Saturday, 28 September 2024

Matrix language conjecture for combinatorics:
Combinatorial set generation via nested cartesian products

Rubik's cube (Wikipedia)

Preamble
 

Counting is one of the most important concepts in probability and statistical mechanics as well. Two primary characteristics of choosing $n$ items from $N$ items are (no)-order and (no)-repeat. This leads to four  possible cases that leads to combinations and permutations of different kinds to found the resulting set size. Specially the case for repeated no-ordering would be quite hard to memorise or derive on the spot. There is a catch, we can compute the set sizes in combinatorics but usually there is no explicit explanation how could we get the entire combinatorial set member via a computation.  In this short exposition, we have demonstrated a conjecture  that obtaining resulting combinatorial set members is possible via nested cartesian products of the set of the $N$ items: Concepts from matrices help to identify resulting set members in the given combinatorics case.  

Compact Combinatorics

Combinatorics over choosing $n$ items over $N$ has four cases as mentioned. Here is the formulations of choosing sets over $N$ items of size $n$ with different constraints, resulting set sizes will be: 

  • Ordering matters with replacement : $N^n$
  • Ordering matters without replacement: $\Large \frac{N!}{(N-n)!} $
  • Ordering doesn't matter with replacement: $\Large \frac{(n+N-1)!}{n!(N-1)!}$
  • Ordering doesn't matter without replacement: $\Large \frac{N!}{n!(N-n)!} $
However, it is often we forget the formulations of the each case, even to identify the case given a problem.

Obtaining resulting sets: Combinatorial Matrix

We know the size of the resulting sets if we do the choose over a set given $n$. However, the members of the resulting set is not usually known via a computational procedure in standard texts. However, this is possible via cartesian product of the initial set by itself if $n=2$, i.e., pairwise matrices over $N$ item, a combinatorial matrix.     

Matrix language conjecture for combinatorics: A combinatorial matrix has different portions. An entire matrix (E), diagonals (D), upper diagonal (UD) and lower diagonal (LD).  These portions or combinations of them corresponds to the cases above, somehow obvious after a close look: 

  • Ordering matters with replacement : E
  • Ordering matters without replacement: E-D
  • Ordering doesn't matter with replacement:   UD+D or LD+D   
  • Ordering doesn't matter without replacement: UD or LD
By choosing certain portions of the matrix we full-fill constraints of generating new sets. 

Nested Combinatorial Matrices ($n \ge 3$  cases) 

It is obvious that the above conjecture is valid for $n=2$. In the case of where $n \ge 3$ we would need to do more cartesian products over a combinatorial matrix selectively. The idea is as follows: we generate new cartesian products with $N$ items row-wise with removal of the base row at each level due to portions selected. Here is the sketch of the nested cartesian products: 

  1. We start with the case of $n=2$ and generate the matrix.  Retain the corresponding matrix portion.
  2. Then repeat portion for the next levels till exhaustion that items are size $n$: We do cartesian product of the given row to all items again to generate new combinatorial matrix of each element, except the elements deleted from the row due to portion selection, if deleted.
  3. At the end we obtain multiple matrices of possibly different sizes, then choose the portions again to form the resulting set of combinatorial items. 

Conclusion

We have discussed shortly how to leverage a matrix language in finding the resulting sets in core combinatorics counting operations. This could be used as a pedagogical tool, algorithmic development and bringing new relationships between matrices and combinatorial formulaThis is still a conjecture, but taught provoking one that connects matrix generation to combinatorial structures.  

Cite this as follows: 

 @misc{suezen24mcg, 
     title = { Matrix language conjecture for combinatorics: Combinatorial set generation via nested cartesian products}, 
     howpublished = {\url{https://memosisland.blogspot.com/2024/09/combinatorics-touch-of-linear-algebra.html}}
     author = {Mehmet Süzen},
     year = {2024}
}  




Saturday, 1 June 2024

A new kind of AutoML: On the fallacy of many-shot in-context learning with LLMs and PLMs

Preamble 

A graph path (Wikipedia)

With the common usage of  Pre-trained Large  Language Models (PLMs/LLMs), now it is possible to direct them to make data analysis, generate predictions or code for very specialised tasks without training. The primary approach in doing such automated analysis is using many-shot learning. In this short post, it is pointed out that pushing analysis into meta model doesn't remove the analyst, as there are now some claims doing so. This is a common generalisation fallacy in many AI systems. Humans are actually are still in the loop and certain automation presented as if human's are removed entirely is not a fair representation of the capabilities and advantages brought by these models. 

How to direct PLMs to do new-prediction without training?

This is quite an attractive premise. Using a foundational model, we could update their ability of prediction beyond training data by simply inducing a memory in their input, so called Chain-of-Though (CoT). At this point, we can deploy the model for an automated task with invoking CoT before its first prompt. This is a new kind of AutoML approach, that supervised learning takes a new great-look: pushes training efforts to building chain-of-thought datasets. Building CoTs appears to be a meta modelling tasks and requires domain knowledge to develop.

Did we really remove the data analyst, software developer or the practitioner from the loop? 

Short answer is No. What we did by is a new way of performing a specialised task, i.e., modelling is moved into a meta modelling stage. Meaning a memory induced by CoT is  designed by experienced human and it will be broken if the task or input data deviates a little differently. A usual culprit in play here; a reliability is problematic. Even though this is quite a promising development and potential to be a game changer on how we practice machine learning, error-rates are still quite high, see Agarwal-Singh et. al. (2024).  

Conclusion 

In this short exposition, we argue that many-shot learning, chain-of-thought learning for foundation models is actually a new kind of AutoML tool. AutoML doesn't imply that human expert is removed from the picture completely. However it greatly assist the scientist, analyst, programmer or practitioner, in automating some tasks as meta programming tool. It will also help less-experienced colleague to start being a bit more productive. These tools indeed improves our computational tool boxes, specifically as a new AutoML tool.  

Further reading

Please cite as follows:

 @misc{suezen24pmlauto, 
     title = {A new kind of AutoML: On the fallacy of many-shot in-context learning with LLMs and PLMs}, 
     howpublished = {\url{https://memosisland.blogspot.com/2024/05/llm-analysis-fallacy.html}, 
     author = {Mehmet Süzen},
     year = {2024}
}  

Saturday, 24 February 2024

Inducing time-asymmetry on reversible classical statistical mechanics via Interventional Thermodynamic Ensembles (ITEs).

Preamble 

Probably, one of the most fundamental issue in classical statistical mechanics is extending reversible dynamics to many-particle systems that behaves irreversibly. In other words, how time's arrow appears even though constituted systems evolves in reversible dynamics. This is the main idea of Loschmidt's paradox. The resolution to this paradox lies into something called interventional thermodynamic ensembles (ITEs).  

Leaning Tower
of Pisa:Recall Galileo's 
Experiments
 (Wikipedia)

Time-asymmetry is about different histories : Counterfactual dynamics

Before trying to understand how ITEs are used in resolving Loschmidt's paradox, we understand that inducing different trajectories on an identical dynamical system in "a parallel universe" implies time-asymmetry. A trajectory provides here a reversibility.  So called "a parallel universe" is about imagining a different dynamics via a sampling, this corresponds to counterfactuals within Causal inference frameworks. 

Interventional Thermodynamic Ensembles (ITEs)

Interventional ensemble build upon an other ensemble, for the sake of simplicity, we can think of an ensemble as an associated chosen sampling scheme. From this perspective,  sampling scheme $\mathscr{E}$ would have an interventional sampling $do(\mathscr{E})$ if the adjusted scheme only  introduces a change in the scheme that doesn't change the inherent dynamics but effects the dynamical  history. One of the first examples of this is appeared recently: single-spin-flip vs. dual-spin-flip dynamics [suezen23]. This is shown with simulations. 

Outlook

Reversibility and time-asymmetry in classical dynamics are a long standing issues in physics. By inducing causal inference perspective in computing dynamical evolution of many body systems leads to reconciliation of reversibility and time-asymmetry i.e., $do-$operator's interpretation.

References

[suezen23] H-theorem do-conjecture (2023) arXiv:2310.01458 (simulation code GitHub).

Please Cite as:

 @misc{suezen24ite, 
     title = {Inducing time-asymmetry on reversible classical statistical mechanics via  Interventional Thermodynamic Ensembles (ITEs)}, 
     howpublished = {\url{https://memosisland.blogspot.com/2024/02/inducing-time-asymmetry-on-reversible.html}, 
     author = {Mehmet Süzen},
     year = {2024}
}  





Friday, 9 February 2024

Exact reproducibility of stochastic simulations for parallel and serial algorithms simultaneously
Random Stream Chunking

Preamble 

Figure: Visual description of
random stream chunking, M.Suzen (2017)
The advent of using computational sciences approaches, i.e., data science and machine learning, in the industry becomes more common practice in almost all organisations due to the democratisation of data science tools and availability of inexpensive cloud infrastructure. This brings the requirement or even compulsory practice of a code being so called parallelised. Note that parallelisation is used as an umbrella term here for using multiple compute resources in accelerating otherwise a serial computation and could mean a distributed or multi-core computing, i.e., multiple CPUs/GPUs. Here we provide a simple yet a very powerful approach to preserve reproducibility of parallel and serial implementation of the same algorithm that uses random numbers, i.e., stochastic simulations. We assume Single Instruction Multiple Data (SIMD) setting. 

Terminology on repeatability, reproducibility and  replicability 

Even though we only use reproducibility as a term as an umbrella term, there are clear distinctions, recommended by ACM, here. We summarise this from computational science perspective :

repeatability : Owner re-run the code to produce the same results on own environment. 
reproducibility: Others can  re-run the code to produce the same results on other's environment. 
replicability: Others can re-code the same thing differently and produce the same results on other's environment. 

In the context of this post, since parallel and serial settings would constitute different environments, the practice of getting the same results, this falls under reproducibility.   

Single Instruction Multiple Data (SIMD) setting. 

This is the most used, and probably the only one you would ever need, approach in parallel processing. It implies using the same instruction, i.e., algorithm or function/module, for the distinct portions of the data. This originates from applied mathematics techniques so called domain decomposition method

Simultaneous Random Stream Chunking 

The approach in ensuring exact reproducibility of a stochastic algorithm in both serial and parallel implementation lies in default chunking in producing the random numbers both in serial and parallel code. This approach used in the Bristol Python package Here is the mathematical definition. 

Defintion Random Stream Chunking Given a random number generator $\mathscr{R}(s_{k})$ with as seed $s_{k}$ is used over $k$-partitions. These partitions are always corresponds to $k$ datasets, MD portion of SIMD. In the case of parallel algorithms, each $k$ corresponds to a different compute resource     $\mathscr{C}_{k}$. 

By this definition we ensure that both parallel and serial processing receiving exactly the same random number, this is summarised in the Figure.

Conclusion

We outline a simple yet effective way of ensuring exact reproducibility of serial and parallel simulations simultaneously. However, reproducibility of stochastic simulations are highly hardware dependent as well, such as GPUs and NPUs, and their internal streams might not be that easy to control partitions, but generic idea presented here should be applicable.


Please cite this article as: 

 @misc{suezen24rep, 
     title = {Exact reproducibility of stochastic simulations for parallel and serial algorithms simultaneously}, 
     howpublished = {\url{https://memosisland.blogspot.com/2024/02/exact-reproducibility-of-stochastic.html}, 
     author = {Mehmet Süzen},
     year = {2024}
}