Wednesday, 12 November 2025

Understanding high-entropy random number generation for research: Leymosun Package

Preamble

The idea of generating randomness via computers goes back to von Neumann's and his team of colleagues work in Princeton and Los Alamos. Even though, he was sceptical about the idea of generating randomness via computer programs, this practice is now almost a de-facto standard, so the name pseudo-random-number-generator (PRNG). Is there a way to generate better PRNGs? This is a continuous research. A practical matter of seeding is often ignored because a good PRNG should be reliable regardless of seeding, however, in repeated calls to a random sequence may degrade this view. 

PRNGs bird's eye view : Simulating Randomness

Wigner's Cat (Leymosun Package)
Probably the easiest way to explain RNGs appears to be using discrete recurrence equations, well, actually most RNGs generates numbers based on this, such as $$x_{n+1} = f(x_{n})$$ for example it can be parametrised by $a$ and $b$, $f(x_{n}) = a x_{n} + b$. This basic form allow us remarkably to generate quite useful simulation of randomness. 

Concept of a seed 

An other concept is a seed, it can be the first number to start recurrence but it can also be some other form of something related to random generation, if $x_{0}$ is fixed or has some complicated formula. We usually put seed like 42 or 4242 or some other integer.  There are actually used code with seeds 12345.

Anytime we call a randomisation functions, such as sampling 

How eliminating seeding improves things? High-Entropy Randomness

We use new seeds at each new call, this would generate a different sequence and yields us to sample it. This practice would improve the non-predictability as we sample different sequences. But then the question is, where to get the seeds at each call? The answer lies on the "entropy pool" provided by the operating systems, such us Unix's "/dev/random". Essentially they are kind of random devices we tap into. By using hybrid approach of pseudorandom generation via algorithms and seeding from random devices yields to an increased, High-Entropy randomness. 

Leymosun: Tested implementation

Leymosun Python package provides this facility, using its random module functionality operates as described above and one doesn't need to provide any seeding, it uses non-deterministic seeding. Using NIST's test, we have shown that high-entropy approach gives higher pass scores in randomness tests compare to baseline seeding, such as 42. To install, just use:

 pip install leymosun

Conclusion

Using high-quality randomness for research is quite critical. We have shortly discussed this covering the new approach of using RNGs. 

Further reading and links


Please cite as follows:

@article{suzen21,
  title={Empirical deviations of semicircle law in mixed-matrix ensembles},
  author={S{\"u}zen, Mehmet},
  year={2021},
  journal={HAL-Science},
  url={https://hal.science/hal-03464130/}
}
 @misc{suezen25ley, 
     title = {Understanding high-entropy random number generation for research: Leymosun Package}, 
     howpublished = {\url{https://memosisland.blogspot.com/2025/11/leymosun-high-entropy-randomness.html}}, 
     author = {Mehmet Süzen},
     year = {2025}
}  

Tuesday, 22 April 2025

Numerical stability showcase: Ranking with SoftMax or Boltzmann factor

Preamble 

Image: Babylonian table for
computation (Wikipedia)
Probably, one of the most important aspects of computational work in quantitative fields, such as physics and data sciences is stability of numerical computations. It implies given inputs, outputs should not wildly deviates to large numbers or it must not distort the results, such as ranking based on scores, one of the most used computation in data science tasks, such as in classification of clustering. In this short post, we provide a stunning example of using SoftMax that creates wrong results if it applied naively. 

SoftMax:  Normalisation with Boltzmann factor

SoftMax is actually something more physics concept than a data science usage. The most common usage in data science is used for ranking.  Given a vector $x_{i}$, then softmax can be computed with the following expression read $$exp(x_{i})/\sum_{i} exp(x_{I}).$$ This originates from statistical physics, i.e., Boltzmann factor. 

Source of Numerical Instability

Using exponential function in the denominator in a sum creates a numerical instability, if one of the number deviates from other numbers significantly in the vector. This makes all other entries zero for the output of softmax.  

Example Instability: Ranking with SoftMax

Let's say we have the following scores 

scores = [1.4, 1.5, 1.6, 170]

for teams A, B, C,  D for some metric, we want to turn this into probabilistic interpretation with SoftMax, this will read, [0., 0., 0., 1.] we see that D comes on top but A,B,C are tied. 

LogSoftMax

We can rectify this instability by using LogSoftMax. reads $$\log exp(x_{I})-log(\sum_{i} exp(x_{i})),$$reads [-168.6000, -168.5000, -168.4000, 0.0000], so that we can induce consecutive ranking without ties, as follows D, A, B, C.

Conclusion

There is a similar practice in statistics for likelihood computations, as Gaussians brings exponential repeatedly. Using Log of the given operations will stabilise the numerical instabilities caused by repeated exponentiation. This shows the importance of numerical pitfalls in data sciences. 

Cite as follows

 @misc{suezen25softmax, 
     title = {Numerical stability showcase: Ranking with SoftMax or Boltzmann factor}, 
     howpublished = {\url{https://memosisland.blogspot.com/2025/04/softmax-numerical-stability.html}}, 
     author = {Mehmet Süzen},
     year = {2025}
}  

Appendix: Python method 

A python method using PyTorch computing softmax example from the main text. 

import torch

List = list
Tensor = torch.tensor

def get_softmax(scores:List, log :bool = False) -> Tensor:
"""
Compute softmax of a list

Defaults to LogSoftMax
"""
scores = torch.tensor(scores)
if log:
scores = torch.log_softmax(scores, dim=0)
else:
scores = torch.softmax(scores, dim=0)
return scores




Saturday, 28 September 2024

Matrix language conjecture for combinatorics:
Combinatorial set generation via nested cartesian products

Rubik's cube (Wikipedia)

Preamble
 

Counting is one of the most important concepts in probability and statistical mechanics as well. Two primary characteristics of choosing $n$ items from $N$ items are (no)-order and (no)-repeat. This leads to four  possible cases that leads to combinations and permutations of different kinds to found the resulting set size. Specially the case for repeated no-ordering would be quite hard to memorise or derive on the spot. There is a catch, we can compute the set sizes in combinatorics but usually there is no explicit explanation how could we get the entire combinatorial set member via a computation.  In this short exposition, we have demonstrated a conjecture  that obtaining resulting combinatorial set members is possible via nested cartesian products of the set of the $N$ items: Concepts from matrices help to identify resulting set members in the given combinatorics case.  

Compact Combinatorics

Combinatorics over choosing $n$ items over $N$ has four cases as mentioned. Here is the formulations of choosing sets over $N$ items of size $n$ with different constraints, resulting set sizes will be: 

  • Ordering matters with replacement : $N^n$
  • Ordering matters without replacement: $\Large \frac{N!}{(N-n)!} $
  • Ordering doesn't matter with replacement: $\Large \frac{(n+N-1)!}{n!(N-1)!}$
  • Ordering doesn't matter without replacement: $\Large \frac{N!}{n!(N-n)!} $
However, it is often we forget the formulations of the each case, even to identify the case given a problem.

Obtaining resulting sets: Combinatorial Matrix

We know the size of the resulting sets if we do the choose over a set given $n$. However, the members of the resulting set is not usually known via a computational procedure in standard texts. However, this is possible via cartesian product of the initial set by itself if $n=2$, i.e., pairwise matrices over $N$ item, a combinatorial matrix.     

Matrix language conjecture for combinatorics: A combinatorial matrix has different portions. An entire matrix (E), diagonals (D), upper diagonal (UD) and lower diagonal (LD).  These portions or combinations of them corresponds to the cases above, somehow obvious after a close look: 

  • Ordering matters with replacement : E
  • Ordering matters without replacement: E-D
  • Ordering doesn't matter with replacement:   UD+D or LD+D   
  • Ordering doesn't matter without replacement: UD or LD
By choosing certain portions of the matrix we full-fill constraints of generating new sets. 

Nested Combinatorial Matrices ($n \ge 3$  cases) 

It is obvious that the above conjecture is valid for $n=2$. In the case of where $n \ge 3$ we would need to do more cartesian products over a combinatorial matrix selectively. The idea is as follows: we generate new cartesian products with $N$ items row-wise with removal of the base row at each level due to portions selected. Here is the sketch of the nested cartesian products: 

  1. We start with the case of $n=2$ and generate the matrix.  Retain the corresponding matrix portion.
  2. Then repeat portion for the next levels till exhaustion that items are size $n$: We do cartesian product of the given row to all items again to generate new combinatorial matrix of each element, except the elements deleted from the row due to portion selection, if deleted.
  3. At the end we obtain multiple matrices of possibly different sizes, then choose the portions again to form the resulting set of combinatorial items. 

Conclusion

We have discussed shortly how to leverage a matrix language in finding the resulting sets in core combinatorics counting operations. This could be used as a pedagogical tool, algorithmic development and bringing new relationships between matrices and combinatorial formulaThis is still a conjecture, but taught provoking one that connects matrix generation to combinatorial structures.  

Cite this as follows: 

 @misc{suezen24mcg, 
     title = { Matrix language conjecture for combinatorics: Combinatorial set generation via nested cartesian products}, 
     howpublished = {\url{https://memosisland.blogspot.com/2024/09/combinatorics-touch-of-linear-algebra.html}}
     author = {Mehmet Süzen},
     year = {2024}
}  




Saturday, 1 June 2024

A new kind of AutoML: On the fallacy of many-shot in-context learning with LLMs and PLMs

Extension of this post, appear as a short article with a conjecture:

Preamble 
A graph path (Wikipedia)

With the common usage of  Pre-trained Large  Language Models (PLMs/LLMs), now it is possible to direct them to make data analysis, generate predictions or code for very specialised tasks without training. The primary approach in doing such automated analysis is using many-shot learning. In this short post, it is pointed out that pushing analysis into meta model doesn't remove the analyst, as there are now some claims doing so. This is a common generalisation fallacy in many AI systems. Humans are actually are still in the loop and certain automation presented as if human's are removed entirely is not a fair representation of the capabilities and advantages brought by these models. 

How to direct PLMs to do new-prediction without training?

This is quite an attractive premise. Using a foundational model, we could update their ability of prediction beyond training data by simply inducing a memory in their input, so called Chain-of-Though (CoT). At this point, we can deploy the model for an automated task with invoking CoT before its first prompt. This is a new kind of AutoML approach, that supervised learning takes a new great-look: pushes training efforts to building chain-of-thought datasets. Building CoTs appears to be a meta modelling tasks and requires domain knowledge to develop.

Did we really remove the data analyst, software developer or the practitioner from the loop? 

Short answer is No. What we did by is a new way of performing a specialised task, i.e., modelling is moved into a meta modelling stage. Meaning a memory induced by CoT is  designed by experienced human and it will be broken if the task or input data deviates a little differently. A usual culprit in play here; a reliability is problematic. Even though this is quite a promising development and potential to be a game changer on how we practice machine learning, error-rates are still quite high, see Agarwal-Singh et. al. (2024).  

Conclusion 

In this short exposition, we argue that many-shot learning, chain-of-thought learning for foundation models is actually a new kind of AutoML tool. AutoML doesn't imply that human expert is removed from the picture completely. However it greatly assist the scientist, analyst, programmer or practitioner, in automating some tasks as meta programming tool. It will also help less-experienced colleague to start being a bit more productive. These tools indeed improves our computational tool boxes, specifically as a new AutoML tool.  

Further reading

Please cite as follows:

 @misc{suezen24pmlauto, 
     title = {A new kind of AutoML: On the fallacy of many-shot in-context learning with LLMs and PLMs}, 
     howpublished = {\url{https://memosisland.blogspot.com/2024/05/llm-analysis-fallacy.html}, 
     author = {Mehmet Süzen},
     year = {2024}
}  
(c) Copyright 2008-2024 Mehmet Suzen (suzen at acm dot org)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License