Showing posts with label statistical physics. Show all posts
Showing posts with label statistical physics. Show all posts

Tuesday, 22 April 2025

Numerical stability showcase: Ranking with SoftMax or Boltzmann factor

Preamble 

Image: Babylonian table for
computation (Wikipedia)
Probably, one of the most important aspects of computational work in quantitative fields, such as physics and data sciences is stability of numerical computations. It implies given inputs, outputs should not wildly deviates to large numbers or it must not distort the results, such as ranking based on scores, one of the most used computation in data science tasks, such as in classification of clustering. In this short post, we provide a stunning example of using SoftMax that creates wrong results if it applied naively. 

SoftMax:  Normalisation with Boltzmann factor

SoftMax is actually something more physics concept than a data science usage. The most common usage in data science is used for ranking.  Given a vector $x_{i}$, then softmax can be computed with the following expression read $$exp(x_{i})/\sum_{i} exp(x_{I}).$$ This originates from statistical physics, i.e., Boltzmann factor. 

Source of Numerical Instability

Using exponential function in the denominator in a sum creates a numerical instability, if one of the number deviates from other numbers significantly in the vector. This makes all other entries zero for the output of softmax.  

Example Instability: Ranking with SoftMax

Let's say we have the following scores 

scores = [1.4, 1.5, 1.6, 170]

for teams A, B, C,  D for some metric, we want to turn this into probabilistic interpretation with SoftMax, this will read, [0., 0., 0., 1.] we see that D comes on top but A,B,C are tied. 

LogSoftMax

We can rectify this instability by using LogSoftMax. reads $$\log exp(x_{I})-log(\sum_{i} exp(x_{i})),$$reads [-168.6000, -168.5000, -168.4000, 0.0000], so that we can induce consecutive ranking without ties, as follows D, A, B, C.

Conclusion

There is a similar practice in statistics for likelihood computations, as Gaussians brings exponential repeatedly. Using Log of the given operations will stabilise the numerical instabilities caused by repeated exponentiation. This shows the importance of numerical pitfalls in data sciences. 

Cite as follows

 @misc{suezen25softmax, 
     title = {Numerical stability showcase: Ranking with SoftMax or Boltzmann factor}, 
     howpublished = {\url{https://memosisland.blogspot.com/2025/04/softmax-numerical-stability.html}}, 
     author = {Mehmet Süzen},
     year = {2025}
}  

Appendix: Python method 

A python method using PyTorch computing softmax example from the main text. 

import torch

List = list
Tensor = torch.tensor

def get_softmax(scores:List, log :bool = False) -> Tensor:
"""
Compute softmax of a list

Defaults to LogSoftMax
"""
scores = torch.tensor(scores)
if log:
scores = torch.log_softmax(scores, dim=0)
else:
scores = torch.softmax(scores, dim=0)
return scores




Saturday, 24 February 2024

Inducing time-asymmetry on reversible classical statistical mechanics via Interventional Thermodynamic Ensembles (ITEs).

Preamble 

Probably, one of the most fundamental issue in classical statistical mechanics is extending reversible dynamics to many-particle systems that behaves irreversibly. In other words, how time's arrow appears even though constituted systems evolves in reversible dynamics. This is the main idea of Loschmidt's paradox. The resolution to this paradox lies into something called interventional thermodynamic ensembles (ITEs).  

Leaning Tower
of Pisa:Recall Galileo's 
Experiments
 (Wikipedia)

Time-asymmetry is about different histories : Counterfactual dynamics

Before trying to understand how ITEs are used in resolving Loschmidt's paradox, we understand that inducing different trajectories on an identical dynamical system in "a parallel universe" implies time-asymmetry. A trajectory provides here a reversibility.  So called "a parallel universe" is about imagining a different dynamics via a sampling, this corresponds to counterfactuals within Causal inference frameworks. 

Interventional Thermodynamic Ensembles (ITEs)

Interventional ensemble build upon an other ensemble, for the sake of simplicity, we can think of an ensemble as an associated chosen sampling scheme. From this perspective,  sampling scheme $\mathscr{E}$ would have an interventional sampling $do(\mathscr{E})$ if the adjusted scheme only  introduces a change in the scheme that doesn't change the inherent dynamics but effects the dynamical  history. One of the first examples of this is appeared recently: single-spin-flip vs. dual-spin-flip dynamics [suezen23]. This is shown with simulations. 

Outlook

Reversibility and time-asymmetry in classical dynamics are a long standing issues in physics. By inducing causal inference perspective in computing dynamical evolution of many body systems leads to reconciliation of reversibility and time-asymmetry i.e., $do-$operator's interpretation.

References

[suezen23] H-theorem do-conjecture (2023) arXiv:2310.01458 (simulation code GitHub).

Please Cite as:

 @misc{suezen24ite, 
     title = {Inducing time-asymmetry on reversible classical statistical mechanics via  Interventional Thermodynamic Ensembles (ITEs)}, 
     howpublished = {\url{https://memosisland.blogspot.com/2024/02/inducing-time-asymmetry-on-reversible.html}, 
     author = {Mehmet Süzen},
     year = {2024}
}  





Saturday, 14 October 2023

Ising-Conway lattice-games: Understanding increasing entropy

Preamble

The entropy is probably one of the most difficult physical concepts to grasp. Its inception roots in efficiency of engines and foundational connection to multi-particle classical mechanics to thermodynamics,  i.e., kinetic theory to thermo-statistics. However, computing entropy for a physical systems is a difficult task, as most of the real-physical systems lacks the explicit formulation. Apart from advanced simulation techniques that invokes thermodynamical expressions, pedagogically accessible and physically plausible system is lacking in the literature. Addressing this, we explore here, recently proposed Ising-Conway Games.

Figure: Evolution of Ising-Conway
Game  (arXiv:2310.01458)
Ising-Conway Lattice-Games (ICG)

Ising-Lenz model is probably one of the landmark models in physics, remarkably provides beyond its idealised case of magnetic domains,  now impacts even quantum computational research. However, computing entropy of Ising-Lenz models are still quite difficult. On the other hand, Conway introduce a game with simple rules generating complexity in various orders, via simple dynamical rules. By analogy to these two modelling approach,  we recently introduce game like physical system of spins or lattice sides on a finite space with constraints. This gives a physically plausible dynamics but simpler dynamical evolution to generate the trajectories. Because vanilla Ising-Models requires more complicated Monte Carlo techniques.  Here is the configuration and dynamics of Ising-Conway games,

  1. $M$ sites as a fixed space.
  2. $N$ occupied sites, or 1s.  
  3. Configuration $C(M,N,t)=C(i)$ over time changes. But at $t=0$ all occupied sites live in at the corner.
  4. Configuration can only change to neighbouring sites if they are empty. This is closely related to spin-flip dynamics of the Ising Model. 
  5. No sites occupy the same lattice cell, Pauli exclusion
  6. Should be contained within $M$ Cell.
An example evolution is shown on the Figure.

Defining ensemble Entropy on ICG

Now we are in position to define the entropy for ICGs, which easy to grasp conceptually and computationally.  $C(i, t) \in \{1,0\}$ defines the states of  the game. We build an ensemble at a given time $t$ by defining a region enclosed by 1s.  Then dimensionality of the ensemble  $ k(t) = argmax[\mathbb{I}(C(i))] - argmin [\mathbb{I}(C(i)) ]$. Here,  $\mathbb{I}$ returns index of $1$s on the lattice. This ensemble closely track maximum entropy of the system at a given time. 

Conclusions

A new game-like system that helps us to understand entropy increase that has a plausible physical characteristics that one can easily simulate.

Further reading

  • H-theorem do-conjecture, M.Süzen, arXiv:2310.01458
  • Effective ergodicity in single-spin-flip dynamics, Mehmet Süzen. Phys. Rev. E 90, 03214 url
  • do_ensemble module provides such simulation via simulate_single_spin_flip_game  from the repo h-do-conjecture 

Please cite as 

 @misc{suezen23iclg, 
     title = {Ising-Conway lattice-games: Understanding increasing entropy}, 
     howpublished = {\url{https://memosisland.blogspot.com/2023/10/ising-conway-games-entropy-increase.html}}, 
     author = {Mehmet Süzen},
     year = {2023}
}  


Saturday, 28 March 2020

Book review: A tutorial introduction to the mathematics of deep learning

Preamble
Artificial Intelligence Engines:
An introduction to the Mathematics
of Deep Learning
by Dr James V. Stone
the book and Github repository.
(c) 2019 Sebtel Press
Deep learning and associated connectionist approaches are now applied routinely in industry and academic research from image analysis to natural language processing and areas as cool as reinforcement learning. As practitioners, we use these techniques and utilise them from well designed and tested reliable libraries like Tensorflow or Pytorch as shipped black-boxed algorithms. However, most practitioners lack mathematical foundational knowledge and core algorithmic understanding. Unfortunately, many academic books and papers try to make an impression of superiority show subliminally and avoid a simple pedagogical approach. In this post we review, a unique book trying to fill this gap with a pedagogical approach to the mathematics of deep learning avoiding showing of mathematical complexity but aiming at conveying the understanding of how things work from the ground up. Moreover, the book provides pseudo-codes that one can be used to implement things from scratch along with a supporting implementation in Github repo. Author Dr James V. Stone, a trained cognitive scientist and researcher in mathematical neuroscience provides such approaches with other books many years now, writing for students, not for his peers to show off. One important note that this is not a cookbook or practice tutorial but an upper-intermediate level academic book.

Building associations and classify with a network

The logical and conceptual separation of associations and classification tasks are introduced in the initial chapters. It is ideal to start with from learning one association with one connection to many via gentle introduction to Gradient descent in learning the weights before going to 2 associations and 2 connections. This reminds me of George Gamow's term 1, 2 and infinity as a pedagogical principle. Perceptron is introduced later on how classification rules can be generated via a network and the problems it encounters with XOR problem.

Backpropagation, Hopfield networks and Boltzmann machines

Detail implementation of backpropagation is provided from scratch without too many cluttering index notation in such clarity. Probably this is the best explanation I have ever encountered. Following chapters introduced Hopfield networks and Boltzmann machines from the ground up to applied level. Unfortunately, many modern deep learning books skip these two great models but Dr Stone makes these two models implementable for a practitioner by reading his chapters.  It is very impressive. Even though I am a bit biased in Hopfield networks as I see them as an extension to Ising models and its stochastic counterparts, but I have not seen anywhere else such explanations on how to use Hopfield networks in learning and in a pseudo-code algorithm to use in a real task.

Advanced topics

Personally, I see the remaining chapters as advanced topics: Deep Boltzmann machines, variational encoders, GANs and introduction to reinforcement learning. Probably exception of deep backpropagation in Chapter 9. I would say what is now known as deep learning now was the inception of the architectures mentioned in sections 9.1 till 9.7.

Glossary, basic linear algebra and statistics

Appendices provide a fantastic conceptual introduction to jargon and basics to main mathematical techniques. Of course, this isn't a replacement to fully-fledged linear algebra and statistics book but it provides immediate concise explanations.

Not a cookbook: Not import tensorflow as tf book

One other crucial advantage of this book is that it is definitely not a cookbook. Unfortunately, almost all books related to deep learning are written in a cookbook style. This book is not. However, it is supplemented by full implementation in a repository supporting each chapter, URL here.

Conclusion

This little book archives so much with down to earth approach with introducing basic concepts with a respectful attitude, assuming the reader is very smart but inexperience in the field. If you are a beginner or even experienced research scientist this is a must-have book.  I still see this book as an academic book and can be used in upper-undergraduate class as the main book in an elective such as
"Mathematics of Deep Learning".

Enjoy reading and learning from this book. Thank you, Dr Stone, for your efforts on making academic books more accessible.

Disclosure: I received a review copy of the book but I have bought another copy for a family member. 

Friday, 27 September 2019

On modern data scientist: A blind empiricist is not a data scientist

As mentioned by Professor Pearl graciously on twitter 

Preamble
Hubble Space Telescope (Wikipedia)
Computational science is to
modern data scientist, as telescopes are
for astrophysics.

A better software developer than a statistician and better statistician than a software developer would have been a good definition for the early 2010s in identifying who would be a data scientist. In the late 2010s, trends changed dramatically, a data scientist is now identified as who can turn any set of data to run through machine learning libraries and getting a model to deploy for service.  Unfortunately, this blind empiricism is now considered as a data science practice in many industrial places and the term "scientist" lost its intellectual practice and turn into the mass hysteria of producing "junk science" blindly in the name of "democratisation of data science".

Who is the modern data scientist? 

Modern data science actually goes beyond statistics and machine learning. Modern data scientist practice computational science from dynamical systems to game theory or graph theory. One could think of such practice as applied mathematics or statistical physics as well.  For example, most of the neural networks is actually originating from statistical physics. In that sense, a modern data scientist is a computational scientist building mechanics of data.


  1. The exploratory analysis goes beyond basic PCA or clustering to be able to form causal relationships or establish mechanics of the data.
  2. Can express the mechanics of data in mathematical models and build parametric inference. Not all parameter estimations are learning.
  3. Use machine learning algorithms from libraries by knowing the underlying algorithm and can relate this to the mechanics of data.
  4. Build algorithms fusing above work.
  5. Explainable and transparent work.
  6. Document the findings as in the scientific paper and scientific software. 

Ignoring the above practice and treating data science similar to a web-based software development activity is not a fair practice and an immense waste of time. Organisations should understand that investing in data science means investing in the new computational science of building mechanics of data. Pushing the outcome of such a scientific practice to make a real-world impact lies in the novelty of scientist and as in any scientific funding, this is a very risky investment.

Misconception in democratisation of data science

The democratisation of data science does not mean that anyone should build learning or statistical models using machine learning libraries and put lots of data to get a black-box model as a blind empiricist. Democratisation was about the availability of tools and services at very low cost and open culture of transparency in algorithmic and software work.

Artificial Intelligence is modern data science

The separation of AI from the above definition of data science is not really clear. While AI combines the same characteristics to build so-called intelligent agents.

Conclusion

Having a perspective and understanding of what is modern data science about will help organisations better orient in building modern data science capabilities.

Postscript: Further reading and on the mechanics of data

We used a term the mechanics of data, it implies the effort to put in finding signatures of causal relationships and make sense of the correlations within the data. The reason is one of the core scientific methods that give rise to modern science lies in Newton-Leibniz mechanics. Coveney and his co-worker's deep dives in intricacies of practising science and data science.
  • Big data: the end of the scientific method?
    Sauro Succi and Peter V. Coveney
    [article]
  • Big data need big theory too
    Peter V. Coveney, Edward R. Dougherty and Roger R. Highfield
    [article]
Post-Postscript
Judea Pearl, a pioneering scientist on causal inference field, a quiet revolution in statistics and data science, Turing award laureate has similar critique on excessive empiricism. His post explains: 

Radical Empiricism and Machine Learning Research, which is also published as an article here: doi

(c) Copyright 2008-2024 Mehmet Suzen (suzen at acm dot org)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License