Preamble
![]() |
Image: Babylonian table for computation (Wikipedia) |
SoftMax: Normalisation with Boltzmann factor
SoftMax is actually something more physics concept than a data science usage. The most common usage in data science is used for ranking. Given a vector $x_{i}$, then softmax can be computed with the following expression read $$exp(x_{i})/\sum_{i} exp(x_{I}).$$ This originates from statistical physics, i.e., Boltzmann factor.
Source of Numerical Instability
Using exponential function in the denominator in a sum creates a numerical instability, if one of the number deviates from other numbers significantly in the vector. This makes all other entries zero for the output of softmax.
Example Instability: Ranking with SoftMax
Let's say we have the following scores
for teams A, B, C, D for some metric, we want to turn this into probabilistic interpretation with SoftMax, this will read, [0., 0., 0., 1.] we see that D comes on top but A,B,C are tied.
LogSoftMax
We can rectify this instability by using LogSoftMax. reads $$\log exp(x_{I})-log(\sum_{i} exp(x_{i})),$$reads [-168.6000, -168.5000, -168.4000, 0.0000], so that we can induce consecutive ranking without ties, as follows D, A, B, C.
Conclusion
There is a similar practice in statistics for likelihood computations, as Gaussians brings exponential repeatedly. Using Log of the given operations will stabilise the numerical instabilities caused by repeated exponentiation. This shows the importance of numerical pitfalls in data sciences.
Cite as follows
Appendix: Python method
A python method using PyTorch computing softmax example from the main text.