A recently shown phenomenon can classify deep learning architectures with only using the knowledge gained by trained weights [suezen20a]. The classification produces a measure of equivalence between two trained neural network and astonishingly captures a family of closely related architectures as equivalent within a given accuracy. In this post, we will look into this from a conceptual perspective.
Figure 1: VGG architecture spectral difference in the long positive tail [suezen20a]. |
Conjugacy is a mathematical construct reflecting different approaches to the same system should yield to the same outcome: It is reflected in the statistical mechanic's concept of ensembles. However, for matrix ensembles, like the ones offered in Random Matrix Theory, the conjugacy is not well defined in the literature. One possible resolution is to look at the cumulative spectral difference between two ensembles in the long positive tail part of the spectrum [suezen20a]. If this is vanishing we can say that two matrix ensembles are conjugate to each other. We observe this with matrix ensembles VGG vs. circular ensembles.
Conjugacy is the first step in building equivalence among different architectures. If two architectures are conjugate to the same third matrix ensemble and their fluctuations on the spectral difference are very close over the spectral locations, they are equivalant in a given accuracy [suezen20a].
Outlook: Where to use equivalence in practice?
The equivalence can be used in selecting or compressing an architecture or classify different neural network architectures. Python notebook to demonstrate this with different vision architecture in PyTorch is provided, here.
Reference
[suezen20a] Equivalence in Deep Neural Networks via Conjugate Matrix Ensembles, Mehmet Suezen, arXiv:2006.13687 (2020)