Tuesday, 29 December 2020

Practice causal inference: Conventional supervised learning can't do inference

This is a bit philosophical but goes into causal inference.

A trained model may provide predictions about input values it may never seen before but it isn't an inference, at least for 'classical' supervised learning. In reality it provides an interpolation from the training-set, i.e., via function approximation. By "inference implies going beyond training data", reference to distributional shift, compositional learning or similar type of learning should have been raised. 

In the case of ontology inference, ontology being a causal graph, that is a "real" inference as it symbolically traverse a graph of causal connections. Not sure if we can directly transfer that to regression scenario but probably it is possible with altering our models with SCMs and hybrid symbolic-regression approach. 

  • Looper repo provides a resource list for causal inference looper 
  • Thanks to Patrick McCrae for invoking ontology inference comparison.

Sunday, 1 November 2020

Gems of data science: 1, 2, infinity


Figure: George Gamow's book. (Wikipedia)
Problem-solving is the core activity of data science using scientific principles and evidence. On our side, there is an irresistible urge to solve the most generic form of the problem. We do this almost always from programming to formulation of the problem. But, don't try to solve a generalised version of the problem. Solve it for N=1 if N is 1 in your setting, not for any integer: Save time and resources and try to embed this culture to your teams and management. Extent later when needed on demand.

Solving for N=1 is sufficient if it is the setting

This generalisation phenomenon manifests itself as an algorithmic design: From programming to problem formulation, strategy and policy setting. The core idea can be expressed as mapping, let's say the solution to a problem  is a function, mapping from one domain to a range 

$$ f : \mathbb{R} \to \mathbb{R} $$

Trying to solve for the most generic setting of the problem, namely multivariate setting

$$ f : \mathbb{R}^{m} \to \mathbb{R}^{n} $$

where $m, n$ are the integers generalising the problem.  


It is elegant to solve a generic version of a problem. But is it really needed? Does it reflect reality and would be used? If N=1 is sufficient, then try to implement that solution first before generalising the problem. An exception to this basic pattern would be if you don't have a solution at N=1 but once you move larger N that there is a solution: you might think this is absurd, but SVM works exactly in this setting by solving classification problem for disconnected regions.


  • The title intentionally omits three, while it is a reference to Physics's inability to solve, or rather a mathematical issue of the three-body problem.

Sunday, 28 June 2020

Conjugacy and Equivalence for Deep Neural Networks: Architecture compression to selection


A recently shown phenomenon can classify deep learning architectures with only using the knowledge gained by trained weights [suezen20a]. The classification produces a measure of equivalence between two trained neural network and astonishingly captures a family of closely related architectures as equivalent within a given accuracy. In this post, we will look into this from a conceptual perspective. 

Figure 1: VGG architecture spectral difference in the long
positive tail [suezen20a]
The concept of conjugate matrix ensembles and equivalence

Conjugacy is a mathematical construct reflecting different approaches to the same system should yield to the same outcome: It is reflected in the statistical mechanic's concept of ensembles. However, for matrix ensembles, like the ones offered in Random Matrix Theory, the conjugacy is not well defined in the literature. One possible resolution is to look at the cumulative spectral difference between two ensembles in the long positive tail part of the spectrum [suezen20a]. If this is vanishing we can say that two matrix ensembles are conjugate to each other. We observe this with matrix ensembles VGG vs. circular ensembles. 

 Conjugacy is the first step in building equivalence among different architectures.  If two architectures are conjugate to the same third matrix ensemble and their fluctuations on the spectral difference are very close over the spectral locations, they are equivalant in a given accuracy [suezen20a].

Outlook: Where to use equivalence in practice?

The equivalence can be used in selecting or compressing an architecture or classify different neural network architectures. Python notebook to demonstrate this with different vision architecture in PyTorch is provided, here.


[suezen20a] Equivalence in Deep Neural Networks via Conjugate Matrix Ensembles, Mehmet Suezen, arXiv:2006.13687 (2020)

Tuesday, 12 May 2020

Collaborative data science: High level guidance for ethical scientific peer reviews


Catalan Castellers are
collaborating (Wikipedia)
Availability of distributed code tracking tools and associated collaborative tools make life much easier in building collaborative scientific tools and products. This is now especially much more important in data science as it is applied in many different industries as a de-facto standard. Essentially a computational science field in academics now become industry-wide practice.

Peer-review is a pull request

Peer-reviews usually appears as pull requests, this usually entails a change to base work that achieves the certain goal by changes. A nice coincidence that acronym PR here corresponds to both peer review and pull request.

Technical excellence does come with decent behaviour

Aiming at technical excellence is all we need to practice. Requesting technical excellence in PRs is our duty as peers. However, it does come with a decent behaviour. PRs are tools for collaborative work, even if it isn't your project or you are in a different cross-function. Here we summarise some of the high-level points for PRs. This can manifest as software code, algorithmic method or a scientific or technical article:
  • Don’t be a jerk  We should not request things that we do not practice ourselves or invent new standards on the fly.   If we do, then give a hand in implementing it.
  • Focus on the scope and be considerate We should not request things that extend the scope of the task much further than the originally stated scope.   
  • Nitpicking is not attention to details Attention to details is quite valuable but excessive nitpicking is not.
  • Be objective and don’t seek revenge If someone of your recommendations on PRs is not accepted by other colleague don’t seek revenge on his suggestions on your PRs by declining her/his suggestions as an act of revenge or create hostility towards that person.


We provided some basic suggestion on high-level guidance on peer review processes. Life is funny, there is a saying in Cyprus and probably in Texas too, -what you seed you will harvest-..

(c) Copyright 2008-2020 Mehmet Suzen (suzen at acm dot org)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License