As mentioned by Professor Pearl graciously on twitter
A better software developer than a statistician and better statistician than a software developer would have been a good definition for the early 2010s in identifying who would be a data scientist. In the late 2010s, trends changed dramatically, a data scientist is now identified as who can turn any set of data to run through machine learning libraries and getting a model to deploy for service. Unfortunately, this blind empiricism is now considered as a data science practice in many industrial places and the term "scientist" lost its intellectual practice and turn into the mass hysteria of producing "junk science" blindly in the name of "democratisation of data science".
Hubble Space Telescope (Wikipedia) Computational science is to modern data scientist, as telescopes are for astrophysics. 
Who is the modern data scientist?
Modern data science actually goes beyond statistics and machine learning. Modern data scientist practice computational science from dynamical systems to game theory or graph theory. One could think of such practice as applied mathematics or statistical physics as well. For example, most of the neural networks is actually originating from statistical physics. In that sense, a modern data scientist is a computational scientist building mechanics of data.
 The exploratory analysis goes beyond basic PCA or clustering to be able to form causal relationships or establish mechanics of the data.
 Can express the mechanics of data in mathematical models and build parametric inference. Not all parameter estimations are learning.
 Use machine learning algorithms from libraries by knowing the underlying algorithm and can relate this to the mechanics of data.
 Build algorithms fusing above work.
 Explainable and transparent work.
 Document the findings as in the scientific paper and scientific software.
Ignoring the above practice and treating data science similar to a webbased software development activity is not a fair practice and an immense waste of time. Organisations should understand that investing in data science means investing in the new computational science of building mechanics of data. Pushing the outcome of such a scientific practice to make a realworld impact lies in the novelty of scientist and as in any scientific funding, this is a very risky investment.
Misconception in democratisation of data science
The democratisation of data science does not mean that anyone should build learning or statistical models using machine learning libraries and put lots of data to get a blackbox model as a blind empiricist. Democratisation was about the availability of tools and services at very low cost and open culture of transparency in algorithmic and software work.
Artificial Intelligence is modern data science
The separation of AI from the above definition of data science is not really clear. While AI combines the same characteristics to build socalled intelligent agents.
Conclusion
Having a perspective and understanding of what is modern data science about will help organisations better orient in building modern data science capabilities.
Postscript: Further reading and on the mechanics of data
We used a term the mechanics of data, it implies the effort to put in finding signatures of causal relationships and make sense of the correlations within the data. The reason is one of the core scientific methods that give rise to modern science lies in NewtonLeibniz mechanics. Coveney and his coworker's deep dives in intricacies of practising science and data science.

Big data: the end of the scientific method?
Sauro Succi and Peter V. Coveney
[article] 
Big data need big theory too
Peter V. Coveney, Edward R. Dougherty and Roger R. Highfield
[article]
Judea Pearl, a pioneering scientist on causal inference field, a quiet revolution in statistics and data science, Turing award laureate has similar critique on excessive empiricism. His post explains:
1 comment:
I really like this perspective. Well done, sir.
Post a Comment