Saturday, 5 August 2017

Post-statistics: Lies, damned lies and data science patents


Wikipedia
 (US Patent)

Statistics is so important field in our daily lives nowadays, the emerging field of 50 years old data science that is applied to almost every human activity now, or post-statistics, a kind of post-rock,  fusing operations research, data mining, software and performance engineering and of course multitude fields of statistics to machine learning. Even though, the reputation of statistics is a bit shaky due to quotes like  Lies, damned lies, and statistics. Post-statistics is still emerging and drive innovation in almost all industries that drive things with data.
One of the most important characteristics of data science appears to be shared ideal with open source movement, i.e. free software. Note that "free" here means freedom of using the source code and sharing recipes, i.e. a workflows/combination of algorithms for example. The entire innovation in data science we are witnessing last 5 years or so fundamentally driven by this attitude that is embraced by giants like Microsoft, Google and IBM supporting a huge number of enthusiastic individuals from industry and academics. These technology giants open source their workflows and tools to the entire community like Tensorflow and supporting community via event or investing in research that partly goes into public.  On the other hand, traditionally patents are designed to encourage innovation and invention culture. A kind of a gift and a natural right to innovator that given certain time frame he/she or organisation ripe some benefits. 


A recent patent on predicting data science project outcome, unfortunately, do not entirely served to this purpose: Data science project automated outcome prediction US 9710767 B1Even though it is very well written a patent, scope reads very restricted in the first instance, however, the core principle is identical to the standard work-flow activity a data science professional applies in daily routine: where to produce an automated outcome prediction.  The interpretation of  'data science project' is open to any activity on prediction outcome. I am of course no legal expert but based on this patent, which claims to invent outcome prediction pipeline for a 'data science project', Sci-kit learn's workflow manager, pipelines can be taken to court while it facilitates the exact same outcome prediction pipeline this patent claims to invent. It does not matter how this is enforceable but it gives right to patent holder an opportunity to sue everyone doing automated data science outcome prediction. 

This patent US 9710767 B1  is a tremendous disservice to the entire data science community and damaging to an industry and professionals that are trying to use the data in outcome prediction for the greater good in society and solve problems. We definitely do not claim that data science is the solution to our problems in general but will help us to tackle important problems in industry and society. So maybe in the post-statistics world, we have to yell; lies, damned lies and data science patents. While holders of such patent may look like encouraging a patent shark or troll, rather than  the intention of innovating or inventing.