Memo's Island: L1

Thursday, 3 January 2019

Core principles of sustainable data science, machine learning and AI product development: Research as a core driver

Kindly reposto to KDnuggets by Gregory Piatetsky-Shapiro

Preamble

Almost all businesses and industry embraced Machine learning (ML) technologies. Apart from ROI concerns, as it is an expensive endeavour to develop and deploy a service driven by ML techniques, sustainability as in going beyond proof-of-concept core development appears to be one of the roadblocks in data science. In this post, we will outline basic logical core principles that can help organisations for sustainable AI product development cycle, apart from reproducibility issues. The aim is giving a coarser view, rather than listing fine-grain good practice advice.

Research as a core driver: Research Environment

Regardless of the size of your organisation, if you are developing machine learning or AI products, the core asset you have is a research professional, data scientist or AI scientist, regardless of their academic background. Developing a model using software libraries blindly won't resolve issues you might encounter after deployment of the product. For example, even if you need to do a simple hyperparameter search, this can easily yield to research. Why? Because most probably no one ever tried building a model or try a modelling task using your dataset and you might need a different approach than ML libraries provide. A simplest different angle or deviation from ML libraries will yield to a research problem.

No full 'black-box' approaches.
No blind usage of software libraries.
Awareness and skills in the mathematical and algorithmic aspect in detail.

Figure: A schematic of core principles for AI product development.

Separate out research code and production code

Software development is an integral part of ML product development. However, during research, a code development can go very wild and a scientist, even if they are very good software developers, would end up creating hard to follow and poor code. Once there is a confidence in reproducibility and robustness of results, the production code should be re-written with high-quality software engineering principles.

Data Standardisation: Release data-sets for research

A cold start problem for ML products is to release and design data-sets before even doing any research like work. This, of course, has to be aligned with industrial requirements. Imagine datasets like MINST or imagenet for benchmarking. Released sets will be the first step for any model building or product development, and would constitute a data product themselves. Data versioning is also a must.

Do not obsess with workflows: All workflows are ad-hoc

There is no such thing as a universal or generic workflow. A workflow depends on a human understanding of processes and steps. Human understanding is based on language and linguistically there is no such thing as universal language, at least it isn't practical yet c.f., universal grammar. Loosely defined steps are sufficient for research steps. However, once it entered into production, then much more strict workflow design might be needed, but be aware all workflows are ad-hoc.

Do not run sprints for core data science

Agile principles are suitable for software development innovations. Sprints or Agile is not suitable for AI research and research environment, it is a different kind of innovation than software engineering. Thinking that Agile is a cure to do scientific innovation is naive wishful thinking. Structuring a research group, periodic reviews and releases of the results via presentations and detailed technical reports are much more suitable for data science on top of mini-workshops. A simple proposal runs can also be made to decide which direction to invest, akin to research proposals.

Feedback loop: Service to Business decision making back to research

A service using ML technologies should produce more data. The very first service monitoring is A/Null testing, meaning that what would happen in the absence of the AI product. Detailed analysis of the service data would bring more insights both for business and to research.

produce impact assessment: A/null testing
quality of service: Quality of service can be measured basically on what is the success of the ML model, this has to be technical.

Conclusion and outlook

There is no such thing as free-lunch and developing AI products won't be fully automated soon. Tools may improve the productivity immensely but AI replacing a data scientist or AI scientist is far from reality, at least for now. If you are investing in AI products, basically you are investing in research at the core, missing that important point may cost organisations a lot. The basic core principles or variation of them may help in sustaining AI products longer and form your teams accordingly.

Friday, 10 April 2015

Scale back or transform back multiple linear regression coefficients: Arbitrary case with ridge regression

Summary

The common case in data science or machine learning applications, different features or predictors manifest them in different scales. This could bring difficulty in interpreting the resulting coefficients of linear regression, such as one feature having very large or small values compare to other predictors and being in different units first of all. The common approach to overcome this to use z-score's for each features, centring and scaling.

This approach would allow us to interpret the effects. A possible question however is how could we map regression coefficients obtained with the scaled data back to original coefficients. In the context of ridge regression, this question is posed by Mark Seeto in the R mailing list and provided a solution for two predictor case with an R code. In this post we formalize his approach. Note that, in the case of how to scale, Professor Gelman suggests dividing them by two standard deviations. In this post we won't cover that approach and use usual approach.

Algebraic Solution: No error term

An arbitrary linear regression for $n$ variable reads as follows
$$y=(\Sigma_{i=1}^n \beta_{i} x_{i}) + \beta_{0}$$
here, $y$ is being response variable, $x_{i}$ are the predictors, $n=1,..,n$. Let's use primes for the scaled regression equation for $n$ variable.
$$y'=(\Sigma_{i=1}^n \beta_{i}' x_{i}') + \beta_{0}'$$
We would like to express $\beta_{i}$ by only using $\beta_{i}'$ and two statistic from the data, namely mean and standard deviations, $\mu_{x_{i}}$, $\mu_{y}$, $\sigma_{x_{i}}$ and $\sigma_{y}$.

The following transformation can be shown by using the z-scores and some algebra,

$$\beta_{0}=\beta_{0}' \sigma_{y} + \mu_{y} - \Sigma_{i=1}^{n} \frac{\sigma_{y}}{\sigma_{x_{i}}}\beta_{i}' \mu_{x_{i}}$$
$$\beta_{i} = \beta_{i}' \frac{\sigma_{y}}{\sigma_{x_{i}}}$$

Ridge regression in R

There are many packages and tools in R to perform ridge regression. One of the prominent one is glmnet. Following Mark Seeto's example, here we extent that in to many variate case with a helper function scaleBack.lm from R1magic package. Function provides a transform utility for $n$-variate case. Here we demo this using 6 predictors, also available as gist,

rm(list=ls())
library(glmnet)
library(R1magic) # https://github.com/msuzen/R1magic
set.seed(4242)
n <- 100 # observations
X <- model.matrix(~., data.frame(x1 = rnorm(n, 1, 1),
                                 x2 = rnorm(n, 2, 2),
                                 x3 = rnorm(n, 3,2),
                                 x4 = rnorm(n, 4,2),
                                 x5 = rnorm(n, 5,1),
                                 x6 = rnorm(n, 6,1)
                                ))[,-1] # glmnet adds the intercept
Y          <- matrix(rnorm(n, 1, 2),n,1)
# Now apply scaling 
X.s        <- scale(X)
Y.s        <- scale(Y)
# Ridge regression & coefficients with scaled data
glm.fit.s    <- glmnet(X.s, Y.s, alpha=0)
betas.scaled <- as.matrix(as.vector(coef(glm.fit.s)[,80]), 1, 7)
# trasform the coefficients 
betas.transformed <- scaleBack.lm(X, Y, betas.scaled)
# Now verify the correctness of scaled coefficients: 
# ridge regression & coefficients
glm.fit    <- glmnet(X, Y, alpha=0)
betas.fit  <- as.matrix(as.vector(coef(glm.fit)[,80]), 1, 7)
# Verify correctness: Difference is smaller than 1e-12
sum(betas.fit-betas.transformed) < 1e-12 # TRUE

Conclusion

Multiple regression is used by many practitioners. In this post we have shown how to scale continuous predictors and transform back the regression coefficients to original scale. Scaled coefficients would help us to better interpret the results. The question of when to standardize the data is a different issue.

Memo's Island

Thursday, 3 January 2019

Core principles of sustainable data science, machine learning and AI product development: Research as a core driver

Friday, 10 April 2015

Scale back or transform back multiple linear regression coefficients: Arbitrary case with ridge regression

Mehmet Suzen

Related

(c) Copyright 2008-2024 Mehmet Suzen (suzen at acm dot org)