Thursday, 13 June 2013

Practicing static typing in R: Prime directive on trusting our functions with object oriented programming

The creator of S language which R is derived from John Chambers said in one of his books  Software for data analysis programming with R
...This places an obligation on all creators of software to program in such a
way that the computations can be understood and trusted. This obligation
I label the Prime Directive.
He was referring to prime directive from Star Trek. One of the practice in this direction is to have a proper checks in place for the types we use. We can trust that if we pass for example a wrong type to our function, it will fail gracefully. So a type system of a programming language is quite important in mission critical numerical computations. Since R language is weakly typed language, or dynamically typed similar to Perl, Python or Matlab/Octave, most of R users omit to place type checks in their functions if not rarely. For example take the following function that takes arguments of a matrix, a vector and a function name. It applies the named function to each columns of the matrix listed in the given vector. Assuming named function is returning a single number our function will return a vector of numbers.

myMatrixOperation  <-  function(A, v, fName) {
  sliceA <-  A[, v];
  apply(sliceA, 2, fName);
}

One obvious way to put if statements for each argument in our function. So, function may look like:

myMatrixOperation <- function(A, v, fName) {
  if(!is.matrix(A)) {
   stop("A is not a matrix");
  }
  if(!is.vector(v)) {
   stop("v is not a vector");
  }
  if(!is.funcion(fName)) {
   stop("fName is not a function");
  }
  sliceA <- A[, v];
  apply(sliceA, 2, fName);
}
The problem with this approach appears to be the fact that it is too verbose and if we have a repeating pattern of arguments in many functions and many arguments, we would copy and paste code many times. It would not only look ugly but wastes our time. Luckily there is a mechanism to address this: S4-class system. Let's define an S4 class for our set of arguments, following an example instantiation.

setClass("mySlice", representation(A="matrix", v="vector", fName="function"))

myS <- new("mySlice",, A=matrix(rnorm(9),3,3),v=c(1,2), fName=mean)
str(myS)
Formal class 'mySlice' [package ".GlobalEnv"] with 3 slots
  ..@ A    : num [1:3, 1:3] 0.356 -0.34 -0.642 -0.466 2.915 ...
  ..@ v    : num [1:2] 1 2
  ..@ fName:function (x, ...)

Now if we re-write the function that uses our S4 class with type checking only to passing object once.
is.mySlice <- function(obj) {
  l <- FALSE 
  if(class(obj)[1] == "mySlice") { l <- TRUE } 
  l 
} 

myMatrixOperation <- function(mySliceObject) { 
  if(!is.mySlice(mySliceObject)) { 
    stop("argument is not class of mySlice") 
  }  
  sliceA <- mySliceObject@A[, mySliceObject@v]; 
  apply(sliceA, 2, mySliceObject@fName); 
} 
This simple example demonstrates how we can introduce a good organization to our R codes, that obeys the prime directive. Further more modern approach to object orientation is introduced by John Chambers called Reference classes. If you practice this kind of approach in your R codes than I can only say; Live long and prosper.

No comments:

Post a Comment