Thursday 5 September 2013

Metaprogramming in R with an example: Beating lazy evaluation

Functional languages allows us to treat functions as types. This brings us a distinct advantage of being able to write a code that generates further code, this practise is generally known as metaprogramming. As a functional language R project provides tools to perform well structured code generation. In this post, I will present a simple example that generates functions on the fly based on different parametrisation in the function body. Consider the following simple function taking a vector as an argument and returning the number of element that are higher than a given threshold. 
1
2
3
4
myFun <- function(vec) {
  numElements <- length(which(vec > threshold))
  numElements
}
If somehow we need to have a different threshold value within the body, for a moment accept that it is a requirement rather than proposing to have an other argument in the function definition. Instead of rewriting the function by hand we will write a function that generates all these functions in our work space. Problematic bit of this exercise will be to beat lazy evalution.  Here is the function that produces losts of myFun type functions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
genMyFuns <- function(thresholds) {
  ll <- length(thresholds)
  print("Generating functions:")
  for(i in 1:ll) {
    fName <- paste("myFun.", i, sep="")
    print(fName)
    assign(fName, eval(
                       substitute(
                                  function(vec) {
                                    numElements <- length(which(vec > tt));
                                    numElements;
                                  }, 
                                  list(tt=thresholds[i])
                                 )
                      ),
             envir=parent.frame()
           )
  }
}
Let's shortly analyse  this function. If we don't use substitute explicitly there, due to lazy evalution our value for the threshold will not be assigned at the loop value but the  last value of thresholds[i]. Here is one numeric example on the R CLI session:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
>  genMyFuns(c(7, 9, 10))
[1] "Generating functions:"
[1] "myFun.1"
[1] "myFun.2"
[1] "myFun.3"
>  myFun.1(1:20)
[1] 13
>  myFun.2(1:20)
[1] 11
>  myFun.3(1:20)
[1] 10
> 
To be able to generate code is very powerful tool. However, a caution should be taken in practicing code generation in a large project. This may bring more problems in debugging. Every powerful method comes with a hidden cost.

6 comments:

cellocgw said...

It might be a lot easier to use the body() function. As :

body(your_func)[3] <- x*new_parameter

rather than going thru all the pain in your example.

msuzen said...

@cellocgw

Could you post a working example that reproduce the above example? Shorter the code the better of course, but I always try to make it work first. Yes,
function expression may be defined using with 'body'. But, the crucial step here is not generating body of a function but the usage of 'substitute', as I mentioned above, otherwise due lazy evaluation the new parameter won't be updated properly in the generated functions.

cellocgw said...

LIke this?

main<-function(newthing,x){
changeit<-function(x){
foo<-x + 5
}
body(changeit)[2]<- parse(text=newthing)
changeit(x)
}


main('x -3', 4)

There's no lazy evaluation problem.

msuzen said...

@cellocgw.

Your code does not reproduce the example in the post. As I said, the problem is not only changing the body. The idea is to generate many functions using a template function and changing one of the constant inside the template function to a new value that is pulled out from a vector. Functions returned by the generating function must be functions which are not evaluated and available in the parent frame.

Anonymous said...

I think this might be a cleaner way to do it. Since R is a functional language, you can make functions that return functions as results. So I create a function that takes thresholds, and generates a series of functions that take a vector and return the number of elements greater than threshold

genMyFuns <- function(thresholds) {
sapply(thresholds, function(x) return(
function(vec) return(length(which(vec > x)))
))
}

We can test this function this way:

thresholds <- seq(0, 1, by = 0.1)
myFuns <- genMyFuns(thresholds)

we can call a specific threshold function this way:

# Create a test vector to test our functions
test_vec <-rnorm(1000)
# Call the function associated with threshold with 0.1
myFuns[[2]](test_vec)

If for some reason, we need each function to be named, we can do something like:
for(i in 1:length(myFuns)) {
assign(paste0("myFun.", i), myFuns[[i]])
}

We can test it this way

myFun.10(test_vec)

msuzen said...

@unknown (Arnob) Yes, it produce similar effect. However, the solution given in the blog post is much more clean. We don't want to refer the function from a list. Idea was to have a function available in the current frame, literally with its name.

(c) Copyright 2008-2024 Mehmet Suzen (suzen at acm dot org)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License