Wednesday, 30 January 2013

R's range and loop behaviour: Zero, One, NULL

One of the most common pattern in programming languages is to ability to iterate over a given set (a vector usually) by using 'for' loops. In most modern scripting languages range operations is a build in data structure and trivial to use with 'for' loops. In this post, I would like to discuss R's behaviour when the upper bound of the range is zero. For example:
> length(0:0)
[1] 1

> for(i in 0:0) { print(i) }
[1] 0
In Python, the behaviour is somehow more intuitive (well depends on your interpretation of zero):
>>> len(range(0,0))
0
>>> for i in range(0,0): print i
...
>>>
So when you use 'for loops' in R, which you should avoid as much as you can, use 'apply' type operations instead, do not assume that 'for loop' will not enter into 'for body' when the upper bound is zero. This can be important when one uses 'length' for a variable length vector as the upper bound. Similar issue is discussed at the section 8.1.60 of R inferno book. Instead, 'seq_along' is suggested there as a replacement. However, this will also fail to catch 0 length sequence.
> for(i in seq_along(0:0)) { print(i) }
[1] 1
Hence, the usage of 'length' or 'seq_along' to determine the upper bound is not recommended. In relation to this, do not use 'seq' or range ':'  in generating a sequence. To completely  avoid such a bug to occur,  a wrapper sequence generation can be used. This is rather quite a conservative approach but would avoid any confusion if your vector's or sequences are variable length and there is a good chance that their upper bound would be zero during run time. Here is one simple wrapper:
> seqWrapper <- function(lb, ub, by=1) {
+   s <- c()
+   if(!ub <= lb) s <- seq(lb,ub, by=by)
+   return(s)
+ }
> seqWrapper(0,0)
NULL
Sometimes being conservative in coding would help to avoid a simple run time bug.


No comments:

Post a Comment