This post will share my experience about how to measure the performance of code in R. As an R programmer, you may have heard
apply() functions are usually more efficient than for-loops.
However, you may ask that how can we measure this efficiency and how can we quantify the performance of our functions?
One of the most direct ways to know how long your code runs is using
From the above results, the elapsed time to run function
for_fun is 0.13 seconds. However, to some extent, using
system.time() is not very reliable because it just executes 1 and only 1 time - it’s hard to tell whether the status of your OS affects a lot during this execution.
Thus, I’d like to introduce microbenchmark - a package that brings you more comprehensive results about the performance of your code and allows you to choose how many times you want to execute your code.
Now you can find out after running
for_fun(100000) with 100 times, the average execution time is 0.112 seconds, meanwhile, you could get other useful information such as the median time (0.107 seconds).
Besides the microbenchmark package, you can also make full use of the RStudio and it has a functionality called Profile on the toolbar. Here is the introduction document.
Now!! Let’s do more field tests on
apply() functions. Assume we have a fake.vector, which has 10000 number of numeric elements, a function called
use_for_loop() that just does some conditional computation and a similar function called
Firstly, we could use
identical() to check whether these functions return the same result. Then, let’s measure their performance:
These results illustrate that with 500 executions, running
apply() function averagely takes 12.7 milliseconds and it is 5 milliseconds quicker than the
for-loop does. However, it does not mean
apply() function is always a silver bullet compared with
for-loop. Let’s make fake.vector only has 10 elements:
for-loop averagely takes 22.1 microseconds per round, while the
apply() function takes 29.0 microseconds. Thus, we could say:
- When there is not too many data to be traversed and worked on, the speeds of
apply()function can almost be the same, and the
for-loopcould even be the faster one;
- When you have a large amount of data to proceed, in most cases, you should use