Programing — Easy Ways to Measure the Performance of R Code

Graph by Hadley Wickham in Advanced R

This post will share my experience about how to measure the performance of code in R. As an R programmer, you may have heard

apply() functions are usually more efficient than for-loops.

However, you may ask that how can we measure this efficiency and how can we quantify the performance of our functions?

One of the most direct ways to know how long your code runs is using system.time():

From the above results, the elapsed time to run function for_fun is 0.13 seconds. However, to some extent, using system.time() is not very reliable because it just executes 1 and only 1 time - it’s hard to tell whether the status of your OS affects a lot during this execution.

Thus, I’d like to introduce microbenchmark - a package that brings you more comprehensive results about the performance of your code and allows you to choose how many times you want to execute your code.

Now you can find out after running for_fun(100000) with 100 times, the average execution time is 0.112 seconds, meanwhile, you could get other useful information such as the median time (0.107 seconds).

Besides the microbenchmark package, you can also make full use of the RStudio and it has a functionality called Profile on the toolbar. Here is the introduction document.

Now!! Let’s do more field tests on for-loops and apply() functions. Assume we have a fake.vector, which has 10000 number of numeric elements, a function called use_for_loop() that just does some conditional computation and a similar function called use_apply_family_funcs().

Firstly, we could use identical() to check whether these functions return the same result. Then, let’s measure their performance:

These results illustrate that with 500 executions, running apply() function averagely takes 12.7 milliseconds and it is 5 milliseconds quicker than the for-loop does. However, it does not mean apply() function is always a silver bullet compared with for-loop. Let’s make fake.vector only has 10 elements:

The for-loop averagely takes 22.1 microseconds per round, while the apply() function takes 29.0 microseconds. Thus, we could say:

  • When there is not too many data to be traversed and worked on, the speeds of for-loop and apply() function can almost be the same, and thefor-loop could even be the faster one;
  • When you have a large amount of data to proceed, in most cases, you should use apply() function.

Data Engineer. Interested in topics related to data engineering, marketing, and investment in general.