-1

I found this previous thread here:

Range standardization (0 to 1) in R

This leads me to my question: when building a function to perform a calculation across all values in a vector, my understanding was that this scenario is when the use of for-loops would be necessary (because said calculation is being all applied to all vector values). However, apparently that is not the case. What am I misunderstanding?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453

1 Answers1

1

All of the basic arithmetic operations in R, and most of the basic numerical/mathematical functions, are natively vectorized; that is, they operate position-by-position on elements of vectors (or pairs of elements from matched vectors). So for example if you compute

c(1,3,5) + c(2,4,7)

You don't need an explicit for loop (although there is one in the underlying C code!) to get c(3,7,12) as the answer.

In addition, R has vector recycling rules; any time you call an operation with two vectors, the shorter automatically gets recycled to match the length of the longer one. R doesn't have scalar types, so a single number is stored as a numeric vector of length 1. So if we compute

(x-min(x))/(max(x)-min(x))

max(x) and min(x) are both of length 1, so the denominator is also of length 1. Subtracting min(x) from x, min(x) gets replicated/recycled to a vector the same length as x, and then the pairwise subtraction is done. Finally, the numerator (length == length(x)) is divided by the denominator (length 1) following similar rules.

In general, exploiting this built-in vectorization in R is much faster than writing your own for-loop, so it's definitely worth trying to get the hang of when operations can be vectorized.

This kind of vectorization is implemented in many higher-level languages/libraries that are specialized for numerical computation (pandas and numpy in Python, Matlab, Julia, ...)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453