Today’s session follows R for Data Science, Chapter 19.
The goal of today’s lesson is to learn the basics of writing your own functions. Why bother writing your own functions when you could just copy and paste your code to do the same thing in slightly different ways?
R for Data Science outlines three main advantages: “(1) You can give a function an evocative name that makes your code easier to understand.”
“(2) As requirements change, you only need to update the code in one place, instead of many.”
“(3) You eliminate the chance of making incidental mistakes when you copy and paste (e.g. updating a variable name in one place, but not in another).”
Consider writing a function anytime you find yourself copy-and-pasting a block of code many times.
For example, review this code:
df <- tibble::tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
df$a <- (df$a - min(df$a, na.rm = TRUE)) /
(max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) /
(max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) /
(max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) /
(max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
What does this code do?
There is an error in this code. Can you spot it?
Pick a name for the function. The name should clearly explain what the function does. What would you name the function you would write to perform the operation above?
List the inputs (“arguments”) to the function inside function
. A function with three inputs would look like function(x,y,z)
. There are generally two types of arguments: arguments that give the function the data it will work with, and arguments that specify the details of the computation. What would the input(s) be for the function needed to perform the operation above?
Place the code you have developed in the body of the function, a {
block that immediately follows function(...)
. What would the body of the function include, if you wanted to perform the operations above?
Generally, it’s easier to start with code that works and then turn it into a function, rather than going the other way around!
Functions generally have the following structure:
function_name <- function(x, y, z){ step1... step2... step3... }
Function output may be a scalar, vector, list, or matrix…think carefully about what you want the function to create!
Here’s a function that performs the operation we discussed earlier. Try running it with different inputs.
rescale01 <- function(x) {
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale01(c(0, 5, 10))
## [1] 0.0 0.5 1.0
This exercise is taken directly from R for Data Science, Section 19.3.1.
f1 <- function(string, prefix) {
substr(string, 1, nchar(prefix)) == prefix
}
f2 <- function(x) {
if (length(x) <= 1) return(NULL)
x[-length(x)]
}
f3 <- function(x, y) {
rep(y, length.out = length(x))
}
This exercise comes from R-exercises, as do the ones that follow.
Create a function that will return TRUE
if a given integer is inside a vector.
Create the function unique()
, which, given a vector, will return a new vector with the elements of the first vector with duplicated elements removed.
Create a function that, given a data frame and a number or character, will return the data frame with the character or number changed to NA.