Write your own functions to study for the test

March 13, 2019

Motivation and goals

Today we’ll look at writing functions in R that have to do with a topic you may have been reviewing for tomorrow’s test.

I’ve gotten a number of questions about solving interaction problems from 2x2 tables, e.g., slide 51 of the interaction notes or question 2 part 3 from the practice exam. So let’s write some functions that will help us explore these problems.

Interaction from table of risks

Let’s look first at a table of risks like the following:

Proportion surviving at 3 years
E = 0 E = 1
G = 0 0.01 0.05
G = 1 0.04 0.10

What we’ve learned in class is how to calculate the measure of additive interaction with this data: \[ p_{11} - p_{01} - p_{10} + p_{00} = 0.10 - 0.05 - 0.04 + 0.01 = 0.02 \] But what if we want to calculate this from an arbitrary table? That is, what if we have lots of tables of this type and we want to create a function so that we can easily calculate this value?

We might write something like the following:

interact_add <- function(a, b, c, d) {
  int <- a - b - c + d
  return(int)
}

where the a, b, etc. arguments refer to the table cells. Then we could just plug them into the function like this:

interact_add(a = 0.01, b = 0.05, c = 0.04, d = 0.1)
## [1] 0.02

Since this value is greater than 0, we know we have interaction on the additive scale. Whether this is just statistical interaction, effect heterogeneity, or causal interaction depends on what confounding assumptions we’ve made! Let’s assume this is causal interaction (so, for example, both \(G\) and \(E\) were randomly assigned).

In words, this interaction means that the \(G = 1\) group gets an added “boost” from the drug \(E\) that the \(G = 0\) doesn’t get. What about on the multiplicative scale, though? Let’s write a function for that.

We know that the measure of multiplicative interaction is \[ RR_{11} \big / (RR_{01} \times RR_{10}) \] and we can calculate each of those risk ratios from the risks (really, probabilities of surviving!) we’ve been given: \[ RR_{11} = p_{11} \big / p_{00} \] \[ RR_{01} = p_{01} \big / p_{00} \] \[ RR_{10} = p_{10} \big / p_{00} \] Write a function that will calculate the measure of multiplicative interaction from the four cells:

interact_mult <- function(a, b, c, d) {
  # your code goes here
}

Test your code to make sure it gives you the correct answer (0.5) before you look at the solution. (I will make all my functions as explicit as possible, but obviously there are ways you could have made this code much shorter!)

interact_mult <- function(a, b, c, d) {
  RR11 <- d / a
  RR01 <- b / a
  RR10 <- c / a
  int <- RR11 / (RR01 * RR10)
  return(int)
}
interact_mult(a = 0.01, b = 0.05, c = 0.04, d = 0.1)

So, the multiplicative measure of interaction is negative (below 1). That means that on the relative scale, the drug \(E\) is actually less effective among the \(G = 1\) group. How is this possible, when we just saw that that group got that extra boost from \(E\)?

This occurs because of how relative measures depend on the baseline risk – the denominator of the risk ratio. This is true whether or not we’re in an interaction context. For example, let’s consider a drug which has two side effects: cardiac arrest and headache. On a normal day, the risk of cardiac arrest is 0.00001 and the risk of headache is 0.10 (I’m completely making these numbers up). If you take the drug, however, your risk of cardiac arrest goes up to 0.00005 and your risk of headache goes up to 0.2.

This means we have a risk ratio of 5 for cardiac arrest and 2 for headache. Does this mean that cardiac arrest is a “worse” side effect from the drug? (Well, yes, it is objectively worse than a headache but let’s just think about the numbers!) If 1,000,000 people take the drug, an extra 40 people (50 with, 10 without) will have cardiac arrest who wouldn’t have otherwise had it, but an extra 100,000 (200,000 with, 100,000 without) will have a headache. Now headache looks like a pretty bad side effect! In general, additive measures tell us more about the raw number of people who could be harmed or helped by some exposure or treatment.

The same is true of the additive measure of interaction. From the table above, the risk ratio (relative probability of survival) for \(E\) among the \(G = 0\) group is \(0.05/0.01 = 5\) and among the \(G = 1\) group it’s \(0.1/0.04 = 2.5\). So the drug appears to be less effective among the \(G = 1\) group (i.e., doesn’t increase durvival by as much). But that’s just because there were fewer people who would have survived without the drug in the \(G = 0\) group, which essentially “allows” its relative effect measures to be bigger (just like we saw with cardiac arrest). So we can use the direction of the additive interaction measure (if it is non-zero) to decide which group to treat: if it’s positive, the doubly-exposed group is going to get the most bang for their buck; if it’s negative, the singly-exposed group is.

Ok, back to R. We don’t really want two functions that do very similar things. Why not combine them? We can do this by adding an extra argument: let’s call it scale. The most straightforward way we can allow for the choice between the additive and multiplicative scale is just by including if statements for the two possibilities, and then copying the body of the previous functions.

interact <- function(a, b, c, d, scale) {
  if (scale == "additive") {
    int <- a - b - c + d
    return(int)
  }
  
  if (scale == "multiplicative") {
    RR11 <- d / a
    RR01 <- b / a
    RR10 <- c / a
    int <- RR11 / (RR01 * RR10)
    return(int)
  }
}

Try it!

interact(a = 0.01, b = 0.05, c = 0.04, d = 0.1, scale = "additive")
## [1] 0.02
interact(a = 0.01, b = 0.05, c = 0.04, d = 0.1, scale = "multiplicative")
## [1] 0.5

Great! But what if we send this to our friends and they try to put in some other value of scale, or none at all? Let’s see what will happen:

interact(a = 0.01, b = 0.05, c = 0.04, d = 0.1)
## Error in interact(a = 0.01, b = 0.05, c = 0.04, d = 0.1): argument "scale" is missing, with no default
interact(a = 0.01, b = 0.05, c = 0.04, d = 0.1, scale = "relative")

We get an error message for the first one, but the second just fails to produce output!

First of all, you should state these kinds of things in documentation you write for functions – either as part of an R package you’re creating, or as comments in your code. But you should also write good error messages so that users know what they’ve done wrong. One helpful function is stop(), which literally stops the execution of a function and prints an error message. So let’s rewrite the function:

interact <- function(a, b, c, d, scale = NULL) {
  # default to scale = NULL
  # so error if user doesn't provide an option
  if (is.null(scale)) stop("You must provide a scale", call. = FALSE)
  
  # and error if user provides the wrong option
  if (!scale %in% c("additive", "multiplicative")) {
    stop("You can choose either scale = 'additive' or scale = 'multiplicative'", 
         call. = FALSE)
  }
  
  if (scale == "additive") {
    int <- a - b - c + d
    return(int)
  }
  
  if (scale == "multiplicative") {
    RR11 <- d / a
    RR01 <- b / a
    RR10 <- c / a
    int <- RR11 / (RR01 * RR10)
    return(int)
  }
}

Obviously we could make a much more concise function and combine those error messages. But this is a really simple way to start off making error messages, which can even help you when you go back to use your own function later!

Now we get:

interact(a = 0.01, b = 0.05, c = 0.04, d = 0.1)
## Error: You must provide a scale
interact(a = 0.01, b = 0.05, c = 0.04, d = 0.1, scale = "relative")
## Error: You can choose either scale = 'additive' or scale = 'multiplicative'

Interaction from table with RRs

Some other interaction problems start you off with a table of risk ratios instead of the raw risks, like the one in part 2, question 3 of the practice exam, where we see this table:

Risk ratios for stomach flu
Unexposed Exposed
City 1 1 3
City 2 2 5

Since we don’t have the absolute risks, we can’t directly calculate the additive measure of interaction. Instead, we can calculate the RERI. Write a function that can take data from a table like the one above and calculate a multiplicative measure of interaction or the RERI, depending on which one the user wants. What error messages might you want to include?

Here’s one solution:

interact_RR <- function(a, b, c, d, scale = NULL) {
  if (is.null(scale)) stop("You must provide a scale", call. = FALSE)

  if (!scale %in% c("additive", "multiplicative")) {
    stop("You can choose either scale = 'additive' or scale = 'multiplicative'", call. = FALSE)
  }
  
  if (scale == "additive") {
    RERI <- d - b - c + 1
    warning("This is an RERI, not a direct measure of additive interaction")
    return(RERI)
  }
  
  if (scale == "multiplicative") {
    int <- d / (b * c)
    return(int)
  }
}

Notice that I provided a warning so that the RERI is not confused with the actual additive measure of interaction, since I used the scale = additive argument again.

interact_RR(1, 3, 2, 5, scale = "additive")
## Warning in interact_RR(1, 3, 2, 5, scale = "additive"): This is an RERI,
## not a direct measure of additive interaction
## [1] 1
interact_RR(1, 3, 2, 5, scale = "multiplicative")
## [1] 0.8333333

Again we have positive additive interaction but negative multiplicative interaction. So this must mean that more people are affected in the “doubly exposed” group. In this case that’s City 2. But that may seem weird to you – a city is an exposure? Yup. All that matters is how we are laying out the cells. We could have just as easily considered City 1 exposed and City 2 unexposed if we had made the table like this:

Risk ratios for stomach flu
Unexposed Exposed
City 2
City 1

Let’s think about how we would remake the table with the unexposed in City 2 as the common reference group (so cell a will continue to be 1). That’s the benefit of presenting interaction like this, instead of as stratified risk ratios with two reference groups. Try to fill it in on your own before looking below.

Risk ratios for stomach flu
Unexposed Exposed
City 2 1.0 2.5
City 1 0.5 1.5

Now we can use our function to see if we have additive interaction:

interact_RR(1, 2.5, 0.5, 1.5, scale = "additive")
## [1] -0.5

It’s negative! That means that fewer people are affected by being doubly exposed. In this case, the second exposure corresponded to City 1. So we should treat City 2 instead. Which matches our original conclusion! Note that the magnitude of the RERI is not the same, however. The RERI depends on the risk in the baseline, doubly-unexposed group. Since we’ve switched which group that is (from the unexposed in City 1 to the unexposed in City 2), it is natural that the magnitude of the RERI would change.

Is the same the case when we’re working with risks, as in our first example? Use your original function to calculate the measure of interaction for the rearranged table (as below) and see how it compares to our original answer.

Proportion surviving at 3 years
E = 0 E = 1
G = 1
G = 0
Again, try to fill it in before looking below.

Proportion surviving at 3 years
E = 0 E = 1
G = 1 0.04 0.10
G = 0 0.01 0.05
interact(a = 0.04, b = 0.1, c = 0.01, d = 0.05, scale = "additive")
## [1] -0.02

What about when you use the multiplicative measure of interaction?