by Joseph Rickert

We all "know" that correlation does not imply causation, that unmeasured and unknown factors can confound a seemingly obvious inference. But, who has not been tempted by the seductive quality of strong correlations?

Fortunately, it is also well known that a well done randomized experiment can account for the unknown confounders and permit valid causal inferences. But what can you do when it is impractical, impossible or unethical to conduct a randomized experiment? (For example, we wouldn't want to ask a randomly assigned cohort of people to go through life with less education to prove that education matters.) One way of coping with confounders when randomization is infeasible is to introduce what Economists call instrumental variables. This is a devilishly clever and apparently fragile notion that takes some effort to wrap one's head around.

On Tuesday October 20th, we at the Bay Area useR Group (BARUG) had the good fortune to have Hyunseung Kang describe the work that he and his colleagues at the Wharton School have been doing to extend the usefulness of instrumental variables. Hyunseung's talk started with elementary notions: like explaining the effectiveness of randomized experiments, described the essential notion of instrumental variables and developed the background necessary for understanding the new results in this area. The slides from Hyunseung's talk available for download in two parts from the BARUG website. As with most presentations, these slides are little more than the mute residue of talk itself. Nevertheless, Hyunseung makes such imaginative used of animation and build slides that the deck is worth working through.

The following slide from Hyunseung's presentation captures the essence of the instrumental approach.

The general idea is that one or more variables, the instruments, are added to the model for the purpose of inducing randomness into the outcome. This has to be done in a way that conforms with the three assumptions mentioned in the figure. The first assumption, A1, is that the instrument variables are relevant to the process. The second assumption, A2, states that randomness is only induced into the exposure variables and not also into the outcome. The third assumption, A3, is a strong one: there are no unmeasured confounders. The claim is that if these three assumptions are met then causal effects can be estimated with coefficients for the exposure variables that are consistent and asymptotically unbiased.

In the education example developed by Hyunseung, the instrumental variables are the subject's proximity to 2 year and 4 year colleges. Here is where the "rubber meets the road" so to speak. Assessing the relevancy of the instrumental variables and interpreting their effects are subject to the kinds of difficulties described by Andrew Gelman in his post of a few years back.

In the second part of his presentation Hyunseung presents new work: (1) two methods that provide robust confidence intervals when assumption A1 is violated, (2) a method for implementing a sensitivity analysis to assess the sensitivity of an instrumental variable model to violations of assumptions A2 and A3, and (3) the R package ivmodel that ties it all together.

To delve even deeper into this topic have a look at the paper: Instrumental Variables Estimation With Some Invalid Instruments and its Application to Mendelian Randomization.

It would probably be helpful for naive readers if you clarified things in:

The claim is that if these three assumptions are met then causal effects can be estimated with coefficients for the exposure variables that are consistent and asymptotically unbiased.

A bit. Instrumental Variable Estimators are used to estimate the Local Average Treatment Effect - sometimes referenced as the Complier Average Treatment Effect in the medical literature. The estimator is local to the conditions specified in the first stage equation (e.g., it is an unbiased estimator but the generalizability of the estimate is reduced to a subset within the available sample and the analogous population). Additionally, IVE are used relatively frequently in the context of true randomized controlled trials when the intent - or offer - to treat is not always accepted or when there are issues with compliance to the assigned groups. In these cases a just identified first stage with the initial random assignment is sufficient to identify the causal effect of the treatment on the treated. A good reference for these types of estimators is Angrist and Pischke (2008). Mostly Harmless Econometrics. Princeton, NJ: Princeton University Press.

Posted by: Billy Buchanan | October 30, 2015 at 00:00

I'm confused about the AR significance values. It's stated as a test of the A3 assumption, but I don't know what that means. Is a significant results suggesting the assumption HAS NOT been violated or HAS?

Posted by: Edomaniac | November 05, 2015 at 08:57

@Billy_Buchanan: This is a great point and thanks for mentioning this. Typically, along with the three “core” assumptions (A1)-(A3), one needs to make additional assumptions to point-identify the treatment effect. Usually, these assumptions revolve around population homogeneity. For example, the one you mentioned, the local average treatment effect (LATE), reduces homogeneity in the population by making an assumption that no "defiers" exist in the study, e.g. in the education example, those who are encouraged to go to finish high school defy their encouragement and instead, don't finish high school. The other popular homogeneity assumption is based on some form of structural models, which typically identify the treatment on the treated (ToT) (see Hernan and Robins (2006) "Instruments for causal inference: an epidemiologist's dream?" for a survey of various assumptions to point-identify the treatment effect).

For better or for worse, most users of IV assume a homogeneous treatment effect, making the LATE equal to ToT, which equals the average treatment effect, and the presentation above takes this view. Also, while a semi-ideal use of IV methods would use RCT data where non-compliance is present, since under this setup, assumption (A3) is automatically satisfied, again, for better or for worse, it's common for users of IV to use observational data and use "exogenous variation" as instruments.

In fact, the instrumental variables (IV) literature is pretty extensive and the presentation, for better or for worse, only went through the most common usage of IV. It also focused on the users' awareness of the "core" assumptions of IV, the (A1)-(A3), under this common setting. But, it's definitely important to know the subtleties of the estimands in IV, specifically the additional layer of assumptions regarding homogeneity, and it's definitely worth reading more about it.

@Edomaniac. The AR test, roughly speaking, tests the null hypothesis of no treatment effect, i.e. H_0: beta=0 versus the two-sided alternative where beta is the treatment effect, although the null, to varying degree, also includes a test of exogeneity, i.e. Assumption (A3). In any case, the R printout above for the AR test is a robust procedure when the assumption (A1) is violated. That is, even if assumption (A1) is violated for this data, the p-value and the resulting confidence interval should still provide "honest" information about the true treatment effect, i.e. beta, in the model.

All the methods for violation of IV assumptions (A1)-(A3) reported in the R printout above are methods that will provide ``honest'' information about the treatment effect under the model, even if the violations of assumptions were to occur. More information about these methods can be found in the paper that accompanies this R software ivmodel by Jiang, Kang, and Small (2015) “ivmodel: An R Package for Inference and Sensitivity Analysis of Instrumental Variables Models with One Endogenous Variable.”

Posted by: Hyunseung Kang | November 05, 2015 at 11:13