Exploring interactions in a random forest model

Random forest methodology offers several strengths that make it attractive to data analysis problems in ecology and natural resources. (1) It can accommodate the “large p, small n” problem of many predictors and relatively few observations. (2) It can capture nonlinear relationships between the response and predictors. (3) It makes no distributional assumptions about predictors or response. (4) It does not break in the presence of highly correlated predictors. (5) It provides measures of variable importance. And, my focus here, (6) it can capture complex interactions among predictors. A classical partial dependence plot depicts the marginal effect of a given predictor on the response, “averaging out” the effects of all other predictors. Consequently, interactions are not visible in a partial plot and the depicted relationship may be misleading. Using a dataset that relates den locations to several forest structure metrics, I explore the potential for detection of interactions among predictors using Individual Conditional Expectation (ICE) plots (implemented in the R package ICEbox) and forest floor visualizations (implemented in the forestFloor package).

Susan Durham
Susan Durham
Utah State University