Date Published: 2020-04-20
Seo-young Silvia Kim, California Institute of Technology
R. Michael Alvarez, California Institute of Technology
Objective: What can machine learning tell us about who voted in 2016? There are numerous competing voter turnout theories, and a large number of covariates are required to assess which theory best explains turnout. This paper is a proof-of-concept that machine learning can help overcome this curse of dimensionality and reveal important insights in studies of political phenomena.
Methods: We use Fuzzy Forests, an extension of Random Forests, to screen variables for a parsimonious but accurate prediction. Fuzzy Forests achieve accurate variable importance measures in the face of high dimensional and highly correlated data. The data that we use is the 2016 Cooperative Congressional Election Study.
Results: Fuzzy Forests chose only a small number of covariates as major correlates of 2016 turnout and still boasted high predictive performance.
Conclusion: Our analysis provides three important conclusions about turnout in 2016: registration and voting procedures were important, political issues were important (especially Obamacare, climate change, and ﬁscal policy), but few demographic variables other than age were strongly associated with turnout. We conclude that Fuzzy Forests is an important methodology for studying over-determined questions in social sciences.