Marshall A. Taylor and Dustin S. Stoltz
This repository contains all R code and data necessary to reproduce the plots and simulations in our "Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts" paper in Sociological Science.
Recent methodological work at the intersection of culture, cognition, and computational methods has drawn attention to how cultural schemas can be "recovered" from social survey data. Defining cultural schemas as slowly learned, implicit, and unevenly-distributed relational memory structures, researchers show how schemas---or rather, the downstream consequences of people drawing upon them---can be operationalized and measured from domain-specific survey modules. Respondents can then be sorted into "classes" on the basis of the schema to which their survey response patterns best align. In this paper, we extend this "schematic class analysis" method to text data. We introduce concept class analysis (CoCA): a hybrid model that combines word embeddings and correlational class analysis to group documents across a corpus by the similarity of schemas recovered from them. We introduce the CoCA model, illustrate its validity and utility using simulations, and conclude with considerations for future research and applications.