Applied Scaling & Classification Techniques in Political Science using text data
(academic year 2024/25)
Syllabus
Course aims and objectives
Students will learn how to employ some widely discussed methods advanced in the literature to analyze political texts and to extract from them useful information for texting their own theories.
First Lecture
28/11/24 Theory: Theory: An introduction to text analytics
Reference texts: (1; 2)
28/11/24 Lab class: An introduction to the Quanteda package (a) packages to install for Lab 1; b) script for Lab 1: R script; c) script for Lab 1: Google Colab notebook; datasets: a) Boston tweets sample (.csv; .rds); b) Inaugural US Presidential speeches sample (to open this file, please use the data compression tool WinRAR); c) sample of Japanese legislatives speeches; EXTRA: a) R script to tokenize Japanese & Chinese; b) Google Colab notebook to open files stored in your Google Drive
Second Lecture
5/12/24 Theory: Unsupervised classification methods: the Topic Model (and beyond)
Reference text (1; 2):
First assignment (due: 12 December 2025) (dataset for the first part: Guardian 2013 - .csv; .rds) (dataset for the second part: Trump 2018 tweets. To open this file, use the command readRDS("Trump2018.rds"))
Third Lecture
12/12/24 Theory: (Part 1): Supervised classification methods: automatic tagging; (Part 2): An introduction to supervised classification models
Reference texts (1, 2, 3):
12/12/24 Lab class: How to implement dictionary models and a supervised classification model (a) packages to install for Lab 3; Lab 3 scripts (part I - dictionaries: R script; Google Colab Notebook; part II - NB model: R script; Google Colab Notebook); datasets for Lab 3: a) Laver and Garry policy dictionary; b) sample of tweets discussing Donal Trump (.rds file); c) disaster dataset - training-set (.csv; .rds); d) disaster dataset - test-set (.csv; .rds); e) US airlines training-set (.csv; .rds); f) US airlines test-set (.csv; .rds); EXTRA: a) converting an external dictionary to a Quanteda format; b) split-half reliability test; c) Meaning of compressed sparse matrices)
Second Assignment (due: 19 December 2024)
Fourth Lecture
19/12/24 Theory: (Part 1): Supervised classification models: the Random Forest; (Part 2): The importance of the training set
Reference text (1; 2; 3, 4):
19/12/24 Lab class: How to implement a RF: (a) packages to install for Lab 4; b) Lab 4 scripts (part I - Random Forest: R script; Google Colab Notebook; part II - Inter-coder reliability: R script; Google Colab Notebook); c) Google Colab notebook about Keras package)
Third Assignment (due: 9 January 2025) (datasets for Assignment 3: a) UK training set (.csv; .rds); b) UK test set (.csv; .rds))
Fifth Lecture
9/1/24 Theory: Theory: Neural Network Models
9/1/24 Lab class: How to implement a NN model (a) packages to install for Lab 5; b) Lab 5 script (Neural Network Models - R script; Google Colab Notebook)
Fourth Assignment (due: 16 January 2025)
Sixth Lecture
16/1/24 Theory: How to validate the results from a ML algorithm
Reference text (1, 2, 3, 4):
16/1/24 Lab class: How to compute internal and external validity of a ML algorithm (a) packages to install for Lab 6; b) Lab 6 scripts (part I: external validity - R script; Google Colab Notebook; part II: global interpretation - R script; Google Colab Notebook); c) functions to compute cross-validation: for 2 class labels; for more than 2 class labels (to open the two files, please use the data compression tool WinRAR); e) .rds file for the grid-search of the NN algorithm; f) rds. files for the global interpretation exercise (to open the file, please use the data compression tool WinRAR); g) Google Colab notebook about Text package
Fifth Assignment (due: 23 January 2025)
Seventh Lecture
23/1/24 Theory: Word Embedding techniques
Reference texts (1, 2):
23/1/24 Lab class: How to implement GloVe, word2vec and BERT (scripts: a) packages to install for Lab 7; b) Lab 7 scripts (part I: Glove and word2vec - R script; Google Colab Notebook; Part II: contextualized WE -R script; Google Colab Notebook; dataset & files for the lab: a) movie reviews dataset (.csv; .rds); b) social-disaster dataset (.csv; .rds); c) .rds file with the results of the RF model via permutation; d) pre-trained WE on Google news; d) pre-trained WE on Facebook posts; e) BERT results via text package (.rds file)
Sixth Assignment (due: 30 January 2025) (datasets for Assignment 6 (.csv; .rds))
Students will learn how to employ some widely discussed methods advanced in the literature to analyze political texts and to extract from them useful information for texting their own theories.
First Lecture
28/11/24 Theory: Theory: An introduction to text analytics
Reference texts: (1; 2)
- Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297
- Benoit, Kenneth (2020). Text as data: An overview. In Luigi Curini and Robert Franzese (eds.), SAGE Handbook of Research Methods is Political Science & International Relations, London, Sage, chapter 26
28/11/24 Lab class: An introduction to the Quanteda package (a) packages to install for Lab 1; b) script for Lab 1: R script; c) script for Lab 1: Google Colab notebook; datasets: a) Boston tweets sample (.csv; .rds); b) Inaugural US Presidential speeches sample (to open this file, please use the data compression tool WinRAR); c) sample of Japanese legislatives speeches; EXTRA: a) R script to tokenize Japanese & Chinese; b) Google Colab notebook to open files stored in your Google Drive
Second Lecture
5/12/24 Theory: Unsupervised classification methods: the Topic Model (and beyond)
Reference text (1; 2):
- Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297
- Robert, Margaret E., Brandon M. Stewart, Dustin Tingley, Christopher Luca, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, David G. Rand. 2014. Structural Topic Models for Open-Ended Survey Response. American Journal of Political Science, 58(4), 1064-1082
First assignment (due: 12 December 2025) (dataset for the first part: Guardian 2013 - .csv; .rds) (dataset for the second part: Trump 2018 tweets. To open this file, use the command readRDS("Trump2018.rds"))
Third Lecture
12/12/24 Theory: (Part 1): Supervised classification methods: automatic tagging; (Part 2): An introduction to supervised classification models
Reference texts (1, 2, 3):
- Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297
- Curini, Luigi, and Robert Fahey. 2020. Sentiment Analysis. In: Luigi Curini and Robert Franzese (eds.), Sage Handbook of Research Methods in Political Science and International Relations, London: Sage, chapter 29
- Barberá, Pablo et al. (2020). Automated Text Classification of News Articles: A Practical Guide. Political Analysis, DOI: 10.1017/pan.2020
12/12/24 Lab class: How to implement dictionary models and a supervised classification model (a) packages to install for Lab 3; Lab 3 scripts (part I - dictionaries: R script; Google Colab Notebook; part II - NB model: R script; Google Colab Notebook); datasets for Lab 3: a) Laver and Garry policy dictionary; b) sample of tweets discussing Donal Trump (.rds file); c) disaster dataset - training-set (.csv; .rds); d) disaster dataset - test-set (.csv; .rds); e) US airlines training-set (.csv; .rds); f) US airlines test-set (.csv; .rds); EXTRA: a) converting an external dictionary to a Quanteda format; b) split-half reliability test; c) Meaning of compressed sparse matrices)
Second Assignment (due: 19 December 2024)
Fourth Lecture
19/12/24 Theory: (Part 1): Supervised classification models: the Random Forest; (Part 2): The importance of the training set
Reference text (1; 2; 3, 4):
- Olivella, Santiago, and Shoub Kelsey (2020). Machine Learning in Political Science: Supervised Learning Models. In Luigi Curini and Robert Franzese (eds.), SAGE Handbook of Research Methods is Political Science & International Relations, London, Sage, chapter 56
- Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297
- Curini, Luigi, and Robert Fahey. 2020. Sentiment Analysis. In: Luigi Curini and Robert Franzese (eds.), Sage Handbook of Research Methods in Political Science and International Relations, London: Sage, chapter 29
- Barberá, Pablo et al. (2020). Automated Text Classification of News Articles: A Practical Guide. Political Analysis, DOI: 10.1017/pan.2020
19/12/24 Lab class: How to implement a RF: (a) packages to install for Lab 4; b) Lab 4 scripts (part I - Random Forest: R script; Google Colab Notebook; part II - Inter-coder reliability: R script; Google Colab Notebook); c) Google Colab notebook about Keras package)
Third Assignment (due: 9 January 2025) (datasets for Assignment 3: a) UK training set (.csv; .rds); b) UK test set (.csv; .rds))
Fifth Lecture
9/1/24 Theory: Theory: Neural Network Models
9/1/24 Lab class: How to implement a NN model (a) packages to install for Lab 5; b) Lab 5 script (Neural Network Models - R script; Google Colab Notebook)
Fourth Assignment (due: 16 January 2025)
Sixth Lecture
16/1/24 Theory: How to validate the results from a ML algorithm
Reference text (1, 2, 3, 4):
- Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297
- Cranmer, Skyler J. and Desmarais, Bruce A. (2017) What Can We Learn from Predictive Modeling?, Political Analysis, 25: 145-166
- Soren Jordan, Hannah L. Paul, Andrew Q. Philips, How to Cautiously Uncover the “Black Box” of Machine Learning Models for Legislative Scholars”, Legislative Studies Quarterly, 2022, https://onlinelibrary.wiley.com/doi/abs/10.1111/lsq.12378
- Arnold, Christian, Biedebach Luka, Küpfer Andreas, and Neunhoeffer Marcel. (2023). The Role of Hyperparameters in Machine Learning, Political Science Research and Methods, 2024
16/1/24 Lab class: How to compute internal and external validity of a ML algorithm (a) packages to install for Lab 6; b) Lab 6 scripts (part I: external validity - R script; Google Colab Notebook; part II: global interpretation - R script; Google Colab Notebook); c) functions to compute cross-validation: for 2 class labels; for more than 2 class labels (to open the two files, please use the data compression tool WinRAR); e) .rds file for the grid-search of the NN algorithm; f) rds. files for the global interpretation exercise (to open the file, please use the data compression tool WinRAR); g) Google Colab notebook about Text package
Fifth Assignment (due: 23 January 2025)
Seventh Lecture
23/1/24 Theory: Word Embedding techniques
Reference texts (1, 2):
- Rodriguez Pedro L. and Spirling Arthur (2022). Word Embeddings: What works, what doesn’t, and how to tell the difference for applied research, Journal of Politics, 84(1), 101-115
- Kjell, O., Giorgi, S., & Schwartz, H. A. (2023, May 1). The Text-Package: An R-Package for Analyzing and Visualizing Human Language Using Natural Language Processing and Transformers. Psychological Methods. Advance online publication. https://dx.doi.org/10.1037/met0000542
23/1/24 Lab class: How to implement GloVe, word2vec and BERT (scripts: a) packages to install for Lab 7; b) Lab 7 scripts (part I: Glove and word2vec - R script; Google Colab Notebook; Part II: contextualized WE -R script; Google Colab Notebook; dataset & files for the lab: a) movie reviews dataset (.csv; .rds); b) social-disaster dataset (.csv; .rds); c) .rds file with the results of the RF model via permutation; d) pre-trained WE on Google news; d) pre-trained WE on Facebook posts; e) BERT results via text package (.rds file)
Sixth Assignment (due: 30 January 2025) (datasets for Assignment 6 (.csv; .rds))