Big Data Analytics (second term 2021/22)
IMPORTANT INFO
This course will be conducted online using Zoom (Join Zoom Meeting: https://zoom.us/j/5469311951)
Course aims and objectives
In the 4-days course, students will learn how to employ some widely discussed methods advanced in the literature to analyze political texts and to extract from them useful information for texting their own theories.
First Lecture
9 March 2022 - Morning session
Theory class: An introduction to text analytics
Reference texts: (1; 2, 3)
Lab class: An introduction to the Quanteda package. scripts: Lab 1A script; datasets: a) Boston tweets sample; b) Inaugural US Presidential speeches sample (to open this file, please use the data compression tool WinRAR)
9 March 2022 - Afternoon session
Theory class: From words to positions: supervised & unsupervised scaling models
Reference texts (1; 2; 3; 4; 5)
Lab class: How to implement the Wordscores & the Wordfish algorithm (scripts: a) Wordscore example; b) Lab 1B script; c) Lab 1C script; dataset: a) UK party programs 1992 and 1997 (to open this file, please use the data compression tool WinRAR)
Second Lecture
10 March 2022 - Morning session
Theory class: From words to issues: unsupervised classification models
Reference text (1; 2):
10 March 2022 - Afternoon session
Theory class: (Part 1): From words to issues. First part: Structural Topic Models; Second part: Automatic Tagging
Reference texts (1):
Lab class: How to implement a STM and a dictionary (scripts: a) Lab 2B script; b) Lab 2C script) ; datasets: a) NyT; b) data for topical content analysis
Third Lecture
17 March 2022 - Morning session
Theory class: From words to issues: semi-supervised classification models
Reference text (1; 2):
Lab class: How to implement a semi-supervised classification model (scripts: a) Lab 3A script)
17 March 2022 - Afternoon session
Theory class: From words to issues. First part: an introduction to supervised classification models; Second part: supervised classification models (1)
Reference text (1):
Lab class: How to implement supervised classification models (scripts: a) Lab 3B script; b) Lab 3C script and slide; datasets: a) disaster training-set; b) disaster test-set)
Fourth Lecture
18 March 2022 - Morning session
Theory class: From words to issues. Supervised classification models (2)
Reference text (1):
Lab class: How to implement supervised classification models (scripts: a) Lab 4A; dataset: a) Nationality)
18 March 2022 - Afternoon session
Theory class: From words to issues. First part: How to validate the results you get from machine learning algorithms; Second part: The importance of the training set
Reference text (1, 2, 3, 4):
Lab class: How to run a cross-validation. (script: a) Lab 4B; b) Lab 4C; c) Lab 4D; d) Lab 4E; dataset: a) movie review; b) UK tweets training set)
Course Assignment (due: 18 April 2022) (Texts for the first part of the Assignment; Texts for the fourth part of the Assignment)
This course will be conducted online using Zoom (Join Zoom Meeting: https://zoom.us/j/5469311951)
Course aims and objectives
In the 4-days course, students will learn how to employ some widely discussed methods advanced in the literature to analyze political texts and to extract from them useful information for texting their own theories.
First Lecture
9 March 2022 - Morning session
Theory class: An introduction to text analytics
Reference texts: (1; 2, 3)
- Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297
- Benoit, Kenneth (2020). Text as data: An overview. In Luigi Curini and Robert Franzese (eds.), SAGE Handbook of Research Methods is Political Science & International Relations, London, Sage, chapter 26
- Grossman, Jonathan, and Pedahzur Ami (2020). Political Science and Big Data: Structured Data, Unstructured Data, and How to Use Them, Political Science Quarterly, 135(2): 225-257
Lab class: An introduction to the Quanteda package. scripts: Lab 1A script; datasets: a) Boston tweets sample; b) Inaugural US Presidential speeches sample (to open this file, please use the data compression tool WinRAR)
9 March 2022 - Afternoon session
Theory class: From words to positions: supervised & unsupervised scaling models
Reference texts (1; 2; 3; 4; 5)
- Proksch, Sven-Oliver, and Slapin, Jonathan B. 2008. A Scaling Model for Estimating Time-Series Party Positions from Texts. American Journal of Political Science, 52(3): 705-722.
- Proksch, Sven-Oliver, and Slapin, Jonathan B. 2009. How to Avoid Pitfalls in Statistical Analysis of Political Texts: The Case of Germany. German Politics, 18(3): 323-344.
- Curini, Luigi, Hino, Airo, and Atsushi Osaki. 2020. Intensity of government–opposition divide as measured through legislative speeches and what we can learn from it. Analyses of Japanese parliamentary debates, 1953–2013, Government and Opposition, 55(2), 184-201
- Laver, Michael, Kenneth Benoit, John Garry. 2003. Extracting Policy Positions from political texts using words as data. American Political Science Review, 97(02), 311-331
- Egerod, Benjamin C.K., and Robert Klemmensen (2020). Scaling Political Positions from text. Assumptions, Methods and Pitfalls. In Luigi Curini and Robert Franzese (eds.), SAGE Handbook of Research Methods is Political Science & International Relations, London, Sage, chapter 27
Lab class: How to implement the Wordscores & the Wordfish algorithm (scripts: a) Wordscore example; b) Lab 1B script; c) Lab 1C script; dataset: a) UK party programs 1992 and 1997 (to open this file, please use the data compression tool WinRAR)
Second Lecture
10 March 2022 - Morning session
Theory class: From words to issues: unsupervised classification models
Reference text (1; 2):
- Robert, Margaret E., Brandon M. Stewart, Dustin Tingley, Christopher Luca, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, David G. Rand. 2014. Structural Topic Models for Open-Ended Survey Response. American Journal of Political Science, 58(4), 1064-1082
- Roberts, Margaret E., Brandon M. Stewart, Dustin Tingley. 2014. STM: R Package for Structural Topic Models. Journal of Statistical Software
10 March 2022 - Afternoon session
Theory class: (Part 1): From words to issues. First part: Structural Topic Models; Second part: Automatic Tagging
Reference texts (1):
- Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297
Lab class: How to implement a STM and a dictionary (scripts: a) Lab 2B script; b) Lab 2C script) ; datasets: a) NyT; b) data for topical content analysis
Third Lecture
17 March 2022 - Morning session
Theory class: From words to issues: semi-supervised classification models
Reference text (1; 2):
- Kohei Watanabe and Yuan Zhou (2020) Theory-Driven Analysis of Large Corpora: Semisupervised Topic Classification of the UN Speeches. Social Science Computer Review, DOI: 10.1177/0894439320907027
- Shusei Eshima, Kosuke Imai, and Tomoya Sasaki (2020). Keyword Assisted Topic Models, arXiv:2004.05964v1
Lab class: How to implement a semi-supervised classification model (scripts: a) Lab 3A script)
17 March 2022 - Afternoon session
Theory class: From words to issues. First part: an introduction to supervised classification models; Second part: supervised classification models (1)
Reference text (1):
- Olivella, Santiago, and Shoub Kelsey (2020). Machine Learning in Political Science: Supervised Learning Models. In Luigi Curini and Robert Franzese (eds.), SAGE Handbook of Research Methods is Political Science & International Relations, London, Sage, chapter 56
Lab class: How to implement supervised classification models (scripts: a) Lab 3B script; b) Lab 3C script and slide; datasets: a) disaster training-set; b) disaster test-set)
Fourth Lecture
18 March 2022 - Morning session
Theory class: From words to issues. Supervised classification models (2)
Reference text (1):
- Olivella, Santiago, and Shoub Kelsey (2020). Machine Learning in Political Science: Supervised Learning Models. In Luigi Curini and Robert Franzese (eds.), SAGE Handbook of Research Methods is Political Science & International Relations, London, Sage, chapter 56
Lab class: How to implement supervised classification models (scripts: a) Lab 4A; dataset: a) Nationality)
18 March 2022 - Afternoon session
Theory class: From words to issues. First part: How to validate the results you get from machine learning algorithms; Second part: The importance of the training set
Reference text (1, 2, 3, 4):
- Grimmer, Justin, and Stewart, Brandon M. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297
- Cranmer, Skyler J. and Desmarais, Bruce A. (2017) What Can We Learn from Predictive Modeling?, Political Analysis, 25: 145-166
- Curini, Luigi, and Robert Fahey. 2020. Sentiment Analysis. In: Luigi Curini and Robert Franzese (eds.), Sage Handbook of Research Methods in Political Science and International Relations, London: Sage, chapter 29
- Barberá, Pablo et al. (2020). Automated Text Classification of News Articles: A Practical Guide. Political Analysis, DOI: 10.1017/pan.2020
Lab class: How to run a cross-validation. (script: a) Lab 4B; b) Lab 4C; c) Lab 4D; d) Lab 4E; dataset: a) movie review; b) UK tweets training set)
Course Assignment (due: 18 April 2022) (Texts for the first part of the Assignment; Texts for the fourth part of the Assignment)