Syllabus

Instructor

Course objectives and overview

This text as data course is a section of the Applied Machine Learning and Big Data Analysis class taught by Louise Inguere. It consists of two 3h-lectures.

Text data are ubiquitous. Recent methodological and technological developments now allow to study these data systematically, enabling economists to investigate new questions and to quantify concepts that were previously difficult to measure.

This course thus aims to provide an overview and understanding of: - Why text data is useful in economics research and what research questions it enables to address - A typical empirical workflow for text analysis - How to concretely implement these analyses in Python - An overview of the recent developments in the field - How this section relates to the rest of the class

Prerequisites

Prerequisites for this course include the first lectures of the Applied Machine Learning and Big Data Analysis class. This course also requires familiarity with basic coding as well as Python basics as taught in the rest of the class.

Caution

This course will make an extensive use of Python. Make sure to have it installed and working before class and to come to class with your laptops.

Outline of the course

Lecture 1

  1. Introduction
  2. Applications in economics
  3. Workflow for analysis
  4. Pre-processing

Lecture 2

  1. Transformation and representations
  2. Dictionary based methods
  3. Word embeddings
  4. Machine learning
  5. Introduction to deep learning methods

Resources

Handbooks and coding resources for NLP

In R but also worth reading even though you only use Python to do your text analyses. The code is easy to transpose and the structure of the books are great to understand the possibilities and methods in NLP:

  • Robinson and Silge (2017) a rather thorough introduction to text analysis
  • Hvitfeldt and Silge (2022) a very clear overview of supervised ML

Text analysis in economics

  • Ash and Hansen (2023) gives a great overview of the use of text analysis in economics
  • Gentzkow, Kelly, and Taddy (2019) is one of the key papers (and authors) in text analysis in economics
  • Dugoua, Dumas, and Noailly (2022) discuses the use of text analysis in environmental economics
  • Elliott Ash’s NLP class

References

Ash, Elliott, and Stephen Hansen. 2023. “Text Algorithms in Economics.” Annual Review of Economics 15 (1): 659–88. https://doi.org/10.1146/annurev-economics-082222-074352.
Dugoua, Eugenie, Marion Dumas, and Joëlle Noailly. 2022. “Text as Data in Environmental Economics and Policy.” Review of Environmental Economics and Policy 16 (2): 346–56. https://doi.org/10.1086/721079.
Gentzkow, Matthew, Bryan Kelly, and Matt Taddy. 2019. “Text as Data.” Journal of Economic Literature 57 (3): 535–74. https://doi.org/10.1257/jel.20181020.
Hvitfeldt, Emil, and Julia Silge. 2022. Supervised Machine Learning for Text Analysis in R.
Robinson, David, and Julia Silge. 2017. Text Mining with R.