Use in economics and pre-processing
Objective
Discuss some of the main uses of text as data in economics and describe the necessary steps to prepare your text data for analysis.
Summary
Text data is ubiquitous and developments in text analysis allow us to study economics questions that were not possible to explore before. The present lecture describes some of the type of analyses that can be implemented with text data offer. Yet, before the concrete analyses, one needs to gather and pre-process data. We thus discuss these steps in the second part of the lecture.
Session Outline
- Introduction
- Specificity of text data
- Why using text data in economics?
- Outline of the course and resources
- Applications in economics
- Measuring document similarity
- Concept detection
- Relation between concepts
- Associating text with metadata
- Workflow for analysis
- Gathering data
- Common data sources and ways to gather data
- Introduction to web scraping
- Optical Character Recognition
- Pre-processing
- Tokenization
- Capitatilization and punctuation
- Stemming/lemmatization
- Stopwords
Materials
Exercise
The assignment is available on the Portail des Etudes.
Specific resources for this lecture
If you should read only one thing
Ash and Hansen (2023) gives a great overview of the use of text analysis in economics and is the reference the first part this lecture is built on
Gathering data
Pre-processing
- Chapter 2 to 4 of Hvitfeldt and Silge (2022) a very clear overview of supervised ML (abstracting from the R coding)
- Lecture 3 of Elliott Ash’s NLP class
References
Ash, Elliott, and Stephen Hansen. 2023. “Text Algorithms in Economics.” Annual Review of Economics 15 (1): 659–88. https://doi.org/10.1146/annurev-economics-082222-074352.
Hvitfeldt, Emil, and Julia Silge. 2022. Supervised Machine Learning for Text Analysis in R.