Syllabus
Instructor
Course objectives and overview
This text as data course is a section of the Applied Machine Learning and Big Data Analysis class taught by Louise Inguere. It consists of two 3h-lectures.
Text data are ubiquitous. Recent methodological and technological developments now allow to study these data systematically, enabling economists to investigate new questions and to quantify concepts that were previously difficult to measure.
This course thus aims to provide an overview and understanding of: - Why text data is useful in economics research and what research questions it enables to address - A typical empirical workflow for text analysis - How to concretely implement these analyses in Python - An overview of the recent developments in the field - How this section relates to the rest of the class
Prerequisites
Prerequisites for this course include the first lectures of the Applied Machine Learning and Big Data Analysis class. This course also requires familiarity with basic coding as well as Python basics as taught in the rest of the class.
This course will make an extensive use of Python. Make sure to have it installed and working before class and to come to class with your laptops.
Outline of the course
Lecture 1
- Introduction
- Applications in economics
- Workflow for analysis
- Pre-processing
Lecture 2
- Transformation and representations
- Dictionary based methods
- Word embeddings
- Machine learning
- Introduction to deep learning methods
Resources
Handbooks and coding resources for NLP
- spaCy 101: tutorial for
spaCy, a Python library for NLP - Natural Language Processing with Python: general Python book on NLP (with the
nltklibrary) - Companion Python notebooks to Ash and Hansen (2023)
In R but also worth reading even though you only use Python to do your text analyses. The code is easy to transpose and the structure of the books are great to understand the possibilities and methods in NLP:
Text analysis in economics
- Ash and Hansen (2023) gives a great overview of the use of text analysis in economics
- Gentzkow, Kelly, and Taddy (2019) is one of the key papers (and authors) in text analysis in economics
- Dugoua, Dumas, and Noailly (2022) discuses the use of text analysis in environmental economics
- Elliott Ash’s NLP class