Topics in Econometrics
2025-09-30
Questions on replication games?
On your research proposal?
Data viz is everywhere
We work with data, we routinely (need to) visualize it
Seems pretty simple, we all know how to make graphs
Sure BUT there are a few things we need to think about when visualizing data
Once you pay attention to data viz, it is fun, instructive and satisfying!
1. Why is data viz important?
2. Key data viz principles
3. Building a graph
4. Data viz for research in economics
Graphs to explore
Graphs to explain
They have different goals and audiences
Data often contain patterns; data viz can be tremendously helpful to identify them
But need to look at your raw data
That’s the role of the exploratory data analysis (EDA)
It may also help you
Country | ISO | 2020 | 2020 Ranking | 2025 | 2025 Ranking | Region | Pattern |
---|---|---|---|---|---|---|---|
Afghanistan | AFG | 62.30 | 122 | 17.88 | 175 | Asia-Pacific | Lower |
Albania | ALB | 69.75 | 84 | 58.18 | 80 | EU & Balkans | Lower |
Algeria | DZA | 54.48 | 146 | 44.64 | 126 | MENA | Lower |
Andorra | AND | 76.77 | 37 | 63.30 | 65 | EU & Balkans | Lower |
Angola | AGO | 66.08 | 106 | 52.67 | 100 | Africa | Lower |
Argentina | ARG | 71.22 | 64 | 56.14 | 87 | Americas | Lower |
Armenia | ARM | 71.40 | 61 | 73.96 | 34 | EECA | Higher |
Australia | AUS | 79.79 | 26 | 75.15 | 29 | Asia-Pacific | Lower |
Austria | AUT | 84.22 | 18 | 78.12 | 22 | EU & Balkans | Lower |
Azerbaijan | AZE | 41.52 | 168 | 25.47 | 167 | EECA | Lower |
Bahrain | BHR | 39.87 | 169 | 30.24 | 157 | MENA | Lower |
Bangladesh | BGD | 50.63 | 151 | 33.71 | 149 | Asia-Pacific | Lower |
Belarus | BLR | 50.25 | 153 | 25.73 | 166 | EECA | Lower |
Belgium | BEL | 87.43 | 12 | 80.12 | 18 | EU & Balkans | Lower |
Belize | BLZ | 72.50 | 53 | 68.32 | 47 | Americas | Lower |
Benin | BEN | 64.89 | 113 | 54.60 | 92 | Africa | Lower |
Bhutan | BTN | 71.10 | 67 | 32.62 | 152 | Asia-Pacific | Lower |
Bolivia | BOL | 64.63 | 114 | 54.09 | 93 | Americas | Lower |
Bosnia and Herzegovina | BIH | 71.49 | 58 | 56.33 | 86 | EU & Balkans | Lower |
Botswana | BWA | 76.44 | 39 | 57.64 | 81 | Africa | Lower |
Brazil | BRA | 65.95 | 107 | 63.80 | 63 | Americas | Lower |
Brunei | BRN | 50.35 | 152 | 53.47 | 97 | Asia-Pacific | Higher |
Bulgaria | BGR | 64.94 | 111 | 60.78 | 70 | EU & Balkans | Lower |
Burkina Faso | BFA | 76.53 | 38 | 52.25 | 105 | Africa | Lower |
Burundi | BDI | 44.67 | 160 | 45.44 | 125 | Africa | Higher |
Cambodia | KHM | 54.54 | 144 | 28.18 | 161 | Asia-Pacific | Lower |
Cameroon | CMR | 56.72 | 134 | 42.75 | 131 | Africa | Lower |
Canada | CAN | 84.71 | 16 | 78.75 | 21 | Americas | Lower |
Cape Verde | CPV | 79.85 | 25 | 74.98 | 30 | Africa | Lower |
Central African Republic | CAF | 57.13 | 132 | 60.15 | 72 | Africa | Higher |
Chad | TCD | 60.30 | 123 | 51.89 | 108 | Africa | Lower |
Chile | CHL | 72.69 | 51 | 62.25 | 69 | Americas | Lower |
China | CHN | 21.52 | 177 | 14.80 | 178 | Asia-Pacific | Lower |
Colombia | COL | 57.34 | 130 | 49.80 | 115 | Americas | Lower |
Comoros | COM | 70.23 | 75 | 59.27 | 75 | Africa | Lower |
Congo | COG | 63.44 | 118 | 60.58 | 71 | Africa | Lower |
Costa Rica | CRI | 89.47 | 7 | 73.09 | 36 | Americas | Lower |
Croatia | HRV | 71.49 | 59 | 64.20 | 60 | EU & Balkans | Lower |
Cuba | CUB | 36.19 | 171 | 26.03 | 165 | Americas | Lower |
Cyprus | CYP | 79.55 | 27 | 59.04 | 77 | EU & Balkans | Lower |
Cyprus North | CTU | 70.21 | 77 | 54.84 | 91 | EU & Balkans | Lower |
Czechia | CZE | 76.43 | 40 | 83.96 | 10 | EU & Balkans | Higher |
DR Congo | COD | 50.91 | 150 | 42.31 | 133 | Africa | Lower |
Denmark | DNK | 91.87 | 3 | 86.93 | 6 | EU & Balkans | Lower |
Djibouti | DJI | 23.27 | 176 | 25.36 | 168 | Africa | Higher |
Dominican Republic | DOM | 72.10 | 55 | 69.87 | 43 | Americas | Lower |
East Timor | TLS | 70.10 | 78 | 71.79 | 39 | Asia-Pacific | Higher |
Ecuador | ECU | 67.38 | 98 | 53.76 | 94 | Americas | Lower |
Egypt | EGY | 43.18 | 166 | 24.74 | 170 | MENA | Lower |
El Salvador | SLV | 70.30 | 74 | 41.19 | 135 | Americas | Lower |
Equatorial Guinea | GNQ | 43.62 | 165 | 48.68 | 118 | Africa | Higher |
Eritrea | ERI | 16.50 | 178 | 11.32 | 180 | Africa | Lower |
Estonia | EST | 87.39 | 14 | 89.46 | 2 | EU & Balkans | Higher |
Eswatini | SWZ | 54.85 | 141 | 52.86 | 98 | Africa | Lower |
Ethiopia | ETH | 67.18 | 99 | 36.92 | 145 | Africa | Lower |
Fiji | FJI | 72.59 | 52 | 71.20 | 40 | Asia-Pacific | Lower |
Finland | FIN | 92.07 | 2 | 87.18 | 5 | EU & Balkans | Lower |
France | FRA | 77.08 | 34 | 76.62 | 25 | EU & Balkans | Lower |
Gabon | GAB | 62.80 | 121 | 70.65 | 41 | Africa | Higher |
Gambia | GMB | 69.38 | 87 | 65.49 | 58 | Africa | Lower |
Georgia | GEO | 71.41 | 60 | 50.53 | 114 | EECA | Lower |
Germany | DEU | 87.84 | 11 | 83.85 | 11 | EU & Balkans | Lower |
Ghana | GHA | 77.74 | 30 | 67.13 | 52 | Africa | Lower |
Greece | GRC | 71.20 | 65 | 55.37 | 89 | EU & Balkans | Lower |
Guatemala | GTM | 64.26 | 116 | 40.32 | 138 | Americas | Lower |
Guinea | GIN | 65.66 | 110 | 52.53 | 103 | Africa | Lower |
Guinea-Bissau | GNB | 67.94 | 94 | 51.36 | 110 | Africa | Lower |
Guyana | GUY | 73.37 | 49 | 60.12 | 73 | Americas | Lower |
Haiti | HTI | 69.80 | 83 | 51.06 | 111 | Americas | Lower |
Honduras | HND | 51.80 | 148 | 38.51 | 142 | Americas | Lower |
Hong Kong | HKG | 69.99 | 80 | 39.86 | 140 | Asia-Pacific | Lower |
Hungary | HUN | 69.16 | 89 | 62.82 | 68 | EU & Balkans | Lower |
Iceland | ISL | 84.88 | 15 | 81.36 | 17 | EU & Balkans | Lower |
India | IND | 54.67 | 142 | 32.96 | 151 | Asia-Pacific | Lower |
Indonesia | IDN | 63.18 | 119 | 44.13 | 127 | Asia-Pacific | Lower |
Iran | IRN | 35.19 | 173 | 16.22 | 176 | MENA | Lower |
Iraq | IRQ | 44.63 | 162 | 30.69 | 155 | MENA | Lower |
Ireland | IRL | 87.40 | 13 | 86.92 | 7 | EU & Balkans | Lower |
Israel | ISR | 69.16 | 88 | 51.06 | 112 | MENA | Lower |
Italy | ITA | 76.31 | 41 | 68.01 | 49 | EU & Balkans | Lower |
Ivory Coast | CIV | 71.06 | 68 | 63.69 | 64 | Africa | Lower |
Jamaica | JAM | 89.49 | 6 | 75.83 | 26 | Americas | Lower |
Japan | JPN | 71.14 | 66 | 63.14 | 66 | Asia-Pacific | Lower |
Jordan | JOR | 57.92 | 128 | 35.25 | 147 | MENA | Lower |
Kazakhstan | KAZ | 45.89 | 157 | 39.34 | 141 | EECA | Lower |
Kenya | KEN | 66.28 | 103 | 49.41 | 117 | Africa | Lower |
Kosovo | XKX | 70.67 | 70 | 52.73 | 99 | EU & Balkans | Lower |
Kuwait | KWT | 65.70 | 109 | 44.06 | 128 | MENA | Lower |
Kyrgyzstan | KGZ | 69.81 | 82 | 37.46 | 144 | EECA | Lower |
Laos | LAO | 35.72 | 172 | 33.22 | 150 | Asia-Pacific | Lower |
Latvia | LVA | 81.44 | 22 | 81.82 | 15 | EU & Balkans | Higher |
Lebanon | LBN | 66.81 | 102 | 42.62 | 132 | MENA | Lower |
Lesotho | LSO | 69.55 | 86 | 52.07 | 107 | Africa | Lower |
Liberia | LBR | 67.75 | 95 | 66.61 | 54 | Africa | Lower |
Libya | LBY | 44.23 | 164 | 40.42 | 137 | MENA | Lower |
Liechtenstein | LIE | 80.48 | 24 | 83.42 | 12 | EU & Balkans | Higher |
Lithuania | LTU | 78.81 | 28 | 82.27 | 14 | EU & Balkans | Higher |
Luxembourg | LUX | 84.54 | 17 | 83.04 | 13 | EU & Balkans | Lower |
Madagascar | MDG | 72.32 | 54 | 50.80 | 113 | Africa | Lower |
Malawi | MWI | 70.68 | 69 | 59.20 | 76 | Africa | Lower |
Malaysia | MYS | 66.88 | 101 | 56.09 | 88 | Asia-Pacific | Lower |
Maldives | MDV | 70.07 | 79 | 52.46 | 104 | Asia-Pacific | Lower |
Mali | MLI | 65.88 | 108 | 48.23 | 119 | Africa | Lower |
Malta | MLT | 69.84 | 81 | 62.96 | 67 | EU & Balkans | Lower |
Mauritania | MRT | 67.46 | 97 | 67.52 | 50 | Africa | Higher |
Mauritius | MUS | 72.00 | 56 | 67.31 | 51 | Africa | Lower |
Mexico | MEX | 54.55 | 143 | 45.55 | 124 | Americas | Lower |
Moldova | MDA | 68.84 | 91 | 73.36 | 35 | EECA | Higher |
Mongolia | MNG | 70.39 | 73 | 52.57 | 102 | Asia-Pacific | Lower |
Montenegro | MNE | 66.17 | 105 | 72.83 | 37 | EU & Balkans | Higher |
Morocco | MAR | 57.12 | 133 | 48.04 | 120 | MENA | Lower |
Mozambique | MOZ | 66.21 | 104 | 52.63 | 101 | Africa | Lower |
Myanmar | MMR | 55.23 | 139 | 25.32 | 169 | Asia-Pacific | Lower |
Namibia | NAM | 80.75 | 23 | 75.35 | 28 | Africa | Lower |
Nepal | NPL | 64.90 | 112 | 55.20 | 90 | Asia-Pacific | Lower |
Netherlands | NLD | 90.04 | 5 | 88.64 | 3 | EU & Balkans | Lower |
New Zealand | NZL | 89.31 | 9 | 81.37 | 16 | Asia-Pacific | Lower |
Nicaragua | NIC | 64.19 | 117 | 22.83 | 172 | Americas | Lower |
Niger | NER | 71.75 | 57 | 57.05 | 83 | Africa | Lower |
Nigeria | NGA | 64.37 | 115 | 46.81 | 122 | Africa | Lower |
North Korea | PRK | 14.18 | 180 | 12.64 | 179 | Asia-Pacific | Lower |
North Macedonia | MKD | 68.72 | 92 | 70.44 | 42 | EU & Balkans | Higher |
Norway | NOR | 92.16 | 1 | 92.31 | 1 | EU & Balkans | Higher |
OECS | CSS | 76.22 | 44 | 68.08 | 48 | Americas | Lower |
Oman | OMN | 56.58 | 135 | 42.29 | 134 | MENA | Lower |
Pakistan | PAK | 54.48 | 145 | 29.62 | 158 | Asia-Pacific | Lower |
Palestine | PSE | 55.91 | 137 | 27.41 | 163 | MENA | Lower |
Panama | PAN | 70.22 | 76 | 66.75 | 53 | Americas | Lower |
Papua New Guinea | PNG | 76.07 | 46 | 58.35 | 78 | Asia-Pacific | Lower |
Paraguay | PRY | 67.03 | 100 | 56.84 | 84 | Americas | Lower |
Peru | PER | 69.06 | 90 | 42.88 | 130 | Americas | Lower |
Philippines | PHL | 56.46 | 136 | 49.57 | 116 | Asia-Pacific | Lower |
Poland | POL | 71.35 | 62 | 74.79 | 31 | EU & Balkans | Higher |
Portugal | PRT | 88.17 | 10 | 84.26 | 8 | EU & Balkans | Lower |
Qatar | QAT | 57.49 | 129 | 58.25 | 79 | MENA | Higher |
Romania | ROU | 74.09 | 48 | 66.42 | 55 | EU & Balkans | Lower |
Russia | RUS | 51.08 | 149 | 24.57 | 171 | EECA | Lower |
Rwanda | RWA | 49.66 | 155 | 35.84 | 146 | Africa | Lower |
Samoa | WSM | 81.75 | 21 | 69.28 | 44 | Asia-Pacific | Lower |
Saudi Arabia | SAU | 37.86 | 170 | 27.94 | 162 | MENA | Lower |
Senegal | SEN | 76.01 | 47 | 59.43 | 74 | Africa | Lower |
Serbia | SRB | 68.38 | 93 | 53.55 | 96 | EU & Balkans | Lower |
Seychelles | SYC | 71.34 | 63 | 68.56 | 45 | Africa | Lower |
Sierra Leone | SLE | 69.72 | 85 | 66.36 | 56 | Africa | Lower |
Singapore | SGP | 44.77 | 158 | 45.78 | 123 | Asia-Pacific | Higher |
Slovakia | SVK | 77.33 | 33 | 71.93 | 38 | EU & Balkans | Lower |
Slovenia | SVN | 77.36 | 32 | 74.06 | 33 | EU & Balkans | Lower |
Somalia | SOM | 44.55 | 163 | 40.49 | 136 | Africa | Lower |
South Africa | ZAF | 77.59 | 31 | 75.71 | 27 | Africa | Lower |
South Korea | KOR | 76.30 | 42 | 64.06 | 61 | Asia-Pacific | Lower |
South Sudan | SSD | 55.51 | 138 | 51.63 | 109 | Africa | Lower |
Spain | ESP | 77.84 | 29 | 77.35 | 23 | EU & Balkans | Lower |
Sri Lanka | LKA | 58.06 | 127 | 39.93 | 139 | Asia-Pacific | Lower |
Sudan | SDN | 44.67 | 159 | 30.34 | 156 | Africa | Lower |
Suriname | SUR | 82.50 | 20 | 74.49 | 32 | Americas | Lower |
Sweden | SWE | 90.75 | 4 | 88.13 | 4 | EU & Balkans | Lower |
Switzerland | CHE | 89.38 | 8 | 83.98 | 9 | EU & Balkans | Lower |
Syria | SYR | 27.43 | 174 | 15.82 | 177 | MENA | Lower |
Tajikistan | TJK | 44.66 | 161 | 32.21 | 153 | EECA | Lower |
Tanzania | TZA | 59.75 | 124 | 53.68 | 95 | Africa | Lower |
Thailand | THA | 55.06 | 140 | 56.72 | 85 | Asia-Pacific | Higher |
Togo | TGO | 70.67 | 71 | 48.03 | 121 | Africa | Lower |
Tonga | TON | 72.73 | 50 | 68.39 | 46 | Asia-Pacific | Lower |
Trinidad and Tobago | TTO | 76.78 | 36 | 79.71 | 19 | Americas | Higher |
Tunisia | TUN | 70.55 | 72 | 43.48 | 129 | MENA | Lower |
Turkey | TUR | 49.98 | 154 | 29.40 | 159 | EECA | Lower |
Turkmenistan | TKM | 14.56 | 179 | 19.14 | 174 | EECA | Higher |
Uganda | UGA | 59.05 | 125 | 37.61 | 143 | Africa | Lower |
Ukraine | UKR | 67.48 | 96 | 63.93 | 62 | EECA | Lower |
United Arab Emirates | ARE | 57.31 | 131 | 26.91 | 164 | MENA | Lower |
United Kingdom | GBR | 77.07 | 35 | 78.89 | 20 | EU & Balkans | Higher |
United States | USA | 76.15 | 45 | 65.49 | 57 | Americas | Lower |
Uruguay | URY | 84.21 | 19 | 65.18 | 59 | Americas | Lower |
Uzbekistan | UZB | 46.93 | 156 | 35.24 | 148 | EECA | Lower |
Venezuela | VEN | 54.34 | 147 | 29.21 | 160 | Americas | Lower |
Vietnam | VNM | 25.29 | 175 | 19.74 | 173 | Asia-Pacific | Lower |
Yemen | YEM | 41.75 | 167 | 31.45 | 154 | MENA | Lower |
Zambia | ZMB | 63.00 | 120 | 57.33 | 82 | Africa | Lower |
Zimbabwe | ZWE | 59.05 | 126 | 52.10 | 106 | Africa | Lower |
We can easily see patterns presented in certain ways, but if they are presented in other ways, they become invisible [..]
Following perception-based rules, we can present our data in such a way that the important and informative patterns stand out. If we disobey the rules, our data will be incomprehensible or misleading.
Ware, C. (2012). Information Visualization, Third Edition: Perception for Design
We have briefly discussed that before
There is a breadth of ways in which they can be misleading
Charts can be wrong. They can also be correct BUT misleading
See Defense Against Dishonest Charts on Flowing Data
There is actually a lot of theory behind data viz:
Worth learning about it and being aware of key principles
Leverage it to make better data viz
Ebbinghaus illusion
Law of simultaneous contrast
\(\text{data-ink ratio} = \dfrac{\text{data ink}}{\text{total ink on graph}}\)
Why make nice looking visualizations?
To trigger interest, to intrigue, to catch the eye
That affects how people perceive information
Nice looking visuals may be more memorable
In that sense, not everything is chartjunk
Pretty graphs are useful in data journalism and so on, but what about academia?
They are also only more pleasing to look at, they also make readers want to engage more with them
Maybe better to keep the design rather minimalist
Pretty does not always mean non-simple. Simple graphs have value.
Opinion on credibility partly based on aesthetics
“This paper did not receive the care it deserved” comment given to a now senior researcher when they submitted a paper with a sketchy graph
Original graphs may trigger interest
Familiar graphs may convey the point more easily
My take:
Use the best type of graph, regardless of its originality/familiarity
If it is different from what people are used to, make it easy to read
Know what is possible to do to find what you need
Know graph names (to search the internet regarding how to code them)
Refer to existing graph (type) galleries:
What do you want to show in your graph?
What is the main message you want to convey?
With the same data, you can:
Tell a lot of different stories
Emphasize different points
Making a graph = choosing a lighting for your data
In your graph, you may want to show a:
country | year | lifeExp |
---|---|---|
France | 1952 | 67.410 |
France | 1957 | 68.930 |
France | 1962 | 70.510 |
France | 1967 | 71.550 |
France | 1972 | 72.380 |
France | 1977 | 73.830 |
France | 1982 | 74.890 |
France | 1987 | 76.340 |
France | 1992 | 77.460 |
France | 1997 | 78.640 |
France | 2002 | 79.590 |
France | 2007 | 80.657 |
Japan | 1952 | 63.030 |
Japan | 1957 | 65.500 |
Japan | 1962 | 68.730 |
Japan | 1967 | 71.430 |
Japan | 1972 | 73.420 |
Japan | 1977 | 75.380 |
Japan | 1982 | 77.110 |
Japan | 1987 | 78.670 |
Japan | 1992 | 79.360 |
Japan | 1997 | 80.690 |
Japan | 2002 | 82.000 |
Japan | 2007 | 82.603 |
Niger | 1952 | 37.444 |
Niger | 1957 | 38.598 |
Niger | 1962 | 39.487 |
Niger | 1967 | 40.118 |
Niger | 1972 | 40.546 |
Niger | 1977 | 41.291 |
Niger | 1982 | 42.598 |
Niger | 1987 | 44.555 |
Niger | 1992 | 47.391 |
Niger | 1997 | 51.313 |
Niger | 2002 | 54.496 |
Niger | 2007 | 56.867 |
Explain orally what your graph represents!
What is on the x-axis?
What is on the y-axis?
What is the message you want to convey?
If need more than 7 colors or so:
If you start paying attention to data viz when you see them, you will see what works, what does not
It is a process that takes place in the “background”: you will learn quickly and not realize it
You will see nice looking graphs (and hopefully enjoy it)
You will build better and more impactful graphs i
Take-away messages
Build legible, understandable and nice looking graphs.
Have a title and explicit axes; present them.
Limit the number of colors you use. Use gray.
Label your graphs directly, add annotations.
Think twice before cutting the y-axis
Overall, facilitate the retrieval of information.
We often have to deal with a massive number of observations
We often want to display a specific type of graph: estimation output
Our analysis are often based on identifying assumptions that we can sometimes check through graphs
We often use very complex models. Visualization can help understand what we are actually estimating
We make graphs for different audiences
They thus need to be more or less polished
For you (and your future self): can be quite rough on the edges, but you will want to be able to understand it in the future
For presentations: you have some leeway for explaining orally your graphs
For the paper: there is only a couple of graphs in a paper; make them perfect
As rhetorical visualization tools for models
To explore our data
To check the validity of our models
As diagnostics
To communicate results
state | year | spirits | unemp | income | emppop | beertax | baptist | mormon | drinkage | dry | youngdrivers | miles | breath | jail | service | fatal | nfatal | sfatal | fatal1517 | nfatal1517 | fatal1820 | nfatal1820 | fatal2124 | nfatal2124 | afatal | pop | pop1517 | pop1820 | pop2124 | milestot | unempus | emppopus | gsp |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
al | 1982 | 1.37 | 14.4 | 10544.15 | 50.69204 | 1.539379 | 30.3557 | 0.32829 | 19.00 | 25.0063 | 0.211572 | 7233.887 | no | no | no | 839 | 146 | 99 | 53 | 9 | 99 | 34 | 120 | 32 | 309.438 | 3942002 | 208999.6 | 221553.4 | 290000.1 | 28516 | 9.7 | 57.8 | -0.0221248 |
al | 1983 | 1.36 | 13.7 | 10732.80 | 52.14703 | 1.788991 | 30.3336 | 0.34341 | 19.00 | 22.9942 | 0.210768 | 7836.348 | no | no | no | 930 | 154 | 98 | 71 | 8 | 108 | 26 | 124 | 35 | 341.834 | 3960008 | 202000.1 | 219125.5 | 290000.2 | 31032 | 9.6 | 57.9 | 0.0465583 |
al | 1984 | 1.32 | 11.1 | 11108.79 | 54.16809 | 1.714286 | 30.3115 | 0.35924 | 19.00 | 24.0426 | 0.211484 | 8262.990 | no | no | no | 932 | 165 | 94 | 49 | 7 | 103 | 25 | 118 | 34 | 304.872 | 3988992 | 197000.0 | 216724.1 | 288000.2 | 32961 | 7.5 | 59.5 | 0.0627978 |
al | 1985 | 1.28 | 8.9 | 11332.63 | 55.27114 | 1.652542 | 30.2895 | 0.37579 | 19.67 | 23.6339 | 0.211140 | 8726.917 | no | no | no | 882 | 146 | 98 | 66 | 9 | 100 | 23 | 114 | 45 | 276.742 | 4021008 | 194999.7 | 214349.0 | 284000.3 | 35091 | 7.2 | 60.1 | 0.0274900 |
al | 1986 | 1.23 | 9.8 | 11661.51 | 56.51450 | 1.609907 | 30.2674 | 0.39311 | 21.00 | 23.4647 | 0.213400 | 8952.854 | no | no | no | 1081 | 172 | 119 | 82 | 10 | 120 | 23 | 119 | 29 | 360.716 | 4049994 | 203999.9 | 212000.0 | 263000.3 | 36259 | 7.0 | 60.7 | 0.0321429 |
Look at the documentation: ?Fatalities
or help(Fatalities)
Understand the structure of your data
Rows: 336
Columns: 34
$ state <fct> al, al, al, al, al, al, al, az, az, az, az, az, az, az, a…
$ year <fct> 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1982, 1983, 198…
$ spirits <dbl> 1.37, 1.36, 1.32, 1.28, 1.23, 1.18, 1.17, 1.97, 1.90, 2.1…
$ unemp <dbl> 14.4, 13.7, 11.1, 8.9, 9.8, 7.8, 7.2, 9.9, 9.1, 5.0, 6.5,…
$ income <dbl> 10544.15, 10732.80, 11108.79, 11332.63, 11661.51, 11944.0…
$ emppop <dbl> 50.69204, 52.14703, 54.16809, 55.27114, 56.51450, 57.5098…
$ beertax <dbl> 1.53937948, 1.78899074, 1.71428561, 1.65254235, 1.6099070…
$ baptist <dbl> 30.3557, 30.3336, 30.3115, 30.2895, 30.2674, 30.2453, 30.…
$ mormon <dbl> 0.32829, 0.34341, 0.35924, 0.37579, 0.39311, 0.41123, 0.4…
$ drinkage <dbl> 19.00, 19.00, 19.00, 19.67, 21.00, 21.00, 21.00, 19.00, 1…
$ dry <dbl> 25.0063, 22.9942, 24.0426, 23.6339, 23.4647, 23.7924, 23.…
$ youngdrivers <dbl> 0.211572, 0.210768, 0.211484, 0.211140, 0.213400, 0.21552…
$ miles <dbl> 7233.887, 7836.348, 8262.990, 8726.917, 8952.854, 9166.30…
$ breath <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, no, n…
$ jail <fct> no, no, no, no, no, no, no, yes, yes, yes, yes, yes, yes,…
$ service <fct> no, no, no, no, no, no, no, yes, yes, yes, yes, yes, yes,…
$ fatal <int> 839, 930, 932, 882, 1081, 1110, 1023, 724, 675, 869, 893,…
$ nfatal <int> 146, 154, 165, 146, 172, 181, 139, 131, 112, 149, 150, 17…
$ sfatal <int> 99, 98, 94, 98, 119, 114, 89, 76, 60, 81, 75, 85, 87, 67,…
$ fatal1517 <int> 53, 71, 49, 66, 82, 94, 66, 40, 40, 51, 48, 72, 50, 54, 3…
$ nfatal1517 <int> 9, 8, 7, 9, 10, 11, 8, 7, 7, 8, 11, 19, 16, 14, 5, 2, 2, …
$ fatal1820 <int> 99, 108, 103, 100, 120, 127, 105, 81, 83, 118, 100, 104, …
$ nfatal1820 <int> 34, 26, 25, 23, 23, 31, 24, 16, 19, 34, 26, 30, 25, 14, 2…
$ fatal2124 <int> 120, 124, 118, 114, 119, 138, 123, 96, 80, 123, 121, 130,…
$ nfatal2124 <int> 32, 35, 34, 45, 29, 30, 25, 36, 17, 33, 30, 25, 34, 31, 1…
$ afatal <dbl> 309.438, 341.834, 304.872, 276.742, 360.716, 368.421, 298…
$ pop <dbl> 3942002, 3960008, 3988992, 4021008, 4049994, 4082999, 410…
$ pop1517 <dbl> 208999.6, 202000.1, 197000.0, 194999.7, 203999.9, 204999.…
$ pop1820 <dbl> 221553.4, 219125.5, 216724.1, 214349.0, 212000.0, 208998.…
$ pop2124 <dbl> 290000.1, 290000.2, 288000.2, 284000.3, 263000.3, 258999.…
$ milestot <dbl> 28516, 31032, 32961, 35091, 36259, 37426, 39684, 19729, 1…
$ unempus <dbl> 9.7, 9.6, 7.5, 7.2, 7.0, 6.2, 5.5, 9.7, 9.6, 7.5, 7.2, 7.…
$ emppopus <dbl> 57.8, 57.9, 59.5, 60.1, 60.7, 61.5, 62.3, 57.8, 57.9, 59.…
$ gsp <dbl> -0.022124760, 0.046558253, 0.062797837, 0.027489973, 0.03…
Unique | Missing Pct. | Mean | SD | Min | Median | Max | Histogram | |
---|---|---|---|---|---|---|---|---|
spirits | 157 | 0 | 1.8 | 0.7 | 0.8 | 1.7 | 4.9 | |
unemp | 100 | 0 | 7.3 | 2.5 | 2.4 | 7.0 | 18.0 | |
income | 333 | 0 | 13880.2 | 2253.0 | 9513.8 | 13763.1 | 22193.5 | |
emppop | 334 | 0 | 60.8 | 4.7 | 43.0 | 61.4 | 71.3 | |
beertax | 302 | 0 | 0.5 | 0.5 | 0.0 | 0.4 | 2.7 | |
baptist | 248 | 0 | 7.2 | 9.8 | 0.0 | 1.7 | 30.4 | |
mormon | 165 | 0 | 2.8 | 9.7 | 0.1 | 0.4 | 65.9 | |
drinkage | 12 | 0 | 20.5 | 0.9 | 18.0 | 21.0 | 21.0 | |
dry | 161 | 0 | 4.3 | 9.5 | 0.0 | 0.1 | 45.8 | |
youngdrivers | 335 | 0 | 0.2 | 0.0 | 0.1 | 0.2 | 0.3 | |
miles | 336 | 0 | 7890.8 | 1475.7 | 4576.3 | 7796.2 | 26148.3 | |
fatal | 291 | 0 | 928.7 | 934.1 | 79.0 | 701.0 | 5504.0 | |
nfatal | 206 | 0 | 182.6 | 188.4 | 13.0 | 135.0 | 1049.0 | |
sfatal | 177 | 0 | 109.9 | 108.5 | 8.0 | 81.0 | 603.0 | |
fatal1517 | 125 | 0 | 62.6 | 55.7 | 3.0 | 49.0 | 318.0 | |
nfatal1517 | 48 | 0 | 12.3 | 12.3 | 0.0 | 10.0 | 76.0 | |
fatal1820 | 171 | 0 | 106.7 | 104.2 | 7.0 | 82.0 | 601.0 | |
nfatal1820 | 88 | 0 | 33.5 | 33.2 | 0.0 | 24.0 | 196.0 | |
fatal2124 | 183 | 0 | 126.9 | 131.8 | 12.0 | 97.5 | 770.0 | |
nfatal2124 | 105 | 0 | 41.4 | 42.9 | 1.0 | 30.0 | 249.0 | |
afatal | 335 | 0 | 293.3 | 303.6 | 24.6 | 211.6 | 2094.9 | |
pop | 336 | 0 | 4930271.5 | 5073703.9 | 478999.7 | 3310503.2 | 28314028.0 | |
pop1517 | 316 | 0 | 230815.5 | 229896.3 | 21000.0 | 163000.2 | 1172000.2 | |
pop1820 | 331 | 0 | 249090.4 | 249345.6 | 21000.0 | 170982.3 | 1321004.4 | |
pop2124 | 328 | 0 | 336389.9 | 345304.4 | 30000.2 | 240999.9 | 1892998.1 | |
milestot | 335 | 0 | 37101.5 | 37454.4 | 3993.0 | 28483.5 | 241575.0 | |
unempus | 7 | 0 | 7.5 | 1.5 | 5.5 | 7.2 | 9.7 | |
emppopus | 7 | 0 | 60.0 | 1.6 | 57.8 | 60.1 | 62.3 | |
gsp | 336 | 0 | 0.0 | 0.0 | -0.1 | 0.0 | 0.1 | |
N | % | |||||||
year | 1982 | 48 | 14.3 | |||||
1983 | 48 | 14.3 | ||||||
1984 | 48 | 14.3 | ||||||
1985 | 48 | 14.3 | ||||||
1986 | 48 | 14.3 | ||||||
1987 | 48 | 14.3 | ||||||
1988 | 48 | 14.3 | ||||||
breath | no | 181 | 53.9 | |||||
yes | 155 | 46.1 | ||||||
jail | no | 241 | 71.7 | |||||
yes | 94 | 28.0 | ||||||
service | no | 273 | 81.2 | |||||
yes | 62 | 18.5 |
We will need to wrangle some of the variables (eg jail)
We have panel data, with repeated observations
We can make tons of graphs to understand what’s in our data
fatal_jail <- fatalities |>
ggplot(aes(x = jail_name, y = fatal)) +
# geom_hline(aes(yintercept = mean(fatal, na.rm = TRUE))) +
geom_jitter(width = 0.25) +
labs(
title = "Jail penaly for drunk driving and fatal accidents",
y = "Number of fatal accidents per state",
x = "State law on drunk driving"
)
fatal_jail_log <- fatal_jail +
scale_y_log10() +
labs(y = "Number of fatal accidents\nper state (log scale)")
# Evolution of law
law_evol <- fatalities |>
group_by(year) |>
# summarise(prop_jail = mean(jail, na.rm = TRUE))
summarise(prop_jail = mean(jail, na.rm = TRUE)) |>
ggplot(aes(year, prop_jail)) +
geom_line() +
labs(
title = "Adoption of jail sentence for drunk driving",
y = "Proportion of states with a jail law",
x = NULL
)
# Making the map
states_sf <- tigris::states(
cb = TRUE, resolution = "20m", year = 2024, progress_bar = FALSE) |>
tigris::shift_geometry() |>
rename(state = STUSPS)
fatalities_sf <- fatalities |>
filter(jail) |>
group_by(state) |>
mutate(first_year = min(year, na.rm = TRUE)) |>
ungroup() |>
filter(year == first_year) |>
dplyr::full_join(states_sf, by = join_by(state)) |>
sf::st_as_sf()
law_map <- fatalities_sf |>
ggplot() +
geom_sf(aes(fill = first_year), color = "white", linewidth = 0.1) +
scale_mediocre_c(pal = "portal", gradient = "left") +
theme_mediocre_map() +
labs(
title = "First year of adoption of the 'jail' law",
subtitle = "In the data set",
fill = NULL
) +
theme(axis.text.y = element_blank(),
axis.text.x = element_blank())
Formal statistical tests
Graphs only provide visual information that can hide informations or lack clarity
FALSE (N=241) | TRUE (N=94) | ||||||
---|---|---|---|---|---|---|---|
Mean | Std. Dev. | Mean | Std. Dev. | Diff. in Means | Std. Error | ||
year | 1984.9 | 2.0 | 1985.2 | 1.9 | 0.3 | 0.2 | |
spirits | 1.8 | 0.6 | 1.7 | 0.8 | -0.1 | 0.1 | |
unemp | 7.1 | 2.4 | 7.9 | 2.7 | 0.8 | 0.3 | |
income | 14078.8 | 2229.7 | 13326.7 | 2203.8 | -752.0 | 268.9 | |
emppop | 61.1 | 4.2 | 59.9 | 5.8 | -1.2 | 0.7 | |
beertax | 0.5 | 0.5 | 0.5 | 0.4 | -0.0 | 0.1 | |
baptist | 7.7 | 10.2 | 5.9 | 8.4 | -1.8 | 1.1 | |
mormon | 1.5 | 5.9 | 6.2 | 15.2 | 4.7 | 1.6 | |
drinkage | 20.5 | 0.8 | 20.3 | 1.0 | -0.2 | 0.1 | |
dry | 5.5 | 10.9 | 1.1 | 2.2 | -4.4 | 0.7 | |
youngdrivers | 0.2 | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | |
miles | 7822.9 | 1580.6 | 8057.9 | 1162.6 | 235.1 | 157.3 | |
fatal | 1034.4 | 1013.0 | 610.0 | 386.1 | -424.4 | 76.4 | |
nfatal | 205.2 | 205.2 | 116.6 | 85.9 | -88.6 | 15.9 | |
sfatal | 122.9 | 117.6 | 72.3 | 53.9 | -50.6 | 9.4 | |
fatal1517 | 69.8 | 60.6 | 42.1 | 27.9 | -27.7 | 4.8 | |
nfatal1517 | 13.9 | 13.4 | 7.8 | 5.8 | -6.1 | 1.1 | |
fatal1820 | 117.8 | 112.8 | 73.0 | 49.1 | -44.8 | 8.9 | |
nfatal1820 | 37.6 | 36.0 | 22.0 | 17.9 | -15.5 | 3.0 | |
fatal2124 | 141.3 | 143.8 | 83.9 | 56.9 | -57.4 | 11.0 | |
nfatal2124 | 46.8 | 46.7 | 25.9 | 20.7 | -20.9 | 3.7 | |
afatal | 320.8 | 331.0 | 212.9 | 175.9 | -107.9 | 28.0 | |
pop | 5631590.2 | 5449193.3 | 2883446.3 | 2170402.7 | -2748143.9 | 416321.6 | |
pop1517 | 264373.5 | 247820.4 | 135244.7 | 105382.4 | -129128.8 | 19312.6 | |
pop1820 | 285428.2 | 269031.3 | 145512.7 | 112279.9 | -139915.5 | 20843.2 | |
pop2124 | 384846.5 | 373060.8 | 196904.2 | 149978.8 | -187942.3 | 28579.4 | |
milestot | 42027.3 | 39724.9 | 22297.4 | 15710.6 | -19729.9 | 3028.8 | |
unempus | 7.6 | 1.5 | 7.4 | 1.4 | -0.2 | 0.2 | |
emppopus | 59.9 | 1.6 | 60.1 | 1.5 | 0.2 | 0.2 | |
gsp | 0.0 | 0.0 | 0.0 | 0.0 | -0.0 | 0.0 | |
N | Pct. | N | Pct. | ||||
breath | no | 103 | 42.7 | 77 | 81.9 | ||
yes | 138 | 57.3 | 17 | 18.1 | |||
service | no | 227 | 94.2 | 46 | 48.9 | ||
yes | 14 | 5.8 | 48 | 51.1 | |||
jail_name | Jail penalty | 0 | 0.0 | 94 | 100.0 | ||
No jail penalty | 241 | 100.0 | 0 | 0.0 |
The setting calls for a TWFE approach (staggered roll-out)
Identifying assumptions?
Threats to identification?
How to explore these? Are graphs helpful?
first_year <- fatalities |>
filter(jail) |>
group_by(state) |>
mutate(treat_year = min(year, na.rm = TRUE)) |>
ungroup() |>
filter(year == treat_year) |>
select(treat_year, state)
dat <- fatalities |>
left_join(first_year, by = join_by(state)) |>
mutate(
first_year = replace_na(treat_year, 0)
)
event_study_reg <- feols(
log(fatal) ~ sunab(treat_year, year) | state + year,
data = dat
)
event_study_graph <- event_study_reg |>
tidy(conf.int = TRUE) |>
mutate(term = as.integer(str_remove(term, "year::"))) |>
rbind(c(-1, rep(0, 6))) |>
ggplot(aes(x = term, y = estimate)) +
geom_point() +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
geom_vline(xintercept = -1) +
labs(
title = "Event study graph: law and impact on fatalities",
x = "Number of years relative to the introduction of the law",
y = "Estimate"
)
We will discuss that in the last session
Along with inference aspects
Let’s just consider a simple model for now, even though it might present obvious issues
reg_ctrl <- feols(
data = fatalities,
log(fatal) ~ jail + log(unemp) + log(income) | state + year
)
coef_plot <- modelsummary::modelplot(list(`No controls` = reg, `Controls` = reg_ctrl)) +
geom_vline(xintercept = 0) +
labs(title = "Coefficient plot", caption = "R package: modelsummary")
distrib_coef_plot <- reg_ctrl |>
broom::tidy() |>
ggplot(aes(y = term)) +
ggdist::stat_halfeye(
aes(xdist = distributional::dist_student_t(
df = df.residual(reg), mu = estimate, sigma = std.error)),
fill = colors_mediocre$complementary,
color = colors_mediocre$base,
# size = 10,
alpha = 0.6
) +
geom_vline(xintercept = 0) +
labs(
title = "Distribution of estimates",
x = "Point estimate",
y = NULL,
caption = "R package: ggdist"
)
ex_duration <- readRDS("~/Documents/Teaching/data_viz_summer/content/slides/data/ex_duration.RDS")
ex_duration <- readRDS("data/ex_duration.RDS")
raw <- ex_duration |>
ggplot(aes(x = date, y = duration)) +
geom_point() +
scale_y_log10() +
labs(
title = "Evolution of the duration of items in time",
x = NULL,
y = "Duration (in s)"
)
opacity <- ex_duration |>
ggplot(aes(x = date, y = duration)) +
geom_point(alpha = 0.01) +
scale_y_log10() +
labs(
title = "Evolution of the duration of items in time, by channel",
x = NULL,
y = "Duration (in s)"
) +
facet_wrap(~ channel)
heat_map <- ex_duration |>
ggplot(aes(x = date, y = duration)) +
geom_bin2d(bins = 70) +
scale_y_log10() +
labs(
title = "Evolution of the duration of items in time, by channel",
x = NULL,
y = "Duration (in s)"
) +
facet_wrap(~ channel) +
scale_mediocre_c(pal = "coty", gradient = "left") +
labs(fill = "Number of observations per tile")
binscatter <- ex_duration |>
ggplot(aes(x = date, y = duration)) +
geom_point(alpha = 0.01) +
scale_y_log10() +
stat_summary_bin(
fun.y = 'mean',
bins = 20,
geom = "point",
color = colors_mediocre[["complementary"]],
size = 3
) +
labs(
title = "Scatter and binscatter plot",
x = NULL,
y = "Duration (in s)"
)
Data viz is powerful, harness its power
It can be super insightful or equally deceptive
It can make your point memorable
It can also be truly beautiful
Leverage perception and data viz principles
Take-away messages
There are many rules in data viz
The main goal is to facilitate the transmission of your message
What is the main point you want to convey?
Choose (one of) the right graph types
Explain your graphs, orally and by facilitating reading, on your graph
BUT avoid clutter
Most data viz principles and ideas also apply to economics and academia in general
There are however some specificities
In particular, some graphs and types of analyses are specific to academic research
Data viz can be extremely useful for research and communication