Overview
rallicagram
is a R wrapper for the Gallicagram API.
Gallicagram is a super nice tool and set of databases that enables to easily run simple Natural Language Processing (NLP) analyses on a wide set of corpora. In particular, it enables to build historical time series of occurrences of keywords in various media in one line of code.
Installation
You can install the development version of rallicagram from GitHub with:
# install.packages("devtools")
devtools::install_github("vincentbagilet/rallicagram")
Usage
The main function, gallicagram
, builds a data frame with the yearly, monthly or daily proportion of mentions of a term.
library(rallicagram)
gallicagram(keyword = "président", corpus = "lemonde", from = 1960)
#> # A tibble: 756 × 10
#> date keyword n_occur n_total prop_occur year month corpus resolution
#> <date> <chr> <int> <int> <dbl> <int> <int> <chr> <chr>
#> 1 1960-01-01 président 1338 872943 0.00153 1960 1 lemon… monthly
#> 2 1960-02-01 président 1360 915672 0.00149 1960 2 lemon… monthly
#> 3 1960-03-01 président 1461 928764 0.00157 1960 3 lemon… monthly
#> 4 1960-04-01 président 1239 772707 0.00160 1960 4 lemon… monthly
#> 5 1960-05-01 président 1355 835612 0.00162 1960 5 lemon… monthly
#> 6 1960-06-01 président 1314 850245 0.00155 1960 6 lemon… monthly
#> 7 1960-07-01 président 1189 942062 0.00126 1960 7 lemon… monthly
#> 8 1960-08-01 président 979 739018 0.00132 1960 8 lemon… monthly
#> 9 1960-09-01 président 1506 904804 0.00166 1960 9 lemon… monthly
#> 10 1960-10-01 président 1107 826661 0.00134 1960 10 lemon… monthly
#> # ℹ 746 more rows
#> # ℹ 1 more variable: n_of <chr>
It enables to draw nice graphs representing the evolution of the use of a term in time, in two lines of code.
The package also allows to describe co-occurrences or words associated with a keyword. The corresponding functions are described in the vignette.
Corpora
The corpora available via Gallicagram are:
Corpus | Corpus Name | Reliable From | Reliable To | Nb Words | Max Length | Resolution |
---|---|---|---|---|---|---|
lemonde | Le Monde | 1944 | 2023 | 1.50e+09 | 4 | daily |
presse | Presse de Gallica | 1789 | 1950 | 5.70e+10 | 3 | monthly |
livres | Livres de Gallica | 1600 | 1940 | 1.60e+10 | 5 | yearly |
ddb | Deutsches Zeitungsportal (DDB) | 1780 | 1950 | 3.90e+10 | 2 | monthly |
american_stories | American Stories | 1798 | 1963 | 2.00e+10 | 3 | yearly |
paris | Journal de Paris | 1777 | 1827 | 8.60e+07 | 2 | daily |
moniteur | Moniteur Universel | 1789 | 1869 | 5.11e+08 | 2 | daily |
journal_des_debats | Journal des Débats | 1789 | 1944 | 1.20e+09 | 1 | daily |
la_presse | La Presse | 1836 | 1869 | 2.53e+08 | 2 | daily |
constitutionnel | Le Constitutionnel | 1821 | 1913 | 6.40e+07 | 2 | daily |
figaro | Le Figaro | 1854 | 1952 | 8.70e+08 | 2 | daily |
temps | Le Temps | 1861 | 1942 | 1.00e+09 | 2 | daily |
petit_journal | Le Petit Journal | 1863 | 1942 | 7.45e+08 | 2 | daily |
petit_parisien | Le Petit Parisien | 1876 | 1944 | 6.31e+08 | 2 | daily |
huma | L’Humanité | 1904 | 1952 | 3.18e+08 | 2 | daily |
subtitles | Opensubtitles (français) | 1935 | 2020 | 1.70e+07 | 3 | yearly |
subtitles_en | Opensubtitles (anglais) | 1930 | 2020 | 1.02e+08 | 3 | yearly |
rap | Rap (Genius) | 1989 | 2024 | 2.00e+07 | 5 | yearly |
Additional information on Gallicagram can be found on a preprint by Gallicagram developers Benoît de Courson and Benjamin Azoulay and on the “Notice” tab of the Gallicagram website.