Occurrences of a keyword in a Gallicagram copus
gallicagram.Rd
Retrieves the proportion of occurrences of a keyword in one of the corpora by year, month or day.
Usage
gallicagram(
keyword,
corpus = "lemonde",
from = "earliest",
to = "latest",
resolution = "monthly",
n_of = "grams",
subcorpora = NULL
)
Arguments
- keyword
A character string. Keyword to search. The string cannot contain more words than the
max_length
for this corpus, as indicated in thelist_corpora
dataset.- corpus
A character string. The corpus to search. The list of available corpora can be found in the
list_corpora
dataset.- from
An integer or "earliest". The starting year. If set to "earliest", it uses the earliest date at which the data is reliable for this corpus, as described in
list_corpora
.- to
An integer or "latest". The end year. If set to "latest", it uses the latest date at which the data is reliable for this corpus, as described in
list_corpora
.- resolution
A character string. Can only be "daily", "monthly" or "yearly". The finest available resolution for the corpus selected can be found in the
resolution
column of thelist_corpora
dataset.- n_of
A character string. The type of object to the compute number of occurrences for. If set to "grams", the function will compute the number of "grams" that correspond to the keyword for the given period. If set to "articles" (only available for lemonde and for unigrams, ie for keywords only made of one word), will compute the number of articles that contain the keyword for the given period.
- subcorpora
A character vector. The subcorpora to consider. Only available for
corpus = persee
. The list of available Persee subcorpora can be found in thelist_subcorpora
dataset.
Value
A tibble. With the keyword
,
the number of occurrences (n_occur
) or co-occurrences
(n_cooccur
), the total number of ngrams or aritcles
over the period (n_total
),
the proportion of occurrences or co-occurrences of the keyword(s) over the
period of a given observation (prop_occcur
or prop_coocccur
),
either info about whether the total number is a number of grams or articles
(n_of
) or about
the syntagma at which the co-occurrences are computed (cooccur_level
),
the date at the beginning of the period of a given observation (date
),
the corpus
, the resolution
,
the year
and
potentially the month
and day
of the observation.
Details
This function corresponds to the Query
route of the API.
Information regarding available characteristics of the corpus can be found
in the list_corpora
dataset.
Examples
gallicagram("président")
#> # A tibble: 937 × 10
#> date keyword n_occur n_total prop_occur year month corpus resolution
#> <date> <chr> <int> <int> <dbl> <int> <int> <chr> <chr>
#> 1 1944-12-01 président 102 125782 0.000811 1944 12 lemon… monthly
#> 2 1945-01-01 président 306 262131 0.00117 1945 1 lemon… monthly
#> 3 1945-02-01 président 248 256110 0.000968 1945 2 lemon… monthly
#> 4 1945-03-01 président 327 313806 0.00104 1945 3 lemon… monthly
#> 5 1945-04-01 président 565 299972 0.00188 1945 4 lemon… monthly
#> 6 1945-05-01 président 406 327917 0.00124 1945 5 lemon… monthly
#> 7 1945-06-01 président 462 341077 0.00135 1945 6 lemon… monthly
#> 8 1945-07-01 président 633 387425 0.00163 1945 7 lemon… monthly
#> 9 1945-08-01 président 740 376034 0.00197 1945 8 lemon… monthly
#> 10 1945-09-01 président 401 382928 0.00105 1945 9 lemon… monthly
#> # ℹ 927 more rows
#> # ℹ 1 more variable: n_of <chr>