Occurrences of a list of keywords in a Gallicagram corpus
gallicagram_lexicon.Rd
Retrieves the proportion of occurrences of a list of keywords in one of the corpora (historical press, Gallica books, Le Monde newspaper) by year, month or day.
Usage
gallicagram_lexicon(
lexicon,
corpus = "lemonde",
from = "earliest",
to = "latest",
resolution = "monthly"
)
Arguments
- lexicon
A character vector. Keywords to search. Can contain up to a 3-gram in the "books" and "press" corpora and a 4-gram in the "lemonde" corpus.
- corpus
A character string. The corpus to search. The list of available corpora can be found in the
list_corpora
dataset.- from
An integer or "earliest". The starting year. If set to "earliest", it uses the earliest date at which the data is reliable for this corpus, as described in
list_corpora
.- to
An integer or "latest". The end year. If set to "latest", it uses the latest date at which the data is reliable for this corpus, as described in
list_corpora
.- resolution
A character string. Can only be "daily", "monthly" or "yearly". The finest available resolution for the corpus selected can be found in the
resolution
column of thelist_corpora
dataset.
Value
A tibble. With the first keyword
in the vector,
typically the one, the entire lexicon
,
the number of occurrences (n_occur
) or co-occurrences
(n_cooccur
), the total number of ngrams over the period of a
given observation (n_grams
or n_ngrams
),
the proportion of occurrences or co-occurrences of the keyword(s) over the
period of a given observation (prop_occcur
or prop_coocccur
),
the date at the beginning of the period of a given observation (date
),
the corpus
, the resolution
,
the year
and
potentially the month
and day
of the observation.
Details
This function sums the outputs of calls of gallicagram
obtained for
each keyword in the vector.
Can typically be used with the function get_same_stem()
.
Examples
if (FALSE) {
gallicagram_lexicon(c("président", "présidentiel"))
}