Skip to contents

Retrieves the proportion of occurrences of a list of keywords in one of the corpora (historical press, Gallica books, Le Monde newspaper) by year, month or day.

Usage

gallicagram_lexicon(
  lexicon,
  corpus = "lemonde",
  from = "earliest",
  to = "latest",
  resolution = "monthly"
)

Arguments

lexicon

A character vector. Keywords to search. Can contain up to a 3-gram in the "books" and "press" corpora and a 4-gram in the "lemonde" corpus.

corpus

A character string. The corpus to search. The list of available corpora can be found in the list_corpora dataset.

from

An integer or "earliest". The starting year. If set to "earliest", it uses the earliest date at which the data is reliable for this corpus, as described in list_corpora.

to

An integer or "latest". The end year. If set to "latest", it uses the latest date at which the data is reliable for this corpus, as described in list_corpora.

resolution

A character string. Can only be "daily", "monthly" or "yearly". The finest available resolution for the corpus selected can be found in the resolution column of the list_corpora dataset.

Value

A tibble. With the first keyword in the vector, typically the one, the entire lexicon, the number of occurrences (n_occur) or co-occurrences (n_cooccur), the total number of ngrams over the period of a given observation (n_grams or n_ngrams), the proportion of occurrences or co-occurrences of the keyword(s) over the period of a given observation (prop_occcur or prop_coocccur), the date at the beginning of the period of a given observation (date), the corpus, the resolution, the year and potentially the month and day of the observation.

Details

This function sums the outputs of calls of gallicagram obtained for each keyword in the vector.

Can typically be used with the function get_same_stem().

Examples

if (FALSE) {
  gallicagram_lexicon(c("président", "présidentiel"))
}