Close co-occurrences of two lexicons in a Gallicagram copus — gallicagram_cooccur

Retrieves the proportion of close co-occurrences of two lexicons in one of the three main corpora (historical press, Gallica books, Le Monde newspaper) by year or month.

Usage

gallicagram_cooccur_lexicon(
  lexicon_1,
  lexicon_2,
  corpus = "lemonde",
  from = "earliest",
  to = "latest",
  resolution = "monthly",
  cooccur_level = "grams"
)

Arguments

lexicon_1: A character vector. One of the two lexicons to search.
lexicon_2: A character vector. The other lexicon to search.
corpus: A character string. The corpus to search. The list of available corpora can be found in the list_corpora dataset.
from: An integer or "earliest". The starting year. If set to "earliest", it uses the earliest date at which the data is reliable for this corpus, as described in list_corpora.
to: An integer or "latest". The end year. If set to "latest", it uses the latest date at which the data is reliable for this corpus, as described in list_corpora.
resolution: A character string. For press and lemonde can be either "yearly" or "monthly". For books can only be "yearly".
cooccur_level: character string. Either "grams" or "articles". The level at which to look for co-occurences of the two keywords: in 3-grams for "livres" and "presse" and in 4-grams or articles for "lemonde".

Value

A tibble. With the keyword, the number of occurrences (n_occur) or co-occurrences (n_cooccur), the total number of ngrams or aritcles over the period (n_total), the proportion of occurrences or co-occurrences of the keyword(s) over the period of a given observation (prop_occcur or prop_coocccur), either info about whether the total number is a number of grams or articles (n_of) or about the syntagma at which the co-occurrences are computed (cooccur_level), the date at the beginning of the period of a given observation (date), the corpus, the resolution, the year and potentially the month and day of the observation.

Details

Close co-occurrences correspond to the number of 3-grams (4-grams in the Le Monde corpus) that contain each pair of keywords in the two lexicons.

This function simply loops the function gallicagram_cooccur over each word of each lexicon and sums the results. It can thus take some time to run.

It is only available for the three main corpora (historical press, Gallica books, Le Monde newspaper).

Examples

gallicagram_cooccur_lexicon(c("président", "présidente"), c("mauvais", "nul"))
#> # A tibble: 937 × 13
#>    date       n_cooccur n_total prop_cooccur  year month corpus  resolution
#>    <date>         <dbl>   <int>        <dbl> <int> <int> <chr>   <chr>     
#>  1 1944-12-01         0   79515            0  1944    12 lemonde monthly   
#>  2 1945-01-01         0  166925            0  1945     1 lemonde monthly   
#>  3 1945-02-01         0  162907            0  1945     2 lemonde monthly   
#>  4 1945-03-01         0  201083            0  1945     3 lemonde monthly   
#>  5 1945-04-01         0  189895            0  1945     4 lemonde monthly   
#>  6 1945-05-01         0  208666            0  1945     5 lemonde monthly   
#>  7 1945-06-01         0  220081            0  1945     6 lemonde monthly   
#>  8 1945-07-01         0  245302            0  1945     7 lemonde monthly   
#>  9 1945-08-01         0  239788            0  1945     8 lemonde monthly   
#> 10 1945-09-01         0  246492            0  1945     9 lemonde monthly   
#> # ℹ 927 more rows
#> # ℹ 5 more variables: cooccur_level <chr>, keyword_1 <chr>, keyword_2 <chr>,
#> #   lexicon_1 <chr>, lexicon_2 <chr>