Skip to contents

Retrieves the proportion of close co-occurrences of two keywords in one of the three main corpora (historical press, Gallica books, Le Monde newspaper) by year or month.

Usage

gallicagram_cooccur(
  keyword_1,
  keyword_2,
  corpus = "lemonde",
  from = "earliest",
  to = "latest",
  resolution = "monthly",
  count_phrase = FALSE,
  cooccur_level = "grams"
)

Arguments

keyword_1

A character string. One of the two keywords to search.

keyword_2

A character string. The other keyword to search.

corpus

A character string. The corpus to search. The list of available corpora can be found in the list_corpora dataset.

from

An integer or "earliest". The starting year. If set to "earliest", it uses the earliest date at which the data is reliable for this corpus, as described in list_corpora.

to

An integer or "latest". The end year. If set to "latest", it uses the latest date at which the data is reliable for this corpus, as described in list_corpora.

resolution

A character string. For press and lemonde can be either "yearly" or "monthly". For books can only be "yearly".

count_phrase

If TRUE, counts the co-occurrences of each phrase containing both keywords. If FALSE, returns the number of times both keywords co-occur in each resolution period.

cooccur_level

character string. Either "grams" or "articles". The level at which to look for co-occurences of the two keywords: in 3-grams for "livres" and "presse" and in 4-grams or articles for "lemonde".

Value

A tibble. With the keyword, the number of occurrences (n_occur) or co-occurrences (n_cooccur), the total number of ngrams or aritcles over the period (n_total), the proportion of occurrences or co-occurrences of the keyword(s) over the period of a given observation (prop_occcur or prop_coocccur), either info about whether the total number is a number of grams or articles (n_of) or about the syntagma at which the co-occurrences are computed (cooccur_level), the date at the beginning of the period of a given observation (date), the corpus, the resolution, the year and potentially the month and day of the observation.

Details

Close co-occurrences correspond to the number of 3-grams (4-grams in the Le Monde corpus) that contain the two keywords.

This function is only available for the three main corpora (historical press, Gallica books, Le Monde newspaper).

It corresponds to the Contain route of the API.

Examples

  gallicagram_cooccur("président", "mauvais")
#> # A tibble: 937 × 11
#>    date       keyword_1 keyword_2 n_cooccur n_total prop_cooccur  year month
#>    <date>     <chr>     <chr>         <dbl>   <int>        <dbl> <int> <int>
#>  1 1944-12-01 président mauvais           0   79515            0  1944    12
#>  2 1945-01-01 président mauvais           0  166925            0  1945     1
#>  3 1945-02-01 président mauvais           0  162907            0  1945     2
#>  4 1945-03-01 président mauvais           0  201083            0  1945     3
#>  5 1945-04-01 président mauvais           0  189895            0  1945     4
#>  6 1945-05-01 président mauvais           0  208666            0  1945     5
#>  7 1945-06-01 président mauvais           0  220081            0  1945     6
#>  8 1945-07-01 président mauvais           0  245302            0  1945     7
#>  9 1945-08-01 président mauvais           0  239788            0  1945     8
#> 10 1945-09-01 président mauvais           0  246492            0  1945     9
#> # ℹ 927 more rows
#> # ℹ 3 more variables: corpus <chr>, resolution <chr>, cooccur_level <chr>