Close co-occurrences of two keywords in a Gallicagram copus
gallicagram_cooccur.Rd
Retrieves the proportion of close co-occurrences of two keywords in one of the three main corpora (historical press, Gallica books, Le Monde newspaper) by year or month.
Usage
gallicagram_cooccur(
keyword_1,
keyword_2,
corpus = "lemonde",
from = "earliest",
to = "latest",
resolution = "monthly",
count_phrase = FALSE,
cooccur_level = "grams"
)
Arguments
- keyword_1
A character string. One of the two keywords to search.
- keyword_2
A character string. The other keyword to search.
- corpus
A character string. The corpus to search. The list of available corpora can be found in the
list_corpora
dataset.- from
An integer or "earliest". The starting year. If set to "earliest", it uses the earliest date at which the data is reliable for this corpus, as described in
list_corpora
.- to
An integer or "latest". The end year. If set to "latest", it uses the latest date at which the data is reliable for this corpus, as described in
list_corpora
.- resolution
A character string. For press and lemonde can be either "yearly" or "monthly". For books can only be "yearly".
- count_phrase
If TRUE, counts the co-occurrences of each phrase containing both keywords. If FALSE, returns the number of times both keywords co-occur in each resolution period.
- cooccur_level
character string. Either "grams" or "articles". The level at which to look for co-occurences of the two keywords: in 3-grams for "livres" and "presse" and in 4-grams or articles for "lemonde".
Value
A tibble. With the keyword
,
the number of occurrences (n_occur
) or co-occurrences
(n_cooccur
), the total number of ngrams or aritcles
over the period (n_total
),
the proportion of occurrences or co-occurrences of the keyword(s) over the
period of a given observation (prop_occcur
or prop_coocccur
),
either info about whether the total number is a number of grams or articles
(n_of
) or about
the syntagma at which the co-occurrences are computed (cooccur_level
),
the date at the beginning of the period of a given observation (date
),
the corpus
, the resolution
,
the year
and
potentially the month
and day
of the observation.
Details
Close co-occurrences correspond to the number of 3-grams (4-grams in the Le Monde corpus) that contain the two keywords.
This function is only available for the three main corpora (historical press, Gallica books, Le Monde newspaper).
It corresponds to the Contain
route of the API.
Examples
gallicagram_cooccur("président", "mauvais")
#> # A tibble: 937 × 11
#> date keyword_1 keyword_2 n_cooccur n_total prop_cooccur year month
#> <date> <chr> <chr> <dbl> <int> <dbl> <int> <int>
#> 1 1944-12-01 président mauvais 0 79515 0 1944 12
#> 2 1945-01-01 président mauvais 0 166925 0 1945 1
#> 3 1945-02-01 président mauvais 0 162907 0 1945 2
#> 4 1945-03-01 président mauvais 0 201083 0 1945 3
#> 5 1945-04-01 président mauvais 0 189895 0 1945 4
#> 6 1945-05-01 président mauvais 0 208666 0 1945 5
#> 7 1945-06-01 président mauvais 0 220081 0 1945 6
#> 8 1945-07-01 président mauvais 0 245302 0 1945 7
#> 9 1945-08-01 président mauvais 0 239788 0 1945 8
#> 10 1945-09-01 président mauvais 0 246492 0 1945 9
#> # ℹ 927 more rows
#> # ℹ 3 more variables: corpus <chr>, resolution <chr>, cooccur_level <chr>