Close co-occurrences of two lexicons in a Gallicagram copus
gallicagram_cooccur_lexicon.Rd
Retrieves the proportion of close co-occurrences of two lexicons in one of the three main corpora (historical press, Gallica books, Le Monde newspaper) by year or month.
Usage
gallicagram_cooccur_lexicon(
lexicon_1,
lexicon_2,
corpus = "lemonde",
from = "earliest",
to = "latest",
resolution = "monthly",
cooccur_level = "grams"
)
Arguments
- lexicon_1
A character vector. One of the two lexicons to search.
- lexicon_2
A character vector. The other lexicon to search.
- corpus
A character string. The corpus to search. The list of available corpora can be found in the
list_corpora
dataset.- from
An integer or "earliest". The starting year. If set to "earliest", it uses the earliest date at which the data is reliable for this corpus, as described in
list_corpora
.- to
An integer or "latest". The end year. If set to "latest", it uses the latest date at which the data is reliable for this corpus, as described in
list_corpora
.- resolution
A character string. For press and lemonde can be either "yearly" or "monthly". For books can only be "yearly".
- cooccur_level
character string. Either "grams" or "articles". The level at which to look for co-occurences of the two keywords: in 3-grams for "livres" and "presse" and in 4-grams or articles for "lemonde".
Value
A tibble. With the keyword
,
the number of occurrences (n_occur
) or co-occurrences
(n_cooccur
), the total number of ngrams or aritcles
over the period (n_total
),
the proportion of occurrences or co-occurrences of the keyword(s) over the
period of a given observation (prop_occcur
or prop_coocccur
),
either info about whether the total number is a number of grams or articles
(n_of
) or about
the syntagma at which the co-occurrences are computed (cooccur_level
),
the date at the beginning of the period of a given observation (date
),
the corpus
, the resolution
,
the year
and
potentially the month
and day
of the observation.
Details
Close co-occurrences correspond to the number of 3-grams (4-grams in the Le Monde corpus) that contain each pair of keywords in the two lexicons.
This function simply loops the function gallicagram_cooccur
over each
word of each lexicon and sums the results. It can thus take some time to run.
It is only available for the three main corpora (historical press, Gallica books, Le Monde newspaper).
Examples
gallicagram_cooccur_lexicon(c("président", "présidente"), c("mauvais", "nul"))
#> # A tibble: 937 × 13
#> date n_cooccur n_total prop_cooccur year month corpus resolution
#> <date> <dbl> <int> <dbl> <int> <int> <chr> <chr>
#> 1 1944-12-01 0 79515 0 1944 12 lemonde monthly
#> 2 1945-01-01 0 166925 0 1945 1 lemonde monthly
#> 3 1945-02-01 0 162907 0 1945 2 lemonde monthly
#> 4 1945-03-01 0 201083 0 1945 3 lemonde monthly
#> 5 1945-04-01 0 189895 0 1945 4 lemonde monthly
#> 6 1945-05-01 0 208666 0 1945 5 lemonde monthly
#> 7 1945-06-01 0 220081 0 1945 6 lemonde monthly
#> 8 1945-07-01 0 245302 0 1945 7 lemonde monthly
#> 9 1945-08-01 0 239788 0 1945 8 lemonde monthly
#> 10 1945-09-01 0 246492 0 1945 9 lemonde monthly
#> # ℹ 927 more rows
#> # ℹ 5 more variables: cooccur_level <chr>, keyword_1 <chr>, keyword_2 <chr>,
#> # lexicon_1 <chr>, lexicon_2 <chr>