ngrams that most frequently contain a keyword in a Gallicagram corpus over a period

Returns the most frequent ngrams containing a keyword over a given period.

Usage

gallicagram_with(
  keyword,
  corpus = "lemonde",
  from = "earliest",
  to = "latest",
  n_results = 20,
  after = FALSE,
  length = 2
)

Arguments

keyword: A character string. Keyword to search. The string cannot contain more words than the max_length for this corpus, as indicated in the list_corpora dataset.
corpus: A character string. The corpus to search. The list of available corpora can be found in the list_corpora dataset.
from: An integer or "earliest". The starting year. If set to "earliest", it uses the earliest date at which the data is reliable for this corpus, as described in list_corpora.
to: An integer or "latest". The end year. If set to "latest", it uses the latest date at which the data is reliable for this corpus, as described in list_corpora.
n_results: An integer. The number of most frequently associated words to return. n_results can also be set to "all" to return all the available results.
after: A boolean. Whether to consider only words following the keyword and not those preceding. Set to FALSE by default.
length: An integer. The length of the ngrams considered. Can be up to 3 in the "books" and "press" corpora and 4 in the "lemonde" corpus.

Value

A tibble. With the n_results most frequent ngrams containing the keyword searched (ngram) and the number of occurrences over the period (n_occur). It also returns the input parameters keyword, corpus, from and to.

Details

This function is only available for the three main corpora (historical press, Gallica books, Le Monde newspaper).

This function corresponds to the Joker route of the API, accessed through the 'Joker' function on the Gallicagram app. When length = 1, it is analogous to the 'Joker' function on Ngram Viewer.

It is analogous to gallicagram_with_month but for a period instead of a given month.

For instance "camarade" is often followed by "staline" or "khrouchtchev" in Le Monde. The function returns the most frequent ngrams of the form "camarade *" when setting after = TRUE. after = FALSE also includes the most frequent ngrams of the form "* camarade".

Searching the "presse" corpus can require a long running time.

Examples

  gallicagram_with("camarade", from = 1960, to = 1970)
#> # A tibble: 20 × 6
#>    n_occur ngram                 keyword  corpus   from    to
#>      <int> <chr>                 <chr>    <chr>   <dbl> <dbl>
#>  1     452 le camarade           camarade lemonde  1960  1970
#>  2     404 son camarade          camarade lemonde  1960  1970
#>  3     256 camarade de           camarade lemonde  1960  1970
#>  4     235 un camarade           camarade lemonde  1960  1970
#>  5     198 du camarade           camarade lemonde  1960  1970
#>  6     124 camarade khrouchtchev camarade lemonde  1960  1970
#>  7     113 leur camarade         camarade lemonde  1960  1970
#>  8      91 notre camarade        camarade lemonde  1960  1970
#>  9      68 d'un camarade         camarade lemonde  1960  1970
#> 10      62 au camarade           camarade lemonde  1960  1970
#> 11      52 mon camarade          camarade lemonde  1960  1970
#> 12      48 camarade dubcek       camarade lemonde  1960  1970
#> 13      44 camarade mao          camarade lemonde  1960  1970
#> 14      44 ancien camarade       camarade lemonde  1960  1970
#> 15      41 camarade et           camarade lemonde  1960  1970
#> 16      31 camarade waldeck      camarade lemonde  1960  1970
#> 17      27 camarade qui          camarade lemonde  1960  1970
#> 18      24 sa camarade           camarade lemonde  1960  1970
#> 19      24 camarade du           camarade lemonde  1960  1970
#> 20      23 camarade togliatti    camarade lemonde  1960  1970