Skip to contents

Retrieves the proportion of occurrences of a keyword in one of the corpora by year, month or day.

Usage

gallicagram(
  keyword,
  corpus = "lemonde",
  from = "earliest",
  to = "latest",
  resolution = "monthly",
  n_of = "grams",
  subcorpora = NULL
)

Arguments

keyword

A character string. Keyword to search. The string cannot contain more words than the max_length for this corpus, as indicated in the list_corpora dataset.

corpus

A character string. The corpus to search. The list of available corpora can be found in the list_corpora dataset.

from

An integer or "earliest". The starting year. If set to "earliest", it uses the earliest date at which the data is reliable for this corpus, as described in list_corpora.

to

An integer or "latest". The end year. If set to "latest", it uses the latest date at which the data is reliable for this corpus, as described in list_corpora.

resolution

A character string. Can only be "daily", "monthly" or "yearly". The finest available resolution for the corpus selected can be found in the resolution column of the list_corpora dataset.

n_of

A character string. The type of object to the compute number of occurrences for. If set to "grams", the function will compute the number of "grams" that correspond to the keyword for the given period. If set to "articles" (only available for lemonde and for unigrams, ie for keywords only made of one word), will compute the number of articles that contain the keyword for the given period.

subcorpora

A character vector. The subcorpora to consider. Only available for corpus = persee. The list of available Persee subcorpora can be found in the list_subcorpora dataset.

Value

A tibble. With the keyword, the number of occurrences (n_occur) or co-occurrences (n_cooccur), the total number of ngrams or aritcles over the period (n_total), the proportion of occurrences or co-occurrences of the keyword(s) over the period of a given observation (prop_occcur or prop_coocccur), either info about whether the total number is a number of grams or articles (n_of) or about the syntagma at which the co-occurrences are computed (cooccur_level), the date at the beginning of the period of a given observation (date), the corpus, the resolution, the year and potentially the month and day of the observation.

Details

This function corresponds to the Query route of the API.

Information regarding available characteristics of the corpus can be found in the list_corpora dataset.

Examples

  gallicagram("président")
#> # A tibble: 937 × 10
#>    date       keyword   n_occur n_total prop_occur  year month corpus resolution
#>    <date>     <chr>       <int>   <int>      <dbl> <int> <int> <chr>  <chr>     
#>  1 1944-12-01 président     102  125782   0.000811  1944    12 lemon… monthly   
#>  2 1945-01-01 président     306  262131   0.00117   1945     1 lemon… monthly   
#>  3 1945-02-01 président     248  256110   0.000968  1945     2 lemon… monthly   
#>  4 1945-03-01 président     327  313806   0.00104   1945     3 lemon… monthly   
#>  5 1945-04-01 président     565  299972   0.00188   1945     4 lemon… monthly   
#>  6 1945-05-01 président     406  327917   0.00124   1945     5 lemon… monthly   
#>  7 1945-06-01 président     462  341077   0.00135   1945     6 lemon… monthly   
#>  8 1945-07-01 président     633  387425   0.00163   1945     7 lemon… monthly   
#>  9 1945-08-01 président     740  376034   0.00197   1945     8 lemon… monthly   
#> 10 1945-09-01 président     401  382928   0.00105   1945     9 lemon… monthly   
#> # ℹ 927 more rows
#> # ℹ 1 more variable: n_of <chr>