A data frame containing information about the available corpora
list_corpora.Rd
It includes the code name of the corpus, its plain language name, the years for which the data is reliable, the number of words in each corpus, the maximum length of the ngrams, and the resolution.
Usage
data("list_corpora")
Format
A data frame with 17 rows and 7 variables:
- corpus
Code name of the corpus.
- corpus_name
Plain language name of the corpus.
- reliable_from
The year at which the corpus starts being reliable.
- reliable_to
The year at which the corpus stops being reliable.
- nb_words
The number of words in the corpus.
- max_length
The maximum length of ngrams available.
- resolution
The finest available resolution (daily, monthly, yearly)
Examples
if (FALSE) {
# Load the dataset
data("list_corpora")
}