Skip to contents

It includes the code name of the corpus, its plain language name, the years for which the data is reliable, the number of words in each corpus, the maximum length of the ngrams, and the resolution.

Usage

data("list_corpora")

Format

A data frame with 17 rows and 7 variables:

corpus

Code name of the corpus.

corpus_name

Plain language name of the corpus.

reliable_from

The year at which the corpus starts being reliable.

reliable_to

The year at which the corpus stops being reliable.

nb_words

The number of words in the corpus.

max_length

The maximum length of ngrams available.

resolution

The finest available resolution (daily, monthly, yearly)

Examples

if (FALSE) {
  # Load the dataset
  data("list_corpora")
}