Subtitle word frequencies for Spanish: SUBTLEX-ESP

Word frequency norms based on film subtitles have been shown to be better than word frequencies based on books and newspapers, because they are more representative for everyday language use. In all languages we tested, word frequencies based on a corpus of 40 million words from film subtitles predict more variance in word recognition times than word frequencies based on much larger written corpora.

Here you find the word frequencies for Spanish. Full information about the collection of the database can be found in our article (Cuetos et al., 2011).

You find an excel file with the SUBTLEX-ESP here.

Here you find a demo on how to easily enter the SUBTLEX frequencies into your stimulus Excel file.

Shortly after the publication of the list, it was brought to our attention that there were some copy errors in the original list of SUBTLEX-ESP , mainly involving non-ASCII characters. In addition, some words had two entries.

These problems became apparent in an article by Angeles Alonso, Fernandez, and Diez (2011) on oral frequency norms for Spanish words. Although SUBTLEX-ESP did reasonably well, its performance was less than we had expected.

We think we now have corrected all errors. The corrected version has 44,374 words in common with Angeles Alonso et al. (instead of 42,609). The correlation with the oral frequencies now is .72 (was .67). R² for the naming times of Cuetos & Barbon (2006) now is .308 (was .290); R² for the picture naming times from Cuetos, Ellis & Alvarez (1999) is .118 (was .033). There are no changes for the analyses reported by Cuetos et al. (2011).

To make sure you are using the correct version of SUBTLEX-ESP, check the following words:

  • cenar [dine] : should have a frequency count of 3721
  • verdad [truth] : should have a frequency count of 54203

We thank Manolo Perea and Maria Angeles Alonso for their feedback. If you find other problems in our databases, please let us know. Although we try to control our data as much as possible, it is impossible to completely avoid programming errors with such vast databases.

References:

Alonso, M.A., Fernandez, A., & Diez, E. (2011). Oral frequency norms for 67,979 Spanish words. Behavior Research Methods, 43, 449-458.

Cuetos, F., Glez-Nosti, M., Barbon, A., & Brysbaert, M. (2011). SUBTLEX-ESP: Spanish word frequencies based on film subtitles. Psicologica, 32, 133-143. pdf

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Comments are closed.