Software & Data

This page lists the data we collected and the software we developed in the course of our research.

Word ratings

  • Word ratings of age of acquisition, concreteness, valence, arousal and sensory modalities for thousands of Dutch and English words

Our Lexicon projects

Software

  • Wuggy, our pseudoword generator.
  • Vwr, an R package with utitily functions for visual word recognition research.
  • A small library for fast computation of average levenshtein distances (e.g., OLD20) in Python.
  • Duometer, a tool for detecting near-duplicate documents in text corpora.

Word Frequencies

  • different databases containing word frequency norms for several languages based on film subtitles

Word Prevalence

  • Word prevalence refers to the percentage of people who know the word. It is largely complementary to word frequency.
  • Here you find the measure for Dutch words.
  • Here you find the measure for English words.
  • Here you find word prevalence measures for English L2 speakers.
  • Here you find the measure for Spanish words.

Vocabulary tests (language proficiency)

Spelling tests

The Dutch Author Recognition Test

Picture stimuli

Semantic Vectors

List of megastudies with links to data (if available)

Outside Resources

  • Gao et al. (2022) developed SCOPE, a meta-database with an extensive, curated collection of psycholinguistic variable values for English from major databases. The metabase contains about 250 lexical variables, organized into 7 major categories. You can do online searches here.
  • Boris New and Manuel Gimenes published word Wordlex frequencies for 55 languages (many of those not yet well investigated). For the few new languages we tried out, they worked better than other measures avaiable. You can do online searches here.
  • Language goldmine has links to several hundreds of data-sets and other resources for psycholinguistic research.
  • Jack Taylor has made a shiny app that allows you to select stimulus materials based on some 60 variables (or matched on those variables).
  • Jamie Reilly has a website summarizing the main resources for English words.
  • Geoff Hollis and Chris Westbury calculated measures of valence, arousal, dominance, AoA, and concreteness for 80 thousand words on the basis of our ratings and semantic vectors.
  • The Word Association Database contains word associations to thousands of Dutch and English words. Very useful if you are interested in the meaning of words.
  • Colleagues from Northwestern University have used our databases to collect Dutch, English, French, German and Spanish phonological and orthographic cross-language neighborhood densities. Go and have a look at their Clearpond page.
  • Erin Buchanan and her group have created a bibliography of resources and a database of data they collected themselves. You find more information about the bibliography here.
  • Diconne et al. (2022) collected most emotional stimuli available for research, and made them available in a single resource; KAPODI.

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Comments are closed.