Software & Data

This page lists the data we collected and the software we developed in the course of our research.

Word ratings

  • Word ratings of age of acquisition, concreteness, valence, arousal and sensory modalities for thousands of Dutch and English words

Our Lexicon projects


  • Wuggy, our pseudoword generator.
  • Vwr, an R package with utitily functions for visual word recognition research.
  • A small library for fast computation of average levenshtein distances (e.g., OLD20) in Python.
  • Duometer, a tool for detecting near-duplicate documents in text corpora.

Word Frequencies

  • different databases containing word frequency norms for several languages based on film subtitles

Word Prevalence

  • Word prevalence refers to the percentage of people who know the word. It is largely complementary to word frequency.
  • Here you find the measure for Dutch words.
  • Here you find the measure for English words.
  • Here you find word prevalence measures for English L2 speakers.

Vocabulary tests (language proficiency)

Spelling tests

The Dutch Author Recognition Test

Picture stimuli

Semantic Vectors

List of megastudies with links to data (if available)

Outside Resources

  • Language goldmine has links to several hundreds of data-sets and other resources for psycholinguistic research.
  • Jack Taylor has made a shiny app that allows you to select stimulus materials based on some 60 variables (or matched on those variables).
  • Jamie Reilly has a website summarizing the main resources for English words.
  • Geoff Hollis and Chris Westbury calculated measures of valence, arousal, dominance, AoA, and concreteness for 80 thousand words on the basis of our ratings and semantic vectors.
  • LexicALL also contains many useful data-sets and other resources for psycholinguistic research.
  • The Word Association Database contains word associations to thousands of Dutch and English words. Very useful if you are interested in the meaning of words.
  • Colleagues from Northwestern University have used our databases to collect Dutch, English, French, German and Spanish phonological and orthographic cross-language neighborhood densities. Go and have a look at their Clearpond page.
  • Erin Buchanan and her group have created a bibliography of resources and a database of data they collected themselves. You find more information about the bibliography here.

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Comments are closed.