Software & Data

This page lists the data we collected and the software we developed in the course of our research.

Word ratings

  • Word ratings of age of acquisition, concreteness, valence and arousal for thousands of Dutch and English words

Lexicon projects


  • Wuggy, our pseudoword generator.
  • Vwr, an R package with utitily functions for visual word recognition research.
  • A small library for fast computation of average levenshtein distances (e.g., OLD20) in Python.
  • Duometer, a tool for detecting near-duplicate documents in text corpora.

Word Frequencies

  • different databases containing word frequency norms for several languages based on film subtitles

Word Prevalence

  • Word prevalence refers to the percentage of people who know the word. It is largely complementary to word frequency.
  • Here you find the measure for Dutch words.
  • Here you find the measure for English words.

Vocabulary tests (language proficiency)

Pictures of tools

  • The Verma & Brysbaert pictures of tools with matched objects and non-objects
  • The Multipic database with colored pictures of 750 objects

Outside Resources

  • Language goldmine has links to several hundreds of data-sets and other resources for psycholinguistic research.
  • LexicALL also contains many useful data-sets and other resources for psycholinguistic research.
  • The Word Association Database contains word associations to thousands of Dutch and English words. Very useful if you are interested in the meaning of words.
  • Colleagues from Northwestern University have used our databases to collect Dutch, English, French, German and Spanish phonological and orthographic cross-language neighborhood densities. Go and have a look at their Clearpond page.

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Comments are closed.