Software & Data

This page lists the data we collected and the software we developed in the course of our research.

Word ratings

  • Word ratings of age of acquisition, concreteness, valence and arousal for thousands of Dutch and English words

Our Lexicon projects

Software

  • Wuggy, our pseudoword generator.
  • Vwr, an R package with utitily functions for visual word recognition research.
  • A small library for fast computation of average levenshtein distances (e.g., OLD20) in Python.
  • Duometer, a tool for detecting near-duplicate documents in text corpora.

Word Frequencies

  • different databases containing word frequency norms for several languages based on film subtitles

Word Prevalence

  • Word prevalence refers to the percentage of people who know the word. It is largely complementary to word frequency.
  • Here you find the measure for Dutch words.
  • Here you find the measure for English words.

Vocabulary tests (language proficiency)

Spelling tests

Picture stimuli

  • The Verma & Brysbaert pictures of tools with matched objects and non-objects
  • The Multipic database with colored pictures of 750 objects

Semantic Vectors

List of megastudies with links to data (if available)

Outside Resources

  • Language goldmine has links to several hundreds of data-sets and other resources for psycholinguistic research.
  • Jack Taylor has made a shiny app that allows you to select stimulus materials based on some 60 variables (or matched on those variables).
  • Geoff Hollis and Chris Westbury calculated measures of valence, arousal, dominance, AoA, and concreteness for 80 thousand words on the basis of our ratings and semantic vectors.
  • LexicALL also contains many useful data-sets and other resources for psycholinguistic research.
  • The Word Association Database contains word associations to thousands of Dutch and English words. Very useful if you are interested in the meaning of words.
  • Colleagues from Northwestern University have used our databases to collect Dutch, English, French, German and Spanish phonological and orthographic cross-language neighborhood densities. Go and have a look at their Clearpond page.
  • Erin Buchanan and her group have created a bibliography of resources and a database of data they collected themselves. You find more information about the bibliography here.

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Comments are closed.