Archive by Author

Test-based AoA measures for 44 thousand English words

Age of acquisition (AoA) is an important variable in word recognition research. Up to now, nearly all psychology researchers examining the AoA effect have used ratings obtained from adult participants. An alternative basis for determining AoA is directly testing children’s knowledge of word meanings at various ages. In educational research, scholars and teachers have tried to establish the grade at which particular words should be taught by examining the ages at which children know various word meanings. Such a list is available from Dale and O’Rourke’s (1981) Living Word Vocabulary for nearly 44 thousand meanings coming from over 31 thousand unique word forms and multiword expressions. In Brysbaert & Biemiller (2018) we relate these test-based AoA estimates to lexical decision times as well as to AoA adult ratings, and report strong correlations between all of the measures. Therefore, test-based estimates of AoA can be used as an alternative measure.

You find an Excel file with the test-based AoA norms here.

If you use the norms, please refer to our article:

Brysbaert, M., & Biemiller, A. (2017). Test-based Age-of-Acquisition norms for 44 thousand English word meanings. Behavior Research Methods, 49(4), 1520-1523. pdf

Measures of word prevalence for 61,800 English words

At long last we found time to make the English word prevalence measures available.

Word prevalence indicates how many people know a word. Because percentage known has an uninteresting distribution, word prevalence is calculated on the basis of a probit transformation. The following are interesting landmarks:

  • negative prevalence values: words known by less than 50% of the people; only of interest for word learning studies
  • prevalence = 0.0 : 50% of the people know this word
  • prevalence = 1.0 : 84% know the word
  • prevalence = 1.5 : 93% know the word
  • prevalence = 2.0 : 98% know the word
  • prevalence = 2.5 : nearly everyone knows the word

You find all the information in:

  • Brysbaert, M., Mandera, P., McCormick, S.F., & Keuleers, E. (in press). Word prevalence norms for 62,000 English lemmas. Behavior Research Methods. pdf

You find an Excel file with the word prevalence measure for English here.

If you want more information about the use of word prevalence, have a look at our findings in Dutch.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Semantic vectors for Italian

Marco Marelli, connected to the center, published Italian semantic vectors. You can find his article here.

This is the reference:

Marelli, M. (2017). Word-Embeddings Italian Semantic Spaces. A semantic model for psycholinguistic research. Psihologija, 50(4), 503–520.

You can do online searches for semantic similarities, semantic neighbors and analogies of Italian words here (for a large co-occurrence window) or here (for a small co-occurrence window; see the article to know which one to use for which question).

Vettori semantici per l’italiano

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Power analysis and effect size in mixed effects models: A tutorial

We’ve published the outcome of 4 years of study and computer simulations on the power of designs that include more than one observation per condition per participant. Indeed, a problem about the current studies on the replication crisis is that power is always calculated on the assumption that each participant only provides one observation per condition. This is not what happens in experimental psychology, where participants respond to multiple stimuli per condition and where the data are averaged per condition or (preferentially) are analyzed with mixed effects models.

In a nutshell, these are our findings:

  1. In experimental psychology we can do replicable research with 20 participants or less if we have multiple observations per participant per condition, because we can turn rather small differences between conditions into effect sizes of d > .8 by averaging across observations (as indeed known to psychophysicists for almost a century). This is the positive outcome of the analyses.

  2. The more sobering finding is that the required number of observations is higher than the numbers currently used (which is why we run underpowered studies). The ballpark figure we propose for RT experiments with repeated measures is 1600 observations per condition (e.g., 40 participants and 40 stimuli per condition).

  3. The 1600 observations we propose is when you start a new line of research and don’t know what to expect. The article gives you the tools to optimize your design once you’ve run the first study.

  4. Standardized effect sizes in analyses over participants (e.g., Cohen’s d) depend on the number of stimuli that were presented. Hence, you must include the same number of observations per condition if you want to replicate the results. The fact that the effect size depends on the number of stimuli also has implications for meta-analyses.

If you use the article please refer to it as follows:

  • Brysbaert, M. and Stevens, M. (2018). Power Analysis and Effect Size in Mixed Effects Models: A Tutorial. Journal of Cognition, 1: 9, 1–20, DOI:

After the publication of the article, it has become clear that other researchers already noticed the relationship between number of stimuli and standardized effect size. Usually this was framed in a negative way (i.e., the effect sizes are overestimated when based on the average of multiple observations), without paying attention to the more positive side for power. Here are some pointers:

  • Brand et al. (2010) already noticed the relationship between number of stimuli per condition and standardized effect sizes. They additionally point to the importance of the correlation between the observations: The higher the correlation, the less multiple observations will increase the standardized effect size (and arguably the less they will help to make the study more powerful).

  • Richard Morey (2016) also noticed that the standardized effect sizes in F1 analyses depend on the number of observations per condition. Maybe the effect size proposed by Westfall et al. is the preferred measure for future use? Alternatively, in reaction time experiments nothing may be more informative than the raw effect in milliseconds.

  • There was an interesting observation by Jeff Rouder pointing to the increased power of experiments with multiple observations. His rule of thumb (if you run within-subject designs in cognition and perception, you can often get high powered experiments with 20 to 30 people so long as they run about 100 trials per condition) agrees quite well with the norm we put forward (a properly powered reaction time experiment with repeated measures has at least 1,600 word observations per condition). With 2000-3000 observations per condition you have high powered experiment, with 1600 you have a properly powered experiment. Within limits (say a lower limit of 20), in most experiments the numbers of trials and participants can be exchanged, depending on how difficult it is to create items or to find participants.

Kolossa & Kopp (2018) report that for model testing in cognitive neuroscience it is more important to obtain extra data per participant than testing more participants.

Rouder & Haaf (2018) published an article that nicely complements ours. They make a theoretical analysis of when extra trials improve power. The basic message is that extra participants are always better than extra trials. However, the degree to which this is the case depends on the phenomenon you are investigating. If there is great interindividual variation in the effect and if the variation is theoretically expected, you need many participants rather than many trials (of course). This is true for many experiments in social psychology. In contrast, when the effect is expected to be present in each participant and when trial variability is larger than the variability across participants, you can trade people for trials. These conditions were met for the priming studies we discussed. No participant was expected to show a negative orthographic priming effect (faster lexical decision times after unrelated primes than after related primes), and the variability in the priming effect across participants (and stimuli) was much smaller than the residual error. These conditions are true for many robust effects investigated in cognitive psychology, in particular for those investigated with reaction times. Indeed, many studies in cognitive psychology address the borderline conditions of well-established effects (to make a distinction between alternative explanations).

The article does not deal with interactions. A nice blog by Roger Giner-Sorolla (based on work by Uri Simonsohn) indicates that for an extra variable with 2 levels, it is advised to multiply the number of observations by at least 4 if you want to draw meaningful conclusions about the interaction. So, beware of including multiple variables in your study. Is the interaction really needed to test your hypothesis?

Power of interactions also features in a recent review paper on power issues by Perugini et al. (2018).

Another article warning against being too cheap on the number of trials per condition was published by Boudewyn et al. (2018). If you look at their small effect sizes (remember these are the ones we are after most of the time!), the recommendation of 40 participants 40 trials seems to hold for EEG research as well.

We’ve collaborated to validate a new set of 750 pictures for picture naming experiments

We have collaborated to validate a new set of 750 colored pictures for picture naming research, compiled by Jon Andoni Dunabeitia at the Basque Center on Cognition, Brain and Language. In particular, we have collected name agreement data for Belgian Dutch. Other languages that have been added are Spanish, British English, French, German, Italian, and Netherlands’ Dutch.

You find all information (including files about name agreement and raw data files) at the BCBL website (see the link above).

Please refer to the database as follows:

Dunabeitia, J.A., Crepaldi, D., Meyer, A.S., Pliatsikas, C, Smolka, E., & Brysbaert, M. (In press) MultiPic: A standardized set of 750 drawings with norms for six European languages. Quarterly Journal of Experimental Psychology. pdf

Dutch keywords: set plaatjes, benoeming, prenten, onderzoek, woordbenoeming, psycholinguïstiek

How many words do we know?

How large is the size of our vocabulary? Based on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. The numbers range from 27,000 lemmas for the lowest 5% to 52,000 for the highest 5%. Between the ages of 20 and 60, the average person learns 6,000 extra lemmas or about one new lemma every 2 days. The knowledge of the words can be as shallow as knowing that the word exists. In addition, people learn tens of thousands of inflected forms and proper nouns (names), which account for the substantially high numbers of ‘words known’ mentioned in other publications.

You find the full details of our calculation of the vocabulary size here.

Here you find the file with all the lemmas and word families (as it turned out, for some reason a few words were lost in the file I uploaded to frontiers, among which again, against and ahead).

Semantic vectors for words in English and Dutch

Algorithms become increasingly powerful to derive word meanings from word co-occurrences in texts. Paweł Mandera has compared the various algorithms to select the best one so far for use in psycholinguistic research. This turns out to be the Continuous Bag of Words (CBOW) model (Mikolov, Chen, Corrado, & Dean, 2013) based on a combined corpus of texts and subtitles. The findings have now been accepted for publication in the Journal of Memory and Language. This is the pdf. Please refer to it as:

Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92, 57-78.

More interestingly, Paweł also makes the semantic vectors available online and created an easy to use shell program and a web interface for those who feel not confident enough to program. So, now everyone can calculate the semantic distance (or semantic similarity) based on CBOW between any two words online in English and Dutch. More information can be found here.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Affective norms for 14,000 Spanish words

Hans Stadthagen-Gonzalez just made valence and arousal norms available for 14,000 Spanish words.

You find the norms here.

If you use the ratings, please refer to the article:

  • Stadthagen-Gonzalez, H., Imbault, C., Pérez Sánchez, M.A., & Brysbaert, M. (in press). Norms of Valence and Arousal for 14,031 Spanish Words. Behavior Research Methods. pdf

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The Dutch Lexicon Project 2 made available

In the Dutch Lexicon Project, we collected lexical decision times for 14K monosyllabic and disyllabic Dutch words. The Dutch Lexicon Project 2 (DLP2) contains lexical decision times for 30K Dutch lemmas. These include almost all words regularly used in Dutch, independent of length.

The reference to this database is:

Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42, 441-458. pdf

These are files you may find interesting:

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Words known in the UK but not in the US, and vice versa

Our vocabulary test keeps on doing well (over 600K tests completed now). Below is a list of 20 words known in the UK but not in the US, and a list of 20 words known in the US but not in the UK. By known we mean selected by more than 85% of the participants from that country with English as their native language. As you can see, for each word there is a difference of more than 50% between both countries.

Better known in the UK (between brackets, percent known in the US and percent known in the UK)

  • tippex (7, 91)
  • biro (17, 99)
  • tombola (17, 97)
  • chipolata (16, 93)
  • dodgem (17, 94)
  • korma (20, 97)
  • yob (22, 97)
  • judder (19, 94)
  • naff (19, 94)
  • kerbside (23, 98)
  • plaice (16, 91)
  • escalope (17, 91)
  • chiropody (20, 93)
  • perspex (22, 94)
  • brolly (24, 96)
  • abseil (15, 87)
  • bodge (18, 89)
  • invigilator (22, 92)
  • gunge (19, 89)
  • gormless (26, 96)

Better known in the US (between brackets, percent known in the US and percent known in the UK)

  • garbanzo (91, 16)
  • manicotti (90, 15)
  • kabob (98, 29)
  • kwanza (91, 24)
  • crawdad (86, 20)
  • sandlot (97, 32)
  • hibachi (89, 27)
  • provolone (97, 36)
  • staph (86, 25)
  • boondocks (96, 37)
  • goober (96, 37)
  • cilantro (99, 40)
  • arugula (88, 29)
  • charbroil (97, 39)
  • tamale (92, 35)
  • coonskin (88, 31)
  • flub (89, 31)
  • sassafras (92, 35)
  • acetaminophen (92, 36)
  • rutabaga (85, 30)

You can still help us to get more refined data by taking part in our vocabulary test. For instance, we have not enough data yet to say anything about differences with Canada, Australia, or any other country with English as an official language.

Words known by men and women

Some words are better known to men than to women and the other way around. But which are they? On the basis of our vocabulary test, we can start to answer this question (on the basis of the first 500K tests completed). These are the 12 words with the largest difference in favor of men (between brackets: %men who know the word, %women who know the word):

  • codec (88, 48)
  • solenoid (87, 54)
  • golem (89, 56)
  • mach (93, 63)
  • humvee (88, 58)
  • claymore (87, 58)
  • scimitar (86, 58)
  • kevlar (93, 65)
  • paladin (93, 66)
  • bolshevism (85, 60)
  • biped (86, 61)
  • dreadnought (90, 66)

These are the 12 words with the largest difference in favor of women:

  • taffeta (48, 87)
  • tresses (61, 93)
  • bottlebrush (58, 89)
  • flouncy (55, 86)
  • mascarpone (60, 90)
  • decoupage (56, 86)
  • progesterone (63, 92)
  • wisteria (61, 89)
  • taupe (66, 93)
  • flouncing (67, 94)
  • peony (70, 96)
  • bodice (71, 96)

These 24 words should suffice to find out whether a person you are interacting with in digital space is male or female.

Take part in our vocabulary test to make the results even more fine grained!

The 20 least known words in English

Now that over 480,000 vocabulary tests have been completed, we can have a look at some of the findings. For instance, which words are not known at all in English? The following are the words of which less than 3% of the participants in our test indicated they were English words. For comparison, the fake words were endorsed by 8.3% of the participants on average. So, these are words not only unknown to everyone but also unlikely to be ‘mistaken’ for a true English word. The funny thing is that they often have interesting meanings, including a weapon, a precious stone, animals, several descriptions of people, and so on.

Here they are, the 20 least known words of English, also the least liked words, cast aside by everyone!

You can still take part in our vocabulary test and contribute data.

AoA norms and Concreteness norms for 30,000 Dutch words

We have collected AoA norms and Concreteness norms for 30,000 Dutch words. If you use them, please refer to this publication:

  • Brysbaert, M., Stevens, M., De Deyne, S., Voorspoels, W., & Storms, G. (2014). Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychologica, 150, 80-84. pdf

Here you find the age of acquisition norms and the concreteness norms.

The AoA norms have been aggregated over the various studies that collected them (Ghyselinck et al., 2000, 2003; Moors et al., 2013; Brysbaert et al., 2014). If you cannot download the Excel files, most probably you are working with Internet Explorer. Ironically, this browser cannot read Microsoft Excel files.

Keywords Dutch: verwervingsleeftijd, concreetheid, voorstelbaarheid.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Pictures of tools with matched objects and non-objects

As part of his PhD thesis on laterality, Ark Verma has developed a set of pictures of tools with matched objects and nonobjects that look as follows (click on the picture to get a bigger image):


You find the full set of pictures of tools, objects, and nonobjects here or here (svg format).

Please refer to the following article when you use the pictures. In this article you also find more information about them.

  • Verma, A., & Brysbaert, M. (2015). A validated set of tool pictures with matched objects and non-objects for laterality research. Laterality, 20, 22-48. pdf

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

De minst geliefde woorden in het Nederlands

Hoewel onze woordentest in de eerste plaats bedoeld was om de algemeen gekende woorden in kaart te brengen, is het perfect mogelijk om eens te kijken welke woorden helemaal niet herkend worden. Welke woorden werden door zo goed als niemand aangeduid? Welke woorden kregen nog minder ja-antwoorden dan de nepwoorden, niet alleen omdat ze door niemand gekend zijn, maar ook omdat niemand ze Nederlands genoeg vindt klinken om een gokje te wagen? Welke zijn deze weeswoorden, de kneusjes van de Nederlandse taal?

Verwacht kan worden dat de lijst dier- en plantsoorten zal bevatten, die alleen door maniakale biologen gekend zijn. Inderdaad, tussen de minst geliefde woorden vinden we een tropische loofboom (knippa) die nochtans lekkere vruchten geeft. We vinden er ook een twatwa (zwarte, vinkachtige vogel uit Suriname), alkanna (de struik die ons de rode hennakleur geeft) en sfagnum (een soort veenmos). Verder is er de kamsin (een verschroeiende wind uit de Sahara die je maar beter kunt vermijden) en een gerenoek (een ranke giraffengazelle). Tot slot blijkt ook de chijl niet gekend, ook al hebben we die nodig voor een goede darmwerking.

Geologie is vertegenwoordigd door twee tijdperken: het eemien en het ypresien. Jammer toch, want het eerste is vernoemd naar het riviertje de Eem in Utrecht en het tweede naar de stad Ieper.

Een aantal woorden uit Indonesië en Suriname blijken er ook niet bij te horen. We hadden al de twatwa. Verder zijn er nog de golok (een soort kapmes) en de romusha (Indonesische dwangarbeider). De mosjav (een Israëlische nederzetting) lijkt eveneens haar beste tijd gehad te hebben. En sommige woorden uit de islam zijn evenmin gekend. Zo hebben we moekim (een kring van ingezetenen in de moskee) en hoedna (nochtans goed om te kennen, want het gaat om een wapenstilstand met een niet-islamitische vijand).

Werktuigen, dranken en textiel die niet meer gebruikt worden, zijn een andere bron van weeswoorden. Niemand kent nog wem (het verbrede uiteinde van een ankerarm), fijfel (een dwarsfluit), saguweer (soort palmwijn), fep (een woord voor sterkedrank), dawet (een niet-alcoholische drank), of falbala (soort boordsel aan vrouwenkledij of gordijnen). Ook een ojief (een profiel dat onderaan hol is bovenaan bol, of omgekeerd) wordt als een exoot beschouwd.

Misschien wat vreemder dat niemand een ghazel kent (dichtvorm in tweeregelige strofen), of bisbilles (gekibbel), of giegagen (balken, schreeuwen als een ezel), of goëtie (zwarte magie). Toch perfecte woorden voor een tekst of een gedicht?!

Hier is hij dan: de lijst met de minst geliefde woorden in het Nederlands. Door niemand herkend en ook door niemand gezien als een potentieel familielid van onze taal. De woorden die door iedereen aan de kant geschoven worden. (Kijk hier voor meer uitleg bij elk woord.)

  • knippa
  • twatwa
  • alkanna
  • sfagnum
  • kamsin
  • gerenoek
  • chijl
  • eemien
  • ypresien
  • golok
  • romusha
  • mosjav
  • moekim
  • hoedna
  • wem
  • fijfel
  • saguweer
  • fep
  • dawet
  • falbala
  • ojief
  • ghazel
  • bisbilles
  • giegagen
  • goëtie

Geïnteresseerden kunnen nog altijd meedoen aan de woordentest.

De volledige resultaten van het Groot Nationaal Onderzoek Taal kun je nu ook in boekvorm kopen.

Our English vocabulary test (wordORnot) is online now

After the success of our Dutch vocabulary test, we’ve developed an English version (wordORnot). The task is the same: you get 100 letter sequences and you have to indicate which are existing English words and which not. Guessing is discouraged, because you are penalized if you say “yes” to a nonword.

Our experiences with the Dutch vocabulary test show us that in the beginning there are some questionable (not to say bad) nonwords and words (for which we apologize). These are next to unavoidable given that we are using so many stimuli. However, on the basis of the responses and the feedback we get (an example of crowdsourcing), the lists are regularly updated, so that after a few days / weeks (depending on the popularity of the test) these problematic cases should be gone. In general, problematic words or nonwords should not change the score by more than 5%.

Enjoy the English Vocabulary Test!

See the Twitter trail here.

Read the first forum discussions after the launch of the test here (UK), here and here (USA).


  • Jan 31, 2014: After two days the test has been done 50K times already with lots of feedback

  • Feb 1, 2014: 100K tests completed

  • Feb 16, 2014 : 200K

  • May 20, 2014 : 480K. First cleaning of the lists. Words out: 300 problematic words (the letters, abbreviations, and some long compound words that are usually written in two words) plus 2,300 very low frequency derived words ending on -ness or -ly (we had too many of them). Words in: 1,300 words from a new frequency list (many science related words). Nonwords out: 8,000 with false acceptance rates of more than 33% (such as ammicably, peachness, ….). Nonwords in: 22,000 nonwords that look like science words or monosyllabic nonwords from the ARC nonword database (because many of the nonwords that had to be dropped were monosyllabic).

Here you find the Dutch test (woordentest).

De resultaten van de Woordentest 2013

Tussen 16 maart 2013 en 15 december 2013 werd een Groot Nationaal Onderzoek Taal georganiseerd door ons centrum, de Universiteit Gent en de Nederlandse omroepen NTR en VPRO in samenwerking met de NWO. Hier zullen wij in het vervolg naar verwijzen onder de noemer Woordentest 2013.

Aan de deelnemers werd gevraagd om een proefje van zo’n 4 minuten af te werken. Elk proefje bestond uit het aanbieden van 100 letterreeksen (één na één), waarbij de deelnemers telkens moesten beslissen of het om een gekend Nederlands woord ging of niet. Om gisgedrag te ontmoedigen, waren een 30-tal letterreeksen nepwoorden en ging de score omlaag als op deze nepwoorden “ja” gezegd werd.

De resultaten werden kenbaar gemaakt in Labyrint uitzendingen op Nederland2 (zondag 15 december) en CANVAS (maandag 16 december).

Je kunt er ook een boek over kopen.

Rapport met bevindingen

In dit rapport staan de bevindingen beschreven.

Here you find an English summary on the basis of a talk we gave in Leiden for computational linguists (CLIN24)


Dit waren de belangrijkste resultaten:

  • Dit rapport beschrijft de belangrijkste bevindingen van het Groot Nationaal Onderzoek Taal, georganiseerd tussen 16 maart 2013 en 15 december 2013 door de Universiteit Gent en de Nederlandse omroepen NTR en VPRO in samenwerking met de NWO.

  • Elke test bestond uit het aanbieden van 100 letterreeksen (één na één), waarbij de deelnemer telkens moest beslissen of het om een gekend Nederlands woord ging of niet. Om gisgedrag te ontmoedigen, waren een 30-tal letterreeksen nepwoorden en ging de score omlaag als op deze nepwoorden “ja” gezegd werd.

  • Omdat 735 verschillende lijsten gebruikt werden, kunnen we uitspraken doen over bijna 53.000 Nederlandse woorden.

  • Ruim 600.000 tests werden afgelegd door iets minder dan 400.000 deelnemers (bijna 2% van de Nederlandstalige populatie). Hiervan kwamen 212.000 deelnemers uit Nederland en 180.000 uit België. Vlamingen hebben dus proportioneel gezien meer deelgenomen.

  • Er waren drie types van deelnemers: 76% nam één keer deel, 20% deed de test een paar keer en stopte bij een hogere score. De resterende 4% deed de test minstens 10 keer (met een maximum van 489 keer). Dit waren gewoonlijk mensen die met een hoge score begonnen en dus een grote interesse voor de Nederlandse taal hebben.

  • De meest voorkomende score is 75,5%. Er is echter een duidelijk effect van leeftijd. De woordenschat groeit constant tussen 12 en 80 jaar (de uitersten die we konden testen): 12-jarigen kennen gemiddeld 50% van de woorden, 80-jarigen gemiddeld 80% van de woorden. Dit is een verschil van bijna 16.000 woorden.

  • Er is ook een effect van opleidingsniveau: hoe hoger het behaalde diploma, hoe meer woorden men gemiddeld kent.

  • Er is een verschil van 1,5% tussen Nederland en België in het voordeel van Nederland. Dit verschil komt door de lagere scores in België dan in Nederland bij de deelnemers ouder dan 40 jaar.

  • Deelnemers die naast het Nederlands als moedertaal meerdere talen spreken, kennen een groter aantal Nederlandse woorden. Het effect is cumulatief: wie vier talen spreekt, kent meer Nederlandse woorden dan wie drie talen spreekt, en wie drie talen spreekt, kent meer Nederlandse woorden dan wie twee talen spreekt.

  • Nederlanders en Vlamingen hebben een gedeelde woordenschat van 16.000 woorden (gekend door 97,5% van alle deelnemers). Volgens hetzelfde criterium kennen Vlamingen 2.000 extra woorden en Nederlanders 5.000 extra woorden. Hiervan zijn er 1.250 typisch Zuid-Nederlandse woorden (zoals foor en pagadder) en 1.900 typisch Noord-Nederlandse woorden (kliko, vlaflip en salmiak). Er is dus een grotere gedeelde woordenschat in Nederland dan in België.

  • Sommige woorden worden beter herkend door mannen dan door vrouwen en omgekeerd (bijv. mandekker vs. sleehak).

  • De scheidingslijn qua taal ligt duidelijk op de landsgrens. De Nederlandse en Belgische provincies vormen twee aparte clusters als gekeken wordt naar de overeenkomsten in woordenkennis tussen de provincies.


De lijsten die hieronder staan, zijn voorlopig om drie redenen:

  1. Ze zijn gebaseerd op 370 duizend deelnemers tot eind oktober (terwijl we hopen er 500 duizend te hebben aan het einde van het jaar).

  2. Het gaat om eenvoudige gemiddelden. Die houden geen rekening met individuele verschillen in gisgedrag. Om hiervoor te corrigeren, moeten we een Rasch-analyse doen, maar die zal tijd vergen, gezien de omvang van de database.

  3. Bij de laatste aanpassing begin december 2013 zijn 3000 nieuwe (langere) woorden toegevoegd. Die maken nog geen deel uit van de lijsten hieronder.


  • Woordenkennis Nederland vs. België (Excel, tekst)

  • Woordenkennis mannen-vrouwen in Nederland-België (Excel, tekst)

  • Woordenkennis leeftijd in Nederland-België (Excel, tekst)

  • Woordenkennis opleidingsniveau Nederland-België (Excel, tekst)

  • Woordenkennis provincies waarvan meer dan 7,500 deelnemers (Excel, tekst)

  • Accuraatheid nepwoorden Nederland-België (Excel, tekst)

  • Accuraatheid nepwoorden per provincie waarvan meer dan 7,500 deelnemers (Excel, tekst)

  • Lijst van woorden die in de drie herzieningen uit de lijst weggehaald werden wegens niet meer gebruikt, niet juist gespeld of te verwarrend met de nepwoorden (Excel, tekst)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Resultaten auteurstest beschikbaar voor Vlaanderen

Doordat de auteurstest via kranten (in het bijzonder De Standaard) kenbaar gemaakt werd, hebben we veel vlugger resultaten voor Vlaanderen dan gedacht. Op basis van de eerste week zijn dit de belangrijkste bevindingen:

  • Twintigduizend Vlamingen en vijfduizend Nederlanders namen deel aan de auteurstest. Omdat het aandeel van de Nederlands te klein is, beperken we de analyse voorlopig tot de Vlaamse deelnemers. Hopelijk zijn er binnenkort genoeg antwoorden uit Nederland.

  • De meeste deelnemers waren lezers van kwaliteitskranten en behoren tot het publiek waar uitgeverijen zich vooral op richten.

  • Herman Brusselmans is de auteur met de grootste naambekendheid in Vlaanderen. Hij werd door alle deelnemers herkend. Daarna volgen J.R.R. Tolkien, Hugo Claus, William Shakespeare, Dimitri Verhulst en Bart Moeyaert met 99% naambekendheid.

  • Slechts 55 namen werden door meer dan 90% van de deelnemers herkend. Hiertoe behoren 22 namen van Belgische auteurs, 10 namen van Britse schrijvers, 5 namen van Amerikaanse, Franse en Nederlandse schrijvers, en één naam uit Colombia, Denemarken, Duitsland, Griekenland, Italië, Rusland, Zweden en Zwitserland.

  • De lijst bevat een aantal namen van auteurs die wellicht niet in de eerste plaats door hun boeken gekend zijn, maar over wie onderwezen wordt op school of die een prominente plaats hebben in de media. Verder interessant is dat de lijst ook jeugdauteurs en auteurs van stripverhalen bevat.

  • Minder dan 500 bijkomende auteurs worden door de helft van de deelnemers herkend; 80% wordt door minder dan een vierde van de deelnemers herkend.

  • De naambekendheid van bijna 15 duizend auteurs in Vlaanderen kan opgezocht worden in dit rapport dat we geschreven hebben. Deze gegevens zijn ook in een Excel bestand beschikbaar.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

SUBTLEX-UK: Subtitle-based word frequencies for British English

Attentive readers may have noticed that we have underused the data from the British Lexicon Project in our publications thus far, focusing more on the (American) English Lexicon Project. This was because we felt uneasy about using word frequencies from American English to predict word processing times in British English.

At long last, together with Walter van Heuven from Nottingham University, we now have analysed word frequency norms for British English based on subtitles: SUBTLEX-UK.

As expected, these norms explain 3% more variance in the lexical decision times of the British Lexicon Project than the SUBTLEX-US word frequencies. They also explain 4% more variance than the word frequencies based on the British National Corpus, further confirming the superiority of subtitle-based word frequencies over written-text-based word frequencies for psycholinguistic research. In contrast, the word frequency norms explain 2% variance less in the English Lexicon Project than the SUBTLEX-US norms.

The SUBTLEX-UK word frequencies are based on a corpus of 201.3 million words from 45,099 BBC broadcasts. There are separate measures for pre-school children (the Cbeebies channel) and primary school children (the CBBC channel). For the first time we also present the word frequencies as Zipf-values, which are very easy to understand (values 1-3 = low frequency words; 4-7 = high frequency words) and which we hope will become the new standard.

You can do online searches here.

You find lists with the word frequencies here:

  • SUBTLEX-UK: A cleaned Excel file with word frequencies for 160,022 word types (also available as a text file). This file is ideal for those who want to use British word frequencies.
  • SUBTLEX-UK_all: An uncleaned Excel file with entries for 332,987 word types, including numbers. To be used for entries not in the cleaned version.
  • SUBTLEX-UK_bigrams: A csv-file with information about word pairs. Contains nearly 2 million lines of information and, hence, cannot be opened in a simple Excel file.

Further information about the collection of the SUBTLEX-UK word frequencies can be found in the article below (please cite):

Van Heuven, W.J.B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). Subtlex-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176-1190. pdf

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Symposium ‘Dyslexie in het hoger onderwijs’

Binnenkort organiseren we een symposium over dyslexie in het hoger onderwijs op basis van de bevindingen uit ons grootschalig dyslexie-onderzoek.

Hier vind je meer informatie.

The Zipf-scale: A better standardized measure of word frequency

A problem with word frequency counts is that they depend on the size of the corpus. As a result, absolute numbers are difficult to interpret. For instance, the frequency count of apple in HAL is 65,844. In SUBTLEX-US it is 1,207.

To make word frequency norms comparable, researchers use a standardized measure, a measure that is independent of the corpus size. The standardized measure used thus far has been frequency per million words (fpmw). So, the standardized SUBTLEX-US frequency of apple is 23.67 pmw (as the corpus includes 51 million words). The fpmw measure of HAL is more difficult to calculate because no-one knows how large the HAL corpus is. It has been claimed to be 130 million words or 160 million words, but in all likelihood it is larger than 400 million words (if you simply add up all the frequencies of the words in the ELP lexicon, you already get this figure).

Increasingly, however, we have felt unease with this standardized measure, because it leads to a wrong intuitive understanding of the word frequency effect. Here are two problems with the fpmw measure:

  • Intuitively, people associate a measure of 1 with the lowest value. However, more than half of the words in a frequency list have frequencies lower than 1 pmw. The reason why 1pmw for a long time seemed like a good start of the scale was that all word frequency research was based on the Kucera & Francis (1967) word frequency list, which used a corpus of 1 million words only. So, a frequency count of 1 indeed was the lowest value. However, now that corpora easily include 100 million or even 100 billion words, we see that very many word types have frequencies below 1 pmw.

  • The frequency effect does not stop below 1 pmw. As a matter of fact, as can be seen below and has been reported by us a few times before, nearly half of the word frequency effect is situated below 1 pmw. In addition, because the word frequency effect is a logarithmic effect, the difference between .1 fpmw and .7 fpmw equals the difference between 5 fpmw and 35 fpmw. Again, this is very difficult to explain to psycholinguistic researchers. It leads to particularly bad results when authors are “matching” conditions on word frequency. So, you’d read that one condition has a mean frequency of .5 pmw and the other has a mean frequency of 3 pmw. This means that the average frequency in the former condition is six times lower than that in the latter (which no one would except if the frequencies were 10 and 60). However, because the raw frequency norms are used for the analysis (instead of the logarithmic values), the difference between the conditions usually is not significant (p > .05!) and, hence, is not noticed by the authors and the readers.

We have been thinking long and hard about how a standardized word frequency scale should look like in order to lead to intuitively correct understanding. These are the elements we saw necessary:

  1. It should be a logarithmic scale (e.g., like the decibel scale of sound loudness).
  2. It should look like a typical Likert rating scale (e.g., from 1 to 7), so that the values are easy to interpret.
  3. The middle of the scale should separate the low-frequency words from the high-frequency words.
  4. The scale should have a straightforward unit.

Once you know what you are looking for, it is not so difficult to come up with a scale that fulfills all requirements. Simply taking log10(frequency per billion words) already solves the first 3 problems. In such a scale, words with a frequency of .1 pmw get a value of 2, words with a frequency of 1 pmw get a value of 3, and words with a frequency of 10 pmw get a value of 4. The word apple gets a SUBTLEX Zipf value of 4.37.

To meet the fourth requirement of our list, we propose to call the new scale the Zipf scale, after the American linguist George Kingsley Zipf (1902–1950) who first thoroughly analyzed the regularities of the word frequency distribution and formulated a law that was named after him (Zipf, 1949). The unit then becomes the Zipf.

We presented the Zipf scale for the first time in a 2014 article on word frequency measures for British English (Van Heuven, Mandera, Keuleers, & Brysbaert, 2014; please, refer to it when you are using the Zipf scale). In that article we also give examples of words with various Zipf values. Here they are (click on the picture to get a larger image):

To see how the word frequency effect translates to the Zipf values, in the figure below we plot the lexical decision RTs to the known words (accuracy > .67) in the British Lexicon Project (N = 19,487). As can be seen, the word frequency effect is now nicely centralized relative to the word frequency scale, with values of 1-3 representing low frequency words, and values of 4-7 representing high frequency words.

A criticism often raised against frequency values lower than 1 pmw is that these words are not known to the participants. Again, we can have a look at the British Lexicon Project. If we only take the words that were answered positively by more than two thirds of the participants, we get the following distribution as a function of Zipf values:

Again, the distribution centers nicely on the scale. Below we give some examples of BLP words in the various bins (all BLP words were monosyllabic or disyllabic words).

In our future publications we will make the Zipf norms available as the primary word frequency variable, because we think this will help researchers and lay people to much better understand what the word frequency effect is and how it should be studied and controlled for. We hope many of you will join us! The Zipf values are easy to calculate from fpmw values. Simply take log10(fpmw)+3 or log10(fpmw*1000).

Here you find a zipped Excel file of the SUBTLEX-US frequencies with the Zipf values added.

Here you can look up the UK Zipf frequencies for thousands of words.


  • Van Heuven, W.J.B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). Subtlex-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176-1190. pdf

  • Zipf, G. (1949), Human Behaviour and the Principle of Least Effort. Reading MA: Addison-Wesley.

Hoeveel auteurs van boeken ken jij?

In een grote bibliotheek kun je boeken van meer dan 15 duizend fictieauteurs en illustratoren (romans, verhalen, strips, kinderboeken, poëzie) lenen. Hoeveel hiervan ken jij?

In navolging van onze woordentest hebben we een auteurstest ontwikkeld die hierop een antwoord geeft. Zoals bij de woordentest krijg je 100 stimuli, in dit geval persoonsnamen. Twee derden van die namen verwijzen naar personen die aan fictieboeken meegewerkt hebben (schrijver, illustrator); één derde bestaat uit lukraak gekozen namen (komende uit allerhande lijsten, zoals slachtoffers van oorlogen, willekeurige combinaties van populaire namen, deelnemers aan loopwedstrijden, lijsten van studenten en personeel, enz.).

Voor elke naam moet je aangeven of hij volgens jou naar een auteur verwijst of niet.

De test duurt ongeveer 5 minuten en je kunt hem zo vaak doen als je wil. Er bestaan meer dan 200 verschillende versies.

Op basis van de eerste resultaten zien we dat de meeste deelnemers zo’n 3 tot 4 schrijvers en illustratoren herkennen per lijst. Dit betekent dat ze naar schatting zo’n 1000 auteursnamen kennen. Hoe scoor jij?

Update 9 oktober 2013

De resultaten voor Vlaanderen zijn al beschikbaar en vind je hier.

Concreteness ratings for 40 thousand English lemmas

We have collected concreteness ratings for 40 thousand English lemma words with Amazon Mechanical Turk. The ratings come from a larger list of 63 thousand words and represent all English words known to 85% of the raters. As such, the list can be used as a reference list for future word recognition in (American) English.

This is our article about the ratings:

Your find the ratings here (Excel file) and here (txt file).

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Call for papers: QJEP Special Issue on megastudies, crowdsourcing and big datasets in psycholinguistics

A QJEP Special Issue on megastudies, crowdsourcing and big datasets in psycholinguistics will be edited by Emmanuel Keuleers (Ghent University) and Dave Balota (Washington University, St. Louis)

We invite papers for a special issue of the Quarterly Journal of Experimental Psychology on recent advances in megastudies and crowdsourcing methods and on the use of large non-experimental data sources. The issue will address both the collection of data and the use of these data to answer important theoretical questions.


In recent years, methods of data collection in psycholinguistics have been rapidly evolving along several dimensions.

First, there is a trend towards establishing large laboratory-based experiments without constrained research questions. In these megastudies, behavioral measures are collected for many items using tasks such as lexical decision and naming or sentence reading, with forerunners such as the English Lexicon Project (Balota et al., 2007) or the Dundee Corpus (Kennedy & Pynte, 2005). The number of available datasets produced using the megastudy approach is rapidly increasing for different languages and using different experimental paradigms (Hutchison et al., in press, Cohen-Shikora & Balota, 2013; Keuleers et al., 2010.).

Another recent trend is to gather behavioral data using crowdsourcing rather than laboratory methods (Mason & Suri, 2011). New norms for variables such as Age-of-Acquisition are being successfully collected using Amazon Mechanical Turk (e.g., Kuperman, Stadthagen-Gonzalez & Brysbaert, 2012), and large scale word-association studies are quickly gaining momentum (e.g., De Deyne, Navarro & Storms, 2012). Recent research in Belgium and the Netherlands shows that it is even possible to recruit hundreds of thousands of participants to participate in a lexical decision experiment ( The use of smartphone technology appears also may revolutionize data collection in these large scale studies, as exemplified the Dufau et al. (2011) study of mega lexical decision study of 7 different languages.

In addition to the controlled data collection methods described above, psychologists have been increasingly using freely generated behavioral data, such as text corpora, to extract behaviorally relevant measures. With the increased availability of text sources, particularly subtitles from film and television, high quality word frequency norms are becoming available for various languages.

An exciting trend in this regard is that researchers have been using these text sources to operationalize existing psychological constructs traditionally collected using subjective evaluation (e.g., Bestgen & Vincze, 2012) or to extend learning theory to large-scale learning models (Baayen et al., 2011)

Examples of topics for this special issue:

  1. Papers addressing important theoretical issues using rigorous analyses of megastudy or crowdsourcing data. Preferably, these articles should address the same issue using multiple data sources and use state of the art statistical and computational methods. Articles that use data collection beyond their intended purpose are especially welcomed.

  2. Papers addressing methodological issues with the collection of large datasets, either introducing new methodology or critically evaluating current methods.

  3. Papers presenting new data collected using megastudy or crowdsourcing methods or presenting new measures derived from large corpora.

We aim for a body of high quality articles that introduces and encourages the collection and analysis of large datasets to a large audience and encourages the use of novel data sources and new data collection methods in the research community.

Time Line

September 22, 2013 (or shortly after): Send initial proposals, abstracts of max 400 words to

January 23, 2014: Submission of manuscripts

March 23, 2014: Initial round of reviews

May 23, 2014: Second round of reviews

Fall 2014: Publication

References Baayen, R. H., Milin, P., Djurdjević, D. F., Hendrix, P., & Marelli, M. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological review, 118(3), 438.

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., … Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445–459.

Bestgen, Y., & Vincze, N. (2012). Checking and bootstrapping lexical norms by means of word similarity indexes. Behavior Research Methods, 44(4), 998–1006. doi:10.3758/s13428-012-0195-z

De Deyne, S., Navarro, D. J., & Storms, G. (2012). Better explanations of lexical and semantic cognition using networks derived from continued rather than single-word associations. Behavior research methods, 1–19.

Dufau, S., Duñabeitia, J.A., Moret-Tatay, C., McGonigal, A., Peeters, D., Alario, F.-X., Balota, D.A., Brysbaert, M., Carreiras, M., Ferrand, L., Ktori, M., Perea, M., Rastle, K., Sasburg, O., Yap, M.J., Ziegler, J.C., & Grainger, J. (2011). Smart phone, smart science: How the use of smartphones can revolutionize research in cognitive science. PLoS ONE, 6, e24974

Hutchison, K. A., Balota, D. A., Neely, J. H., Cortese, M. J., Cohen-Shikora, E. R., Tse, C.-S., … Buchanan, E. (2013). The semantic priming project. Behavior Research Methods. doi:10.3758/s13428-012-0304-z

Kennedy, A., & Pynte, J. (2005). Parafoveal-on-foveal effects in normal reading. Vision research, 45(2), 153–168.

Keuleers, E., Diependaele, K., & Brysbaert, M. (2010). Practice Effects in Large-Scale Visual Word Recognition Studies: A Lexical Decision Study on 14,000 Dutch Mono- and Disyllabic Words and Nonwords. Frontiers in Psychology, 1. doi:10.3389/fpsyg.2010.00174

Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods. doi:10.3758/s13428-012-0210-4

Mason, W., & Suri, S. (2011). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44(1), 1–23. doi:10.3758/s13428-011-0124-6

Lextale-Esp: A fast, free vocabulary test for Spanish

Lemhöfer and Broersma (2012) published an English vocabulary test which turned out to be very useful in our research on bilingualism and native language processing. Because we think such a test should be available for all languages, we decided to develop a Spanish one as well.

Here you can find the text describing the test. Please refer to it as:

  • Izura, C., Cuetos, F., & Brysbaert, M. (2014). Lextale-Esp: A test to rapidly and efficiently assess the Spanish vocabulary size. Psicologica, 35, 49-66.

Here you can download the test with instructions in various languages:

Here you find the response key to mark the test.

The test can also be used with Catalan-Spanish bilinguals, as you can read here.

For our Spanish subtitle word frequencies have a look here.

And here you find the French Lextale test.

Lextale-Esp: Un test para la rápida y eficaz evaluación del tamaño del vocabulario en español

Los métodos para medir el tamaño del vocabulario varían según las disciplinas. Esta heterogeneidad dificulta las comparaciones entre estudios y enlentece la comprensión de los hallazgos. Para remediar este problema, recientemente ha sido desarrollado un test de competencia lingüística en inglés que es rápido, eficaz y gratis, el LexTALE. El LexTALE ha sido validado y ha demostrado ser una herramienta eficaz para distinguir entre distintos niveles de competencia lingüística en inglés. El test también se ha realizado en holandés, alemán y francés. El presente estudio presenta la versión española del test; Lextale-Esp. El test mostró una buena discriminación entre los niveles altos y bajos de competencia en español y reveló grandes diferencias entre el tamaño de vocabulario de nativos y no nativos.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.