How many words do we know?

How large is the size of our vocabulary? Based on an analysis of the literature and a large scale crowdsourcing experiment, we estimate that an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. The numbers range from 27,000 lemmas for the lowest 5% to 52,000 for the highest 5%. Between the ages of 20 and 60, the average person learns 6,000 extra lemmas or about one new lemma every 2 days. The knowledge of the words can be as shallow as knowing that the word exists. In addition, people learn tens of thousands of inflected forms and proper nouns (names), which account for the substantially high numbers of ‘words known’ mentioned in other publications.

You find the full details of our calculation of the vocabulary size here.

Here you find the file with all the lemmas and word families (as it turned out, for some reason a few words were lost in the file I uploaded to frontiers, among which again, against and ahead).

Comments are closed.