Word recognition
Reading starts with word recognition. The two most important variables determining the time required to recognize a word are the number of times the reader has come across the word before (called the word frequency effect), and how much the word has been primed by the preceding words/context. Intriguingly, despite 40 years of research both processes are still ill understood. For instance, our knowledge of the frequency effect is largely limited to the statement that high-frequency words are processed faster than low-frequency words. Similarly, we don’t know much more about the priming effect than that primed words are processed faster than unprimed words.
One reason we know little about the word frequency effect is that we have had to work with suboptimal measures of word frequency. Only in the last decade has it become possible to collect large corpora of text and speech at a reasonable price, due to the widespread availability of electronic texts. We have found that the most interesting estimates of word use come from television subtitles, books for children, and written sources (in this order). Therefore, we are currently collecting these frequencies for a number of languages. Thus far we have collected word frequencies for French, English, Spanish, Mandarin Chinese, German, Polish and Dutch.
The second reason why our understanding of the word frequency effect has been hindered is that we had no hard criterion against which to validate the frequency estimates. This requires the collection of processing times for large numbers of words. The task currently best suited for this is the lexical decision task (participants see a sequence of letters and have to decide whether the letter string forms an existing word or not; e.g. WORD vs. WIRD). Lexical decision times have first been collected for (American) English at the E-lexicon project. Now we also have lexical decision times for French, Dutch , and (British) English (see our lexicon and crowdsourcing projects). As part of this effort, we also needed large numbers of pseudowords (i.e., letter strings that look like words but are no existing words). To do this properly, we developed an easy-to-use program Wuggy that does this automatically for different alphabetic languages.