Word megastudy data and eye movement corpora available

If you are looking for word processing data, the following resources may be of use for various languages. If you know of more resources or have better links, feel free to contact marc dot brysbaert at ugent dot be. If you are looking for word features rather than word processing data, better have a look here.















  • Asahara et al. (2016)
    • Eye movement data of students reading short texts
    • Visual presentation
    • 1600 word tokens











Adelman, J. S., Marquis, S. J., Sabatos-DeVito, M. G., & Estes, Z. (2013). The unexplained nature of reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(4), 1037-1053.

Adelman, J. S., Johnson, R. L., McCormick, S. F., McKague, M., Kinoshita, S., Bowers, J. S., … & Scaltritti, M. (2014a). A behavioral database for masked form priming. Behavior Research Methods46(4), 1052-1067.

Adelman, J. S., Sabatos-DeVito, M. G., Marquis, S. J., & Estes, Z. (2014b). Individual differences in reading aloud: A mega-study, item effects, and some models. Cognitive psychology68, 113-160.

Aguasvivas, J., Carreiras, M., Brysbaert, M., Mandera, P., Keuleers, E., & Duñabeitia, J. A. (2018). SPALEX: A Spanish lexical decision database from a massive online data collection. Frontiers in Psychology, 9, 2156. doi: 10.3389/fpsyg.2018.02156.

Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133(2), 283-316.

Balota, D. A. & Spieler, D. H. (1998).  The utility of item level analyses in model evaluation:  A reply to Seidenberg & Plaut (1998). Psychological Science, 9(3), 238-240.

Balota, D. A., Yap, M. J., Hutchison, K. A., & Cortese, M. J. (2013). Megastudies: What do millions (or so) of trials tell us about lexical processing? In J. S. Adelman (Ed.), Visual Word Recognition Volume 1: Models and methods, orthography and phonology (pp. 90-115). New York, NY: Psychology Press.

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., … & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445-459.

Berzak, Y., Nakamura, C., Smith, A., Weng, E., Katz, B., Flynn, S., & Levy, R. (2022). CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading. Open Mind, 1-10. https://doi.org/10.1162/opmi_a_00054

Boyce, V., & Levy, R. P. (2023). A-maze of Natural Stories: Comprehension and surprisal in the Maze task. Glossa Psycholinguistics, 2(1): X, pp. 1–34. DOI: https://doi.org/10.5070/G6011190

Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A.M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58, 412-424.

Brysbaert, M., Keuleers, E. and Mandera, P., 2019. Recognition Times for 54 Thousand Dutch Words: Data from the Dutch Crowdsourcing Project. Psychologica Belgica, 59(1), pp.281–300. DOI: http://doi.org/10.5334/pb.491

Brysbaert, M., Keuleers, E., & Mandera, P. (2021). Which words do English non-native speakers know? New supernational levels based on yes/no decision. Second Language Research.  https://doi.org/10.1177/0267658320934526.

Brysbaert, M., Lagrou, E., & Stevens, M. (2017). Visual word recognition in a second language: A test of the lexical entrenchment hypothesis with lexical decision times. Bilingualism: Language and Cognition, 20, 530-548.

Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42, 441-458.

Chang, Y. N., Hsu, C. H., Tsai, J. L., Chen, C. L., & Lee, C. Y. (2016). A psycholinguistic database for traditional Chinese character naming. Behavior Research Methods, 48(1), 112-122.

Chateau, D., & Jared, D. (2003). Spelling–sound consistency effects in disyllabic word naming. Journal of Memory and Language48(2), 255-280.

Cohen-Shikora, E. R., Balota, D. A., Kapuria, A., & Yap, M. J. (2013). The past tense inflection project (PTIP): Speeded past tense inflections, imageability ratings, and past tense consistency measures for 2,200 verbs. Behavior research methods, 45(1), 151-159.

Cop, U., Dirix, N., Drieghe, D., & Duyck, W. (2017). Presenting GECO: An eyetracking corpus of monolingual and bilingual sentence reading. Behavior Research Methods, 49(2), 602-615.

Cortese, M.J., Hacker, S., Schock, J. & Santo, J.B. (2015a). Is reading aloud performance in megastudies systematically influenced by the list context? Quarterly Journal of Experimental Psychology, 68, 1711-1722. doi: 10.1080/17470218.2014.974624

Cortese, M.J., Khanna, M.M., & Hacker, S. (2010) Recognition memory for 2,578 monosyllabic words. Memory, 18, 595-609. DOI: 10.1080/09658211.2010.493892.

Cortese, M.J., Khanna, M.M., Kopp, R., Santo, J.B, Preston, K.S., & Van Zuiden, T. (2017). Participants shift response deadlines based on list difficulty during reading aloud megastudies, Memory & Cognition, 45, 589-599.

Cortese, M.J., McCarty D.P., & Schock, J. (2015b). A mega recognition memory study of 2,897 disyllabic words. Quarterly Journal of Experimental Psychology, 68, 1489-1501. doi: 10.1080/17470218.2014.945096

Cortese, M. J., Yates, M., Schock, J., & Vilks, L. (2018). Examining word processing via a megastudy of conditional reading aloud. Quarterly Journal of Experimental Psychology, 71(11), 2295-2313.

Davies, R., Barbón, A., & Cuetos, F. (2013). Lexical and semantic age-of-acquisition effects on word naming in Spanish. Memory & Cognition, 41(2), 297-311.

Dirix, N., & Duyck, W. (2017). The first-and second-language age of acquisition effect in first-and second-language book reading. Journal of Memory and Language97, 103-120.

Dufau, S., Grainger, J., Midgley, K. J., & Holcomb, P. J. (2015). A thousand words are worth a picture: Snapshots of printed-word processing in an event-related potential megastudy. Psychological Science, 26(12), 1887-1897.

Ernestus, M., & Cutler, A. (2015). BALDEY: A database of auditory lexical decisions. The Quarterly Journal of Experimental Psychology, 68(8), 1469-1488.

Ferrand, L., Brysbaert, M., Keuleers, E., New, B., Bonin, P., Meot, A., Augustinova, M., & Pallier, C. (2011). Comparing word processing times in naming, lexical decision, and progressive demasking: evidence from Chronolex. Frontiers in Psychology, 2:306. doi: 10.3389/fpsyg.2011.00306.

Ferrand, L., Méot, A., Spinelli, E., New, B., Pallier, C., Bonin, P., … & Grainger, J. (2018). MEGALEX: A megastudy of visual and auditory word recognition. Behavior Research Methods, 50(3), 1285-1307.

Ferrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., Meot, A., Augustinova, M., & Pallier, C. (2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords. Behavior Research Methods, 42, 488-496.

Frank, S. L., & Aumeistere, A. (2023). An eye-tracking-with-EEG coregistration corpus of narrative sentences. Language Resources and Evaluation.

Frank, S. L., Monsalve, I. F., Thompson, R. L., & Vigliocco, G. (2013). Reading time data for evaluating broad-coverage models of English sentence processing. Behavior Research Methods, 45(4), 1182-1190.

Frank, S. L., Otten, L. J., Galli, G., & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and language, 140, 1-11.

Futrell, R., Gibson, E., Tily, H. J., Blank, I., Vishnevetsky, A., Piantadosi, S. T., & Fedorenko, E. (2018) The Natural Stories Corpus. In Proceedings of LREC 2018, Eleventh International Conference on Language Resources and Evaluation (pp. 76—82). Miyazaki, Japan.

Goh, W. D., Yap, M. J., Lau, M. C., Ng, M. M., & Tan, L. C. (2016). Semantic richness effects in spoken word recognition: A lexical decision and semantic categorization megastudy. Frontiers in psychology7, 976.

Goh, W.D., Yap, M.J., & Chee, Q.W. (2020). The Auditory English Lexicon Project: A multi-talker, multi-region psycholinguistic database of 10,170 spoken words and nonwords. Behavior Resesearch Methods. https://doi.org/10.3758/s13428-020-01352-0

González-Nosti, M., Barbón, A., Rodríguez-Ferreiro, J., & Cuetos, F. (2014). Effects of the psycholinguistic variables on the lexical decision task in Spanish: A study with 2,765 words. Behavior Research Methods, 46(2), 517-525.

Heyman, T., Van Akeren, L., Hutchison, K. A., & Storms, G. (2016). Filling the gaps: A speeded word fragment completion megastudy. Behavior Research Methods, 48(4), 1508-1527.

Hollenstein, N., Barrett, M., & Björnsdóttir, M. (2022). The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts. arXiv preprint arXiv:2204.13311.

Hollenstein, N., Rotsztejn, J., Troendle, M., Pedroni, A., Zhang, C., & Langer, N. (2018). ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading. Scientific data, 5(1), 1-13.

Hollenstein, N., Troendle, M., Zhang, C., & Langer, N. (2019). ZuCo 2.0: A dataset of physiological recordings during natural reading and annotation. arXiv preprint arXiv:1912.00903.

Hsu, C.R., Clariana, R., Schloss, B., & Li, P. (2019). Neurocognitive Signatures of Naturalistic Reading of Scientific Texts: A Fixation-Related fMRI Study. Scientific Reports, 9, 10678.

Huang, K., Arehalli, S., Kugemoto, M., Muxica, C., Prasad, G., Dillon, B., & Linzen, T. (2023, April 21). Surprisal does not explain syntactic disambiguation difficulty: evidence from a large-scale benchmark. https://doi.org/10.31234/osf.io/z38u6

Husain, S., Vasishth, S., and Srinivasan, N. (2014). Integration and prediction difficulty in Hindi sentence comprehension: Evidence from an eye-tracking corpus. Journal of Eye Movement Research, 8(2), 1-12.

Hutchison, K. A., Balota, D. A., Neely, J. H., Cortese, M. J., Cohen-Shikora, E. R., Tse, C. S., … & Buchanan, E. (2013). The semantic priming project. Behavior Research Methods, 45(4), 1099-1114.

Jäger, L., Kern, T., & Haller, P. (2021). Potsdam Textbook Corpus (PoTeC): Eye tracking data from experts and non-experts reading scientific texts. available on OSF.

Kamienkowski, J. E., Carbajal, M. J., Bianchi, B., Sigman, M., & Shalom, D. E. (2018). Cumulative repetition effects across multiple readings of a word: Evidence from eye movements. Discourse Processes55(3), 256-271.

Kessler, B., Treiman, R., & Mullennix, J. (2002). Phonetic biases in voice key response time measurements. Journal of Memory and Language, 47, 145-171.

Keuleers, E & Balota, D.A. (2015) Megastudies, crowd-sourcing, and large datasets in psycholinguistics: An overview of recent developments, The Quarterly Journal of Experimental Psychology. 68, (8) 1457-1468.

Keuleers, E., Diependaele, K. & Brysbaert, M. (2010). Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 Dutch mono- and disyllabic words and nonwords. Frontiers in Psychology 1:174. doi: 10.3389/fpsyg.2010.00174.

Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44, 287-304.

Kuperman, V., Siegelman, N., Schroeder, S., Acartürk, C., Alexeeva, S., Amenta, S., … & Usal, K. A. (2022). Text reading in English as a second language: Evidence from the Multilingual Eye-Movements Corpus. Studies in Second Language Acquisition, 1-35.

Lau, M. C., Goh, W. D., & Yap, M. J. (2018). An item-level analysis of lexical-semantic effects in free recall and recognition memory using the megastudy approach. Quarterly Journal of Experimental Psychology, 71, 2207-2222.

Laurinavichyute, A. K., Sekerina, I. A., Alexeeva, S., Bagdasaryan, K., & Kliegl, R. (2019). Russian Sentence Corpus: Benchmark measures of eye movements in reading in Russian. Behavior Research Methods.

Leal, S. E., Lukasova, K., Carthery-Goulart, M. T., & Aluísio, S. M. (2022). RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese. Language Resources and Evaluation, 1-40.

Lee, C. Y., Hsu, C. H., Chang, Y. N., Chen, W. F., & Chao, P. C. (2015). The feedback consistency effect in Chinese character recognition: Evidence from a psycholinguistic norm. Language and Linguistics, 16(4), 535-554.

Lemhöfer, K., Dijkstra, T., Schriefers, H., Baayen, R. H., Grainger, J., & Zwitserlood, P. (2008). Native language influences on word recognition in a second language: A megastudy. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(1), 12-31.

Li, J., Bhattasali, S., Zhang, S., Franzluebbers, B., Luh, W. M., Spreng, R. N., … & Hale, J. (2022). Le Petit Prince multilingual naturalistic fMRI corpus. Scientific Data, 9(1), 1-15.

Liben-Nowell, D., Strand, J., Sharp, A., Wexler, T., & Woods, K. (2019). The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition. Journal of Cognition2(1), 2. DOI: http://doi.org/10.5334/joc.51

Liu, Y., Shu, H., & Li, P. (2007). Word naming and psycholinguistic norms: Chinese. Behavior Research Methods, 39(2), 192-198.

Luke, S. G., & Christianson, K. (2018). The Provo Corpus: A large eye-tracking corpus with predictability norms. Behavior Research Methods, 50(2), 826-833.

Mak, M., & Willems, R. M. (2019). Mental simulation during literary reading: Individual differences revealed with eye-tracking. Language, Cognition and Neuroscience34(4), 511-535.

Mandera, P., Keuleers, E., & Brysbaert, M. (2020). Recognition times for 62 thousand English words: Data from the English Crowdsourcing Project. Behavior Research Methods, 52, 741–760. https://doi.org/10.3758/s13428-019-01272-8

Maziyah Mohamed, M., Yap, M.J., Chee, Q.W. & Jared, D. (2022). Malay Lexicon Project 2: Morphology in Malay word recognition. Memory & Cognition. https://doi.org/10.3758/s13421-022-01337-8

Miguel-Abella, R.S., Pérez-Sánchez, M.Á., Cuetos, F. et al. (2121). SpaVerb-WN—A megastudy of naming times for 4562 Spanish verbs: Effects of psycholinguistic and motor content variables. Behavior Research Methods. https://doi.org/10.3758/s13428-021-01734-y

Mousikou, P., Sadat, J., Lucas, R., & Rastle, K. (2017). Moving beyond the monosyllable in models of skilled reading: Mega-study of disyllabic nonword reading. Journal of Memory and Language, 93, 169-192.

Nemati, F., Westbury, C., Hollis, G. et al. The Persian Lexicon Project: minimized orthographic neighbourhood effects in a dense language. J Psycholinguist Res (2022). https://doi.org/10.1007/s10936-022-09863-x

Pan, J., Yan, M., Richter, E. M., Shu, H., & Kliegl, R. (2021). The Beijing Sentence Corpus: A Chinese sentence corpus with eye movement data and predictability norms. Behavior Research Methods. https://doi.org/10.3758/s13428-021-01730-2

Pexman, P. M., Heard, A., Lloyd, E., & Yap, M. J. (2017). The Calgary semantic decision project: concrete/abstract decision data for 10,000 English words. Behavior Research Methods, 49(2), 407-417.

Pritchard, S. C., Coltheart, M., Palethorpe, S., & Castles, A. (2012). Nonword reading: Comparing dual-route cascaded and connectionist dual-process models with human data. Journal of Experimental Psychology: Human Perception and Performance, 38(5), 1268.

Pynte, J., & Kennedy, A. (2006). An influence over eye movements in reading exerted from beyond the level of the word: Evidence from reading English and French. Vision Research, 46(22), 3786-3801.

Rayner, Keith; Abbott, Matthew J; Schotter, Elizabeth R; Belanger, Nathalie N; Higgins, Emily C; Leinenger, Mallorie; von der Malsburg, Titus; Plummer, Patrick (2015): Keith Rayner Eye Movements in Reading Data Collection. UC San Diego Library Digital Collections. http://dx.doi.org/10.6075/J0JW8BSV

Schmidtke, D., Van Dyke, J. A., & Kuperman, V. (2021). CompLex: An eye-movement database of compound word reading in English. Behavior Research Methods53, 59-77. pdf

Schröter, P., & Schroeder, S. (2017). The Developmental Lexicon Project: A behavioral database to investigate visual word recognition across the lifespan. Behavior Research Methods, 49(6), 2183-2203.

Seidenberg, M.S., & Waters, G.S. (1989). Word recognition and naming: A mega study. Bulletin of the Psychonomic Society, 27, 489.

Siegelman, N., Schroeder, S., Acartürk, C. et al. Expanding horizons of cross-linguistic research on reading: The Multilingual Eye-movement Corpus (MECO). Behav Res (2022). https://doi.org/10.3758/s13428-021-01772-6

Siew, C. S., Yi, K., & Lee, C. H. (2021). Syllable and letter similarity effects in Korean: Insights from the Korean Lexicon Project. Journal of Memory and Language116, 104170.

Smith, N. J., & Levy, R. (2013). The effect of word predictability on reading time is logarithmic. Cognition128(3), 302-319.

Spieler D. H., & Balota, D. A. (1997).  Bringing computational models of word naming down to the item level. Psychological Science, 8(6), 411-416.

Sui, L., Dirix, N., Woumans, E., & Duyck, W. (2022). GECO-CN: Ghent Eye-tracking COrpus of sentence reading for Chinese-English bilinguals. Behavior Research Methods, 1-21.

Sze, W. P., Liow, S. J. R., & Yap, M. J. (2014). The Chinese Lexicon Project: A repository of lexical decision behavioral responses for 2,500 Chinese characters. Behavior Research Methods, 46(1), 263-273.

Treiman, R., Mullennix, J., Bijeljac-Babic, R., & Richmond-Welty, E. D. (1995). The special role of rimes in the description, use, and acquisition of English orthography. Journal of Experimental Psychology: General, 124, 107-136.

Tsang, Y. K., Huang, J., Lui, M., Xue, M., Chan, Y. W. F., Wang, S., & Chen, H. C. (2018). MELD-SCH: A megastudy of lexical decision in simplified Chinese. Behavior Research Methods, 50(5), 1763-1777.

Tse, C. S., Yap, M. J., Chan, Y. L., Sze, W. P., Shaoul, C., & Lin, D. (2017). The Chinese Lexicon Project: A megastudy of lexical decision performance for 25,000+ traditional Chinese two-character compound words. Behavior Research Methods, 49(4), 1503-1519.

Tse, C. S., Chan, Y. L., Yap, M. J., & Tsang, H. C. (2023). The Chinese Lexicon Project II: A megastudy of speeded naming performance for 25,000+ traditional Chinese two-character words. Behavior Research Methods.

Tucker, B. V., Brenner, D., Danielson, D. K., Kelley, M. C., Nenadić, F., & Sims, M. (2019). The Massive Auditory Lexical Decision (MALD) database. Behavior Research Methods.

Winsler, K., Midgley, K. J., Grainger, J., & Holcomb, P. J. (2018). An electrophysiological megastudy of spoken word recognition. Language, Cognition and Neuroscience, 1-20.

Yap, M. J., Liow, S. J. R., Jalil, S. B., & Faizal, S. S. B. (2010). The Malay Lexicon Project: A database of lexical statistics for 9,592 words. Behavior Research Methods, 42(4), 992-1003.

Zdorova, N., Parshina, O., Ogly, B., Bagirokova, I., Krasikova, E., Ziubanova, A., … & Dragoy, O. (2023). Eye movement corpora in Adyghe and Russian: an eye-tracking study of sentence reading in bilinguals. Frontiers in Psychology14, 1212701.

Zettersten, M., Bergey, C. A., Bhatt, N., Boyce, V., Braginsky, M., Carstensen, A., deMayo, B., Kachergis, G., Lewis, M., Long, B., MacDonald, K., Mankewitz, J., Meylan, S. C., Saleh, A. N., Schneider, R. M., Tsui, A., Uner, S., Xu, T. L., Yurovsky, D., & Frank, M.C. (2021). Peekbank: Exploring children’s word recognition through an open, large-scale repository for developmental eye-tracking data. Proceedings of the 43rd Annual Conference of the Cognitive Science Society.

Zhang, G., Yao, P., Ma, G., Wang, J., Zhou, J., Huang, L., … & Li, X. (2022) The database of eye-movement measures on words in Chinese reading. Nature Scientific Data, 9:411.

Comments are closed.