Semantic vectors for words in English and Dutch

Algorithms become increasingly powerful to derive word meanings from word co-occurrences in texts. Paweł Mandera has compared the various algorithms to select the best one so far for use in psycholinguistic research. This turns out to be the Continuous Bag of Words (CBOW) model (Mikolov, Chen, Corrado, & Dean, 2013) based on a combined corpus of texts and subtitles. The findings have now been accepted for publication in the Journal of Memory and Language. This is the pdf. Please refer to it as:

Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92, 57-78.

More interestingly, Paweł also makes the semantic vectors available online and created an easy to use shell program and a web interface for those who feel not confident enough to program. So, now everyone can calculate the semantic distance (or semantic similarity) based on CBOW between any two words online in English and Dutch. More information can be found here.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Comments are closed.