Downloading
Different files are available for different purposes. See this post for more information on working with the PoS information in SUBTLEX-NL.
The most recent Excel files with Pos information and Zipf frequencies
- Word frequency files on osf: one file with all words observed in the corpus (437K) and a file with all words observed in at least 2 films (150K). The latter is more interesting for most searches as it contains less noise.
- The Zipf frequencies are based on the equation Zipf=LOG10((frequency+1)/44.106)+3.
- For words not present in the database (i.e., with zero frequency), the Zipf value is Zipf=LOG10(1/44.106)+3 = 1.3555.
- Information on why you should use Zipf frequencies.
Letter strings with a lemma contextual diversity above 2 (134,723 entries).
- Tab delimited text (zip)
- Tab delimited text -with POS information (zip)
- Excel (zip)
- Excel – with POS information (zip)
- R
- R – with POS information
All letter strings (437,503 entries).
- Tab delimited text (zip)
- Tab delimited text – with POS information (zip)
- Excel (zip)
- Excel – with POS information (zip)
- R
- R – with POS information
Lemmas and wordforms with a lemma contextual diversity above 2, automatically POS-tagged (89,564 lemmas,182,099 wordforms)
All lemmas and wordforms, automatically POS-tagged (446,488 lemmas,554,339 wordforms)