Leanlex
What is Leanlex
Leanlex is experimental software developed by Emmanuel Keuleers to provide access to the Dutch and German parts of the CELEX lexical database through the python programming language. Leanlex differs radically from current interfaces in that it offers a rich object oriented representation of lexical information. The philosophy of leanlex is that lexical information should be mapped to appropriate data structures. For example, in Leanlex, a syllabic representation of an entry is not a string separated by dashes, but a true list of syllables; likewise the hierarchical morphological decomposition of a word is represented by specialized morphology objects which can be easily navigated.
Leanlex Development
Leanlex development has ended. Older releases can be downloaded right now. Leanlex comes with scripts that build object databases from other lexical information sources. If you have the CELEX95 CD, the scripts bundled with leanlex will now import almost all lexical information on Dutch wordforms and lemmas as well as English wordforms. Please note that the files needed to build databases for leanlex are not part of the leanlex distribution. You absolutely need the CELEX95 CD (which is not free).
Leanlex Documentation
A good introduction to leanlex is the presentation I gave at the Python workshop at the Max Planck Institute for Psycholinguistics in Nijmegen on May 4th 2006. Currently, they are the best introduction to leanlex (next to the code itself, which is well documented).
Obtaining Leanlex
This release includes (import scripts for) Celex English Wordforms and Lemmas.
The package is now split in two parts: leanlex (the library), and leanlexdata (the lexicons).
Download the leanlex interface library here: http://crr.ugent.be/leanlex/leanlex-0.6.zip
Download the leanlex data library here: http://crr.ugent.be/leanlex/leanlexdata-0.6.zip
The bad news
The packages available for download to the general public will only work on unix-like systems (Mac OS X, Linux, BSD, …). Unless you are very familiar with Python and Windows, you will probably not be able to install these packages for Windows.
The good news
Unix and Windows packages with lexica included are in fact available, but cannot be distributed freely. If you or your organization has a licensed version of the CELEX lexical databases, please contact me, so I can give you the necessary directions.
Starting with leanlex
The best way to get started with leanlex is by looking at the examples in the examples directory in the leanlex distribution or by looking at the documentation in the code itself.
Bugs
Remember that leanlex never came out of its development stage. Especially the English lexica have not been extensively tested. File your bug-reports with emmanuel.keuleers@ugent.be.