Thursday, March 15, 2012

How to install the NLTK package for Python 2.7 on Windows

These instructions describe how to install NLTK for Python 2.7 on Windows 7. I installed them using Win 7 Professional with SP1)

Note that for these instructions, I am using Python 2.7 (32-bit). There is a known bug (unresolved since 2009!!) with the installers for 64-bit version of Python, so I removed it and installed 32-bit Python instead. I also did not use Python 3.2 as there are limited packages that are compatible with it at this point in time (15th March 2012). This will most likely change in future, but for now it's 2.7.

These instructions assume you have already installed Python 2.7 (32-bit) and set the PATH variable to include 'C:\Python27\' (or wherever you installed Python).

Here are the steps.

1. The best way to install NLTK is using easy-install. This is part of the setuptools package (http://pypi.python.org/pypi/setuptools). Download this and install it. [My install used: setuptools-0.6c11.win32-py2.7.exe]

Be sure to set the path to the easy install directory (If you're like me you might have skipped through the instructions in the install windows, but read this part: "
Once installation is complete, you will find an ``easy_install.exe`` program in your Python ``Scripts`` subdirectory. Be sure to add this directory to your ``PATH`` environment variable, if you haven't already done so.")

2. Install NumPy. Download from: http://numpy.scipy.org/. [My install used: numpy-1.6.1-win32-superpack-python2.7.exe]

3. Install PyYAML. Download from: . [My install used: PyYAML-3.10.win32-py2.7.exe]

4. Use easy-install to install NLTK. Just type something similar to the following to install: "
\Python27\lib\site-packages\easy_install.py nltk".

5. Install all the corpora required using the following command: "python -m nltk.downloader -d D:\Python27\nltk_data all"

Note that I've used -d D:\Python27\nltk_data. You can exclude this if you just want to install it in the default directory C:\nltk_data. I just wanted to install it in the specified location.
Find more info here: http://www.nltk.org/data

If any packages fail to install just press n on the retry option as shown:

[nltk_data] | Downloading package 'punkt' to
[nltk_data] | D:\Python27\nltk_data...
[nltk_data] | Unzipping tokenizers\punkt.zip.
[nltk_data] | Error with downloaded zip file
Error installing package. Retry? [n/y/e]
n

And run the following at the python command line:

>>> import nltk
>>> nltk.download()
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
Identifier> punkt
Downloading package 'punkt' to D:\python27\nltk_data...
Unzipping tokenizers\punkt.zip.

---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> q

Good Luck!

14 comments:

  1. Thank you so much! You practically saved my day. I was in deep shit till I read your blog and realized that Python 64-bit installer had that one bug. I owe a big one to you. Thank you!

    ReplyDelete
  2. Thanks a lot !
    You saved me from killing myself by frustration.

    ReplyDelete
    Replies
    1. Me too - that's why I wrote up these instructions. Glad I could help you too!

      Delete
  3. Liz , I followed your instructions , it gives error on import ntlk :(

    ReplyDelete
    Replies
    1. I got an error on import as well. Says 'No Disk' found on drive.

      Delete
  4. Thank you so much! I never realized that I need 32-bit python for nltk! Lifesaving!

    ReplyDelete
  5. I followed the instructions given here with a 32 bit version instead of 64 bit version & everything worked like a charm! Thank you very much :-)

    ReplyDelete
  6. Where do we need to type the following command:

    Just type something similar to the following to install: "\Python27\lib\site-packages\easy_install.py nltk".

    I have installed Python 2.7.9
    Moreover I have just installed setuptools 12.1 which was already present in Python27. I installed sublime text 2.0. I did not install Numpy and PyYAML. I have straight away gone to step 4 so I am not understanding where do I need to type the above command in command prompt or in sublime.

    I have easy_install_scripts.py file in C:\Python27\Scripts path and not in C:\Python27\Lib\site-packages

    Please help
    Arc

    ReplyDelete
  7. I'm facing some problem in the 5th step! It displaying invalid syntax.

    ReplyDelete
  8. Superb, what a weblog it is! This website presents helpful information to us, keep
    it up. etutorialspoint

    ReplyDelete