write down,forget

Natural Language Toolkit Install

<Category: NLP> 查看评论


Although Python 3.0 is now available, has not yet been ported.  For now you should use with Python 2.4.*, 2.5.*, or 2.6.* only.  3.0 will be available by mid 2011.

NB. 2.0b9 will not work with Python 2.4 (to be fixed for 2.0 final).After installing the software you should also install the data.


You will need to obtain the following packages and install them on your computer using an administrator account:

The following optional packages are used by some NLTK modules, you can install them later if you find that you need them:

Mac OS X

Before installing any new packages for NLTK, check whether you have a sufficiently recent version of Python. First, open a terminal window (Applications > Utitlities > Terminal) and type python -V in the window (NB this is a capital V). You should get back a message that looks something like Python 2.3.5.  This shows that your version of Python (possibly the version that comes pre-installed on Mac OS X) is version 2.3.5, which means that you need to install a newer version. However, if the version number is of the form 2.4.*, 2.5.* or 2.6.*, then proceed with the steps below on installing PyYAML and NLTK. If you need to upgrade your version of Python, you have two main options. First, you can get the latest 2.6.* version of MacPython:

After downloading it,  install the disk image file like other Mac OS X applications by double-clicking on the package icon.

Alternatively, if you use MacPorts, then you can open a terminal window and type sudo port install python26.  After doing this, get NLTK by typing sudo port install py26-nltk.  This will install NLTK 2.0b9 and should also fetch all its required packages, including Numpy, PyYAML, and Matplotlib. Note that py25-nltk and py-nltk (for Python 2.4) are also available.

If you are not using MacPorts, and have checked for the presence of a sufficiently recent version of Python, then you should install two further packages at this point. First, download PyYAML:

We’ll assume that this has downloaded into your Downloads folder. In a Finder window, you can unpack the archive PyYAML-3.09.tar by double-clicking it; this will create a new folder called PyYAML-3.09. In order to install PyYAML, you will need to open a terminal window, and change directory by typing cd Downloads/PyYAML-3.09. Now type sudo python setup.py install. Finally, download and install NLTK:

In order to test whether NLTK has been installed correctly, check whether you can import NLTK into your Python session. If you are running MacPython, start up IDLE, Python’s Integrated Development intervface, with Applications > MacPython 2.6 > IDLE. Alternatively, open a terminal window, and type python. Now, type import nltk at the Python >>> prompt. If the prompt returns silently, then your installation was successful. If not, try the Troubleshooting suggestion given below.

The following optional packages are used by some NLTK modules; you can install them later if you find that you need them, and if they weren’t already installed automatically:

Install the dmg packages in the usual way.  Install Prover9 by unpacking and building the distribution, then move it to a standard location such as /usr/share/prover9, and set the PROVER9HOME environment variable to /usr/share/prover9/bin.


It is possible that the NLTK installer gives an error message like “Errors occurred. Try installing again.” This may indicate that the NLTK installer could not locate a suitable version of Python on your machine, or that you have more than one suitable version of Python installed. In this case, open a terminal window, type cd /tmp/nltk-installer and then type sudo python setup.py install.


NLTK is included in Ubuntu 10.4 (Lucid Lynx).  Get it with: sudo apt-get install python-nltk

NLTK requires Python 2.4.*, 2.5.*, or 2.6.* (check with python -V).  In the unlikely event that you need to install Python, you can do this using your favorite package manager, or find a suitable RPM, or download and build Python from source.

The following optional packages are used by some NLTK modules, you can install them later if you find that you need them.  Use your package manager to install numpy, pyyaml, and matplotlib, or download them:

Install Numpy and Matplotlib by unpacking the source distribution and running sudo python setup.py install
Install Prover9 following the instructions in the Macintosh section above.

OpenSUSE: install the python-nltk package in the “devel:languages:python” repository (more info).

Now proceed with NLTK Source Installation (below).

NLTK Source Installation

These instructions are for Mac OS X, Linux and Unix platforms.

Check your Python installation is adequate: open a terminal and type python -V to see what version you get; once you get the python prompt type import numpy to check you can load the numerical library (set your PATH and PYTHONPATH environment variables if necessary).

Download the NLTK source distribution:

Unzip the archive; this will create a new folder nltk-2.0.1rc1. Open the terminal and cd into this new folder, and type

Once you have done this installation step you can remove the nltk-2.0.1rc1 folder and the zip file.

Debian Installation

NLTK is available as a Debian package from the following URL:

Installation to Non-Standard Location

If you don’t want to install NLTK in a central location, download and unpack the zip distribution:

Move NLTK to the desired location, then add this location to your PYTHONPATH

Older versions

Older versions of NLTK are available at:


DATA Install


Available Data

NLTK comes with many corpora, toy grammars, trained models, etc.   A complete list is posted at: http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml

Apart from individual data packages, you can download the entire collection (using “all”), or just the data required for the examples and exercises in the book (using “book”).

To install the data, first install NLTK, then use NLTK’s data downloader, as described below.
Interactive installer

For central installation on a multi-user machine, do the following from an administrator account.

Run the Python interpreter (instructions), and type the commands:

>>> import nltk
>>> nltk.download()

A new window should open, showing the NLTK Downloader.  Click on the File menu and select Change Download Directory.  For central installation, set this to C:\nltk_data (Windows), or/usr/share/nltk_data (Mac, Unix).  Next, select the packages or collections you want to download.

If you did not install the data to one of the above central locations, you will need to set the NLTK_DATA environment variable to specify the location of the data.  (On a Windows machine, do right click on “My Computer”, select Properties > Advanced > Environment Variables > User Variables > New…)

Test that the data has been installed as follows.  (This assumes you downloaded the Brown Corpus):

>>> from nltk.corpus import brown
>>> brown.words()
[‘The’, ‘Fulton’, ‘County’, ‘Grand’, ‘Jury’, ‘said’, …]

Installing via a proxy web server

If your web connection uses a proxy server, you should specify the proxy address as follows.  In the case of an authenticating proxy, specify a username and password.  If the proxy is set to None then this function will attempt to detect the system proxy.  (NB this support was added on 21 Sep 2010, and needs a release more recent than 2.0b9.)

Command line installation

The downloader will search for an existing nltk_data directory to install NLTK data.  If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace.  If necessary, run the download command from an administrator account, or using sudo.  The default system location on Windows is C:\nltk_data; and on Mac and Unix is /usr/share/nltk_data.  You can use the -d flag to specify a different location (but if you do this, be sure to set the NLTK_DATA environment variable accordingly).

Python 2.5 and 2.6: Run the command “python -m nltk.downloader all”.  To ensure central installation, run the command “sudo python -m nltk.downloader -d /usr/share/nltk_data all“.

Python 2.4: Locate downloader.py inside Python’s site-packages/nltk directory, then run the command “python downloader.py all”.  To ensure central installation, run the command “sudo python downloader.py -d /usr/share/nltk_data all“.

Windows: Use the “Run…” option on the Start menu.  Windows Vista users need to first turn on this option, using Start -> Properties -> Customize to check the box to activate the “Run…” option.

Test the installation: Check that the user environment and privileges are set correctly by logging in to a user account,
starting the Python interpreter, and accessing the Brown Corpus (see the previous section).


本文来自: Natural Language Toolkit Install