Philips Tech Support

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 8 July 2009

NLTK on Ubuntu Quick Start Guide

Posted on 09:59 by Unknown

Update July 19, 2009 : You can now use nltk python egg instead, read the NLTK Installation with Python setuptools post.

While attending a short program in computational linguistics at Dravidian University, Dr. Arul introduced me to NLTK (Natural Language Toolkit). It was full two years before that I finally decided to have a close look at it. Like most linguists at the lab I used Perl programming language. With new version of NLTK 2.0 released last month, NLTK now works with python 2.6. Here a quick start guide for NLTK on Ubuntu Linux.

Installing NLTK on Ubuntu with Python 2.6

At the time of writing this post the Debian package on NLTK download page is built for Python 2.5. Ubuntu ships with Python 2.6 by default. So you need to download the source package from the NLTK download page.

NLTK needs some dependency modules, lets install them.

sudo apt-get install python-numpy python-matplotlib prover9

Uncompress the source package and run the NLTK setup.

$ unzip nltk-2.0b3.zip

$ cd nltk-2.0b3/

$ ls build LICENSE.txt nltk PKG-INFO README.txt setup.py yaml

$ sudo python setup.py install

After finishing the NLTK setup, you should download the NLTK data which contains various corpora, tagsets and treebank data etc.

$ python

Python 2.6.2+ (release26-maint, Jun 19 2009, 15:14:35)

[GCC 4.4.0] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> import nltk

>>> nltk.download()

NLTK Data downloader window

Learning NLTK

NLTK Book coverThe best place to start is the NLTK book Natural Language Processing with Python Analyzing Text with the Natural Language Toolkit. The book is released under public domain, so you can read it online on NLTK website itself. I would recommand you to buy a copy of this book as the procceds will go into the future development of NLTK.





There aren't many videos about NLTK. I recently stumbled upon this video lecture by the trinity of NLTK Steven Bird, Ewan Klein, and Edward Loper.

If you are new to computational linguistics and need good grounding in this field you should also consider reading these texts.

Speech and Language Processing (2nd Edition) book coverSpeech and Language Processing (2nd Edition)



Natural Language Understanding (2nd Edition) book cover Natural Language Understanding (2nd Edition)



Foundations of Statistical Natural Language Processing book cover Foundations of Statistical Natural Language Processing



Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in Computational Linguistics, Linux, NLP, NLTK, python, ubuntu | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Google Apps Script for Google Sites
    Google Apps Script started as functions for Google Spreadsheets. Today Apps Scripts have evolved into powerful development environment for e...
  • Naranda Muni Assocation
    Juan says 'I now dub thee, president arky ' Initiation into Hipatia community Browsing this blog on Nokia N95 at the IFFK in Kerala...
  • Acer Aspire One Battery Charging Problem
    If you believe amnesia is a malady that effects humans, think again. Recently a Acer Aspire One netbook landed on my table with a similar p...
  • PyCon India 2009
    Most action happens on conference sidelines PyCon India 2009 is no expection. You can always watch the recordings of talks later on. Catch...
  • Prayas The Tinkerer
    Ran into Prayas the tinkerer while in Bangalore. He introduced me to two of his really interesting projects. CANOPY: temporary roofs for t...
  • Free Software Camp for Ngo's and Civil Society Groups in Delhi, India
    Free Software Workshop for NGOs and Civil Society Groups Effectively compute and communicate in your local language using free and open sour...
  • Swathi Sangeethotsavam 2010
    The thunderstorm lasted all night, it was raining all sunday afternoon. The lake water is rising again, I resist the idea of going for a swi...
  • ALSA Jack Sense problem in Jaunty
    Update: Ubuntu Lucid user should read the this updated post Headphone jack sense problem in Ubuntu 10.04 Lucid Lynx instead. The newly or...
  • National Open Source Conference of Afghanistan (NOSCA) 2013
    This week National Open Source Conference of Afghanistan (NOSCA) starts in Jalalabad, Afghanistan. The event is organized by the National I...
  • Installing mod_pagespeed
    You must have heard Google released mod_pagespeed module for the Apache web server . The mod_pagespeed module improves web page speeds by ...

Categories

  • "compiz-fusion"
  • "film making"
  • "Graphic Design"
  • "martial art"
  • 01-18-2012
  • 10.04
  • 10.10
  • 3dprinting
  • 9.04
  • a11y
  • accerciser
  • accessibility
  • Activism
  • adobe
  • aegis
  • africa
  • AIR
  • alsa
  • Andhra Pradesh
  • android
  • angling
  • Animals
  • apache
  • apple
  • apport
  • Apps Script
  • architecture
  • ardour
  • arduino
  • ARM
  • art
  • audio description
  • bangalore
  • barcamp
  • barcamphanoi
  • barcampkl
  • barcamppp
  • barcampsaigon
  • barcampvte
  • bash
  • bcy2011
  • beercamp
  • biofuel
  • bittorrent
  • blackout
  • blender
  • blind
  • blogging
  • book
  • boot2gecko
  • braille
  • broadcom
  • bugs
  • bzr
  • Calicut
  • cambodia
  • canon
  • Canopy
  • cartoons
  • cat
  • CC
  • CDMA
  • censorship
  • CHDK
  • children
  • china
  • Chromium
  • Climate Change
  • cms
  • Comedy
  • comics
  • command line
  • compiz
  • Computational Linguistics
  • cpan
  • Creative Commons
  • cyanogenmod
  • DAISY
  • debian
  • delhi
  • design
  • dhvani
  • django
  • documentation
  • dontzap
  • dots
  • drupal
  • drush
  • earth hour
  • easy_install
  • eclipse
  • Ecuador
  • embedded linux
  • Environment
  • espeak
  • events
  • fennec
  • ffmpeg
  • film
  • film making
  • firefox
  • firefox3
  • firefox4
  • firefoxOS
  • firmware
  • fishing
  • fossasia
  • free culture
  • free software
  • fsfs
  • fx4
  • G1
  • gadgets
  • gdm
  • geek humour
  • Gimp
  • GISS
  • git
  • gnewsense
  • gnome
  • google
  • gta02
  • GUI Testing
  • hack
  • hackable1
  • hacker
  • handbrake
  • hanoi
  • hanoitweetup
  • hardware
  • hardy heron
  • hipatia
  • html5
  • humour
  • hunspell
  • ICANN41
  • iceweasel
  • identi.ca
  • iffk
  • IISE
  • india
  • Indian Languages
  • intel
  • interaction design
  • internet
  • intersat
  • Intrepid
  • Intrepid Ibex
  • ipod
  • jam
  • jaunty
  • Java
  • Javascript
  • josm
  • karmic
  • kerala
  • kernel
  • keyboard
  • kid
  • kiddy video
  • kids
  • kinect
  • kiosk
  • l10n
  • laos
  • launchpad
  • ldap
  • libreoffice
  • Linux
  • local weather
  • lucid
  • machine translation system
  • maemo
  • mallard
  • manga
  • maps
  • maverick
  • mediawiki
  • meego
  • mencoder
  • merkaartor
  • micro-blogging
  • midori
  • Mobile
  • moblin
  • mod_pagespeed
  • modem
  • mozcamp
  • mozilla
  • mplayer
  • music
  • mwc2012
  • myanmar
  • mymozl10n
  • mysql
  • n70
  • nature
  • nedumangad
  • neo freerunner
  • Neo1973
  • nepal
  • netbooks
  • NGO
  • NLP
  • NLTK
  • Nokia
  • Nonprofits
  • notify-osd
  • NUI
  • nvda
  • OLPC
  • ooffice
  • openDNS
  • openmoko
  • openNI
  • openOffice
  • openoffice.org
  • OpenStreetMap
  • orca
  • oscar
  • OSM
  • packaging
  • PDF
  • people
  • perl
  • Pets
  • photography
  • pipa
  • pokhara
  • postfix
  • potlatch
  • powershot
  • programming
  • pune
  • puppylinux
  • python
  • pythonegg
  • radio show
  • Recycling
  • red nose day
  • rms
  • RND
  • robots
  • rockbox
  • rubber
  • ruby
  • rural
  • s60
  • sahana
  • samba
  • samsung
  • scipy
  • security
  • SFD2011
  • shell
  • silk
  • singapore
  • skype
  • social media
  • software-center
  • softwarefreedomday
  • solar
  • solar eclipse
  • sopa
  • speakers
  • spins
  • stallman
  • startups
  • system-adminstration
  • t-shirt
  • tablet
  • tactile watch
  • tea shops
  • technology
  • tee
  • terminal
  • Testing
  • theatre
  • tibet
  • tracker
  • travel
  • trek
  • trekking
  • tux4kids
  • tuxmath
  • tv
  • tweets
  • twitter
  • ubuntu
  • UNR
  • uTouch
  • UX
  • UXA
  • vagrant
  • VCS
  • veli
  • vidarbha
  • video
  • virutalization
  • vsat
  • w3c
  • watches
  • water from dew
  • weather stations
  • weave
  • web automation
  • web standards
  • web testing
  • web2py
  • webmaker
  • Wiki
  • wikia
  • wikipedia
  • Windows
  • Windows XP
  • wordpress
  • wvdial
  • X-Window-System
  • X11
  • xorg
  • yelp
  • Zii
  • ZTE

Blog Archive

  • ►  2013 (17)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  May (3)
    • ►  April (1)
    • ►  March (1)
    • ►  February (5)
    • ►  January (2)
  • ►  2012 (26)
    • ►  December (3)
    • ►  November (1)
    • ►  October (1)
    • ►  July (1)
    • ►  June (3)
    • ►  May (6)
    • ►  April (1)
    • ►  March (8)
    • ►  January (2)
  • ►  2011 (43)
    • ►  December (2)
    • ►  November (7)
    • ►  October (8)
    • ►  September (4)
    • ►  August (5)
    • ►  June (1)
    • ►  February (6)
    • ►  January (10)
  • ►  2010 (73)
    • ►  December (17)
    • ►  November (5)
    • ►  October (10)
    • ►  September (3)
    • ►  August (8)
    • ►  July (9)
    • ►  June (4)
    • ►  March (5)
    • ►  February (7)
    • ►  January (5)
  • ▼  2009 (108)
    • ►  December (7)
    • ►  November (10)
    • ►  October (8)
    • ►  September (6)
    • ►  August (8)
    • ▼  July (4)
      • Better Gnome Desktop Magnification with eZoom
      • NLTK Installation with Python easy_install
      • GDM Timed Login
      • NLTK on Ubuntu Quick Start Guide
    • ►  June (5)
    • ►  May (6)
    • ►  April (15)
    • ►  March (15)
    • ►  February (9)
    • ►  January (15)
  • ►  2008 (33)
    • ►  December (33)
Powered by Blogger.

About Me

Unknown
View my complete profile