Is there any way to find proper nouns using NLTK WordNet?Ie., Can i tag Possessive nouns using nltk Wordnet ?
2 Answers
I don't think you need WordNet to find proper nouns, I suggest using the Part-Of-Speech tagger pos_tag
.
To find Proper Nouns, look for the NNP
tag:
from nltk.tag import pos_tag
sentence = "Michael Jackson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')]
propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
# ['Michael','Jackson', 'McDonalds']
You may not be very satisfied since Michael
and Jackson
is split into 2 tokens, then you might need something more complex such as Name Entity tagger.
By right, as documented by the penntreebank
tagset, for possessive nouns, you can simply look for the POS
tag, http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html. But often the tagger doesn't tag POS
when it's an NNP
.
To find Possessive Nouns, look for str.endswith("'s") or str.endswith("s'"):
from nltk.tag import pos_tag
sentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')]
possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")]
# ["Jackson's", "Agnes'"]
Alternatively, you can use NLTK ne_chunk
but it doesn't seem to do much other unless you are concerned about what kind of Proper Noun you get from the sentence:
>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk
>>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
[Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])]
>>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))]
['Michael', 'Jackson', 'Daniel']
Using ne_chunk
is a little verbose and it doesn't get you the possessives.

- 115,346
- 109
- 446
- 738
-
1Thank you for this solution, I implemented it as a console script last November - https://github.com/dereckson/extract-proper-nouns - and successfully imported the list of the proper names from a novel. – Dereckson Mar 27 '15 at 20:36
-
1Glad that the answer help, it's great to see that you have a ready solution for other people who's trying to perform the same task =) – alvas Mar 27 '15 at 23:17
-
3Is it possible to use nltk for extracting proper nouns from some _unstructured_ text like a log file where proper nouns are with **mixed case** and text is **not completely grammatically correct**? Thanks – user2436428 Jan 18 '16 at 08:30
-
1@user2436428 Not really but there's no harm trying. It's sort of an irony to have improper proper nouns. What you need is something more like a named entity recognition, see http://stackoverflow.com/questions/34439208/nltk-stanfordnertagger-how-to-get-proper-nouns-without-capitalization/34458164#34458164 – alvas Jan 18 '16 at 09:38
-
1@alvas nltk is asking me to do nltk.download() in order to use it. But it is giving error HTTP 405 Not allowed? – Ankush Rathi Jul 26 '17 at 15:45
-
Are you sure the first word in the sentence is tagged as `NNP` ? To check. Depends on the version perhaps. – Catalina Chircu Sep 23 '19 at 04:07
I think what you need is a tagger, a part-of-speech tagger. This tool assigns a part-of-speech tag (e.g., proper noun, possesive pronoun etc) to each word in a sentence.
NLTK includes some taggers: http://nltk.org/book/ch05.html
There's also the Stanford Part-Of-Speech Tagger (open source too, better performance).

- 8,546
- 8
- 38
- 50