nltk - TypeError - Translate takes one argument.(2 given) Python -
i have following code
import nltk, os, json, csv, string, cpickle scipy.stats import scoreatpercentile lmtzr = nltk.stem.wordnet.wordnetlemmatizer() def sanitize(wordlist): answer = [word.translate(none, string.punctuation) word in wordlist] answer = [lmtzr.lemmatize(word.lower()) word in answer] return answer words = [] filename in json_list: words.extend([sanitize(nltk.word_tokenize(' '.join([tweet['text'] tweet in json.load(open(filename,read))])))])
i've tested lines 2-4 in separate testing.py file when wrote
import nltk, os, json, csv, string, cpickle scipy.stats import scoreatpercentile wordlist= ['\'the', 'the', '"the'] print wordlist wordlist2 = [word.translate(none, string.punctuation) word in wordlist] print wordlist2 answer = [lmtzr.lemmatize(word.lower()) word in wordlist2] print answer freq = nltk.freqdist(wordlist2) print freq
and command prompt returns ['the','the','the'], wanted (removing punctuation).
however, when put exact same code in different file, python returns typeerror stating that
file "foo.py", line 8, in <module> tweet in json.load(open(filename, read))])))]) file "foo.py", line 2, in sanitize answer = [word.translate(none, string.punctuation) word in wordlist] typeerror: translate() takes 1 argument (2 given)
json_list list of file paths (i printed , check list valid). i'm confused on typeerror because works fine when i'm testing in different file.
i suspect issue has differences between str.translate
, unicode.translate
(these differences between str.translate
on python 2 versus python 3). suspect original code being sent unicode
instances while test code using regular 8-bit str
instances.
i don't suggest converting unicode strings regular str
instances, since unicode
better type handling text data (and future!). instead, should adapt new unicode.translate
syntax. regular str.translate
(on python 2), can pass optional deletechars
argument , characters in removed string. unicode.translate
(and str.translate
on python 3), argument no longer allowed, translation table entries none
value deleted output.
to solve problem you'll need create appropriate translation table. translation table dictionary mapping unicode ordinals (that is, int
s) ordinals, strings or none
. helper function making them exists in python 2 string.maketrans
(and python 3 method of str
type), python 2 version of doesn't handle case care (putting none
values table). can build appropriate dictionary {ord(c): none c in string.punctuation}
.
Comments
Post a Comment