nltk - TypeError - Translate takes one argument.(2 given) Python -


i have following code

import nltk, os, json, csv, string, cpickle scipy.stats import scoreatpercentile  lmtzr = nltk.stem.wordnet.wordnetlemmatizer()  def sanitize(wordlist):  answer = [word.translate(none, string.punctuation) word in wordlist]  answer = [lmtzr.lemmatize(word.lower()) word in answer] return answer  words = [] filename in json_list:     words.extend([sanitize(nltk.word_tokenize(' '.join([tweet['text']                     tweet in json.load(open(filename,read))])))]) 

i've tested lines 2-4 in separate testing.py file when wrote

import nltk, os, json, csv, string, cpickle scipy.stats import scoreatpercentile  wordlist= ['\'the', 'the', '"the'] print wordlist wordlist2 = [word.translate(none, string.punctuation) word in wordlist] print wordlist2 answer = [lmtzr.lemmatize(word.lower()) word in wordlist2] print answer  freq = nltk.freqdist(wordlist2) print freq 

and command prompt returns ['the','the','the'], wanted (removing punctuation).

however, when put exact same code in different file, python returns typeerror stating that

file "foo.py", line 8, in <module>   tweet in json.load(open(filename, read))])))]) file "foo.py", line 2, in sanitize   answer = [word.translate(none, string.punctuation) word in wordlist] typeerror: translate() takes 1 argument (2 given) 

json_list list of file paths (i printed , check list valid). i'm confused on typeerror because works fine when i'm testing in different file.

i suspect issue has differences between str.translate , unicode.translate (these differences between str.translate on python 2 versus python 3). suspect original code being sent unicode instances while test code using regular 8-bit str instances.

i don't suggest converting unicode strings regular str instances, since unicode better type handling text data (and future!). instead, should adapt new unicode.translate syntax. regular str.translate (on python 2), can pass optional deletechars argument , characters in removed string. unicode.translate (and str.translate on python 3), argument no longer allowed, translation table entries none value deleted output.

to solve problem you'll need create appropriate translation table. translation table dictionary mapping unicode ordinals (that is, ints) ordinals, strings or none. helper function making them exists in python 2 string.maketrans (and python 3 method of str type), python 2 version of doesn't handle case care (putting none values table). can build appropriate dictionary {ord(c): none c in string.punctuation}.


Comments

Popular posts from this blog

c++ - How to add Crypto++ library to Qt project -

jQuery Mobile app not scrolling in Firefox -

how to receive file in java(servlet/jsp) -