python - Get a unique list of strings in pandas after a split() operation -
i'm getting started pandas, , have one column of data in larger dataframe such
0 1 2 1 2 7 6 2 3 1 5 3 7 5 five 8 4 6 4 5 3 dtype: object
and i'd split sequences of words component parts, unique set or counts words. can split fine
numbers.str.split(' ') 0 [one, two] 1 [two, seven, six] 2 [three, one, five] 3 [seven, five, five, eight] 4 [six, four] 5 [three] dtype: object
however, i'm not sure go here. again, i'd have output such
['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight']
or same in dictionary counts, or in series/dataframe equivalent of 1 of these two.
the best i've been able far use apply() in combination set unique words. pandas elegant package i've seen far, , seems within easy reach knows better do.
thanks in advance!
if understand correctly, think follows using pandas. i'll start series before split strings:
print s 0 1 2 1 2 7 6 2 3 1 5 3 7 5 five 8 4 6 4 5 3 stacked = pd.dataframe(s.str.split().tolist()).stack() print stacked 0 0 1 1 2 1 0 2 1 7 2 6 2 0 3 1 1 2 5 3 0 7 1 5 2 5 3 8 4 0 6 1 4 5 0 3
now compute value counts of series:
print stacked.value_counts() 5 3 1 2 3 2 6 2 two 2 7 2 8 1 4 1 dtype: int64
Comments
Post a Comment