python - Get a unique list of strings in pandas after a split() operation -


i'm getting started pandas, , have one column of data in larger dataframe such

0                  1 2 1            2 7 6 2           3 1 5 3    7 5 five 8 4                 6 4 5                    3 dtype: object 

and i'd split sequences of words component parts, unique set or counts words. can split fine

numbers.str.split(' ')  0                    [one, two] 1             [two, seven, six] 2            [three, one, five] 3    [seven, five, five, eight] 4                   [six, four] 5                       [three] dtype: object 

however, i'm not sure go here. again, i'd have output such

['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight'] 

or same in dictionary counts, or in series/dataframe equivalent of 1 of these two.

the best i've been able far use apply() in combination set unique words. pandas elegant package i've seen far, , seems within easy reach knows better do.

thanks in advance!

if understand correctly, think follows using pandas. i'll start series before split strings:

print s  0                  1 2 1            2 7 6 2           3 1 5 3    7 5 five 8 4                 6 4 5                    3  stacked = pd.dataframe(s.str.split().tolist()).stack() print stacked  0  0      1    1      2 1  0      2    1    7    2      6 2  0    3    1      1    2     5 3  0    7    1     5    2     5    3    8 4  0      6    1     4 5  0    3 

now compute value counts of series:

print stacked.value_counts()  5     3 1      2 3    2 6      2 two      2 7    2 8    1 4     1 dtype: int64 

Comments

Popular posts from this blog

c++ - How to add Crypto++ library to Qt project -

jQuery Mobile app not scrolling in Firefox -

How to use vim as editor in Matlab GUI -