python - Unicode object error in parsing XML using BeautifulSoup -
parsing contents of 'name' tag in xml output using beautifulsoup gives me following error:
attributeerror: 'unicode' object has no attribute 'get_text'
xml output:
<show> <stud> <__readonly__> <table_stud> <row_stud> <name>rice</name> <dept>chem</dept> . . . </row_stud> </table_stud> </__readonly__> </stud> </show>
however if access contents of other tags 'dept' seems work fine.
stud_info = output_xml.find_all('row_stud') eachstud in range(len(stud_info)): print stud_info[eachstud].dept.get_text() #gives 'chem' print stud_info[eachstud].name.get_text() #---unicode error---
can python/beautifulsoup experts me resolve this? (i know beautifulsoup not ideal parsing xml. lets i'm compelled use )
tag.name
attribute containing tag name; it's value here row_stud
.
attribute access contained tags shortcut .find(attributename)
, works if there isn't attribute in api same name. use .find()
instead:
print stud_info[eachstud].find('name').get_text()
you can loop on stud_info
result list directly, no need use range()
here:
stud_info = output_xml.find_all('row_stud') eachstud in stud_info: print eachstud.dept.get_text() print eachstud.find('name').get_text()
i notice searching row_stud
in lower-case. if parsing xml beautifulsoup, make sure have lxml
installed , tell beautifulsoup xml processing, won't html-ize tags (lowercase them):
soup = beautifulsoup(source, 'xml')
Comments
Post a Comment