python 2.7 - Tags with an underscore cause failure with BeautifulSoup selector -


xml file:

<?xml version="1.0" encoding="utf-8"?> <sites>   <site>     <name>default</name>     <url_namespace>default</url_namespace>   </site> </sites> 

soup info:

soup = beautifulsoup(xml) soup.select('url_namespace') 

error:

valueerror: unsupported or invalid css selector: "url_namespace" 

how 1 select xml tag, or , id contains underscore?

i'd suggest lxml because done simple xpath, fun of showing how select invalid css selector... well, don't. there couple of things can done, 1 of replace offensive tag perhaps div tag specific class, can select it.

however, 1 hackish way of doing change name property of each element find.

from bs4 import beautifulsoup bsoup  data = """     <?xml version='1.0' encoding='utf-8'?>     <sites>     <site>     <name>default</name>         <url_namespace>default1</url_namespace>         <url_namespace>default2</url_namespace>         <url_namespace>default3</url_namespace>         <url_namespace>default4</url_namespace>     </site>     </sites>     """ soup = bsoup(data)  elements = soup.find_all("url_namespace") element in elements:     element.name = "urlnamespace" print soup 

the above changes soup following:

<html><body><sites> <site> <name>default</name> <urlnamespace>default1</urlnamespace> <urlnamespace>default2</urlnamespace> <urlnamespace>default3</urlnamespace> <urlnamespace>default4</urlnamespace> </site> </sites> </body></html> 

adding following codeblock above code...

targets = soup.select("urlnamespace") target in targets:     print target.get_text() 

... gives following result:

default1 default2 default3 default4 

not prettiest way, works. out of sheer curiosity, though, why need select tag way? find_all works on tag, can see above.

anyway, let know if works.


Comments

Popular posts from this blog

c++ - How to add Crypto++ library to Qt project -

jQuery Mobile app not scrolling in Firefox -

How to use vim as editor in Matlab GUI -