python 2.7 - Tags with an underscore cause failure with BeautifulSoup selector -
xml file:
<?xml version="1.0" encoding="utf-8"?> <sites> <site> <name>default</name> <url_namespace>default</url_namespace> </site> </sites>
soup info:
soup = beautifulsoup(xml) soup.select('url_namespace')
error:
valueerror: unsupported or invalid css selector: "url_namespace"
how 1 select xml tag, or , id contains underscore?
i'd suggest lxml because done simple xpath, fun of showing how select invalid css selector... well, don't. there couple of things can done, 1 of replace offensive tag perhaps div
tag specific class, can select it.
however, 1 hackish way of doing change name
property of each element find.
from bs4 import beautifulsoup bsoup data = """ <?xml version='1.0' encoding='utf-8'?> <sites> <site> <name>default</name> <url_namespace>default1</url_namespace> <url_namespace>default2</url_namespace> <url_namespace>default3</url_namespace> <url_namespace>default4</url_namespace> </site> </sites> """ soup = bsoup(data) elements = soup.find_all("url_namespace") element in elements: element.name = "urlnamespace" print soup
the above changes soup following:
<html><body><sites> <site> <name>default</name> <urlnamespace>default1</urlnamespace> <urlnamespace>default2</urlnamespace> <urlnamespace>default3</urlnamespace> <urlnamespace>default4</urlnamespace> </site> </sites> </body></html>
adding following codeblock above code...
targets = soup.select("urlnamespace") target in targets: print target.get_text()
... gives following result:
default1 default2 default3 default4
not prettiest way, works. out of sheer curiosity, though, why need select tag way? find_all
works on tag, can see above.
anyway, let know if works.
Comments
Post a Comment