python 2.7 - Tags with an underscore cause failure with BeautifulSoup selector -

- January 15, 2012

xml file:

<?xml version="1.0" encoding="utf-8"?> <sites>   <site>     <name>default</name>     <url_namespace>default</url_namespace>   </site> </sites>

soup info:

soup = beautifulsoup(xml) soup.select('url_namespace')

error:

valueerror: unsupported or invalid css selector: "url_namespace"

how 1 select xml tag, or , id contains underscore?

i'd suggest lxml because done simple xpath, fun of showing how select invalid css selector... well, don't. there couple of things can done, 1 of replace offensive tag perhaps div tag specific class, can select it.

however, 1 hackish way of doing change name property of each element find.

from bs4 import beautifulsoup bsoup  data = """     <?xml version='1.0' encoding='utf-8'?>     <sites>     <site>     <name>default</name>         <url_namespace>default1</url_namespace>         <url_namespace>default2</url_namespace>         <url_namespace>default3</url_namespace>         <url_namespace>default4</url_namespace>     </site>     </sites>     """ soup = bsoup(data)  elements = soup.find_all("url_namespace") element in elements:     element.name = "urlnamespace" print soup

the above changes soup following:

<html><body><sites> <site> <name>default</name> <urlnamespace>default1</urlnamespace> <urlnamespace>default2</urlnamespace> <urlnamespace>default3</urlnamespace> <urlnamespace>default4</urlnamespace> </site> </sites> </body></html>

adding following codeblock above code...

targets = soup.select("urlnamespace") target in targets:     print target.get_text()

... gives following result:

default1 default2 default3 default4

not prettiest way, works. out of sheer curiosity, though, why need select tag way? find_all works on tag, can see above.

anyway, let know if works.

Search This Blog

HR

python 2.7 - Tags with an underscore cause failure with BeautifulSoup selector -

Comments

Post a Comment

Popular posts from this blog

My HTML document is not linking to my CSS stylesheet properly -

php array slice every 2th rule -

node.js - Sending sockets to client side, Error: Converting circular structure to JSON -