android - Capture image from web page using regex -
i writing simple program capture image resources web page. image items in html looks like:
case1:<img src="http://www.aaa.com/bbb.jpg" alt="title bbb" width="350" height="385"/>
or
case2:<img alt="title ccc" src="http://www.ddd.com/bbb.jpg" width="123" height="456"/>
i know how handle either case separately, take first 1 example:
string capture = "<img(?:.*)src=\"http://(.*)\\.jpg\"(?:.*)alt=\"(.*?)\"(?:.*)/>"; defaulthttpclient client = new defaulthttpclient(); basichttpcontext context = new basichttpcontext(); scanner scanner = new scanner(client .execute(new httpget(uri), context) .getentity().getcontent()); pattern pattern = pattern.compile(capture); while (scanner.findwithinhorizon(pattern, 0) != null) { matchresult r = scanner.match(); string imageurl = "http://" +r.group(1)+".jpg"; string imagetitle = r.group(2); //do image }
the question how write correct pattern image items web page source code contains both case1 , case2? want scan page once.
use jsoup
import org.jsoup.jsoup; import org.jsoup.nodes.document; import org.jsoup.nodes.element; import org.jsoup.select.elements; ... document doc; string useragent = "mozilla/5.0 (windows nt 6.1; wow64; rv:28.0) gecko/20100101 firefox/28.0"; try { // need http protocol doc = jsoup.connect("http://domain.tld/images.html").useragent(useragent).get(); // images elements images = doc.select("img"); (element image: images) { // values img attribute (src & alt) system.out.println("\nimage: " + image.attr("src")); system.out.println("alt : " + image.attr("alt")); } } catch (ioexception e) { e.printstacktrace(); }
jsoup, html parser, “jquery-like” , “regex” selector syntax easy use , flexible enough whatever want.
Comments
Post a Comment