OpenSearch is a specification for accessing search services. A website providing OpenSearch functionality advertises that capability by including a link element like the following in its xhtml.
<link title="dzone" type="application/opensearchdescription+xml" rel="search" href="http://www.dzone.com/links/DZone-opensearch.xml"/>
The href attribute above points to the site’s OpenSearch description document. This document tells clients how to consume the OpenSearch services provided by the site.
<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>DZone Search</ShortName>
<Description>DZone Search Provider</Description>
<Url type="text/html"
template="http://www.dzone.com/search.html?query={searchTerms}"/>
</OpenSearchDescription>
The “Url” elements in the OpenSearch description contain search templates: urls with curly-enclosed parameters that a client replaces with desired values. Besides {searchTerms} common parameters include {startPage?} and {count?}.
There can be more than one Url element in the description for different flavors of search. e.g. if the site provides more than one result media type. Although “text/html” is common on the web “application/atom+xml” is more sophisticated.
Modern browsers provide UI support for OpenSearch. If your current page in Firefox provides OpenSearch services you can add it to your menu of quick search engines.
Let’s examine a simple OpenSearch client written in the Scala programming language.
package synaptic
import java.io._
import java.net._
import scala.xml._
import com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
// Find and consume a web site's OpenSearch service.
object Search {
// HTTP GET to InputStream
def get(url : String) : InputStream = {
val u = new URL(url)
val conn = u.openConnection()
conn.getInputStream()
}
// HTTP GET to String
def getAsString(url : String) : String = {
scala.io.Source.fromInputStream(get(url)).getLines().mkString("\n")
}
// get OpenSearch search template from web site
def getSearchTemplate(url : String,
mediaType : String = "application/atom+xml") : String = {
// get page containing an opensearch descriptor
var xml : Elem = XML.load(get(url))
// parse OpenSearch description href from a link element
var href : String = (( xml \\ "link").filter(
u => { (u \ "@rel").text == "search" &&
(u \ "@type").text == "application/opensearchdescription+xml" })(0)
\ "@href").text
// absolutize description link if necessary
if (!href.startsWith("http")) {
href = url.replaceFirst("([^/]+//[^/]+).*", "$1/" + href)
}
// get OpenSearch description
xml = XML.load(get(href))
// parse OpenSearch template from a Url element
((xml \\ "Url").filter(
u => { (u \ "@type").text == mediaType })(0) \ "@template").text
}
// perform search against template with a few OpenSearch parameters
def search(template : String,
query : String,
startPage : Int = 1,
pageSize : Int = 5) : String = {
val url = template.replace("{searchTerms}", query)
.replace("{startPage?}", startPage.toString())
.replace("{count?}", pageSize.toString())
getAsString(url)
}
// search for "tungsten"-related results on www.nature.com
def main(args: Array[String]): Unit = {
makeXmlLoadWorkWithXhtml()
val url = "http://www.nature.com"
val template = getSearchTemplate(url)
val results = search(template, "tungsten")
println(results)
}
// only included to make XML.load() work with xhtml
def makeXmlLoadWorkWithXhtml() = {
System.setProperty("javax.xml.parsers.SAXParserFactory",
"synaptic.MyXMLParserFactory")
}
}
// only included to make XML.load() work with xhtml
@throws(classOf[org.xml.sax.SAXNotRecognizedException])
@throws(classOf[org.xml.sax.SAXNotSupportedException])
@throws(classOf[javax.xml.parsers.ParserConfigurationException])
class MyXMLParserFactory extends SAXParserFactoryImpl() {
super.setFeature("http://xml.org/sax/features/validation", false)
super.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
super.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
super.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
}
The getSearchTemplate() method in lines 25-47 retrieves a site’s xhtml, finds the OpenSearch
descriptor href, retrieves that descriptor, and extracts the first OpenSearch template therein.
The search() method in lines 50-59 stuffs some search parameters into a given search template and performs a search.
Here’s one of the search results returned from the above search. It’s an atom entry embellished with some fancy xml.
<entry>
<title>Characterization of exposures among cemented tungsten carbide workers. Part I: Size-fractionated exposures to airborne cobalt and tungsten particles</title>
<link href="http://dx.doi.org/10.1038/jes.2008.37"/>
<id>http://dx.doi.org/10.1038/jes.2008.37</id>
<updated>2011-07-15T13:47:05+00:00</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml"> <p>
<b>Characterization of exposures among cemented tungsten carbide workers. Part I: Size-fractionated exposures to airborne cobalt and tungsten particles</b>
</p>
<p>Journal of Exposure Science and Environmental Epidemiology 19, 475 (2008). <a href="http://dx.doi.org/10.1038/jes.2008.37">doi:10.1038/jes.2008.37</a></p>
<p>Authors: Aleksandr B Stefaniak, M Abbas Virji & Gregory A Day</p>
</div>
</content>
<dc:identifier>doi:10.1038/jes.2008.37</dc:identifier>
<dc:title>Characterization of exposures among cemented tungsten carbide workers. Part I: Size-fractionated exposures to airborne cobalt and tungsten particles</dc:title>
<prism:productCode>jes</prism:productCode>
<dc:creator>Aleksandr B Stefaniak</dc:creator>
<dc:creator>M Abbas Virji</dc:creator>
<dc:creator>Gregory A Day</dc:creator>
<prism:publicationName>Journal of Exposure Science and Environmental Epidemiology</prism:publicationName>
<prism:issn>1559-0631</prism:issn>
<prism:eIssn>1559-064X</prism:eIssn>
<prism:doi>10.1038/jes.2008.37</prism:doi>
<dc:publisher>Nature Publishing Group</dc:publisher>
<prism:publicationDate>2008-07-16</prism:publicationDate>
<prism:volume>19</prism:volume>
<prism:number>5</prism:number>
<prism:startingPage>475</prism:startingPage>
<prism:endingPage>491</prism:endingPage>
<prism:url>http://dx.doi.org/10.1038/jes.2008.37</prism:url>
<prism:channel/>
<prism:section/>
<prism:genre>Research Article</prism:genre>
<prism:copyright>© 2009 Nature Publishing Group</prism:copyright>
<sru:recordSchema>info:srw/schema/11/pam-v2.1</sru:recordSchema>
<sru:recordPacking>unpacked</sru:recordPacking>
<sru:recordData>
<pam:message xmlns:pam="http://prismstandard.org/namespaces/pam/2.0/" xsi:schemaLocation="http://prismstandard.org/namespaces/pam/2.0/ http://prismstandard.org/schemas/pam/2.1/pam.xsd">
<pam:article>
<xhtml:head xmlns:xhtml="http://www.w3.org/1999/xhtml"/>
</pam:article>
</pam:message>
</sru:recordData>
<sru:recordPosition>1</sru:recordPosition>
<sru:extraRecordData/>
</entry>
A couple caveats: This client parses the initial site html as xml. If, as is often the case, the site isn’t well-formed xml you’re out of luck. Also, Scala seems to have a problem with xhtml. The cruft at the bottom of the client is a workaround. Full details are available in this post



