<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Bruised Edge &#187; Search</title>
	<atom:link href="http://weblog.kevinclarke.info/category/search/feed/" rel="self" type="application/rss+xml" />
	<link>http://weblog.kevinclarke.info</link>
	<description>Digital Libraries, Repositories, Programming, Technology, Librarianship, etc.</description>
	<lastBuildDate>Wed, 28 Jul 2010 03:19:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>MARC2Solr (Slight Return)</title>
		<link>http://weblog.kevinclarke.info/2007/02/19/marc2solr-slight-return/</link>
		<comments>http://weblog.kevinclarke.info/2007/02/19/marc2solr-slight-return/#comments</comments>
		<pubDate>Mon, 19 Feb 2007 19:55:52 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Search]]></category>
		<category><![CDATA[XQuery]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=199</guid>
		<description><![CDATA[Awhile back, Andrew Nagy posted an XSLT for turning MARCXML into Solr&#8217;s XML indexing format. I thought it would be fun to take his XSLT and do the same thing in XQuery. I think it is pretty much a 1 to 1 conversion. For the upcoming Code4Lib preconference, I thought about forming an XQuery group. [...]]]></description>
			<content:encoded><![CDATA[<p>Awhile back, Andrew Nagy <a href="http://www.mail-archive.com/code4lib@listserv.nd.edu/msg01213.html" title="marc2solr.xslt">posted</a> an XSLT for turning MARCXML into Solr&#8217;s XML indexing format.  I thought it would be fun to take his XSLT and <a href="http://lisforge.net/xqsolr/marc2solr.xq" title="marc2solr.xq">do the same thing in XQuery</a>.  I think it is pretty much a 1 to 1 conversion.</p>
<p>For the upcoming <a href="http://code4lib.org/node/139" title="Code4lib preconference">Code4Lib preconference</a>, I thought about forming an XQuery group.  I ended up joining the Java group, though, because there aren&#8217;t any native HTTP libs in XQuery (so I&#8217;d have to do that as an extension in Java anyway).   I still think doing an XQuery group would be fun though.</p>
<p>For instance, one nice feature of XQuery is that is allows you to be as strongly or loosely typed as you&#8217;d like. Take off all the &#8220;as &#8230;&#8221; statements from the XQuery and it still works just fine (it just won&#8217;t be so picky about what you pass into (or return from) its functions).</p>
<p>Recently, I&#8217;ve found myself on both sides of this fence; when working with a little bit of throw-away Java code, I&#8217;ve found myself wishing for a little of Ruby&#8217;s loose typing.  On the other hand sometimes, when experimenting with Ruby, I mutter to myself: &#8220;Why can&#8217;t this just be strongly typed so I know what to expect and do?&#8221;</p>
<p>XQuery really gives you the best of both worlds.  This isn&#8217;t to say XQuery can do everything those other languages can (it can&#8217;t&#8230; and far from it). But, if you are working with XML (and want to focus on the data rather than the data&#8217;s source) I can&#8217;t think of a nicer language to use.  It will be interesting to watch XQuery <a href="http://tinyurl.com/2o7uy8" title="XQueryP">grow as a programming language</a>.</p>
<p>So anyway&#8230; since my marc2solr.xq is written as a module you&#8217;ll need to call it from something else.  This little XQuery (<a href="http://lisforge.net/xqsolr/solr.xq" title="solr.xq">also here</a>) works fine from Saxon (pass in the location of a MARCXML file on the file system as $input):</p>
<blockquote><pre>xquery version "1.0";

import module
  namespace marc2solr = "http://lisforge.net/ns/marc2solr"
  at "marc2solr.xq";

declare variable $input external;

marc2solr:add-records(doc($input))</pre>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2007/02/19/marc2solr-slight-return/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Relevance In Time?</title>
		<link>http://weblog.kevinclarke.info/2005/06/18/relevance-in-time/</link>
		<comments>http://weblog.kevinclarke.info/2005/06/18/relevance-in-time/#comments</comments>
		<pubDate>Sat, 18 Jun 2005 11:35:50 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=227</guid>
		<description><![CDATA[Update: The following was written when this weblog was called Kevin&#8217;s Worklog (and my personal journal was called The Bruised Edge). Now this is called The Bruised Edge and my personal journal is Kevin&#8217;s Journal. Confusing, I know&#8230; anyway, here it is. Interesting… I recently changed the URL of my worklog from “/worklog” to “/weblog” [...]]]></description>
			<content:encoded><![CDATA[<p><b>Update: </b>The following was written when this weblog was called Kevin&#8217;s Worklog (and my personal journal was called The Bruised Edge).  Now this is called The Bruised Edge and my personal journal is Kevin&#8217;s Journal.  Confusing, I know&#8230; anyway, here it is.</p>
<p>Interesting… I recently changed the URL of my worklog from “/worklog” to “/weblog” (I’ve set up redirects in my .htaccess file and, now, everything that was pointing to “/worklog” should go to its new home). What is interesting, though, is that my worklog’s new home used to be the home of my personal weblog (that has a new location now too obviously). Stay with me… this is where it gets interesting…</p>
<p><a href="http://www.google.com/search?q=the+bruised+edge">Search</a> for The Bruised Edge (the name of my personal journal) in Google and “Kevin’s Worklog” now shows up as the number one hit. This is because, I assume, the worklog has replaced the location of the journal. There is nothing in the worklog, though, that mentions the journal (also keep in mind the Google results that are returned have been updated to show the worklog’s name (not TBE’s)).</p>
<p>What this means is that Google is finding the phrase “The Bruised Edge” in it cache and ranking my worklog as the number one hit because of data in it’s cache, not the current data that it has (and even though the worklog has nothing to do with the journal aside from replacing it). All this brings me back to the <a href="http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=1&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PG01&amp;s1=20050071741&amp;OS=20050071741&amp;RS=20050071741">Google patent</a> (titled appropriately enough:  Information retrieval based on historical data).</p>
<p>It seems that this historical approach to data is how Google intends to defeat the abuse of its ranking system by spammers (who have latched onto weblogs and the linking aspect of Google’s algorithms to drive up their Google rankings). For more on this new approach on Google’s part, see the <a href="http://www.buzzle.com/editorials/6-10-2005-71368.asp">Buzzle story</a> that brought the patent to my attention.</p>
<p>The Buzzle story was posted, earlier today, to one of the lists to which I subscribe… I can’t remember which one. It is all very interesting (the conditions used to determine relevance (or should we say take a stab at relevance)). Those Google folks are frighteningly smart.</p>
<p>It reminds me of when they spoke at Stanford. I didn’t see them, but Dick reported that one of the most interesting things from the talk was that they said: once you had tons of data it was amazing the types of things you could do. The same algorithms that wouldn’t return good results with smaller sets worked much better when the data set was massive. It seems once you get past a certain point, you get a new perspective.</p>
<p>I wonder if the same is true for library organizations (like OCLC) that just have reams and reams of data?</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2005/06/18/relevance-in-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Federated Searching Misconceptions</title>
		<link>http://weblog.kevinclarke.info/2004/06/03/federated-searching-misconceptions/</link>
		<comments>http://weblog.kevinclarke.info/2004/06/03/federated-searching-misconceptions/#comments</comments>
		<pubDate>Thu, 03 Jun 2004 21:09:54 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=244</guid>
		<description><![CDATA[I found a link on Catalogablog about an article from Information Today that tells The Truth about Federated Searching. I&#8217;d have to agree wholeheartedly with everything it has to say despite working on my library&#8217;s own version of a federated search. To be fair, the point that relevancy ranking is not totally relevant is not [...]]]></description>
			<content:encoded><![CDATA[<p>I found a link on <a href="http://catalogablog.blogspot.com/">Catalogablog</a> about an article from Information Today that tells <a href="http://www.infotoday.com/it/oct03/hane1.shtml">The Truth about Federated Searching</a>. I&#8217;d have to agree wholeheartedly with everything it has to say despite working on my library&#8217;s own version of a federated search. To be fair, the point that relevancy ranking is not totally relevant is not a problem just for federated search. Relevance is never totally relevant even for the best search engine (Cf. D.R. Swanson&#8217;s <i>postulates of impotence</i> from _Historical Note: Information Retrieval and The Future of an Illusion. <i>Journal of the American Society for Information Science</i>, 39(2):92-98, 1988_).  It may be true, though, that relevance is <i>further</i> obscured by federated searching.</p>
<p>I think we, at Lane, have side-stepped most of the misconceptions for our own search, at this point at least, because it doesn&#8217;t aspire to be any more than an overview of what is provided by each of the individual engines. We pass the users on to the source rather than try to digest (munge together) all the search results into one interface. Hopefully, we are just trying to act as tool that lets the user choose where to go rather than actually tells them this source or this result is the best and what &#8220;you&#8221; are actually seeking. When it comes to searching, we shouldn&#8217;t be an aggregator but, rather, a facilitator.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2004/06/03/federated-searching-misconceptions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
