<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Bruised Edge &#187; Databases</title>
	<atom:link href="http://weblog.kevinclarke.info/category/databases/feed/" rel="self" type="application/rss+xml" />
	<link>http://weblog.kevinclarke.info</link>
	<description>Digital Libraries, Repositories, Programming, Technology, Librarianship, etc.</description>
	<lastBuildDate>Wed, 28 Jul 2010 03:19:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Do You Trust Your Data Modelers?</title>
		<link>http://weblog.kevinclarke.info/2006/12/18/do-you-trust-your-data-modelers/</link>
		<comments>http://weblog.kevinclarke.info/2006/12/18/do-you-trust-your-data-modelers/#comments</comments>
		<pubDate>Mon, 18 Dec 2006 19:47:15 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=252</guid>
		<description><![CDATA[In the #code4lib IRC channel today Ed Summers asked me some good questions about storing metadata in a native XML database. The gist of his questions was that he wasn&#8217;t sure he saw any advantages that a native XML database might have over a relational database (yes, I&#8217;m simplifying a bit I&#8217;m sure). As we [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://www.code4lib.org/irc/" title="#code4lib">#code4lib IRC</a> channel today <a href="http://www.inkdroid.org/journal/" title="Inkdroid">Ed Summers</a> asked me some good questions about storing metadata in a native XML database.  The gist of his questions was that he wasn&#8217;t sure he saw any advantages that a native XML database might have over a relational database (yes, I&#8217;m simplifying a bit I&#8217;m sure).  As we were winding down he said, &#8220;just preppin you for my questions at <a href="http://www.code4lib.org/2007" title="Code4Lib 2007">code4libcon</a>.&#8221;</p>
<p>My first thought after digesting the conversation was, &#8220;Hey, wait, I&#8217;m not even talking about native XML databases at code4libcon!&#8221;  My proposal is about using XQuery in the digital library realm.  True, we are using a native XML database here, but just because one uses XQuery doesn&#8217;t mean s/he is using a native XML database.  You can use XQuery just as easily with <a href="http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.xquery.doc/xqrbasics.html" title="DB2 XQuery">DB2</a> or <a href="http://www.oracle.com/technology/tech/xml/xquery/index.html" title="Oracle XQuery">Oracle&#8217;s database</a> (or <a href="http://www.gnu.org/software/qexo/" title="Qexo">files on the file system</a>).</p>
<p>The one thing that native XML databases and XQuery do have in common, though, is that they let you interact with your data directly &#8212; it doesn&#8217;t have to be deconstructed into another structure and then reconstructed when you want the whole thing back out again (in the case of XQuery being used over a relational database, that (de|re)construction takes place invisibly in the database layer).</p>
<p>But, is this a good thing?  Ed kept saying he didn&#8217;t see any data modeling going on with native XML databases.</p>
<p>There is data modeling going on with native XML databases, I&#8217;d suggest, but it happens on the metadata side of things.  Andrew Nagy made this observation recently on the code4lib mailing list when he noted how poorly just putting MARCXML into a native XML database performs.</p>
<p>This is because putting MARCXML into a native XML database makes MARCXML the data model.  MARC was intended for concise transfer, not for working with the data&#8230; it was assumed by the architects of MARC (I believe and hope) that MARC would be reconstructed into something else before anyone tried to do anything meaningful with it (for what it is worth, this only partially happens in the library world).</p>
<p>XQuery allows the developer to work more nimbly with the data models s/he is given (instead of mapping them into other data models that match the database s/he has chosen to use).  So, what are these data models?  They are the XML metadata standards being created by the different knowledge communities.  Unsurprisingly, to use these data models, the developer needs to know them (i.e., having programmers in our libraries is a better idea than contracting out to people not in the profession).</p>
<p>Are the people creating these (meta)data models working to make accessing the data easier?  That could be one critique leveled at native XML databases (from the perspective of the developers)&#8230; if you aren&#8217;t doing the data modeling, can you trust the people who are?</p>
<p>It&#8217;s not that bad though (put down that gun you cynical library developers); keep in mind that XQuery isn&#8217;t a fulltext query language.  Think of it more as a database query language (even though there doesn&#8217;t have to be a database).</p>
<p>To use XQuery in the digital library world, in my opinion, you still need to use a fulltext indexer (like <a href="http://lucene.apache.org/java/docs/index.html" title="Lucene">Lucene</a> or the type built into many XML-enabled databases). Indices may be used through proprietary extensions to the XQuery language (indicated by a different namespace) or through separate processes which feed the XQuery engine (as in a pre-processing stage).</p>
<p>For what it is worth, there is a <a href="http://www.w3.org/TR/xquery-full-text/" title="XQuery Fulltext">fulltext</a> extension to the XQuery spec that is being written to take advantage of these external indices, but it is not really out there in the world yet. In the meantime, even if our metadata models (e.g., MARCXML) aren&#8217;t the best, we can still create and use indices that provide an intelligent view of the data.</p>
<p>One nice thing about working directly with the data/data models you receive is that there are less &#8220;moving parts&#8221; to fix when things change (in other words, less things to get in the way) &#8212; because we all know digital library standards don&#8217;t change, right?</p>
<p>Rather than go through the process of re-mapping to the database&#8217;s structure, you only need to modify the parts of your code that deal with the parts of the metadata that have changed. You&#8217;d have to do this with the other option too&#8230; just because you have a standard way of normalizing data doesn&#8217;t mean that all data is structured in the same way (in terms of how you get at the pieces you want).</p>
<p>I could mention, I guess, some reasons why I like native XML databases in my presentation, but I&#8217;m not sure this is a good idea.  I think it may distract from the beauty that is XQuery.  I&#8217;m also hoping Andrew Nagy will cover this territory in his presentation comparing different native XML databases. One of XQuery&#8217;s strengths is that it is database agnostic; I shouldn&#8217;t stray from that.</p>
<p>For the record, the Ed Summers that appears in this post is not the real Ed (despite the conversation actually happening), but one of my own conception for rhetorical purposes only.  <img src='http://weblog.kevinclarke.info/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2006/12/18/do-you-trust-your-data-modelers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
