<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Bruised Edge &#187; Metadata</title>
	<atom:link href="http://weblog.kevinclarke.info/category/metadata/feed/" rel="self" type="application/rss+xml" />
	<link>http://weblog.kevinclarke.info</link>
	<description>Digital Libraries, Repositories, Programming, Technology, Librarianship, etc.</description>
	<lastBuildDate>Wed, 28 Jul 2010 03:19:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Amazon Offers Public Datasets&#8230; Bibliographic?</title>
		<link>http://weblog.kevinclarke.info/2008/11/23/amazon-aws-offers-public-datasets-bibliographic/</link>
		<comments>http://weblog.kevinclarke.info/2008/11/23/amazon-aws-offers-public-datasets-bibliographic/#comments</comments>
		<pubDate>Sun, 23 Nov 2008 20:22:44 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Metadata]]></category>
		<category><![CDATA[aws]]></category>

		<guid isPermaLink="false">http://weblog.kevinclarke.info/?p=208</guid>
		<description><![CDATA[Interesting news that Amazon is going to be offering large public datasets up to the public through it&#8217;s EC2 (Elastic Compute Cloud) web service.  Some examples included will be the annotated human genome data, various US Census, transportation, and economic databases. I&#8217;ve got an idea for a dataset they could add&#8230; how about MARCXML records [...]]]></description>
			<content:encoded><![CDATA[<p>Interesting news that Amazon is going to be offering <a title="AWS Public Datasets" href="http://aws.amazon.com/publicdatasets/">large public datasets</a> up to the public through it&#8217;s <a title="Elastic Compute Cloud" href="http://aws.amazon.com/ec2/">EC2</a> (Elastic Compute Cloud) web service.  Some examples included will be the annotated human genome data, various US Census, transportation, and economic databases.  I&#8217;ve got an idea for a dataset they could add&#8230; how about MARCXML records for all their books, videos, CDs, etc.  With their collection, that would be a pretty good sized dataset.  I&#8217;ve sent my suggestion in via email.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2008/11/23/amazon-aws-offers-public-datasets-bibliographic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Evolutionist, Creationist&#8230; Relativist</title>
		<link>http://weblog.kevinclarke.info/2008/04/15/evolutionist-creationist-relativist/</link>
		<comments>http://weblog.kevinclarke.info/2008/04/15/evolutionist-creationist-relativist/#comments</comments>
		<pubDate>Tue, 15 Apr 2008 17:30:52 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=368</guid>
		<description><![CDATA[This quote from Mark Diggory on the OAI-ORE mailing list: [warning... stereotyping ahead] This is an argument on a continuum of evolutionism vs. creationism&#8230; Evolutionist say, establish the smallest possible set of laws to enforce on a system and see what emergent behavior arises&#8230; while the Creationist say, define the entire mechanism, top to bottom, [...]]]></description>
			<content:encoded><![CDATA[<p>This <a href="http://groups.google.com/group/oai-ore/msg/4d7add14617c4e4d">quote from Mark Diggory</a> on the <a href="http://groups.google.com/group/oai-ore/">OAI-ORE mailing list</a>:</p>
<blockquote><p>[warning... stereotyping ahead]</p>
<p>This is an argument on a continuum of evolutionism vs. creationism&#8230; Evolutionist say, establish the smallest possible set of laws to enforce on a system and see what emergent behavior arises&#8230; while the Creationist say, define the entire mechanism, top to bottom, written in stone, and damn all who do not comply. Seems to me, folks that come from the RDF World ascribe to the former and those from XML Schema world tend to the later, both could stand to learn a little from each other.</p></blockquote>
<p>It&#8217;s an interesting quote (other than I bristled a bit about being cast into the &#8216;creationist&#8217; camp)&#8230;</p>
<p>I wonder if a third alternative might be the relativist.  The relativist looks at the world from a particular local perspective and codifies it.  Unlike the creationist, she does not seek to enforce her structure on everyone else, but sees it as the solution to a particular problem.  Unlike the evolutionist, she does believe there is a need to validate her data and confirm that it fits with her worldview.</p>
<p>She&#8217;s a relativist because she looks beyond her particular assumptions and perspectives and realizes there are lots of other communities out there (like hers) that have a perceived need for valid data.  Rather than say her schema is the one to rule them all she looks at ways to &#8220;trade&#8221; (share/crosswalk) information with other like communities.  It doesn&#8217;t mean she wants to share with them all, but just that there are some with which she wants to have a relationship.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2008/04/15/evolutionist-creationist-relativist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bruce&#8217;s Answers to Dan&#8217;s Questions</title>
		<link>http://weblog.kevinclarke.info/2008/02/08/bruces-answers-to-dans-questions/</link>
		<comments>http://weblog.kevinclarke.info/2008/02/08/bruces-answers-to-dans-questions/#comments</comments>
		<pubDate>Fri, 08 Feb 2008 19:00:44 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/2008/02/08/bruces-answers-to-dans-questions/</guid>
		<description><![CDATA[Dan: Does the linked data movement really depend upon RDF? It doesn’t seem like it has to. Maybe it could grow faster if it didn’t. Bruce: Let’s turn the question around and ask: if not RDF, then what? You definitely need some model on which to base it, it seems to me, and things like [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://onebiglibrary.net/story/ongoing-questions-about-linked-data-and-the-semantic-web">Dan:</a> Does the linked data movement really depend upon RDF? It doesn’t seem like it has to. Maybe it could grow faster if it didn’t.</p>
<p><a href="http://community.muohio.edu/blogs/darcusb/archives/2008/02/08/dans-questions">Bruce:</a> Let’s turn the question around and ask: if not RDF, then what? You definitely need some model on which to base it, it seems to me, and things like GRDDL, microformats, etc. leave a lot of flexibility on the encoding end. The key for linked data is really the URI, of course, which becomes kind of like a key for a global database.</p>
<p>Me: Does it need a single data model?  It does if machines are doing all the work automagically, but if there are people involved does it?  The key to me seems to be the &#8220;linked&#8221; part.  I really like Dan&#8217;s coining of &#8220;Linked Description&#8221; &#8212; this doesn&#8217;t seem to be splitting hairs to me, like Bruce suggests, but more of a recognition of a component in the process which isn&#8217;t considered necessary in the Linked Data perspective.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2008/02/08/bruces-answers-to-dans-questions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My Metadata Mini-Manifesto</title>
		<link>http://weblog.kevinclarke.info/2007/03/29/my-mini-metadata-manifesto/</link>
		<comments>http://weblog.kevinclarke.info/2007/03/29/my-mini-metadata-manifesto/#comments</comments>
		<pubDate>Thu, 29 Mar 2007 20:44:55 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Cataloging]]></category>
		<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=238</guid>
		<description><![CDATA[How do we approach the problem of metadata (and which formats we should use)? In what way is metadata like cataloging and in what way is it different? What is metadata anyway? I see metadata as cataloging. Libraries have a strong historical tradition of organizing materials. I think it is fair to say that the [...]]]></description>
			<content:encoded><![CDATA[<p>How do we approach the problem of metadata (and which formats we should use)?  In what way is metadata like cataloging and in what way is it different?  What is metadata anyway?</p>
<p>I see metadata as cataloging.  Libraries have a strong historical tradition of organizing materials.  I think it is fair to say that the way we&#8217;ve done this has changed over time with the emergence of new technologies.  There are at least two questions that should, in my opinion, guide us in participating in the process of evolving our cataloging:  “How do we continue to do what is appropriate and right (contextually speaking)” and “How do we adjust what we do to fit our changing environment (e.g., to take advantage of new perspectives)?”</p>
<p>I see metadata as cataloging; It is what cataloging is becoming.  It is the same thing, and yet we give it a different label because we need to adjust our perception of it&#8230; just enough so that we can question it.  We need to distance ourselves from it&#8230; just enough to look at it differently, make judgments about it, and then reintegrate our new perspective into our work.  It is at this point, I believe, that “metadata” becomes “cataloging” again&#8230; the need for a different label is gone – until the process begins anew.</p>
<p>The same is true of the “digital library” &#8212; what is a digital library?  Isn&#8217;t a digital library the same thing as a traditional library?  In a way, I think, it is a trick we play on ourselves to allow us to question some of our basic assumptions, make some discoveries (and mistakes), and then integrate what we&#8217;ve learned.  At that point, the “digital library” disappears and there is just “the library” (we may even look back and think, “Wasn&#8217;t it there all along?  It sure seems like we were doing library work.”)</p>
<p>I believe a key question when looking at metadata formats from the digital materials community is, “How do libraries differ from other institutions of cultural knowledge?”  For instance, are there similarities between libraries and other institutions of cultural knowledge on which we could be building?  How similar are our missions or should the differences between our materials be the determining factor?  What role would a unified authorities system play across a diverse material environment?</p>
<p>Ultimately, I think we should make metadata decisions based on the context of the digital materials community (and the materials that we are digitizing).  It isn&#8217;t that digital materials are somehow fundamentally different from their physical counterparts or that a different level of description is required.  The decision is made to foster an environment where people working towards the same goal can share ideas, tools, and realizations.</p>
<p>The purpose of metadata is to ask the questions; when we understand the answers, I believe, we&#8217;ll realize we&#8217;ve been cataloging all along (but we have to ask the questions to truly understand the realization).  We could liken the question of whether metadata is cataloging to the Zen parable:</p>
<blockquote><p>
“Thirty years ago, before I had studied Zen, I saw mountains as mountains and rivers as rivers. And then later, when I had more intimate knowledge, I came to see mountains not as mountains and rivers not as rivers. But now that I have attained the substance, I again see mountains just as mountains, and rivers just as rivers.”
</p></blockquote>
<p>All this is sort of the philosophical underpinnings of a viewpoint.  It is a different thing to implement this perspective as a way forward, I know.  For instance, libraries have large numbers of catalogers who are trained to catalog.  It may or may not be interesting to them to look at what they do from a broader perspective (though it should be, because what they do is interesting from the micro and macro views).</p>
<p>Pragmatically, I&#8217;d suggest librarians start with MODS and MADS.  It is the most similar to what they already do so won&#8217;t seem like such a stretch.  Get comfortable with it.  Look at what moving the information around a little does to the overall organization (in comparison to what it would look like in MARC record).  Look at what is added to and left out from the MARC.</p>
<p>It is important, though, not to stop there.  After all, the Library of Congress is not advocating that people stop using MARC in favor of MODS.  MODS and MADS are just one set of (possible) answers.</p>
<p>To truly investigate the question of evolving cataloging we need to look from a variety of perspectives.  If MODS and MADS is as far as one wants to question, I think it makes more sense to just stick with MARC.  MARC is more granular than MODS/MADS and, though it has its fair share of organizational problems (even related to its level of granularity), it is certainly well understood by a select group of trained professionals (you know who you are).</p>
<p>To work with metadata (in the sense I&#8217;ve been discussing here), one should look at other metadata formats&#8230; EAD, TEI, Dublin Core, VRA Core, and even XOBIS!  Look at similarities and differences.   Discover why decisions were made when the authors created the schema (for instance, were they made based on the material in hand, because of traditions within the community, or to satisfy perceived user needs?)</p>
<p>The goal of working with metadata shouldn&#8217;t be to make a definitive and final decision about which format a library should use, but to be open to evaluating and re-evaluating the strengths and weaknesses, uses and user communities, and even the cataloging communities from which the metadata comes.  It is important to be pragmatic&#8230; not to overwhelm yourself with more than you can digest at a time, but when you have new projects (or have time to investigate) take a look at other metadata formats.  You might like what you see.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2007/03/29/my-mini-metadata-manifesto/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RDF is &#8220;Easy As Learning English&#8221;</title>
		<link>http://weblog.kevinclarke.info/2006/01/10/rdf-is-easy-as-learning-english/</link>
		<comments>http://weblog.kevinclarke.info/2006/01/10/rdf-is-easy-as-learning-english/#comments</comments>
		<pubDate>Tue, 10 Jan 2006 07:27:49 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=208</guid>
		<description><![CDATA[Hmm, seems I riled some folks up about RDF and didn’t even know it (until someone sent me a link today — too bad I wasn’t linked to in the original post). Anyway, I started to write a semi-long, semi-polite response to Shelley Powers, author of Practical RDF, in response to her blog entry titled [...]]]></description>
			<content:encoded><![CDATA[<p>Hmm, seems I riled some folks up about RDF and didn’t even know it (until someone sent me a link today — too bad I wasn’t linked to in the original post). Anyway, I started to write a semi-long, semi-polite response to <a href="http://weblog.burningbird.net/">Shelley Powers</a>, author of <a href="http://practicalrdf.info">Practical RDF</a>, in response to her blog entry titled <a href="http://practicalrdf.info/2005/07/frustrating/">Frustrating</a> (which was in response to my discussion with <a href="http://www.ldodds.com/blog/archives/000224.html">Leigh Dodds</a>), but I’ve changed my mind.  I’ve already spent way too much time talking about RDF.</p>
<p>I’m happy to concede that the complexity of RDF is like the complexity of the English language. Ms. Powers asks how long I’ve had trouble with the English language? I don’t have too much now but, of course, I’m a native speaker. I’d ask Ms. Powers how long it took her to learn the English language (with all its complexity). I just don’t have that much time to devote to learning RDF inside and out. If I want to put in that sort of commitment I’ll learn Chinese (which will be much more useful in the future in my humble opinion).</p>
<p>Yes, I agree, Ms. Powers, our time will be better spent solving problems rather than going back and forth about whether I should use RDF or not. The people who believe there is something there to justify the initial investment will use it, the rest will not. All I’m saying is I’m happy to take a wait and see approach. As you suggest, there is no need to try and convince me RDF will save the world. I’ll continue to work on what I think is worth my time and effort and RDF folks should do the same.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2006/01/10/rdf-is-easy-as-learning-english/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>For Whom The Bell Tolls</title>
		<link>http://weblog.kevinclarke.info/2005/10/05/for-whom-the-bell-tolls/</link>
		<comments>http://weblog.kevinclarke.info/2005/10/05/for-whom-the-bell-tolls/#comments</comments>
		<pubDate>Wed, 05 Oct 2005 11:50:03 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Cataloging]]></category>
		<category><![CDATA[Metadata]]></category>
		<category><![CDATA[Social Software]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=214</guid>
		<description><![CDATA[Lorcan Dempsey nails it, in my opinion, with his latest weblog entry, Making data work &#8211; Web 2.0 and catalogs. Among other things, it pulls together an interesting collection of links in support of his idea that we need to “make data work harder.” All of it resonates with me. One of his links, O’Reilly’s [...]]]></description>
			<content:encoded><![CDATA[<p>Lorcan Dempsey nails it, in my opinion, with his latest weblog entry, <a href="http://orweblog.oclc.org/archives/000815.html">Making data work &#8211; Web 2.0 and catalogs</a>. Among other things, it pulls together an interesting collection of links in support of his idea that we need to “make data work harder.” All of it resonates with me.</p>
<p>One of his links, O’Reilly’s “What is Web 2.0″, is the first piece I’ve read on “the Web 2.0″ that actually explains in plain text what that phrase might really mean (I’ve read many interpretations). There are two main points of interest to me: 1) that added value (a large part of what libraries do) is only as good as our ability to use it, and 2) that people are as important as ’smarter’ algorithms in going forward with Web development.</p>
<p>On a related note… I recently attended the NISO OpenURL/MXG workshop (I’m still trying to write up a summary… I’ve been a bit under the weather) and was very impressed with what <a href="http://public.csusm.edu/dwalker/">David Walker</a> at Cal State San Marcos has been doing (unfortunately, his web page doesn’t link to what he demoed (yet)). He is a great example of a library programmer working to address the issues raised by Dempsey and others. It is great (and inspiring) to see.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2005/10/05/for-whom-the-bell-tolls/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Jellybean Jar</title>
		<link>http://weblog.kevinclarke.info/2005/07/14/the-jellybean-jar/</link>
		<comments>http://weblog.kevinclarke.info/2005/07/14/the-jellybean-jar/#comments</comments>
		<pubDate>Thu, 14 Jul 2005 22:23:50 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[MARC]]></category>
		<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=221</guid>
		<description><![CDATA[Remember that contest where there is a jellybean jar full of jellybeans and the goal is to guess how many jellybeans there are in the jar? I think the MARC –&#62; RDF question is a bit like that. RDF has been discussed lately on the MODS list (and a bit here). Ironically, there has been [...]]]></description>
			<content:encoded><![CDATA[<p>Remember that contest where there is a jellybean jar full of jellybeans and the goal is to guess how many jellybeans there are in the jar? I think the MARC –&gt; RDF question is a bit like that. RDF has been discussed lately on the MODS list (and a bit here). Ironically, there has been a discussion of relational databases going on on the MARC list at the same time (not related to RDF, but not wholly unrelated either).When one talks about a triple-store (of whatever kind, native RDF or relational), it is only natural to talk in the number of triples stored. This is just like an object-oriented database. How do you measure its capacity? How many objects can it store and reasonably retrieve in a query? What the triple count doesn’t tell me, though, is how many records (in the library sense) does this represent?</p>
<p>I don’t think we have an answer to this question yet because noone that I am aware of has moved the complete MARC structure into RDF. So, to me, the question of how many triples are there in a MARC record is a bit like the guessing game: “How many jellybeans are there in the jellybean jar?” I’m sure someone out there knows the answer to how many possible units of information are there in a MARC record (you’d have to break out all the encoded info from the control fields too), but this is a bit different than triples because the relationships between those units have to be represented as well.</p>
<p>It would also, perhaps, be more instructive to know how many triples there are in your <i>average</i> MARC record (and how many on the high end). This is related to the question, “Which parts of the MARC record do we actually use?” Bill Moen’s <a href="http://www.mcdu.unt.edu/?p=8">MCDU</a> group is doing some interesting work in this area.</p>
<p>As for triples in your average MARC record, my guess would be a hundred (keep in mind all the subfields and the number of bytes in each control fields if that seems like a lot to you). Okay, that’s totally off the top of my head with no real logic at all, but it doesn’t seem too far off to me either (in fact it might be a bit low). So, using this completely bogus guess, a database of 10 million triples would represent a database of a hundred thousand records. I think it is worth noting that different triple-per-record counts would affect all parts of the system.</p>
<p>It’s funny that I’ve been contemplating all this RDF stuff again. I first looked into it when we were developing XOBIS. I think Leigh Dodds’ question was the right when he asked: “What does RDF give us that XML can’t?” Well, he said “XML or a relational database” but I think the database part isn’t really the question since we don’t have to put XML into a relational database (I can certainly understand the frustrations of developers who have had to map one XML model after another into a relational db).</p>
<p>His answers seem to fall into three categories to me. The first is the database/modelling question, the second is easily combining disparate data, and the third is the ability to do Semantic Web-like things (any time someone starting talking about machines “inferencing” I lump it into the last category (right or wrong)). The first seems to be to be more directed at relational databases, the second seems to me to be handled by choosing RELAX NG and NRL. The third I remain skeptical about.</p>
<p>I don’t mean to trivialize these issues by categorizing them in this way. I don’t think XML offers a complete solution to MARC yet either (this doesn’t mean we shouldn’t keep looking and working on one though). I guess this is just my last sweep, before leaving the RDF stuff behind (though I have emailed the AustLit people to find out a little more about their caching system), to sort of make sense of it.</p>
<p>Since RDF isn’t something I work with in my daily life, I’ve probably already spent much more time than I should taking another look it at (and, yes, though I’ve been arguing a point I’ve also been genuinely looking). Maybe in another five years I’ll poke my head up again and see if there is any more out there. Back to work…</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2005/07/14/the-jellybean-jar/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RDF: Show Me the Money</title>
		<link>http://weblog.kevinclarke.info/2005/07/06/rdf-show-me-the-money/</link>
		<comments>http://weblog.kevinclarke.info/2005/07/06/rdf-show-me-the-money/#comments</comments>
		<pubDate>Thu, 07 Jul 2005 01:03:56 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=225</guid>
		<description><![CDATA[Bruce D’Arcus and I have been sending some emails back and forth about my last post on metadata interoperability. In a comment on the original post, he suggests RDF instead of XSLT crosswalks. In my reply, I referenced Bill Moen’s tongue-in-cheek comment about RDF (for the record, I don’t really believe it is too complicated [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://netapps.muohio.edu/blogs/darcusb/darcusb/">Bruce D’Arcus</a> and I have been sending some emails back and forth about my last post on metadata interoperability. In a comment on the original post, he suggests RDF instead of XSLT crosswalks. In my reply, I <a href="http://www.kevinclarke.info/weblog/2005/06/30/metadata-interoperability/">referenced</a> Bill Moen’s tongue-in-cheek comment about RDF (for the record, I don’t really believe it is too complicated for Moen to figure out but, perhaps, I didn’t have my tongue planted firmly enough in my cheek while reciting what Moen said — I think what Moen was getting at (and this could just be my interpretation) is that RDF’s real world usefulness, when compared to its complexity, remains questionable).</p>
<p>Over the course of these emails, Bruce was kind enough to refer me to <a href="http://www.ldodds.com/blog/archives/000224.html">an entry</a> on Leigh Dodds’ <a href="http://www.ldodds.com/">weblog</a> discussing RDF and the library world (and, in part, my original post). Dodds, Bruce tells me, has been developing an RDF solution for Ingenta (I’m not clear if it is in production yet or not). I’d like to respond on his weblog, but it seems comments are disabled so I’ll just point to the article (linked above). [Update: looks like comments have been turned on there now, but since I’ve already written here I’ll just go ahead and post here (I wrote more than I was planning to anyway)]</p>
<p>To start, I should go ahead and confess my biases. First, I don’t think RDF will succeed (other than in providing a way for people to put XML into a relational database with greater ease — which, for some people, might be sufficient (keep in mind, though, that there is a lot of baggage involved with that “ease”)). Second, I don’t think relational databases (while the best thing in the world for business data) are the best choice for library bibliographic and authority data. Yes, I know many library databases are built on top of a RDBMS — I have extracted data from many-a-MARC record stored in a series of VARCHARs (in essence having to parse segmented data from a BLOB).</p>
<p>It is not that I dislike RDF just because it wasn’t invented in the library community (Moen warned of the NIH (not invented here) syndrome; that warning has been resonating in my head). As an example of a non-library standard that is worth librarians investigating, I’d point to Topic Maps. If I had to I’d choose to use them instead of RDF (yes, I know they can also work together). Anyway, enough background…</p>
<p>The first comment that Dodds picks up on in his weblog entry is my: “I don’t think [RDF] can/will really accomplish anything that agreement on any XML schema couldn’t/wouldn’t.”</p>
<p>He responds: “This surprised me greatly. One thing that RDF doesn’t mandate is a single all-embracing format, it positively embraces plurality of schemas, and independent adoption and repurposing of schemas.”</p>
<p>Perhaps I should go into a little more detail. By this I meant that RDF says, “We can provide a way to reuse and make your data more interoperable <em>if</em> you just accept our structure.” While allowing arbitrary connections to be made is different than having a schema (or XSLT) that defines how those connections should be made, it is also the same in that it is accomplishing the same goal by limiting what one does with the data (for the record, I don’t really think we will have agreement on one schema either). Dorothea Salo touches (with a sledgehammer) on this ‘plays-well-with-others’ aspect in her post <a href="http://cavlec.yarinareth.net/archives/2004/07/21/look-we-get-it-already/">Look, We Get It Already</a>.</p>
<p>On the other hand, Dodds says RDF is not complex because “… [he’s] had no trouble introducing engineers to RDF….” That gave me a bit of a chuckle. I assume by engineers he means software engineers (since he is a [software] engineer at Ingenta, judging from his home page). Are they the target audience?</p>
<p>In this respect, RDF does seem a bit like XSD to me. Being able to model objects and subclass things in a programming-like way provides a very rich way of defining things. But because people other than software engineers will need to use metadata and its related schemas, I tend to favor RELAX NG over XSD as a schema language (RELAX NG has a sound foundation, but doesn’t push it onto its users).</p>
<p>It is not that RDF is so complex that noone in the library community can figure it out (there are people here experimenting with it). It is rather that it’s complexity (or should I say it’s “required knowledge from a particular domain”) makes it prohibitively complex for those not in that domain. Really, though, who wants to edit RDF? Do I really have to think like a relational database? I don’t want to. Perhaps I’m missing the user-friendly RDF editor that already exists out there, but I think we don’t see this because of the “designed in” flexibility of RDF. We end up editing the structure embedded as content and editing the real content too… that is too much.</p>
<p>Dodds says, “RDF has a definite image problem.” This is something I think we all can agree on. I’ve heard the “XML people” say it and I’ve heard the “RDF people” say it (and I’ve heard Salo take it apart as a rhetorical argument (see above)). It is actually not unlike XOBIS’ image problem (if it has one). When you want to replace something that has grown organically over the years with something that has been constructed (and which requires a radical rethinking of the way things are done) there is a lot of drag. How do we know THAT system is better if we can’t see it in the real world.</p>
<p>For reasons why we need RDF, Dodds suggests, “its much harder to map any given model to XML, because XML is limited in how well can express relationships.” I think XOBIS and Topic Maps do a fine job of representing relationships. Relationships are the “reason for being” for both. There are things though we expect a XOBIS aware system to do (what Dodds calls “implicit semantics”). There are a lot of promises in the RDF world too, though, about the types of things machines will be able to do with these explicit linkages (what I refer to as “taking a leap of faith”).</p>
<p>At the end of his post (yes, I’m really trying to wrap this rant up), Dodds says, “There’s some tasty morsels at the bottom of that semantic web layer cake. The only way to demonstrate that is to come up with more convincing demonstrations, e.g. a recast of MODS as RDF, backed by some useful code.” I’d agree with the last part (though I think it will take a lot of useful code (more than just “some”… librarians are a cautious bunch (rightfully so)) and there are also still social and institutional issues that will have to be addressed too). So, RDF (or XOBIS for that matter)… SHOW ME THE MONEY!</p>
<p>So, I’ve been wrong before and I’ll be wrong again (I’m pretty sure of this). If someone demonstrates the usefulness of RDF and creates the application that revolutionizes the Web, that’s great… I’ll start using RDF. I’m not begrudging anyone their RDF experimentations anymore than I’d want someone to take away my chance to experiment with XOBIS (yeah, let them try!) Until that time, though, that time when we start using machine readable data instead of human readable data, I’ll choose to spend my time on areas where I feel the best use of my time will be spent (or at least where I’m most interested).</p>
<p>Just a chuckle before I stop… I just “heard” in the code4lib IRC channel: “in [the] Semantic Web World my agent will coordinate rearranging your schedule with your agent….”</p>
<p>Hope this wasn’t too much of a rant  <img src="http://journal.kevinclarke.info/wp-includes/images/smilies/icon_smile.gif" alt="-)" />   I’m going to be hungry today b/c I’ve now missed my lunch!</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2005/07/06/rdf-show-me-the-money/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Metadata Interoperability</title>
		<link>http://weblog.kevinclarke.info/2005/06/30/metadata-interoperability/</link>
		<comments>http://weblog.kevinclarke.info/2005/06/30/metadata-interoperability/#comments</comments>
		<pubDate>Thu, 30 Jun 2005 11:34:04 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Metadata]]></category>
		<category><![CDATA[XOBIS]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=226</guid>
		<description><![CDATA[I’m back from ALA. What a time! It was the first ALA conference I’ve been to and, I must say, I was amazed at the sheer number of librarians (I’ve been to AALL and MLA, but they aren’t anywhere near as large). It was a bit mind boggling, to me, to see librarians at every [...]]]></description>
			<content:encoded><![CDATA[<p>I’m back from ALA. What a time! It was the first ALA conference I’ve been to and, I must say, I was amazed at the sheer number of librarians (I’ve been to AALL and MLA, but they aren’t anywhere near as large). It was a bit mind boggling, to me, to see librarians at every turn. On a related note, it was nice to see some of the <a href="http://irc.freenode.net/code4lib">Code4Lib</a> people in person (again, for some, and for the first time for others).</p>
<p>The best presentation of the conference (that I saw) was Bill Moen’s (during the MODS, MARC, and Metadata Interoperability session). For those that didn’t attend, Claire Stewart has summarized <a href="http://litablog.org/?p=82">the main points of Moen’s presentation</a> on the <a href="http://www.litablog.org/">LITA weblog</a>.  She did a remarkable job (she must be a fast typist)… it is well worth the read.</p>
<p>Moen suggests that libraries do not play a central role in the metadata world. We do have a priviledged role, since we have been doing it for such a long time, but we need to play well with others. Now that <a href="http://www.unalog.com/">unalog</a> is going to have the ability to perform batch updates on its folksonomy tags (this is coming, but not quite here yet… Dan is working on it), I’m starting to wonder what I will do about my “cataloging” and “metadata” tags (should one be a variant of the other?)</p>
<p>I’ve argued, in the past, that metadata is just a renaming of (a small part of) what catalogers have been doing for a long time. If metadata and cataloging are the same thing, which term should we use? Using metadata is more ‘hip’ (and more savvy politically… e.g., we should be reaching out to them), but I don’t like that it doesn’t recognize our experience with the issues involved.</p>
<p>Perhaps this shouldn’t be necessary. We should make the concessions. We should build bridges rather than worry about who gets credit for the accomplishments (assuming anyone would care… we do play a supporting role after all).</p>
<p>The presentation also made me think about a “switchboard schema” that would allow us to move between the other various schemas. OCLC, Moen said, might be using MARC or MARCXML as the transition format (I know there is something up on their site, but I haven’t looked at it yet). I’m curious, now, about XOBIS’ potential to serve in that capacity (just for us… I have no illusions about world domination (well, okay, I have small ones)).</p>
<p>After all, we keep saying it is a ‘high fidelity’ schema. There would need to be a XOBIS database, though, since some of the mapping data would be in XOBIS content (not just in the schema). I still need to finish the beta version of the schema and get the current bibliographic references converted into XOBIS, and stored in eXist, before I start to think about this though.</p>
<p>Ross Singer, of Code4Lib (among other things), also had a very interesting idea about having multiple transition formats (depending on the source and target schemas). The switchboard, in this case, would negotiate the best path for a transformation. I think this would require humans to put that knowledge into the system… I can’t imagine a system that could handle that negotiation by itself (but maybe I’m not creative enough(?))</p>
<p>Also, somewhat related, after my XOBIS talk, I spoke with a person who characterized XOBIS in an interesting way… he described it as an island. An island on which everything is tightly integrated and logical, but still separate from the world at large. I think it is a fair characterization. It would be idea if everyone lived on this island but, as Moen said, this isn’t going to happen.</p>
<p>This doesn’t mean we shouldn’t develop XOBIS. If it does what it is supposed to, and does it well, it is worth the investment. We should also focus on building that bridge though.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2005/06/30/metadata-interoperability/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Versioning Metadata</title>
		<link>http://weblog.kevinclarke.info/2005/04/29/versioning-metadata/</link>
		<comments>http://weblog.kevinclarke.info/2005/04/29/versioning-metadata/#comments</comments>
		<pubDate>Fri, 29 Apr 2005 23:24:55 +0000</pubDate>
		<dc:creator>ksclarke</dc:creator>
				<category><![CDATA[Metadata]]></category>
		<category><![CDATA[XOBIS]]></category>

		<guid isPermaLink="false">http://kevinclarke.info/weblog/?p=231</guid>
		<description><![CDATA[At DLF a week or two ago, I heard an interesting idea that has, ever since, been bouncing around in my head a bit. The idea was to use XML namespaces as a way to version metadata values (yes, you heard me, values… not just the XML elements and attributes themselves (which is common)). Unfortunately, [...]]]></description>
			<content:encoded><![CDATA[<p>At DLF a week or two ago, I heard an interesting idea that has, ever since, been bouncing around in my head a bit. The idea was to use XML namespaces as a way to version metadata values (yes, you heard me, values… not just the XML elements and attributes themselves (which is common)). Unfortunately, as I write this, I don’t remember which presentation I heard it in (I wrote it down, but don’t have that slip of paper with me right now).</p>
<p>So, usually, an XML namespace is used like so:</p>
<blockquote><pre>
&lt;element_ns:first_element xmlns:element_ns="http://my.namespace.com/ns/1.0"&gt;
&lt;element_ns:second_element&gt;Metadata term&lt;/element_ns:second_element&gt;
&lt;/element_ns:first_element&gt;
</pre>
</blockquote>
<p>What the library at DLF suggested was to use a namespace to control the metadata itself:</p>
<blockquote><pre>
&lt;element_ns:first_element xmlns:element_ns="http://my.namespace.com/ns/elements/1.0" xmlns:metadata_ns="http://my.namespace.com/ns/metadata/1.0"&gt;
&lt;element_ns:second_element&gt;metadata_ns:Metadata term&lt;/element_ns:second_element&gt;
&lt;/element_ns:first_element&gt;
</pre>
</blockquote>
<p>The metadata namespace would be invisible to the XML processing tools (they don’t care if a data value is using a namespace that is defined in the XML), but the programmers at this library could write code that takes into consideration that data values are tied to a particular namespace (that they are, essentially, versioned). Perhaps different actions are taken with metadata from a particular version (before the data value is returned to the user, btw, it’s namespace prefix would be stripped (via XSLT). Or, perhaps, in most cases, the namespace used could be the default namespace (so there is no prefix to append to (or strip from) the data value at all)).</p>
<p>For example, imagine that we have two namespaces for our metadata: http://my.namespace.com/ns/metadata/20050120 and http://my.namespace.com/ns/metadata/20050601. In version 20050120, a term might have been “Surname”; in version 20050601, that term might be changed to “Family name.” Generalized “types” of changes (the type that LC makes) might warrant their own version bumps, too, perhaps. Or, does metadata change too quickly to prevent this whole thing from being useful?</p>
<p>Of course, the whole thing made me think of XOBIS. In a XOBIS record, there are three parts. The first part, ControlData, is used to record information about the record itself. There is an Actions element that allows changes to the record to be recorded. If the record is edited, the date of this change is recorded in an Action element (and a descriptive note can be added to detail the change). We do not attach anything to the Principal Element of the record (the representation of the thing the record is representing) because the version of a term is about the record (not about the thing the record represents).</p>
<p>Keeping a record of the differences between two different versions of an XML record (via XUpdate and a native XML database) provides a real world use for these XOBIS actions. If an Action element says that a record was changed on 2005/1/23, we can use that date to retrieve the XUpdate result between that date and the current version (or between that date and the next date that it was updated (which would be another Action in the ControlData’s Actions element); either, in most cases I believe, would involve a series of XUpdates — these would be stored in the database and associated with the record). In this way, we could see the difference between a Principal Element in one version of a record vs. another.</p>
<p>With XOBIS, this is done explictly in the record’s ControlData section (and via XUpdate), versus associating version information with the data values themselves. Still, the whole concept of using the namespace for this purpose is interesting (and more interesting if it is the default namespace for the document. Of course, if you have multiple data values in a document that belong to different namespaces, you will have to use explicit prefixes, but still…)</p>
<p>Anyway, I wish I could remember which library was doing this. I’d like to visit their site and learn a little more about how they use this information in their processing.</p>
]]></content:encoded>
			<wfw:commentRss>http://weblog.kevinclarke.info/2005/04/29/versioning-metadata/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
