Versioning Metadata
At DLF a week or two ago, I heard an interesting idea that has, ever since, been bouncing around in my head a bit. The idea was to use XML namespaces as a way to version metadata values (yes, you heard me, values… not just the XML elements and attributes themselves (which is common)). Unfortunately, as I write this, I don’t remember which presentation I heard it in (I wrote it down, but don’t have that slip of paper with me right now).
So, usually, an XML namespace is used like so:
<element_ns:first_element xmlns:element_ns="http://my.namespace.com/ns/1.0"> <element_ns:second_element>Metadata term</element_ns:second_element> </element_ns:first_element>
What the library at DLF suggested was to use a namespace to control the metadata itself:
<element_ns:first_element xmlns:element_ns="http://my.namespace.com/ns/elements/1.0" xmlns:metadata_ns="http://my.namespace.com/ns/metadata/1.0"> <element_ns:second_element>metadata_ns:Metadata term</element_ns:second_element> </element_ns:first_element>
The metadata namespace would be invisible to the XML processing tools (they don’t care if a data value is using a namespace that is defined in the XML), but the programmers at this library could write code that takes into consideration that data values are tied to a particular namespace (that they are, essentially, versioned). Perhaps different actions are taken with metadata from a particular version (before the data value is returned to the user, btw, it’s namespace prefix would be stripped (via XSLT). Or, perhaps, in most cases, the namespace used could be the default namespace (so there is no prefix to append to (or strip from) the data value at all)).
For example, imagine that we have two namespaces for our metadata: http://my.namespace.com/ns/metadata/20050120 and http://my.namespace.com/ns/metadata/20050601. In version 20050120, a term might have been “Surname”; in version 20050601, that term might be changed to “Family name.” Generalized “types” of changes (the type that LC makes) might warrant their own version bumps, too, perhaps. Or, does metadata change too quickly to prevent this whole thing from being useful?
Of course, the whole thing made me think of XOBIS. In a XOBIS record, there are three parts. The first part, ControlData, is used to record information about the record itself. There is an Actions element that allows changes to the record to be recorded. If the record is edited, the date of this change is recorded in an Action element (and a descriptive note can be added to detail the change). We do not attach anything to the Principal Element of the record (the representation of the thing the record is representing) because the version of a term is about the record (not about the thing the record represents).
Keeping a record of the differences between two different versions of an XML record (via XUpdate and a native XML database) provides a real world use for these XOBIS actions. If an Action element says that a record was changed on 2005/1/23, we can use that date to retrieve the XUpdate result between that date and the current version (or between that date and the next date that it was updated (which would be another Action in the ControlData’s Actions element); either, in most cases I believe, would involve a series of XUpdates — these would be stored in the database and associated with the record). In this way, we could see the difference between a Principal Element in one version of a record vs. another.
With XOBIS, this is done explictly in the record’s ControlData section (and via XUpdate), versus associating version information with the data values themselves. Still, the whole concept of using the namespace for this purpose is interesting (and more interesting if it is the default namespace for the document. Of course, if you have multiple data values in a document that belong to different namespaces, you will have to use explicit prefixes, but still…)
Anyway, I wish I could remember which library was doing this. I’d like to visit their site and learn a little more about how they use this information in their processing.