This weekend I reimplemented the XMLReader and XMLWriter classes in ruby-marc using Libxml-Ruby, a Ruby layer over the Libxml2 C library.

Currently, ruby-marc uses REXML, a pure Ruby XML library. Since REXML is built into Ruby, it is convenient. I was curious, though, how much of a performance boost there would be from using Libxml2. Here are the results of my very informal test (using some HCL MARC data):

User System Total Real
XMLReader [old]: 24.300000 0.030000 24.330000 25.607547
XMLReader [new]: 3.180000 0.010000 3.190000 3.231896
XMLWriter [old]: 38.960000 0.060000 39.020000 41.017238
XMLWriter [new]: 11.950000 0.050000 12.000000 12.607114

Both XMLWriter times include the new XMLReader reading records in from a source file. As a record is read in, it is written out to a new file. This is just intended to get an inkling of what the difference between the two versions might be (not to be a formal benchmark). Lower numbers are better.

So, in reimplementing, I completely rewrote the reader. It just reads from a file and returns MARC::Record objects. What is being used to read the XML is completely swappable with anything else.

With the writer, I changed the encode method so that it now takes an option specifying which library should be used (REXML is the default still). Since the method is public, I figured someone is probably using those REXML Documents returned and their code would break if I returned a Libxml Document instead. The write method, on the other hand, now uses Libxml by default.

I haven’t checked in any of these changes yet (since I haven’t passed them by Ed and don’t know whether they should be incorporated), but I have validated that the existing tests still pass just fine.

The speed improvements are pretty nice. If an extra dependency can be tolerated it would be nice to have the performance boost. The only other caveat is I used the 0.4.0pre01 version of Libxml-Ruby. It might be desirable to wait until the final 0.4.0 release.

Anyway, I’ll get Ed’s opinion on all this sometime this next week. Right now, it is just a fun experiment.