Howto: Experiment with Clojure Eclipse Maven

So… I’ve gotten interested in Clojure and, since I’m a native Eclipse/Maven user, my first instinct was to try out Clojure from within my old familiar environment…. Thinks I to myself, “That shouldn’t be too hard since Clojure runs on the JVM.” Luckily for me, it turns out to be very easy to get clojure-dev (the Eclipse Clojure plugin) working with m2eclipse (the Eclipse Maven plugin I use).

First, I installed the Eclipse Clojure plugin. Installation was easy thanks to the Eclipse update site that they provide. I should note that I already had the Eclipse Maven plugin installed, but that can be installed in the same way. Next, I had to install the clojure jars that come with the clojure-dev plugin into my local maven repository.

Maven provides a standard way of installing third party jars. First, I cd’ed into my Eclipse plugins directory (mine is at /opt/eclipse/plugins), then I typed:

mvn install:install-file -Dfile=clojure_1.0.0/clojure.jar -DgroupId=org.clojure -DartifactId=clojure-lang -Dversion=1.0 -Dpackaging=jar

The clojure jar is the one that comes with clojure-dev, version 0.0.36; later versions of the plugin might include different versions of clojure. Next, I wanted to install clojure’s contributed libraries, so I typed:

mvn install:install-file -Dfile=clojurecontrib_0.0.0.20090504_r756/clojure-contrib.jar -DgroupId=org.clojure -DartifactId=clojure-contrib -Dversion=0.0.0.20090504_r756 -Dpackaging=jar

Like the main jar, the name of the contributed jar (and its parent directory) will most definitely change over time. So, with this done, I should be ready to go.

All I needed to do then was to modify my pom.xml to use the clojure jars (and any other jars I might want to use).  I added a dependencies section to my pom.xml:

<dependencies>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure-lang</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure-contrib</artifactId>
<version>0.0.0.20090504_r756</version>
</dependency>
<dependency>
<groupId>org.ccil.cowan.tagsoup</groupId>
<artifactId>tagsoup</artifactId>
<version>0.9.7</version>
</dependency>
</dependencies>

Then I created .clj file with a simple test:

(ns info.freelibrary.test)
(. (new org.ccil.cowan.tagsoup.Parser) (toString))

It worked! So, the next step was to see if adding a jar with just .clj files would work (I had no reason to think it wouldn’t). I downloaded enlive and then jar’ed up the contents of the src directory.

The next step was to add that new jar (enlive.jar) to the local maven repository which I did with:

mvn install:install-file -Dfile=enlive.jar -DgroupId=net.cgrand.enlive-html -DartifactId=enlive-html -Dversion=1.0-SNAPSHOT -Dpackaging=jar

Then I added a new dependency to my pom.xml by including:

<dependency>
<groupId>net.cgrand.enlive-html</groupId>
<artifactId>enlive-html</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>

Next, to test that everything was working, I modified my .clj file to:

(ns info.freelibrary.test
(:refer-clojure :exclude [empty complement])
(:use net.cgrand.enlive-html)
)
(select
(html-resource
(java.net.URL. "http://library.appstate.edu/")
) [:#main [:a (attr? :href)]]
)

And I ran it from within Eclipse using Ctrl-L. It ran and gave me valid output. Hurray! So, that’s it… I can now continue experimenting with Clojure using Eclipse and the Maven plugin.

Caveat: Though the above worked, the tagsoup dependency for enlive is really 1.2 (but there is no tagsoup-1.2 in the maven repository). I used an older version for testing but you can, of course, add the 1.2 version to your local maven repository in the same way you added the clojure jars (above).

July 5, 2009 • Posted in: Clojure • No Comments

Howto: Saving an XCF with Layers to a PDF with Pages

I’m surprised that there isn’t an easier way to go from a Gimp file (.xcf) to a PDF.  Sure, you can always “print to pdf” if you are working with a single layer image, but what if you have a multi-layer image that you want to turn into a PDF with multiple pages (each page being a layer from the image)?

Here is one way that I’ve found to accomplish this.  I’m using Ubuntu so any install stuff will be specific to that distribution, but the software I’m using should work on any Linux distro.

First, you’ll need Gimp.  I’m assuming that’s already installed.

Gimp won’t save a multi-layer image to a .ps, .tif, or .pdf by itself, though, so you need to install a script called “Save Layers as Individual Files” (this script can be downloaded for Gimp 2.4 or newer from Panotools) .

Once you download this script it needs to be put in your Gimp scripts directory.

unzip -d ~/.gimp-2.6/scripts Save-layers-tiff-24.zip

Your scripts directory may be named something else if you are using another version of Gimp (other than 2.6). Once the script is in that directory, it will appear in the Script-Fu > Utils menu within Gimp (and can be applied to any open image).

Next, you need to install imagemagick.  If you don’t already have it installed, it’s as easy (on Ubuntu) as:

sudo aptitude install imagemagick

Once that is installed, you’ll be able to use the mogrify program which comes with ImageMagick. From within the directory that contains all your TIF files, type:

mogrify -format pdf *.tif

This will generate PDFs for each of your TIF files. You can then merge all the PDFs files into one using a program called PDFTK. To install that, just type:

sudo aptitude install pdftk

Running that program is as easy as typing:

pdftk filename*.pdf cat output singlename.pdf

The filename*.pdf argument will catch all the individually named files created by the mogrify program (filename1.pdf, filename2.pdf, filename3.pdf, filename4.pdf, etc.)

And, that’s it! You can open your new singlename.pdf file and have all those XCF layers now represented by individual pages within the PDF. This is the easiest way that I’ve found to accomplish this task, but if you know of a better/easier way I’d love to hear it!

July 1, 2009 • Tags: , , • Posted in: Linux • 3 Comments

The Foundation for Librarianship

Is there a commonality that all librarians share?  Some would say that instruction is the commonality in librarianship.  I disagree.  My choice would be the Five Laws of Library Science (adjusted to account for non-book materials).

To boil down the five laws even further, one might say that a service-orientation is the central component of librarianship.  This service-orientation doesn’t have to be manifested in face-to-face interactions, though.  In fact, I’d suggest that the majority of the points of service in a library do not involve a librarian or library staff member interacting with a patron.  I don’t say this to diminish the importance of the work that reference librarians do, but just to note that anytime a patron interacts with any of the library’s systems (classification system, website, catalog, desktop computers, etc.) there is a point of service.

In light of this, all teams (departments) in a library are public service oriented.  If a patron sits down at a computer in the library and doesn’t find the software she is looking for (or the system doesn’t work as expected (within reason)), there is a point of failure.  If a patron goes to find a book on the shelves and can’t find it because it has been misshelved, there is a point of failure.  If a patron searches the library’s website (or catalog) but doesn’t find the information she is looking for because of the quality of the markup or content (or cataloging), there is a point of failure. Yes, reference librarians can help with all these points of failure (or find someone else who can help), but I’d wager the majority of patrons don’t seek them out.  We just expect things to work and move on if they don’t.

It’s interesting that librarians don’t (in my experience) measure the public service aspects of all the teams in the library (e.g., often we measure number of books reshelved, but not the patrons’ ability to successfully locate materials on our shelves… or, we measure the number of times we evaluate and reorganize a website, but not how many clicks it takes a patron to access a resource).  I’m sure this is, in part, because we choose to measure what is easier to measure (the low hanging fruit). But, since what is measured is often what’s valued, perhaps we’re doing ourselves a disservice?  That’s fodder for another post.

So, how do we know we are doing a good job?  One way is certainly the anecdotal evidence gleaned from day to day interactions with our patrons.  Another way is to ask people who have these interactions about their impressions (and encourage them to share their experiences). Another, still, is to observe patron behaviors through logs of their activities (click counts, time spent on a page, etc.)  One might also find useful information about patron needs and behaviors in the research of our field (often a mix of the first three methods).  Another way, of course, is to do usability studies and/or to conduct surveys asking the patrons about their experiences.  All these measurements in my opinion are individually flawed but work well in unison to give us an adequate (though still incomplete) picture of our patrons.

We don’t expect every librarian to have experience in all these information gathering activities.  The fifth law of library science is that the library is a growing organism.  This means there is something organic about it and, I’d suggest, that like any organism it has different parts — each serving different purposes to the whole.  We don’t expect reference librarians to pour through web server logs analyzing their contents (or to catalog books as they come into the library).  That’s because there are a certain set of skills required for this.  Similarly, there are certain skills required to teach a course, conduct a reference interview, or prepare for a RAP session (planned one on one instruction).

One might make the case that anyone can learn these skills (suggesting, perhaps, that there isn’t as much skill involved as one would think), and it might be that anyone can learn to conduct a successful reference interview or write a program that integrates the library’s proxy server with the electronic resources on the library’s website.  If this is the case, then, we truly are all generalized (e.g., any one person in the library can do the job of any other).

I don’t think that is really the case though.  Sure, there are some activities that can be learned (and there are probably groups of activities that share commonalities — suggesting opportunities for cross-training), but to do something well takes a lot of time and practice.  The difference between a student programmer and a professional one is significant (in the amount of time it takes for a project to be completed and in the result that each produces).  I would think that the same is true of teaching.  There is a significant difference between a student teacher fresh out of school and one with ten years of experience (and activity in her field). Don’t we as librarians want experts assisting our patrons, creating our cataloging records, selecting books related to a particular field of study, creating our websites?

Sure, given time, many librarians (though I don’t think all) could become adequate instructors or scripters (or catalogers or acquisition librarians).  What concerns me is that in the meantime, there must be a prioritization of these activities (which means, by necessity, a deprioritization of other existing activities).  This is certainly within the rights of a library (to adjust priorities accordingly).  One might decide that instruction needs are more important than the needs of patrons using the library’s website (or trying to access the library’s special collections, many of which may be unprocessed and unavailable to patrons at the current moment).

To be sure, there is a balance that must be struck.  I worry, though, about what gets deprioritized and also the message that it sends to librarians whose main activities don’t involve instruction.  It seems to devalue these other activities, obscuring even further the important role that they play to the patrons’ ability to find the resources that they need.  I used to say that cataloging was the foundation that libraries stood on.  When I moved into more of a systems role I saw the systems with which patrons interacted as being paramount.

I can see how reference and instruction librarians see the world through instruction tinted glasses.  I think what we need perhaps, though, is not a generalist’s approach to library science but better communication between the different areas/responsibilities within the library (and better assessment of how our patrons’ needs are being met).  We need a more organic library that responds to all the service needs that patrons have as they interact with our people, resources, and systems.

June 23, 2009 • Posted in: Librarianship • 4 Comments

Amazon Offers Public Datasets… Bibliographic?

Interesting news that Amazon is going to be offering large public datasets up to the public through it’s EC2 (Elastic Compute Cloud) web service.  Some examples included will be the annotated human genome data, various US Census, transportation, and economic databases. I’ve got an idea for a dataset they could add… how about MARCXML records for all their books, videos, CDs, etc.  With their collection, that would be a pretty good sized dataset.  I’ve sent my suggestion in via email.

November 23, 2008 • Tags:  • Posted in: Metadata • No Comments

Note to Future Self

When Eclipse on your 64-bit Ubuntu/Dell laptop starts crashing on start, change the settings so that it doesn’t try to auto-compile the workspace. You’ll have to be fast and change the setting after it has started but before it gets to the “build all projects” stage. Don’t ask me how it got set to auto-build again. You probably did it in a moment of forgetfulness. Also, don’t try to figure out why Eclipse doesn’t start with the auto-build feature activated. You’ve tried this many times before. Save your time and just follow the steps above.

September 28, 2008 • Posted in: Linux • No Comments

Elsevier XQuery Challenge

Elsevier has been doing cool things with XQuery for awhile now.  Now, they are holding a contest where each contestant will get access to 7,500 full-text XML articles from Elsevier journals.  The winner will be the one that can develop the best “unique yet useful web-based journal article rendering application.”

So all you library-land XQuery hackers start your (XQuery) engines, here is the official story that came across the XQuery-Talk mailing list:

Elsevier Labs is inviting creative individuals who have wanted the opportunity to view and work with scientific journal article content on the web to enter the Elsevier Article 2.0 Contest. Each contestant will be provided online access to approximately 7,500 full-text XML articles from Elsevier journals, including the associated images, and the Elsevier Article 2.0 API to develop a unique yet useful web-based journal article rendering application.  The sample apps (including source code) we have provided on the Article 2.0 Contest web site were developed in XQuery.  While the contest does not mandate the use of XQuery, our experience has shown the technology is a natural fit for building these types of applications.

If you are interested in the contest, please visit the web site (http://article20.elsevier.com) and apply for an Article 2.0 API Key.

If you have any questions about the contest, drop us an email at info-article20@elsevier.com.

Very cool… Oh, did I mention first prize is $4000, second prize is $2000, and third prize is $1000?  Not too shabby!

September 23, 2008 • Posted in: XQuery • No Comments

12 Seconds

I’ve seen a few folks tweeting lately with a “12seconds” preface (followed by a link). I haven’t clicked on any before because (usually) there isn’t much in a 12seconds tweet other than a link — I need a little more incentive to click on a link in a random tweet. Anyway, my click today took me to a site where people post short 12 second videos of themselves.

If you’ve been reading this weblog for any amount of time at all, you know I tend to ramble on and on. I’ve semi-recently started using Twitter and I find it very interesting (because I’m limited to 140 characters — it’s not so easy for me to confine myself to that amount of space, but I find it works well for some topics (more newsy)).

The 12 second videos seem to be the video equivalent of a tweet. It’s an interesting space. I don’t know if anyone else is doing it (most of the stuff on YouTube is longer (and not that interesting to me)). I know Flickr has recently introduced short videos, but I’m not sure they’re tackling the same space. It seems like 12 Seconds is much more intentionally trying to be a video Twitter, whereas Flickr is just adding short videos as a (cool) afterthought.

I have to admit I’m more of a printed text boy. I don’t follow podcasts and I don’t look at YouTube very often. Yes, I listen to music, but when I want to digest something in a form other than music, I prefer to read about it. Tweets work well for me for this reason, but I’m not sure 12 second videos would. Still, there is something about it that interests me. I’ve signed up to get an invite (I have no idea how long this will take).

I don’t know what I’d be able to do with 12 seconds since I’m a rather slow thinker, but it might be fun to try. If I do give it a shot, I’ll post something here with a link. Perhaps I’m interested, in part, because of the strict time limitation. It seems like a spontaneous haiku… a stream of consciousness haiku? I guess it would be possible to give a great deal of thought to your 12 seconds and to produce a very polished video (some folks on the site seem to be doing this). I don’t think that’s how I’d use it though.

Though I’m not really that interested in video blogging, I might give it a try if/when I get an invite. Unrelated, I find the whole “invite” thing interesting (psychologically speaking), but that’s probably fodder for another post.

Flickr and Capital One Mashup

An interesting mashup… Capital One now allows you to use one of your Flickr photos as the image for your credit card. I really love this idea. I can create a card with the Yale Library catalog, an image from my trip to the Netherlands, or a picture of my kids.

I’m not a huge fan of credit cards, but I like this level of customization. I’d like it even more if my bank card did this. Maybe, though, with a picture of my kids on my credit card, I’ll be less likely to use it (reminded I should be saving for college or something else less frivolous than most of my credit card expenditures).

I’m testing out this service at the moment and it seems to be struggling to get some of my Flickr images. Since the email just arrived perhaps they’re experiencing a lot of traffic at the moment. For those interested in experimenting (I don’t think you already need to have a card to see one of your Flickr images transposed), you might want to wait a bit.

July 30, 2008 • Posted in: Social Software • No Comments

Wordle

My delicious links as a Wordle image:

June 22, 2008 • Posted in: Memes, Worklife • No Comments

Code4Lib Epiphany

I had a bit of an epiphany lately in my thinking about Code4Lib (what it is, what it should be, etc.) It’s all thanks to a post by Ed (who has the ability to shift my thinking every now and then).

I’ve always been of the mind that Code4Lib is an experiment… that it shouldn’t be centralized, organized, etc., and it’s colored my thoughts on other Code4Lib-ish issues (conferences, projects, etc.) I’ve ranted on at times that all these Code4Lib-ish things should be as Code4Lib is. I don’t think so anymore. In fact, I think thinking in this way was sort of putting the cart before the horse. I think thinking like that was actually trying to see Code4Lib in a centralized way — see it as a single thing.

My new approach to Code4Lib is that, if I don’t have a strong enough preference about something to actually get involved with it, I’m going to refrain from commenting unless, of course, opinions are solicited. Then I’ll just offer an opinion and go on my way. Take, for instance, the Code4Lib Planet. Sure, I have opinions but, no, it’s not something I really want to take up at this time. For that reason, I should just let the editors do what they want (which is what Ed said in his post basically).

In a way, this is approaching activities in the same way that I would an open source project. So what if the editors do something I don’t find useful. There is nothing stopping me from setting up my own Planet of Code4Lib authors for my own use if I feel so strongly about an issue. If I find an open source project that is close to what I want, but not quite there, it makes more sense to use it as I need to, modifying it as needed (rather than to try and sway the project’s owners away from their well-considered path).

This doesn’t mean I shouldn’t offer suggestions if solicited, or provide feedback about how something might be more useful to me as a random user, but there’s no need to feel some sort of distributed ownership over anything just because it’s Code4Lib-related. That just gets in the way of those who are already doing great work. If there is a project related to Code4Lib that I want to work on, I can work out the sticky issues with those who also want to put in the time.

As with most epiphanies, this isn’t really a big revelation. Most people were probably already thinking this way. I guess when you realize you’ve been deluded, though, it seems like a burst of light, voices on high, or something like that.

May 28, 2008 • Posted in: Code4Lib • No Comments