Word smatter

I program, speak, and write, primarily about things Solr and Lucene related. I'm a member of the technical staff (and co-founder) at Lucid Imagination
code4lib blog flickr Technorati Profile

Books, books, books

I’ve played with a sample of my personal book data in the past, and really wanted to once (but not for all, by any stretch) get my entire home book collection usable as a fun dataset to feed into Solr.  Thanks to LibraryThing and Delicious Library I have succeeded in very quickly getting the bulk of the books in my house scanned.  First, I purchased a Cue Cat scanner from LibraryThing.  Then I scanned them into my lifetime LibraryThing account.  Unfortunately the export from LibraryThing doesn’t give me as much info as I’d like (subject headings/genres), so I exported the LibraryThing data (through some basic spreadsheet manual massaging) into Delicious Library.  I’m not sure why, but Delicious Library 2 didn’t do the trick of refreshing the data from Amazon, but Delicious Library 1 did.  Then exported that to a tab-delimited file, which is easily indexable into Solr from there.

I had guessed there’d be around 1000 books in my house, but it turns out that was a bit high - ended up scanning in around 680 books.  There are a number of books that aren’t in Amazon or Library of Congress lookups though, so the LibraryThing collection of mine isn’t entirely complete.

I’ve added a LibraryThing sidebar on my Tumblr site, to show some random covers.

The fun now begins to do something with this data.  First stop will be to use it in my Solr Boot Camp class next week at ApacheCon EU.

I took some book shelf photos that I’ve uploaded to Flickr.

I love books!