Prepping for a Books Hackathon

3 minute read

So a few weekends ago I helped organize a really cool “Literary Hackathon” called CODEX, held at the MIT Media Lab and created by Jennifer 8. Lee of Plympton. I wrote previously about the hack I made, Book Playlist, which lets people collaboratively build playlists for their favorite books (which btw became a featured Hunt on Product Hunt! Yay!).

Today I want to talk about the technical resources we made available for hackathon participants. CODEX is unique in that the attendees are much more multi-disciplinary and diverse than your average hackathon - developers, of course, but writers, graphic designers, visual designers, professors, executives, documentary film-makers, all drawn by the opportunity to mingle with bookish folks and create something fun around a common passion, books. The attendees also varied wildly in experience levels - we had everything from CEOs to experienced mid-career engineers to high school and college students.

As a result, we wanted to provided a good set of technical materials for people to work with, so they didn’t spend hours trying to pull together resources for their hacks. We grouped our resources into a few categories: Sponsor APIs, Books Data, Media APIs, Epub & Bookreader tools, Input Devices, Publishing Tools, and a grab-bag of handy tools (data scrapers & NLP toolkits).

CODEX was made possible by grants and support from a number of sponsors, including Google, The Harvard Bookstore, Plympton, MailChimp, Automattic, Timeline, The Hawaii Project (yours truly!), Pressbooks, BookBub, Amazon Books, and The New York Times. We put emphasis on making our sponsors’ tools and APIs front and center so folks saw them. Where possible we arrange for demo keys to be created beforehand and made available to attendees so they didn’t waste time getting keys during the hackathon.

We next wanted teams to have great books data available to them. If you’ve ever tried to do much with books, you know Books metadata is something of a disaster - the data is scattered, disorganized, inconsistently tagged, and many other problems. So we gave people pointers to a variety of data sources in hopes they’d find something that matched their needs, including the usual suspects like Amazon Books, Google Books, Goodreads API, as well as more open data sets like OpenLibrary, the Harvard Library Open Metadata project, Project Gutenberg, and many others.

We wanted to enable the classic “mashup” style application so we pointed people towards a list of readily accessible APIs, like the Digital Public Library, Foursquare, Instagram, Pinterest and Spotify.

We pointed folks at various open source tools for dealing with ebooks, especially ePub format.

Since data is king, we included a list of pointers to various data tools - Import.IO and Scrapy for web scraping, Alchemy, Stanford and Dandelion NLP tools, and a variety of other things.

Lastly we had some unique data sets that CODEX creator Jennifer 8. Lee got us access to, including databases of Short Stories, Novellas, Book Reviews, and the wonderful Recovering the Classics cover art (note: there’s a Kickstarter campaign going to fund Recovering the Classics).

You can see the complete list of resources below. Please feel free to comment about potential additions, this is a living document.

In order to enable collaboration, we set up a Slack channel for the hackathon — which not only provided an environment for Q&A, teaming, and asking for help — it also was the beginning of a community which lasts longer than the event itself.

Finally, if you want to see the projects that were created, we have a HackDash of all the projects. Enjoy!

(Oh, and don’t forget lots of POWER STRIPS! The key to a great hackathon. That, and coffee. )