This session was facilitated by Owen Stephens, who does lots of digital library stuff, is involved in mashedlib events, and is lovely. (He could probably write a better bio than that, though.) I'm really keen to learn more about the techy side of librarianing, so I was really pleased to see him offer this session. It was very interesting - and I've come away with an idea for something that would made catalogues more useful for some rare books users, and lots of places to start finding out more. My notes are a bit sketchy - a lot of links and a few ideas. Sorry about that.
The issues Owen said he hope to discuss in the session were "How do places with data interact with people who want to use it?" and "How do we make data available?".
Why do organisations publish data?
- Accountability: by releasing data people can see what's been done and for how much money. Etc.
- To increase the available services.
- People 'out there' will be able to do things that you don't have the resources to do yourself.
- They'll also think of better things to do with your data, and by combining your data with data from other places.
- An example that covers both of these is the public toilet map.
- NB, the government is currently consulting on what data Local Government should release: Making Open Data Real
Library catalogues have imposed on them librarian- or supplier-made decisions about what can/can't be searched and in what way. Some of these decisions are limited by current cataloguing rules, but not all; often the data is recorded, but not in a usable way, or is there but isn't tapped by the intereface. For example, in most catalogues you can limit by publication type to newspapers, but you can't limit by frequency of the issues. Releasing data means that people can start to use it in the way they want to.
Releasing data: issues
Different uses will require different kinds of data release. To work on the newspapers query above, a dump of all the MARC (i.e. cataloguing) information would be OK. But for an app using circulation data you'd need to use a live API.
Data protection is obviously a big issue for some kinds of data. But it's not a deal-breaker. You can base locations on postcode zone, not individual postcodes for example, or for course-based datasets (in a uni context), just not include data from very small courses.
- Where to get data:
- Other info:
- The Cambridge University Library api (also of interest: Cambridge Open MetaData (COMET) project)
- 5-stars of open linked data (basic principle: start by getting your data out there in whatever format you have it, then work towards putting it in the most usable form)
- JISC Open Bibliographic Data Guide
- VuFind (open source next-gen OPAC type thing)
- Project Blacklight ("Blacklight is a free and open source Ruby on Rails based discovery interface (a.k.a. “next-generation catalog”) especially optimized for heterogeneous collections. You can use it as a library catalog, as a front end for a digital repository, or as a single-search interface to aggregate digital content that would otherwise be siloed.")
- An example of something, presumably crowd-sourced useful things: