| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Linking Museums write-up

Page history last edited by Owen Stephens 12 years, 1 month ago

A place to record some of the discussion from the 'Linking Museums' meetup on July 7, 2010.  Please feel free to edit and add your notes, or comment on the page if that's easier for you.  I didn't get everyone's names/twitter IDs but some are listed on the sign-up page.  There's also a discussion on museum microformats, triggered by this event, elsewhere on the wiki.

 

I'd been desperately hoping I'd get away with not having to do any organising at the event, but I guess that was a bit unrealistic. In part because we needed to break up the tables to allow passage through the area, it seemed easiest to suggest 'birds of a feather'-style breakout discussions.  Possible topics were called out, written on A3 paper then grouped (where appropriate) and passed over to different parts of the room.

 

It would have been a good idea to nominate a time to have each group report back on their discussions, but I didn't think of it at the time.  This would also have given people a chance to easily move around between groups and start a different take on the same questions.

 

All paintings by Stubbs (or Boucher or Picasso)

Joe, Ian, Guys, Mia, Libby...

 

Posting Joe's notes, the lazy version (aka 'I've run out of lunchbreak to write them up in'):

 

Notes from 'linking museums' meetupNotes from 'linking museums' meetup

 

Mia (notes from my phone, pretty much unedited): we need 'recipe books' for how to do the basic things.  

 

Discussion of the activation energy to get started with various models.

 

What's the simplest possible start (that still allows for expansion into more detail later)? We need something simple so we don't lose 90% of museum geeks at the start. Too much structure and startup overhead is scary for busy museum staff with a million other bits of work going on.

 

What's the simplest thing needed for linked museums to work? Namespaces and stable identifiers? Neat problem to be solved: museum identifiers change as object status changes or they change institutions. 

A discussion of the situation with smaller museums that are vulnerable to change (and therefore unable to provide the permanent part of permanent URIs, even if they were able to provide the rest of it) lead Ian to suggest PURIs provided by The National Archives.  [This could be a brilliant idea - how to take it forward?]

 

How much of a standard is enough? Not being a perfect standard is ok but it can't lock you in and prevent future improvements. Joe : we need a skeleton but people need to understand that it's not buildable all at once.

 

Common use cases - starting an exhibition planning process.  Republishing in all sorts of ways

 

Business cases (Libby)

 

We need a forum for hearing about good things already going on, for support and reality checks from peers...

 

Joe showed us the Raphael Research Resource - I can't believe I haven't heard more about it, cos it's pretty cool.

 

There is also an additional experimental presentation of the Raphael data (It does need more work :-) ): the basic ontology, that developed during the project, has been converted into XML and then all of the data have been added into a quadstore here, providing a basic model for URIs. Further work will be required to clean up the ontology and/or replace it with current standards.  The plan is then to express the basic metadata for the entire collection in this form, (an old D2RQ version (unfinished) of this process can be seen here), creating PURIs, which can then be used to connect richer sources of data like the Raphael project.  Similar work is being done elsewhere on Rembrandt and Cranach. (Joe)

 

Why linked data (and various other meta discussions) including how much structure, how is it going to be used (for formats, types of data).

Loads of people!

 

Shelley

(I am the one who went home with our 6 A3 sheets of notes. Feel free to change, update and correct if I have left anything out, got anything wrong.)

 

What is useful for real people and audiences? (Use cases)

 

Who

What

How

Members of the pubic (e.g., non-professional experts, hobbiests, educators) who want to curate their own tours for friends, colleagues, students

Collections/object data

XML, RDF. Really any type of structured data. Need URIs for every object and structured, easily scrape-able web pages for every object

Museum visitors and enthusiasts (“culturally active adults”) who enjoy cultural leisure activities on evenings, weekends. Museums could provide personalized, multi-museum events listings/recommendations mashed up with FB or iCal

Public events listings for various audiences (adults, families, scholars)

Need standard for events data across museums for personal recommendations from multiple institutions. Subject  and audience tags (e.g., Picasso, kids, classicalmusic) would aid personalization. URIs for individual events

Cultural tourist, subject enthusiast, student or scholar doing research who wants to locate all objects of interest across multiple museums. For example: I want a tour of all Roman objects in London museums

Collections/object data, data on whether objects are on view

Standard format for object data across multiple museums located in same city/region

School groups and teachers/educators in specific regions, towns, geographic  areas who want to create resource for local teaching. For example: show me all archaeological objects that were found in a 20km radius of our school and which museums they are in now

Archaeological dig data, geographical/place provenance of objects, collections/object data

Variety of formats possible as long as data is published. Standard for publishing provenance data across museums or standardizing existing data for place names/location

Museums themselves. For-profit departments/divisions of museums, e.g., image licensing and sales. Powerhouse Museum research showed that user-contributed tags in online collection database records improved findability and increased image license sales.

Collections/object images tagged with relevant folksonomic tags

Publish collection records and solicit tags from the public or niche audiences

Open educational resource providers who create/publish materials for teachers. These organisations can repackage educational resources produced by museum educators.

Teachers’ notes, Powerpoint presentations, Flash interactives, activity instructions, object and contextual images produced by museum educators, e-learning web editors

Need some way of describing these materials - metadata on file type, e.g., Powerpoint, Word document, text, image. Extract more standardised data, e,g., teachers’ notes and publishing it machine parsable format, XML, RDF, etc. Need to address licensing issues for contextual images which often come from external sources, e.g, Flickr Creative Commons

Students researching or compiling reports on curriculum topics, e.g., dinosaurs. Data from multiple museums could be published in modularised chunks around themes. Related information (images and data) could be pulled in using machine tags

Collections/object data, educational resources produced by museum educators

Need thematic and machine tags on published data across museums which corresponds with other organizations who produce/publish educational resources for students, e.g., science/art magazines, open educational resource sites like OpenJorum, etc.

First nations peoples: Community websites could virtually repatriate that were removed from the local area and are now kept in many different museum collections

Collections/object data, archaeological dig data, geographical/place provenance of data mashed up with place name in local language/dialect so community can find it

Publish individual object data with geographic/place provenance, standardised way of referencing location

Third-party data publishers of archaeological data who repackage and publish for researchers. Such publishers like Archaeological Data Service (UK) and OpenContext (US) are increasingly important to grant applicants because “data publishing plans” are now required by grant-giving institutions like the National Science Foundation (US)

Archaeological dig data, research data on objects, scientific analyses of collection objects. Popular data searched for on the OpenContext website include: people, object type, specific sites (e.g., Petra)

Need to connect collections/object data with scientific data, records should be tagged with people, object type and place names

Educators/students who want thematic, packaged information like that produced around temporary exhibitions.

Text, images and media assets like videos produced for temporary exhibitions, could include exhibition catalogues, particularly those out of copyright

Need generic/standardised way of extracting exhibitions “stories” from glam islands/silo websites created for exhibitions. Way of describing exhibition story, perhaps specific to museum or subject genre, e.g., art, history, science, natural history

 

Notes on useful types/formats of museum data

 

  • Useful data on objects/collection records include:
    • Where is the object? Is it on display?
    • When the object was last displayed?
    • Are there replica (or similar) objects? If so where can I see one?
  • Location data – where the object was found – is very useful and place names can be matched/mashed up with data from different services:
  • A potential model for information on replica objects comes from the library world. The Functional Requirement for Bibliographic Records (FRBR – see http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records) distinguishes between a literary work and an edition of the work
  • For mashups with music, the Musicbrainz service (http://musicbrainz.org/doc/Database) has a unique ID for artists and songs.  It also includes other metadata
  • In the UK, a potential data licensing issue is that (according to some IP lawyers) Creative Commons is not applicable to data. Recently, an organisation called Open Knowledge Foundation has produced the “Open Data Attribution License” which is essentially the same as Creative Commons but for data
  • Thinking about UGC and community generated content in light of most museums’ obsession with “authoritativeness”, format standards need a field for “Who said it” and some way of cross-checking this person with other community ratings systems, e.g., Wikipedia authors, etc.

 

What's in it for museums?

 

  • Increase access to collections, particularly objects that are rarely/never on view
  • Increase in on-site visitors if they can find out when objects they are interested in will be coming on view
  • Boosts education/public interpretation and storytelling around collections objects that can potentially feed back into museum interpretive strategies/educational offerings
  • Serves public mission to provide resources for scholarship and research
  • Increases persistence, reach of temporary exhibitions if exhibition micro-site data/stories around objects are able to be reused, searched
  • Can dramatically increase the reach and exposure of small museums and collections
  • Better attendance at events, knowledge of the collection if structured data is pushed to places where it will be seen/found/tagged
  • Increase in revenue if data, especially images, are easy to locate through search engines. Powerhouse stats showed increase in licensing of tagged images

 

Paul:

1. Get the raw data out there.

Interesting to hear from Jonty & Richard (museum hackers) - any data is better than no data. Museums, get your raw data out. Stop worrying about completing it and perfecting it. It will never be perfect and waiting until you think it is will stop other trying to do interesting things with it.

Jonty & Richard both said a full dump was more use to them than individual pages, as they'd only have to write routines to find/scrape every page otherwise. The full dump might not be as up to date as the dynamic pages. Museums could provide full dump plus feed of recent changes.

2. Implement basic web best practice.

Each thing should have a persistent structured web page. e.g. Each object, event, place, person documented. Structured could be plain HTML, which is again better than no data.

3. RDFa was seen has having a clearer direction, plus formats that were formally approved, than microformats. There was no obvious microformat for general museum objects.

4. Don't obsess about linked data yet. Until we embed structured data for our records there's no data to link from/to!

 

Why linked data/how's it going to be used - thoughts included better markup of search engine results (e.g. Google rich snippets), potentially link painting by same artist in different collections (possibly via intermediate person record page on common site such as Wikipedia), better structure for hackers to build visualisations on top of.

How much structure - keen users will make do with whatever is there, but structured metadata would increase ability of automated linking of pages.

 

In the education sector particularly, users were more interested in collection/narrative/story level detail on museum object collections rather than individual object records. Museums are often still creating the core object catalogue and haven't yet started creating the level above this. This information is sometimes being created on exhibition microsites, so we need to think about how the data on these sites can also be served as structured data and how it is integrated back into the core collection database at the end of an exhibition.

 

Mia:

Michael made the point: you need a page per thing, with a stable identifer. We need to be able to cite content. We're in the age of point at things. 

Don't do linked data, do structured web pages for each of your 'things'.

Jonty and Richard: if you have a data dump (or an API) a licence to explain how it can be used.

Me: museums need geeks to tell museums what's useful.  [And I think one of the best ways to do this, apart from meetups like this, is hackdays - like usability testing, there's nothing like having someone struggle with your site/data to convince you to improve it.]

Me: if we concentrate on just getting stuff up there, in the first instance, how do we protect the future from well-intentioned short term projects/developers?  Or from projects with purely marketing goals e.g. exhibition microsites that might also include some objects and interpretative information in un-reusable forms?  On a related note, there's a gap in understanding what resources are available to museums - so many museums don't have direct control over their websites, or can only produce brochure-ware sites.

Question from Jonty: how do we contact museums to say we want your data?

People want to contribute, how can we help them?  [This went into a wider discussion of volunteer programmes in museums.]

 

Interestingness

I didn't get to talk to this group so I'm not sure what was discussed and who participated - if you were there, please let us know!

[Rhiannon wrote up some notes for the MCG blog, I've copied parts of it that relate to the 'interestness' discussion here]

"I was on a table that was discussing 'what might real people use it for'? There were about 8-10 of us on the table and I think the majority were coming at the question from a developer rather than a museum perspective.  What I enjoyed most was hearing about what might be possible and what needed to happen to make it possible.  I didn't necessarily understand every part of what was said, but I enjoyed listening, and made a few comments.

 

I took a few things away with me which I hope sound more realistic than negative.  The first was that it seemed to me to be a great knowledge divide between the people who want the data and the people who have the data.  The message from developers was very much 'we could do such cool stuff, it would be so useful to so many people, and would really benefit museums, why aren't more museums publishing the data for us to do this?'  The immediate answer that sprung to my mind was 'because they don't know that they should be for a start'.  Now I'm not talking about bigger museums here, I'm possibly not even talking about London museums, but I'm talking about your regional museum, which might only have a couple of members of staff, or even, one voluntary curator who only comes in once a week.  It seems to me that those museums might not even think of themselves as having data, let alone know that other people might want it, or how to go about publishing it.  Even if they do know these basic things, they may have other worries.  We all know museum professionals who are still quite skeptical about digital, and about web 2.0, and about the risks to authority, copyright etc etc and it felt to me like there was a colossal amount of work to be done in explaining to these people why releasing their data would be a good thing.

 

There is also, however, a probably-equally important explanation that needs to happen (and this evening was very important towards that) so that the world of developers understands the worries of these museum professionals, which, whilst they can seem frustrating, are often very real and important.  There are also some very practical reasons why museum data just may not be in a state to be released either because it's not that detailed, or because it's not in a state to be made public, for whatever reason.

 

The second big thing I took away occurred once the discussions began to round up, when discussion turned to 'so what are you going to do now?'  It seems to me that two key barriers that need to be overcome before any of us can effect any real change are a) the lack of influence and b) the lack of time that most people who believe that all this is a good idea has to actually influence any national or international change.  It may be that these things can change, but my feeling is that there is a lot of work to be done at this basic level before we can really start achieving the undoubtedly more exciting and useful goals that linked data could achieve."

 

General thoughts

 

Richard Light's notes

These thoughts are my own "take homes" from the discussion, rather than any sense of the meeting's overall conclusions.

What data do museums have?

Database content, mostly fielded and designed mainly for collections management support. Textual materials, much of it in a non-accessible "grey literature" format. Images.

The database content is typically (reasonably) self-consistent within a given environment. Thus we have known properties (from the field name) with usable string values. The challenge from a Linked Data perspective is the cost-effective generation of URLs from the string values currently held, e.g. for people and places, given that different museums will have different vocabularies to control their content.

Who wants to use this data?

The public, who are typically interested in classes of objects (rather than individual objects), or in objects with certain properties (e.g. coming from a place of interest to them). Educators, or more specifically people who create resources for educators to use. Students, if relevant objects could be easily accessed as "follow up" to formal learning materials.

How do we improve the data?

There is nothing to stop every museum publishing URLs, and whatever associated Linked Data they have to hand, for each object in their own collection, and thereby giving them a "hook" onto which others can hang added-value information and assertions of their own. They should treat this task as an urgent priority.

Where possible, convert string values in data to URLs, ideally widely-used (not just local) ones. Could use e.g. geonames.org for place names, or dbpedia for object class names. Interest in Portsmouth's historical gazetteer for "old" place names. There is a general need for a service to mint URLs for (dead) people.

There is a clear need for a sector-specific ontology which represents the properties found, i.e. the types of information recorded in museum databases. This will act as the "predicate" in Linked Data triples/assertions. It could be based on an existing agreement about these semantics, e.g. CIDOC CRM or LIDO.

Axis-based data such as geographical co-ordinates or dates/date ranges could be treated as purely numerical data, or "pixellated" by assigning a URL which imposes a certain level of precision (e.g. year for dates). Or both approaches could be adopted.

What's the museum take on Linked Data?

Simple assertions are not enough; we care about the attribution of those assertions (i.e. who is making the assertion). We also want a framework which allows the expression of uncertainty and doubt.

We are not particularly bothered about the specific format (RDF/XML, RDFa, JSON, Topic Maps) in which Linked Data is published, but we would like to be able to "do the job once" and have done with it.

 

Mia:

The discussion was challenging and provoking, which was an excellent tonic.  To summarise the message I heard from non-museum geeks: get your house in order, learn to walk before you try to run. Get the basics right first.  My response is: you're right, but a map of where we're aiming for is good too.  What's right and what's possible for museums isn't often/always the same thing, but at least a roadmap allows us to work towards what's right.  With each project we can improve the publication of our data, if we know what 'improved' looks like.

 

We (museum technologists, especially those working with collections) need to explain: characteristics of museum data, the types of projects that result in websites, the types of success metrics we deal with.

 

We deal with really interesting problems - we should describe these so people outside who want to play with our data will have an idea of how messy it is, how raw (e.g. records that consist only of an accession number), how contextual it can be (e.g. when was the catalogue record written, who was it written before, is it still regarded as 'true'?)...

 

For example, the 'replica object' problem - the data the object was made (say, 1932) isn't a date that fits into the story we want to tell (say, Galileo's telescope of 1609).  We have stories and we have facts, and objects that illustrate each, but there isn't always a neat 1:1 relationship between them.  We also have subjects about objects, but our coverage of those subjects is limited by the collecting histories of our museums.

 

One neat problem to focus attention on might be: how do we get from marketing- or education-based exhibition microsites to decent collections of pages about objects?

 

What next?

Regular meetups, a mailing list?  What works for you (and do you want to help?)

 

Do we want a place to discuss 'The future of museums (online)', or a mailing list with a more specific focus on helping people get sensibly structured web pages up and inch towards linkable data?



Comments (32)

sebchan said

at 6:59 am on Jul 9, 2010

Raw data will never be perfect - certainly once it gets out to a broader community of data users (see http://www.powerhousemuseum.com/dmsblog/index.php/2010/07/05/malcolm-tredinnick-on-some-problems-with-working-with-our-collection-dataset/)

Richard Light said

at 11:12 am on Jul 9, 2010

While it's (very) useful to have meetups like this with a wide range of participants, is there value in also starting a "just do it" group for folks in museums who want to publish Linked Data (or something like it), and who want to share practical ideas and techniques for achieving this?

Mia said

at 11:15 am on Jul 10, 2010

Sounds good to me! It might require a lot of patience on the part of people like you who have a good understanding of what needs to be done, and are able to implement it, for people who are going to find it difficult enough to get a structured page per thing online.

Mia said

at 2:02 pm on Jul 10, 2010

Thinking about it more... perhaps a group with a focus on worked examples (as below) and help with real life implementations?

So, then - online, meetups, combination? If so, then where? (Bearing in mind that some people don't like editing wikis and threaded conversations often work better anyway)

Andy Mabbett said

at 10:39 am on Jul 10, 2010

Sorry I couldn't make the event - it seems to have been very interesting and productive. I would have liked to discuss the view that "There was no obvious microformat for general museum objects" is an obstacle - the answer is to make one! Also to make use of those that do exist; museum web pages aren't just about objects, put people and places, for which hCard exists, and events for which there is hCalendar, for example.

Mia said

at 11:13 am on Jul 10, 2010

That looks like Paul's note - he might be able to expand on the context.

Repeating myself from twitter, it's a shame you weren't there because a discussion of the benefits of microformats vs e.g. RDFa would have been useful.

My gut feeling is that experiments with RDFa could lead to an ad hoc standard that would make the creation of a museum microformat object much easier. If we had to start from abstract contexts, it could be a world of pain.

Richard Light said

at 12:28 pm on Jul 10, 2010

If we use RDFa, there are already tools out there which can suck up our data. If we invent a new format, we also create a new, empty, silo for which tools need to be developed.

Andy Mabbett said

at 12:57 pm on Jul 10, 2010

That's over-stating the issue; we wouldn't really be inventing a new format, but applying the existing format to a new set of data. There are existing microformat-parsing tools, into which new scripts are easily dropped - and it's my experience (with the Species microformat) that this does happen.

Andy Mabbett said

at 1:07 pm on Jul 10, 2010

...and at the end of the day, all that using a microformat involves is agreeing to use the same, meaningful class names in our HTML. Since we all use meaningful class names in our HTML (don't we? ;-) ), where's the harm in agreeing to use the same ones?

Andy Mabbett said

at 1:03 pm on Jul 10, 2010

Thanks, Mia.

I've started adding my thoughts and some notes from our Twitter discussions to http://museum-api.pbworks.com/Microformats

Bear in mind that the microformats "process" does suggest testing which properties are being published in the real world; and reusing (some properties from) existing schema, so no "abstract contexts" should be involved. You can see how this was done for 'Species' here: http://microformats.org/wiki/species-examples

RDFa and microformats are not mutually exclusive, but nor must one come before the other.

Paul Rowe said

at 10:08 am on Jul 11, 2010

At this stage I'm looking for something really simple to test out the amount of work required to build in structured data into object pages and see what kinds of usage this metadata gets. For object pages, RDFa with an existing format such as Dublin Core seems the simplest starting point. I agree that the formats aren't mutually exclusive and if there's a demand it may be worth implementing both. I think there will be interest in an 'object' microformat (other than the more specific formats such as work-of-art), but it's one extra step at this early experimentation stage.

Mia said

at 1:43 pm on Jul 10, 2010

Thinking about it more, it would be really useful to look at some worked examples, from pages-for-things that simply use properly structured HTML (aka POSH or plain old semantic HTML) to examples with various microformats and RDFa applied.

We (Science Museum) might be doing a tiny bit of renovation work on our object pages (http://sciencemuseum.org.uk/onlinestuff/museum_objects.aspx) in the next month or so, so I might be able to include some of the results of any worked examples.

With that in mind, I'm going to suggest some of our objects to see if we can get some concrete results out of these discussions. These are just objects I like, that hopefully also have enough variety to include most cases. I've provided a few links for each object to highlight some of the issues I discussed in http://museum-api.pbworks.com/Science-Museum-linked-data

Mia said

at 1:43 pm on Jul 10, 2010

Babbage's Difference Engine No 2, 2000
(useful for the 'replica problem')
http://sciencemuseum.org.uk/objects/computing_and_data_processing/1992-556.aspx

Mia said

at 1:49 pm on Jul 10, 2010

The Science Museum Arts Project site isn't online yet so this is all I could find for the arts collection: http://www.sciencemuseum.org.uk/about_us/about_the_museum/art.aspx

Mia said

at 12:49 pm on Jul 11, 2010

Paul recorded the following discussion above: "In the education sector particularly, users were more interested in collection/narrative/story level detail on museum object collections rather than individual object records. Museums are often still creating the core object catalogue and haven't yet started creating the level above this. This information is sometimes being created on exhibition microsites, so we need to think about how the data on these sites can also be served as structured data and how it is integrated back into the core collection database at the end of an exhibition."

IIRC Seb Chan has found something similar in the usage of the Powerhouse's collections. At work we're constantly trying to make it easier to build the contextual interpretation written for exhibitions into a central infrastructure, in part so that it can be re-used elsewhere online, and in part to enable 'edit once, update everywhere'. I'm also trying to work to the point where our object pages have links to every instance of their use on exhibition and project microsites, so that all that narrative information is available, but I think we also have to be careful about retaining links to the context in which information originally appeared, as some records are so contextually written that they're almost misleading out of the original narrative.

But most often in museums, this isn't possible as wall captions and thematic narrative text are held outside collections and CMS applications, in Word or InDesign documents.

Richard Light said

at 4:10 pm on Jul 11, 2010

To respond to your last para: the need for effective re-use of museums' descriptive resources is an issue which goes way back (think printed exhibition catalogues). It needs a culture change, so that all information-generating activity is seen as adding to a museum's information resources, and is not just project-specific. It is requires technical developments, so that systems are in place to store this information and deliver it in whatever formats are required. Collections databases are typically not set up to support this type of use.

Richard Light said

at 4:12 pm on Jul 11, 2010

Sorry (don't see how to edit comments): "It also requires ..."

Eric Kansa said

at 6:03 pm on Jul 11, 2010

Hi All,

More basic steps for "data portability" would be good to consider also, above and beyond use of specific microformats or more elaborate and formal RDF(a) expressed semantics. I think there is no way one can anticipate all possible research, educational, or public uses of collections data. So it'll be hard to model our content to meet these unanticipated needs.

So besides thinking about ways to make museum collections "linked data", we should also make museum content more "linkable by others". A good way to do this is to share in nice, simple, machine-readable formats lists of URIs. Atom is great for this. This can be semantically interesting if combined with precision queries. For example, many collections offer precision queries related to place, time, object type, etc. If the results of such queries are expressed as Atom feeds, then it is easy to share a list of materials selected by some analytically (and semantically) important criteria. Third parties can annotate these lists of resources according to whatever ontology or vocabulary they want to use, and in effect, they can make semantics implicit in a query more explicit. The Atom paging extension can be used to share long lists of hundreds or even thousands of URIs, making it possible to annotate large sets of resources that share meaningful characteristics defined by a query. (This scenario is a bit like OpenSearch.org, but with more precision than keyword searches).

Some related papers discussing this more:
http://alexandriaarchive.org/blog/wp-content/uploads/2010/04/CAA_2010_kansa_et_al_draft.pdf

and

http://www.jstage.jst.go.jp/article/dsj/9/0/9_42/_article


eatyourgreens said

at 6:26 pm on Jul 11, 2010

Hi Eric,

Search results from the NMM's collections are available as RSS. Here's a search by year:
http://www.nmm.ac.uk/collections/requestHandlers/doQuickSearch.cfm?searchterm=&authority=category&category=&startyear=1900&endyear=1930&startrow=1&format=rss

There's a YQL table for those searches too:
http://developer.yahoo.com/yql/console/?q=select%20*%20from%20nmm.collections.search%20where%20startyear%3D1900%20and%20endyear%3D1930&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys

Items within those feeds are marked up with Dublin Core to show makers, dates, places etc. where known.

Jim

Richard Boulton said

at 8:28 pm on Jul 11, 2010

Jim,

The NMM collection data looks great.

However: what license is the data retrieved from the NMM usable under? For those of us wishing to take data and put it into mashups, this is a fundamental question - it's not worth coding something up if you're just going to be told to take it down again straight away. It's easy to find information on the NMM website about licensing of the images (on http://www.nmm.ac.uk/collections/about_us.cfm), but I couldn't find anything about licensing of the collections data itself.

Also, the "permitted use" for the images is quite restrictive: I read it as saying that you couldn't put together a mashup of some kind using the images, and then put the mashup online, since then it wouldn't be "personal" use. Though maybe if the mashup pointed to the images hosted on www.nmm.ac.uk, it would be okay? It's sufficiently unclear that that would be okay that I wouldn't try it, anyway.

I don't mean to criticise the NMM specifically here - it's very rare to see licensing information for data on sites, and even rarer to find licenses that allow free re-use (even of parts of the data). However, I really hope that will change over the next few years, as it has been for government datasets.

One site which is doing licensing well is http://finds.org.uk/ the data is clearly licensed, and that license allows free reuse (CC BY-NC-SA). It's also all accessible in nice machine readable formats. I'd be happy even if that data didn't include the images; since it does, I'm bouncing up and down with joy! I commend them thoroughly.

Richard

eatyourgreens said

at 11:12 am on Jul 12, 2010

Here's a link to the YQL results which, hopefully, won't break in the pbwiki comment interface: http://tinyurl.com/2wpmfgz

Rights holder and license are in dcterms:rightsHolder and dcterms:license for each item. At the moment, the license just links to the NMM copyright statement: http://www.nmm.ac.uk/collections/copyright.cfm As you say, it's hard to say from the museum's copyright statement whether reuse of the data is allowed, particularly for items where the rights holder is not the National Maritime Museum.

Eric Kansa said

at 6:41 pm on Jul 12, 2010

Hi Jim,

Excellent! The only thing I'd add to my wish list is are links for paging through the feed. That would make it easier to page through and get the large list of query results.

We've also put the query results as paged Atom feeds for Open Context (http://opencontext.org/about/services). Example:

http://opencontext.org/sets/Palestinian+Authority.atom?cat=Small+Find&prop[Object+Name]=Whorl

Best!
-Eric


Best!
-Eric

eatyourgreens said

at 12:43 am on Jul 13, 2010

Hi Eric,

YQL supports paging, so you can get larger lists of results with query syntax like this example: http://tinyurl.com/2d5768p

The RSS feeds behind this use OpenSearch elements, so it should be possible to page through a set of feeds using the totalResults, startIndex and itemsPerPage elements: http://www.nmm.ac.uk/collections/requestHandlers/doQuickSearch.cfm?searchterm=&authority=category&category=&startyear=1700&endyear=1730&sortby=rank&format=rss&startrow=11&per_page=20
I'd guess the only thing you'd need to know is to change the startrow parameter in the URL.

Jim

Richard Boulton said

at 9:28 am on Jul 13, 2010

Jim,

From reading the museum's copyright statement, I think it quite clearly doesn't permit reuse of the data for items whose copyright is held by the National Maritime Museum: no license to reuse the data is given (so the default copyright law is in place, which means no copying allowed). For items which are crown copyright, reuse might be possible, and for items which are "No known copyright" reuse probably is possible. In addition, there's no license statement for the feed of data itself (as opposed to the data on individual items).

Of course, this doesn't stop a developer contacting the NMM to ask for a license, but it's not going to encourage re-use of the collections data by other projects.

Sorry to harp on about this, but I think getting clear licensing (which would ideally be permissive) in place is essential before worrying about formats and linked data; unless you only want to attract developers and projects who won't respect copyright and licensing.

Richard

eatyourgreens said

at 10:26 am on Jul 13, 2010

Richard, data from the NMM feeds is already being reused by Europeana, if that helps: http://www.europeana.eu/portal/brief-doc.html?query=greenwich&qf=PROVIDER:National%20Maritime%20Museum&qf=TYPE:image&tab=image&view=table
I'd imagine, as you say, that some specific licensing agreement has been reached with the museum for that particularly project.

The discussion at our table mentioned the lack of a good license for museum data. I think it would help make the case internally, within museums, for licensing data if we could show that there is a demand to actually use it ie. if we could build something useful with existing museum data.

Jim

Mia said

at 8:32 pm on Jul 15, 2010

Two useful recent links:

http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/
make your stuff available on the web (whatever format)
make it available as structured data (e.g. excel instead of image scan of a table)
non-proprietary format (e.g. csv instead of excel)
use URLs to identify things, so that people can point at your stuff
link your data to other people’s data to provide context

And some museum data: http://blackcountryhistory.org/data/

Mia said

at 10:29 pm on Jul 30, 2010

This post quite usefully summarises some of the issues that museums are facing in getting their data out: http://hadleybeeman.posterous.com/how-are-we-going-to-improve-government-openda

Alexandra said

at 12:20 pm on Aug 7, 2010

Finally getting round to introducing myself here; another person who is very sorry she couldn't get to the meetup on 7th July, although I'm an archivist not a museums person. Hope you don't mind the gatecrashing but many of the issues you discussed are identical for the archives sector, particularly for smaller archives. As a collections person, I've certainly been in the position of wanting to get data 'out there' but having no clue how to go about this, or knowing what formats would be most useful for developers. I'm particularly interested in the point about permanent URIs. Would definitely be interested in keeping in touch/meeting up/working together with you all.

Mia said

at 12:43 pm on Aug 18, 2010

Hi Alexandra, and thanks for introducing yourself. Archives people are more than welcome! At this stage I think the next meetup will be September 27th, I hope you can make that date.

You don't have permission to comment on this page.