Linking Museums write-up


A place to record some of the discussion from the 'Linking Museums' meetup on July 7, 2010.  Please feel free to edit and add your notes, or comment on the page if that's easier for you.  I didn't get everyone's names/twitter IDs but some are listed on the sign-up page.  There's also a discussion on museum microformats, triggered by this event, elsewhere on the wiki.

 

I'd been desperately hoping I'd get away with not having to do any organising at the event, but I guess that was a bit unrealistic. In part because we needed to break up the tables to allow passage through the area, it seemed easiest to suggest 'birds of a feather'-style breakout discussions.  Possible topics were called out, written on A3 paper then grouped (where appropriate) and passed over to different parts of the room.

 

It would have been a good idea to nominate a time to have each group report back on their discussions, but I didn't think of it at the time.  This would also have given people a chance to easily move around between groups and start a different take on the same questions.

 

All paintings by Stubbs (or Boucher or Picasso)

Joe, Ian, Guys, Mia, Libby...

 

Posting Joe's notes, the lazy version (aka 'I've run out of lunchbreak to write them up in'):

 

Notes from 'linking museums' meetupNotes from 'linking museums' meetup

 

Mia (notes from my phone, pretty much unedited): we need 'recipe books' for how to do the basic things.  

 

Discussion of the activation energy to get started with various models.

 

What's the simplest possible start (that still allows for expansion into more detail later)? We need something simple so we don't lose 90% of museum geeks at the start. Too much structure and startup overhead is scary for busy museum staff with a million other bits of work going on.

 

What's the simplest thing needed for linked museums to work? Namespaces and stable identifiers? Neat problem to be solved: museum identifiers change as object status changes or they change institutions. 

A discussion of the situation with smaller museums that are vulnerable to change (and therefore unable to provide the permanent part of permanent URIs, even if they were able to provide the rest of it) lead Ian to suggest PURIs provided by The National Archives.  [This could be a brilliant idea - how to take it forward?]

 

How much of a standard is enough? Not being a perfect standard is ok but it can't lock you in and prevent future improvements. Joe : we need a skeleton but people need to understand that it's not buildable all at once.

 

Common use cases - starting an exhibition planning process.  Republishing in all sorts of ways

 

Business cases (Libby)

 

We need a forum for hearing about good things already going on, for support and reality checks from peers...

 

Joe showed us the Raphael Research Resource - I can't believe I haven't heard more about it, cos it's pretty cool.

 

There is also an additional experimental presentation of the Raphael data (It does need more work :-) ): the basic ontology, that developed during the project, has been converted into XML and then all of the data have been added into a quadstore here, providing a basic model for URIs. Further work will be required to clean up the ontology and/or replace it with current standards.  The plan is then to express the basic metadata for the entire collection in this form, (an old D2RQ version (unfinished) of this process can be seen here), creating PURIs, which can then be used to connect richer sources of data like the Raphael project.  Similar work is being done elsewhere on Rembrandt and Cranach. (Joe)

 

Why linked data (and various other meta discussions) including how much structure, how is it going to be used (for formats, types of data).

Loads of people!

 

Shelley

(I am the one who went home with our 6 A3 sheets of notes. Feel free to change, update and correct if I have left anything out, got anything wrong.)

 

What is useful for real people and audiences? (Use cases)

 

Who

What

How

Members of the pubic (e.g., non-professional experts, hobbiests, educators) who want to curate their own tours for friends, colleagues, students

Collections/object data

XML, RDF. Really any type of structured data. Need URIs for every object and structured, easily scrape-able web pages for every object

Museum visitors and enthusiasts (“culturally active adults”) who enjoy cultural leisure activities on evenings, weekends. Museums could provide personalized, multi-museum events listings/recommendations mashed up with FB or iCal

Public events listings for various audiences (adults, families, scholars)

Need standard for events data across museums for personal recommendations from multiple institutions. Subject  and audience tags (e.g., Picasso, kids, classicalmusic) would aid personalization. URIs for individual events

Cultural tourist, subject enthusiast, student or scholar doing research who wants to locate all objects of interest across multiple museums. For example: I want a tour of all Roman objects in London museums

Collections/object data, data on whether objects are on view

Standard format for object data across multiple museums located in same city/region

School groups and teachers/educators in specific regions, towns, geographic  areas who want to create resource for local teaching. For example: show me all archaeological objects that were found in a 20km radius of our school and which museums they are in now

Archaeological dig data, geographical/place provenance of objects, collections/object data

Variety of formats possible as long as data is published. Standard for publishing provenance data across museums or standardizing existing data for place names/location

Museums themselves. For-profit departments/divisions of museums, e.g., image licensing and sales. Powerhouse Museum research showed that user-contributed tags in online collection database records improved findability and increased image license sales.

Collections/object images tagged with relevant folksonomic tags

Publish collection records and solicit tags from the public or niche audiences

Open educational resource providers who create/publish materials for teachers. These organisations can repackage educational resources produced by museum educators.

Teachers’ notes, Powerpoint presentations, Flash interactives, activity instructions, object and contextual images produced by museum educators, e-learning web editors

Need some way of describing these materials - metadata on file type, e.g., Powerpoint, Word document, text, image. Extract more standardised data, e,g., teachers’ notes and publishing it machine parsable format, XML, RDF, etc. Need to address licensing issues for contextual images which often come from external sources, e.g, Flickr Creative Commons

Students researching or compiling reports on curriculum topics, e.g., dinosaurs. Data from multiple museums could be published in modularised chunks around themes. Related information (images and data) could be pulled in using machine tags

Collections/object data, educational resources produced by museum educators

Need thematic and machine tags on published data across museums which corresponds with other organizations who produce/publish educational resources for students, e.g., science/art magazines, open educational resource sites like OpenJorum, etc.

First nations peoples: Community websites could virtually repatriate that were removed from the local area and are now kept in many different museum collections

Collections/object data, archaeological dig data, geographical/place provenance of data mashed up with place name in local language/dialect so community can find it

Publish individual object data with geographic/place provenance, standardised way of referencing location

Third-party data publishers of archaeological data who repackage and publish for researchers. Such publishers like Archaeological Data Service (UK) and OpenContext (US) are increasingly important to grant applicants because “data publishing plans” are now required by grant-giving institutions like the National Science Foundation (US)

Archaeological dig data, research data on objects, scientific analyses of collection objects. Popular data searched for on the OpenContext website include: people, object type, specific sites (e.g., Petra)

Need to connect collections/object data with scientific data, records should be tagged with people, object type and place names

Educators/students who want thematic, packaged information like that produced around temporary exhibitions.

Text, images and media assets like videos produced for temporary exhibitions, could include exhibition catalogues, particularly those out of copyright

Need generic/standardised way of extracting exhibitions “stories” from glam islands/silo websites created for exhibitions. Way of describing exhibition story, perhaps specific to museum or subject genre, e.g., art, history, science, natural history

 

Notes on useful types/formats of museum data

 

 

What's in it for museums?

 

 

Paul:

1. Get the raw data out there.

Interesting to hear from Jonty & Richard (museum hackers) - any data is better than no data. Museums, get your raw data out. Stop worrying about completing it and perfecting it. It will never be perfect and waiting until you think it is will stop other trying to do interesting things with it.

Jonty & Richard both said a full dump was more use to them than individual pages, as they'd only have to write routines to find/scrape every page otherwise. The full dump might not be as up to date as the dynamic pages. Museums could provide full dump plus feed of recent changes.

2. Implement basic web best practice.

Each thing should have a persistent structured web page. e.g. Each object, event, place, person documented. Structured could be plain HTML, which is again better than no data.

3. RDFa was seen has having a clearer direction, plus formats that were formally approved, than microformats. There was no obvious microformat for general museum objects.

4. Don't obsess about linked data yet. Until we embed structured data for our records there's no data to link from/to!

 

Why linked data/how's it going to be used - thoughts included better markup of search engine results (e.g. Google rich snippets), potentially link painting by same artist in different collections (possibly via intermediate person record page on common site such as Wikipedia), better structure for hackers to build visualisations on top of.

How much structure - keen users will make do with whatever is there, but structured metadata would increase ability of automated linking of pages.

 

In the education sector particularly, users were more interested in collection/narrative/story level detail on museum object collections rather than individual object records. Museums are often still creating the core object catalogue and haven't yet started creating the level above this. This information is sometimes being created on exhibition microsites, so we need to think about how the data on these sites can also be served as structured data and how it is integrated back into the core collection database at the end of an exhibition.

 

Mia:

Michael made the point: you need a page per thing, with a stable identifer. We need to be able to cite content. We're in the age of point at things. 

Don't do linked data, do structured web pages for each of your 'things'.

Jonty and Richard: if you have a data dump (or an API) a licence to explain how it can be used.

Me: museums need geeks to tell museums what's useful.  [And I think one of the best ways to do this, apart from meetups like this, is hackdays - like usability testing, there's nothing like having someone struggle with your site/data to convince you to improve it.]

Me: if we concentrate on just getting stuff up there, in the first instance, how do we protect the future from well-intentioned short term projects/developers?  Or from projects with purely marketing goals e.g. exhibition microsites that might also include some objects and interpretative information in un-reusable forms?  On a related note, there's a gap in understanding what resources are available to museums - so many museums don't have direct control over their websites, or can only produce brochure-ware sites.

Question from Jonty: how do we contact museums to say we want your data?

People want to contribute, how can we help them?  [This went into a wider discussion of volunteer programmes in museums.]

 

Interestingness

I didn't get to talk to this group so I'm not sure what was discussed and who participated - if you were there, please let us know!

[Rhiannon wrote up some notes for the MCG blog, I've copied parts of it that relate to the 'interestness' discussion here]

"I was on a table that was discussing 'what might real people use it for'? There were about 8-10 of us on the table and I think the majority were coming at the question from a developer rather than a museum perspective.  What I enjoyed most was hearing about what might be possible and what needed to happen to make it possible.  I didn't necessarily understand every part of what was said, but I enjoyed listening, and made a few comments.

 

I took a few things away with me which I hope sound more realistic than negative.  The first was that it seemed to me to be a great knowledge divide between the people who want the data and the people who have the data.  The message from developers was very much 'we could do such cool stuff, it would be so useful to so many people, and would really benefit museums, why aren't more museums publishing the data for us to do this?'  The immediate answer that sprung to my mind was 'because they don't know that they should be for a start'.  Now I'm not talking about bigger museums here, I'm possibly not even talking about London museums, but I'm talking about your regional museum, which might only have a couple of members of staff, or even, one voluntary curator who only comes in once a week.  It seems to me that those museums might not even think of themselves as having data, let alone know that other people might want it, or how to go about publishing it.  Even if they do know these basic things, they may have other worries.  We all know museum professionals who are still quite skeptical about digital, and about web 2.0, and about the risks to authority, copyright etc etc and it felt to me like there was a colossal amount of work to be done in explaining to these people why releasing their data would be a good thing.

 

There is also, however, a probably-equally important explanation that needs to happen (and this evening was very important towards that) so that the world of developers understands the worries of these museum professionals, which, whilst they can seem frustrating, are often very real and important.  There are also some very practical reasons why museum data just may not be in a state to be released either because it's not that detailed, or because it's not in a state to be made public, for whatever reason.

 

The second big thing I took away occurred once the discussions began to round up, when discussion turned to 'so what are you going to do now?'  It seems to me that two key barriers that need to be overcome before any of us can effect any real change are a) the lack of influence and b) the lack of time that most people who believe that all this is a good idea has to actually influence any national or international change.  It may be that these things can change, but my feeling is that there is a lot of work to be done at this basic level before we can really start achieving the undoubtedly more exciting and useful goals that linked data could achieve."

 

General thoughts

 

Richard Light's notes

These thoughts are my own "take homes" from the discussion, rather than any sense of the meeting's overall conclusions.

What data do museums have?

Database content, mostly fielded and designed mainly for collections management support. Textual materials, much of it in a non-accessible "grey literature" format. Images.

The database content is typically (reasonably) self-consistent within a given environment. Thus we have known properties (from the field name) with usable string values. The challenge from a Linked Data perspective is the cost-effective generation of URLs from the string values currently held, e.g. for people and places, given that different museums will have different vocabularies to control their content.

Who wants to use this data?

The public, who are typically interested in classes of objects (rather than individual objects), or in objects with certain properties (e.g. coming from a place of interest to them). Educators, or more specifically people who create resources for educators to use. Students, if relevant objects could be easily accessed as "follow up" to formal learning materials.

How do we improve the data?

There is nothing to stop every museum publishing URLs, and whatever associated Linked Data they have to hand, for each object in their own collection, and thereby giving them a "hook" onto which others can hang added-value information and assertions of their own. They should treat this task as an urgent priority.

Where possible, convert string values in data to URLs, ideally widely-used (not just local) ones. Could use e.g. geonames.org for place names, or dbpedia for object class names. Interest in Portsmouth's historical gazetteer for "old" place names. There is a general need for a service to mint URLs for (dead) people.

There is a clear need for a sector-specific ontology which represents the properties found, i.e. the types of information recorded in museum databases. This will act as the "predicate" in Linked Data triples/assertions. It could be based on an existing agreement about these semantics, e.g. CIDOC CRM or LIDO.

Axis-based data such as geographical co-ordinates or dates/date ranges could be treated as purely numerical data, or "pixellated" by assigning a URL which imposes a certain level of precision (e.g. year for dates). Or both approaches could be adopted.

What's the museum take on Linked Data?

Simple assertions are not enough; we care about the attribution of those assertions (i.e. who is making the assertion). We also want a framework which allows the expression of uncertainty and doubt.

We are not particularly bothered about the specific format (RDF/XML, RDFa, JSON, Topic Maps) in which Linked Data is published, but we would like to be able to "do the job once" and have done with it.

 

Mia:

The discussion was challenging and provoking, which was an excellent tonic.  To summarise the message I heard from non-museum geeks: get your house in order, learn to walk before you try to run. Get the basics right first.  My response is: you're right, but a map of where we're aiming for is good too.  What's right and what's possible for museums isn't often/always the same thing, but at least a roadmap allows us to work towards what's right.  With each project we can improve the publication of our data, if we know what 'improved' looks like.

 

We (museum technologists, especially those working with collections) need to explain: characteristics of museum data, the types of projects that result in websites, the types of success metrics we deal with.

 

We deal with really interesting problems - we should describe these so people outside who want to play with our data will have an idea of how messy it is, how raw (e.g. records that consist only of an accession number), how contextual it can be (e.g. when was the catalogue record written, who was it written before, is it still regarded as 'true'?)...

 

For example, the 'replica object' problem - the data the object was made (say, 1932) isn't a date that fits into the story we want to tell (say, Galileo's telescope of 1609).  We have stories and we have facts, and objects that illustrate each, but there isn't always a neat 1:1 relationship between them.  We also have subjects about objects, but our coverage of those subjects is limited by the collecting histories of our museums.

 

One neat problem to focus attention on might be: how do we get from marketing- or education-based exhibition microsites to decent collections of pages about objects?

 

What next?

Regular meetups, a mailing list?  What works for you (and do you want to help?)

 

Do we want a place to discuss 'The future of museums (online)', or a mailing list with a more specific focus on helping people get sensibly structured web pages up and inch towards linkable data?