| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Sample NMSI objects as Linked Data

Page history last edited by eatyourgreens 13 years, 9 months ago

 

Sample objects

Mia has offered some sample objects for us to discuss as potential Linked Data sources:

 

Original orrery planetary model by John Rowley, 1712-1713
http://sciencemuseum.org.uk/objects/astronomy/1952-73.aspx
http://collectionsonline.nmsi.ac.uk/detail.php?t=objects&type=all&f=&s=1952-73&record=0

 

Stephenson's 'Rocket' locomotive, 1829
http://sciencemuseum.org.uk/objects/nrm_-_locomotives_and_rolling_stock/1862-5.aspx
http://www.sciencemuseum.org.uk/Centenary/Home/Icons/StephensonsRocket.aspx
http://www.makingthemodernworld.org.uk/icons_of_invention/technology/1820-1880/IC.007/
http://collectionsonline.nmsi.ac.uk/detail.php?t=objects&type=all&f=&s=1862-5&record=4

 

Babbage's Difference Engine No 2, 2000
(useful for the 'replica problem')
http://sciencemuseum.org.uk/objects/computing_and_data_processing/1992-556.aspx

 

Pilot ACE Computer
http://www.sciencemuseum.org.uk/Centenary/Home/Icons/Pilot_ACE_Computer.aspx
http://www.makingthemodernworld.org.uk/icons_of_invention/technology/1939-1968/IC.059/

 

What do we want to say?

Mia has already outlined some thoughts on what Science Museum linked data might look like. I suggest that we keep the discussion at this general level, and don't get into issues like delivery formats (e.g. microformats vs. RDFa vs. RDF/XML) or ontologies (e.g. FOAF vs. CIDOC/CRM vs. National Gallery home-grown). However, we may need to think a bit about where our Linked Data URLs are going to come from.

 

[Comment from Mia: actually, delivery formats would be useful - we may be able to implement something on Ingenious as part of a site migration, so helping us get started with actual markup would be brilliant!]

 

Identity

A key task will be assigning a unique, persistent identity to each object. Luckily, we already have a cultural acceptance that museum identity/accession/registration numbers should be unique within their collection/museum, that they should not change if at all possible, and that they should be marked on or attached to the object, so as to act as a link between the object and its documentation. That means it should be straightforward to generate a URL for each object by:

  • using an existing or registering a new domain for the museum, with a subdomain for each collection iff required for uniqueness
  • applying a conversion to the object's identity number so that it is URL-friendly (no spaces; no characters which require URL escaping) but still unique
  • sticking the two together

The object identity URL should make no attempt to include "useful information", because this will militate against the objective of permanence.

As well as acting as the subject of all Linked Data statements made by the museum, this published identity will act as a "peg" onto which others can "hang" their own statements about the object. ("Linked Data" is actually a slight misnomer: the point is that it becomes "Linkable Data", where many people use the same string/URL to mean the same thing: in this case your museum object. How and where the actual linking might happen is a separate issue, which we don't need to worry about at this stage.)

So, our sample objects could be given URLs like:

http://collections.nmsi.ac.uk/object/1952-73 (the orrery)

http://collections.nmsi.ac.uk/object/1862-5 ('Rocket')

http://collections.nmsi.ac.uk/object/1992-556 (the Difference Engine)

http://collections.nmsi.ac.uk/object/1956-152 (the ACE computer)

Note that no attempt is made to use existing domains or the URLs of existing web pages. It is probably a good idea to invent a (sub-)domain which has the specific purpose of acting as a "namespace" for object (and other) URLs.

 

[Comment from Joe: Just a thought but I think you could end up with three URLs, depending on how you implement it,

http://collections.nmsi.ac.uk/resource/1952-73 => the URI defined in a "database" (redirected to "page/1952-73" in the browser)
http://collections.nmsi.ac.uk/page/1952-73 => the human readable version of the related data
http://collections.nmsi.ac.uk/data/1952-73=> the actual XML data for the resource.

Also if you did remove the "object" bit from the URI, removing extra the information, you may then come up against a problem if you need to define a date range or time span of "1952-73".  Accession/inventory number composed of simple "year-number" can be problematic.]

 

[Richard again: true, but see next section.  The idea is that there is a principal URL, which stands for the museum object in an abstract way, and is then mapped to other URLs which deliver actual content, both human- and machine-processible. For the reasons you give, I would keep /object/ in the URL path.]

 

[Comment from Mia (doing my learning in public): I need to research it but my instinct is to use one base or principle URL - human-readable by default, but providing XML or JSON or whatever if requested through content negotiation or with a different extension e.g. 1956-152.html, 1956-152.json. Good idea, bad?]

 

[Richard: yes, there has to be a base URL, because other people need to be able to use one URL in their own Linked Data to mean, unambiguously, the object that you have identified using that URL. How you vary the URL to denote different physical formats is a matter of personal taste: my "put it in the path" approach (see below) and your "change the filetype extension" suggestion are both in real-life use out there.]

 

[Mia: any views from developers out there?  Is one easier to deal with than the other?  Does one method make the discoverability of structured data more likely?]

 

Content negotiation and web pages

Now we have some URLs, it would be nice to give them some meaning by making them "dereferenceable". The approved Linked Data approach involves "content negotiation", which is a standard but hitherto fairly obscure part of the HTTP protocol. It involves giving the web host some additional information via the HTTP "Accept" header; in effect stating what format you would like the results to come back in.

The fallback position (i.e. when there is no Accept header) is meant to be that you deliver a human-readable set of information, e.g. a normal web page. This can be achieved by setting up a 301 redirect ("moved permanently") from the object's Linked Data URL to its web page.  This brings the LD URLs into play in the real-life web world, but doesn't add any Linked Data value.

However, if you can set up this redirection, you can then add Linked Data value to the web page itself, by ensuring it is well-formed XHTML and includes microformat data or RDFa, and suddenly you have some Linked Data which is worthy of the name, without having done anything particularly difficult.

(If you do want to play the full content negotiation game, you would also set up "303 See Also" redirects for each machine-processible format that you support.  As Joe notes above, this could be implemented by including the name of the required format in the redirected URL, e.g.

http://collections.nmsi.ac.uk/object/rdfxml/1952-73

Then you would ensure that this URL delivers an RDF/XML response.  (This is also handy for users, who can then go direct to the specific URL that they require, and get the results they want without bothering with the content negotiation game.)

 

[Bill: I've generally implemented content negotiation via a web app that generates HTML or RDF docs on the fly from a database, but if you are using static files, you can also do it using Apache: http://httpd.apache.org/docs/2.0/content-negotiation.html though I must admit I've not done it myself]

 

[Jim: With dynamically generated HTML or RDF, you could do your content negotiation using Apache's mod_rewrite too. Rewrite rules based on a .html vs. .rdf extension in the URL would be straightforward to write. It's probably also worth noting that Rails, as a framework, gives you RESTful content negotiation for free.]

 

What can we say using the Linked Data approach?

At base, Linked Data consists of simple assertions, or statements, with a subject, and a predicate and an object, e.g.:

object 1952-73 is owned by NMSI

If we're aiming for the easiest possible solution to get us started, we would probably use this approach to issue a set of simple statements about the object.  Look at the dbpedia page for Berlin to get the general idea:

http://dbpedia.org/page/Berlin

(One weakness of this approach is that you can't correlate two related assertions.  On the Berlin page, note that the "population as of" date is a separate assertion from the actual populations quoted.  This means that you have no means of giving the population of Berlin at different dates, and indicating which population relates to which date. Another weakness is that you don't make any statement about the source of, or authority for, your assertions. Which is fine until someone else makes assertions about your objects, using the "peg" you have so helpfully provided. Anyway, lets proceed on the assumption that this is all OK.)

 

In Linked Data, you get most value by using URLs to stand for the things you are making statements about.  Thus, in our ownership statement above, it would be good to have a URL for NMSI. And even better if that URL was one which was used by everyone to mean "The National Museum of Science and Industry (U.K.)". Who should publish such URLs? Each museum could produce their own, of course, but in this context it would make more sense for Culture 24 to augment their existing directory offerings, by publishing URLs for every museum (and lots of associated Linked Data!) as part of its "sharing cultural data" agenda.

U.K. museums are within our domain, and so to some extent within our control. However, when you want to say:

object 1952-73 was made by John Rowley

and you want a URL for the person John Rowley, you are suddenly cast adrift on the wide ocean that is human history. At present, there are only authorities (and so the prospect of URLs) for handy subsets of the human race: artists (ULAN), "noteworthy Brits" (DNB), etc. At present, I think the only sensible way to tackle this issue is to mint your own URLs for the people referenced by your collections database, and publish those too, together with what you know about that person (also expressed as Linked Data, of course). If your database has a separate person authority file, it should be straightforward to do this:

http://collections.nmsi.ac.uk/person/45873

where 45873 is the primary key for (a particular) John Rowley in the NMSI authority file.

 

[Mia: that works for me.  In Science Museum linked data I'd suggested linking to our own subject authority 'person' record - from there we could link out to other authorities and ideally also enable people to manually add other cultural heritage references to the same person by suggesting links.]

 

So that deals with the subject and the object of our initial statement: what about the predicate ("is owned by")?

This is where I think that it is crucial that museums get their act together. I suggest that all museums will be saying much the same sorts of things about their collections objects (who owns/manages it, when it was made, what materials it is made of, etc.).  If they use the same set of properties, expressed as Linked Data URLs, then their collective efforts will have enormously more value as a Linked Data resource than if they were to each invent their own.  This is because you can query Linked Data on any aspect of the triple, including the predicate.  Thus you could do a search on "materials of which objects are made" without any knowledge of (or consistency between) the object identifiers or the materials terms/URLs used by different museums. It's not currently obvious to me what this "museum Linked Data" ontology should be. When I did this exercise a couple of years ago (for the Wordsworth Trust collection), I tended to use dbpedia properties, so as to link up our "museum" assertions with others outside our domain. However, since then, these dbpedia properties have all been changed! Maybe something stable and domain-specific would be better (though I would want the museum ontology to be able to "reach out" and cover general history, as well as museum-specific information). We need to look at frameworks such as the CIDOC-CRM Core Metadata Element Set and LIDO for guidance.

 

Of course, there will be data which can't be expressed as URLs. It may be a textual description, like the "abstract"s for Berlin. Or it may be actual data, such as an object's dimension or the population of a city. Or it may simply be a reference to a class or an entity (such as a material, a colour or an art-historical movement) for which you have no straightforward means of minting a URL within your environment. Don't worry about this: just publish the string value as you have it.  It is better to get the Linked Data out there, and plan to improve it in the future, than to hold back because it could have been better.

 

What can't we say using the Linked Data approach?

Linked Data deals in simple assertions, but if we want to "tell a story" we might want a richer format, such as a web page or a document. We can, however, use Linked Data assertions to say that a web page is "about" the object in question.

 

 

Richard Light

Comments (11)

Mia said

at 4:30 pm on Jul 16, 2010

I should point out that our accession numbers can also have / in them - it's been used for part numbers. e.g. 1986-1289/2 (http://collectionsonline.nmsi.ac.uk/detail.php?t=objects&type=all&f=&s=1986-1289%2F2&record=0)

They're also used in the IDs for technical files (e.g. T/1977-398), which could contain useful, if somewhat eccentrically recorded information.

Richard Light said

at 4:38 pm on Jul 16, 2010

Doesn't matter. The main requirements are that the URL should not require URLencoding, i.e. it consists of US-ASCII minus space:
http://www.w3schools.com/TAGS/ref_ascii.asp
... and that it is unique. The "linking" bit in Linked Data is just string matching on URLs. If you had multiple series of tech. files, you would need to add a prefix to distinguish between series.

eatyourgreens said

at 12:26 pm on Jul 22, 2010

Richard's Wordsworth Trust example reminds me that I once wrote a simple XML template for NMM objects, which still exists. It outputs properties of the catalogue record as unqualified Dublin Core (it's old enough to predate qualified DC). Here are a couple of examples:

An oil painting — http://www.nmm.ac.uk/collections/itemrdf.cfm?ID=BHC2827
A set of papers — http://www.nmm.ac.uk/collections/itemrdf.cfm?ID=FIS/43

I'd be interested in revisiting those templates - particularly how we could mint URLs to start linking these records up.

Jim

Richard Light said

at 3:49 pm on Jul 22, 2010

Right: there are three aspects to think about, starting from this format:

1. The object URL. That's the one URL you should mint, as it's under your control and it's your objects you are publishing. It would be nice if the object identifier could be expressed as something like http://collections.nmm.ac.uk/object/BHC2827, with a mod rewrite or some other redirection technique used to call up your existing script. (This identifier would of course then go in the RDF/XML.)
2. The predicates. DC is one possibility, but we all ought to agree on a single set of object properties and try to implement it. If NMSI, NMM, NG and WT all did it, it would be a good start!
3. The objects. This is where the real challenges start! Do you, for example, have a person authority file?

eatyourgreens said

at 4:18 pm on Jul 22, 2010

Would that URL scheme work when the object number contains a slash? eg. http://collections.nmm.ac.uk/object/FIS/43 which is, in fact, part of http://collections.nmm.ac.uk/object/FIS

Totally agree that we should agree on a single set of object properties. This applies not just to Linked Data but to also eg. exposing collections via YQL open tables — we don't want to be writing custom parsing code for each distinct collection, in my opinion.

We use Mimsy's people authority for the biographies of a limited number of artists in the paintings collection. Here's one:
http://www.nmm.ac.uk/collections/explore/people.cfm/id/163280
There's a full listing of that authority at
http://www.nmm.ac.uk/collections/feeds/authorityXML.cfm?authority=people

The authorities can also be used in YQL queries (although the syntax is clumsy due to the way our site's URLs work):
http://y.ahoo.it/eWIZ5w0t

eatyourgreens said

at 5:15 pm on Jul 22, 2010

Forgot to mention — that XML feed works for other authorities too.

Vessels: http://www.nmm.ac.uk/collections/feeds/authorityXML.cfm?authority=vessels
Sites (v. small number): http://www.nmm.ac.uk/collections/feeds/authorityXML.cfm?authority=site
Site subjects/categories: http://www.nmm.ac.uk/collections/feeds/authorityXML.cfm?authority=category

Each authority was set up for a specific collection or exhibition, so there's little or no overlap between, say, linked vessels and linked people.

eatyourgreens said

at 5:18 pm on Jul 22, 2010

And the generic URL for one of those authority entries on the NMM site is then
http://www.nmm.ac.uk/collections/explore/index.cfm/{authority}/{id}

eg. http://www.nmm.ac.uk/collections/explore/index.cfm/site/35

Mia said

at 7:10 pm on Jul 25, 2010

Ok, so how to agree on those properties? The attraction of existing standards has been that other people have presumably gone through the process of agreeing on shared properties and how to express them, but then choosing a standard is also a process in itself.

We use Mimsy for subject and people authorities and I'm hoping to publish those.

Andy Mabbett said

at 5:16 pm on Jul 22, 2010

What about Wikipedia as a source of definitive URLs for museums? For example: http://en.wikipedia.org/wiki/National_Museum_of_Science_and_Industry

Taxonomists are already using Wikipedia URLs in this way for species and other ranks. They lead to linked data, thanks to DBPedia.

eatyourgreens said

at 10:02 pm on Jul 22, 2010

Ambiguity might be a problem with wikipedia. I'm not sure, but from memory a bunch of different URLs all point to the Royal Observatory page http://en.wikipedia.org/wiki/Royal_Observatory,_Greenwich
I haven't looked at this in ages, but I seem to recall wikipedia doesn't handle redirects very well. That might be a problem if we want a distinct URL to represent a place.

eatyourgreens said

at 10:16 pm on Jul 22, 2010

For example, removing the comma from Royal Observatory, Greenwich generates a second URL representing the ROG http://en.wikipedia.org/wiki/Royal_Observatory_Greenwich My understanding is that different URLs would represent different places in Linked Data.

You don't have permission to comment on this page.