If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Implementation formats

Page history last edited by Mia 14 years, 1 month ago

JSON, REST, RDFa, XML, RSS2, ATOM, ATOMPub, OpenSearch, OWL, SRU, csv, Dublin Core plus - what's your poison?

More specifically, what's your institution a) producing and b) consuming? Or, as a developer, what's easiest for you to use?

Tom's made a useful distinction between formats for modelling and for outputs in the comments below. This might resolve some of the 'heavyweight vs lightweight' issues - go as intense as you like on the backend, but start with lightweight outputs on the frontend.

Random resources on best practice, choosing formats, etc

Common RDFa publishing mistakes

"This page contains a number of RDFa publishing mistakes that web authors make when writing RDFa web pages."

Versioning APIs

Twitter: "We've taken the first steps toward introducing versioning into the Twitter REST API. With a versioned API we can make ambitious improvements *today*, not tomorrow, without worrying about breaking backwards compatibility. This will lead to both a better and more reliable API."

http://groups.google.com/group/twitter-api-announce/browse_thread/thread/2b70bd6ea4aec175?pli=1

Comments (15)

That should cover it ;-)

Depends on how you anticipate data being used. If it's likely to be used in javascript-driven web apps, then JSON is great. Atom/RSS is great if the data can also be usefully read in a feed reader. XML is good for just generally being well implemented in libraries. REST is an architecture which can encompass all of the above.

I know I bang about this a lot :-) but I think it is absolutely essential that this stuff is easy, otherwise we just won't ever see the kind of widespread growth (or growth potential) that we've seen on the rest of the intertubes. So my vote is REST as transport architecture, OpenSearch...etc (the usual suspects) - and all possible formats for data - JSON/RSS/XML. In my experience, JSON is seeing widespread uptake but is javascript-app focused as Frankie says. Easy RSS or XML formats make it trivial for bad coders like me to wrap content into apps.

In general, I think we need to push for simplicity over perfection - I'd rather see a shoehorning of data types/fields into a simple schema than a "better" and more complete one.

Personally, I think both RDF and RDFa are too heavy. But that's just me.

Btw, I'm just dumping links as I come across them - this is an 'as and when' project at the moment, so the randomness of links isn't an indication of decisions made!

I think we need to allow for scripters, tinkerers, programmers, and specialists - implementation of technologies is only competitive in terms of the resources required to actually do them.

I think the best thing to do is to model it in OWL and then implement in RDFa + microformats. Resource Description is what we are doing, and so RDF(a) is ideal. JSON representation and RSS/Atom interchange is a nice bonus. The RDF community already has chunks of what's required in place: FOAF to describe people and organisations, Dublin Core to describe works, SKOS to describe the conceptual classification - to link exhibitable objects to concepts (the orrery depicted in the Joseph Wright painnting probably ought to be linked to concepts like "natural philosophy", "the industrial revolution", "the Enlightenment" and "the moon") - and also the Tag schema.

I had a think about it and this is what I came up with. Three main classes should be able to do it: an ExhibitableObject, and Exhibition and organizations. The ExhibitableObject has Dublin Core metadata about it, can point to photographic depictions and so on. The ExhibitableObject is exhibited in any number of Exhibitions, which are located spatio-temporally, and are hosted by an organization (as defined by foaf:Organization). Subclasses of foaf:Organization could give you Museums, Galleries, Libraries, shopping centres. Anything really. ExhibitableObjects could be owned by any Agent - that is, any Person or Organization or Group or whatever - and they could be on permanent display at any location. Exhibitions become like foaf:Projects - they can be funded by other agents.

I'm not sure about ExhibitableObject. I don't like the term "object" because it's synonymous with "thing" and there are plenty of owl:Things which aren't the sort of thing with which museums, galleries, archives and libraries concern themselves. They concern themselves with objects which are exhibitable. I thought CulturalArtefact or CulturalObject might nail the concept down better. Oh well, the words might not denote what I mean, but it's getting close.

(Continued in next comment...)

Of course, the neat thing with separating out these three things is that each museum or gallery would be able to have a page about the exhibition generally - say, the collection of Picassos being shown at the Tate from such date to such date. But you can also say that *this individual painting* is usually on show at another place.

Another thing that would need to be represented would be going from ExhibitableObjects to sioc:Posts - so that one can point to media reaction: blog posts, videos etc. Similarly, as authored works, one might want to state that one piece inspired another (for art and literary works), or that one work is made up of multiple parts - for instance, a scientist's laboratory might contain, say, the microscope used in some important work that is held in somewhere like the Science Museum, but also contains a first-edition of some work which is being displayed by the British Library.

Another use case: take something like the Dead Sea Scrolls or other Biblical manuscripts. It'd be useful to be able to point from the objects themselves to transcriptions and translations of the content of the objects. For texts, it would certainly be interesting to do such a linking. Similarly, one might see an early wax cylinder recording from the Victorian era and want to hear a recording of it. We should be able to represent that.

I might draft some OWL in Notation 3 over the weekend and stick it up on Github.

We can't be all things to all people (at least not on day 1!), so individual projects will need to prioritise their APIs - both in terms of technical format and number of methods exposed. For eHive we're starting with a simple REST API returning different formats of XML. There is a lot of potential just with the basics (search method, retrieve record method). e.g. Find photos of beards, return data include dates and places, plot on a map with a timeline (as per the Finland project at MW2009). Ideally, extensions to add other API formats or adding methods would be driven by demand.

Other methods we've had requests for are: OAI-PHM/XML, RSS.

For HTML we have museum organisation and object related modules for Joomla (connecting in turn to the eHive server) which we're thinking of opening up for others to play with.

Is the OAI-PHM request from another institution or an individual? ie. machine-to-machine or machine-to-interface?

What are the object related modules? Will you expose subject authorities/information records?

OAI-PHM is the harvesting method of choice for Digital NZ. We need to support it so that NZMuseums data can be harvested (machine-to-machine) for other DNZ projects. OAI-PHM has also come up with Europeana, where it looks like they'll use as a key method for capturing new data in the future.

At present our Joomla interface modules for object data cover searching, browsing of results in several formats, adding/removing tags, tag clouds, comments logs and a toolbox of links to more specific functionality (reports, data exports, sorting). Object record searching/results would be exposed through the API. Our authority structure is simple at present - just a straight list of terms. We'll consider opening access to the authority files through an API once the structure is more complete (e.g. hierarchical support). Support of hierarchical authorities was surprisingly low on the list of things that small museums were looking for in our original focus groups.

You should try mentioning OAI in the UK, people run screaming. Some of it goes back to harvesting v distributed searches and fear of the Ontology of Everything, but I'm not sure that entirely explains the resistance.

Are you able to post any of your authority lists? I've put some of the Science Museum/NMSI authorities lists online at http://museum-api.pbworks.com/NMSI-term-lists

Not an implementation format, but possibly useful for models of um, modelling: http://www.vrafoundation.org/ccoweb/index.htm

Also the readwriteweb on Yahoo's YQL: http://www.readwriteweb.com/archives/theres_a_great_amount_of.php

Because the data for NZMuseums is primarily highlights of each of the contributors collections, the authority terms present are just small subsets of the originals. The three standard authority lists that have come up in the data are the Art & Architecture thesaurus, Chenall's Revised Nomenclature, and the Taonga Maori Thesaurus (a NZ project to develop terminology for Maori artefacts). All three of these are licensed through the thesaurus copyright holder (Getty Research, American Assoc. State & Local History, Hawke's Bay Cultural Trust) so we can't expose the terms to the public without negotiating a licence with the coyright holder first. For these three the situation is:
- AAT - can purchase a site licence but the cost it too high at present for us to justify this
- Nomenclature - thesaurus currently undergoing a major revision with possible new licensing options
- Taonga Maori Thesaurus - we have a non-commercial use licence for use with the website but don't have the rights to publish this through an API for use elsewhere

We will be exposing the tag cloud through the API, so this will give a snapshot of the most popular folksonomy terms.

Paul - sounds like a good solution, given the realities of the situation!

Another 'comment and run' dash - "We're pleased to announce that Nature.com now has an OAI-PMH interface." http://blogs.nature.com/wp/nascent/2009/05/a_catalog_for_naturecom.html

Revisiting this page post-MW2011, it's interesting that copyright for thesauri is still an issue.

Also it's interesting to note that lightweight, semi-structured formats seem to be making an appearance - CSV turned out to be the best quick-and-dirty solution for the release of Science Museum/NMSI data http://sciencemuseumdiscovery.com/blogs/museumdev/ http://api.sciencemuseum.org.uk/documentation/collections/ , and Europeana used OpenSearch http://europeanalabs.eu/wiki/EuropeanaOpenSearchAPI

You don't have permission to comment on this page.

Implementation formats

Random resources on best practice, choosing formats, etc

Versioning APIs

Implementation formats

Page Tools

Insert links

Comments (15)

Paul Walk said

Frankie Roberto said

Mike said

Mia said

Tom Morris said

Tom Morris said

Paul Rowe said

Mia said

Paul Rowe said

Mia said

Mia said

Paul Rowe said

Mia said

Mia said

Mia said

Join this workspace

Navigator

SideBar

Current discussions and resources

Museum, gallery, library and archive APIs and machine-readable data sources for open cultural data

Archived pages

Recent Activity