| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Finally, you can manage your Google Docs, uploads, and email attachments (plus Dropbox and Slack files) in one convenient place. Claim a free account, and in less than 2 minutes, Dokkio (from the makers of PBworks) can automatically organize your content for you.

View
 

Authority Lists

Page history last edited by Mia 7 years, 3 months ago Saved with comment

Older notes

Stephen Pope started this really useful discussion:

 

One subject I think would be a good area for collaberation across museums is something I call 'Authority Lists' ( if someone has a better/more official name please let me know :D )

The idea is that there are certain collections of data that must be common across all collection databases .. while these lists are likely to be fairly simple they provide a quick way to get consistency across various data feeds. If I know that when I get object data from the BM or SCIM I can confidently match a place or a time period because they are using an agreed authority list then i can start to pin together different data feeds with a good level of assurance.

 

Common lists I can think of (please add more if you think of them) 

 

  • Places/Locations (Ancient and Modern) (Example schema)
  • Time Periods (start + end dates)
  • Materials
  • Colours
  • Units of Measurement
  • Languages

 

Uncommon lists (I'm being a bit cheeky, but I think it's easy to not realise what's a heavily contextualise or contested list)

  • People
  • Events
  • Organisations
  • Topics Themes

 

This also means that once these are in place extra supporting data can be added to enrich existsing sets as people see fit e.g. adding lat / long information to places

 

Questions

 

What would it take to get such lists agreed between museums / curators ? I would have thought things like time periods would be fairly universal and not really up for discussion and could be agreed fairly quickly ( I could of course be wildy wrong ) 

 

Sharing of lists - obviously these lists could exist as an api/web service themselves for quick lookup but it would also need the ability to be downloaded the whole as a set for processing into a DB etc. Would CSV / XML / JSON be ok for this ? 

 

The idea of shared list points to the fact of a central collaborative area that anyone from any museum could go to and access .. updates to the list would need to be change controlled correctly if people are relying on this data to be accurate. This means a small commitment of time from people in various museums, is this possible ? Hopefully these lists wouldnt change / need to be updated very often and could even be scheduled for something like a yearly refresh.

 

Lists by museum

NMSI term lists (Science Museum, National Railway Museum, National Media Museum, UK)

 

Comments (27)

Ian Ibbotson said

at 9:59 am on Mar 19, 2009

One thing that might be worth bearing in mind is that if we can find a way to transform between whatever representation is selected and ZThes or one of the other standard thes exchange formats then there are already tools capable of both (a) improving structured metadata based on the presence of more informal names (Temporal names being a particular problem) and (b) search and query for alternate names (EG "WW2", "Neolithic", "Victorian") etc. Structured authoritiy lists can be a great way to enrich a collection that might not have the most descriptive metadata. I think MLA are currently interested in the area and related web services. Really interested to contribute to any activity in this area.

Ian Ibbotson said

at 10:02 am on Mar 19, 2009

Ooh sorry so comment twice, I was also talking to James B from rattle about ways to use their "Muddy Boots" system to do this kind of enrichment also. They can use dbpedia structures to make links between text and controlled names. I wonder if wiki structures (With narrative text, and associated dbpedia entries) are a good way to compose these authority lists. Since the technology already exists to move between the wiki format and more structured data representations it might be an interesting approach to crowdsourcing the authority data?

jeremy said

at 10:59 am on Mar 19, 2009

I'm playing with Dapper and Pipes at the moment to convert the BM Materials Thesaurus to structured data/API. There are a bunch of other thesauri on the CT and English Heritage sites that are ripe for similar ad hoc APIfication. I'll let you know how I get on.
On the subject of Dapper, I've also used it to turn a great EH site into an API for geophysical survey results. The site dates back to 1995, and it's lovely to be able to breathe new life into it as an API so that I can hook in our own site summaries etc.

[mRg] said

at 1:49 pm on Mar 19, 2009

Ian: I know Mia has a dialogue going with the OpenCalais guys regarding the possible idea of processing data within text using the Calais engine too. They too have asked the question about authoritive lists of data they could work with.

Nate Solas said

at 3:45 pm on Mar 19, 2009

If you haven't read it already, the paper from last year's MW08 on the Delphi Toolkit talks about doing some pseudo NLP to extract facets for browsing. I downloaded the toolkit last year and turned it on some of our metadata and had some decent success. Might be useful to help reduce alternate names, as Ian said: "Neolithic", "Victorian", but also can expand some terms.

http://www.archimuse.com/mw2008/papers/schmitz/schmitz.html

Frankie Roberto said

at 4:11 pm on Mar 19, 2009

The problem with Authority Lists is the word 'authority'. :-) Which assumes from the outset that someone needs to 'decide' what goes on and off the list. I realise, of course, that it can a bit more nuanced than this, but there is a distinction between an entity list which is supposed to represent some kind of 'facts' of the world, and entity lists which represent the far more fuzzy notions that people hold in their head. ie are we after something that works well for museums, or works well for users?

If the latter, then emergent lists are a good place to start, such as tagging and tag clusters, Wikipedia/DBpedia pages (which is what we've focused on with Muddy), and natural language parsing of terms that people are actually using.

Even the simplest sounding lists can turn out to be hugely contestable and political. There are at least a dozen different criteria for coming up with a list of 'Countries' for example, and none of these match perfectly to the countries that people perceive of (not least because different people have different perceptions).

Mia said

at 6:47 pm on Mar 23, 2009

"Even the simplest sounding lists can turn out to be hugely contestable and political." - hugely important, IMO.

And one reason why sharing authority lists (if you regard them as 'a definitive list of terms for x') is possibly really difficult - but sharing 'entity lists' (a list of people, places, periods, etc, you might have information on, even if that information might vary hugely between organisations) is more possible.

And as Frankie says,there's the whole issue of emergent lists...

Frankie Roberto said

at 7:12 pm on Mar 23, 2009

It's worth reviewing existing standards/entity lists before embarking upon creating/sharing your own too. Here's an example of the ones:

*Countries*
The ISO maintain a list called <a href="http://en.wikipedia.org/wiki/ISO_3166-1">3166-1</a>, which is available as alpha-2 (two letter), alpha-3 (three letter) and numerical versions. The numerical version has the advantage of being slightly more stable (sometimes the alpha codes change if a country changes its name), but is less readable, and less widely implemented. The alpha-2 one is the most common version, and thus most inter-operable. Overall, using the list has the advantage of being seen to be more neutral, as it is maintained by an international body and has a stable criteria for inclusion. It has the disadvantage of not being very reactive to changes (takes a while for a newly-declared country to be assigned a code), plus it doesn't always map well to people's mental models of countries. For instance, 'United States Minor Outlying Islands' is listed as a country, but really it is only officially a country for statistical reasons, and in reality has no collective identity (nor any indigenous population). There are a few other examples of this.

In addition, the list is only representative of 'current' countries. So if your museum holds Egyptian collections, you can't associate these with any country on the list.

Paul Walk said

at 9:37 pm on Mar 23, 2009

One of the more problematic aspects of authority or 'lookup' lists in this space is going to be the extent to which such lists are static. As Frankie points out - a list of current countries is not especially useful for collections based on ancient civilisations. Actually, this is a lesser problem, because you wouldn't even try to use such a list in this context. A more insidious problem is that a list of 'current' countries will change - just in recent times, we've lost Yugoslavia, East/West Germany and gained a whole bunch of Baltic states.

jeremy said

at 11:47 am on Mar 24, 2009

It's true of course that all sorts of terms are up for negotiation, but I think Frankie's more important question was "are we after something that works well for museums, or works well for users?" Well, obviously the answer is both, but the fact that something "works" for a museum doesnb't mean it won't for users. We don't need to use certain technical terms at the front end (though that's a question for each individual project, I guess) but we may still find them useful at the back end, to hook up related resources. Since many museums (hopefully all) are using term lists and thesauri of one sort or another we may as well try to relate these where possible - taking into account concerns over their variation over time. Data from users (tags, search terms, NLPed Wikipedia entries or whatever) would be as important, perhaps, but the challenge may be to bridge the gap between the formal terms used in our databases and these informal ones.
If we wanted to hook up data sources in this way we'd want museums to report which authorities they used for which fields, in the API response to a collection query (or whatever API uses an authority). Just as above, with informal "term lists", the biggest challenge may be bridging different authorities (the old SW problem of linking ontologies).
In terms of making a start on this, I've got a little further with my games with the BM Materials Thesaurus. Read a bit here: http://doofercall.blogspot.com/2009/03/playing-with-skos.html. In short, I've made what I'm calling a SKOS/RDF file, though doubtless it's not really, and put a Pipe on the front end of it to query by term, which is good for JSON but not RSS. The SKOS has faults and it's all a bit shaky but enough for me to feel it's worth pursuing with other thesauri.

jeremy said

at 11:47 am on Mar 24, 2009

[pt 2 of a long comment...]
What's not in the blog post is that I got a crude navigable view onto the JSON here: http://www.ottevanger.plus.com/skos/skos_pipe.html. It doesn't work properly in Mozilla (which won't reload the script when you click a link) and I'm a JSON noob so I couldn't get it to go through the array of narrower parameters and turn them into links but hey, it's a start.
Mia, your question about periods at the start: they may be agreed, sort of, but they vary geographically and, as you know from bitter experience, with archaeology you may find an things dated differently by different methods - so I guess periods may be dated differently in one place.
So, whilst taking it as read that we're also interested in informal terms and semi-formal sources like Wikipedia, shall we put together a list of the formal word lists, thesauri and authorities that we know of, and come up with a plan of action for APIfying them? (those that aren't already machine-friendly) And of course a list of those that are already out there as APIs.

Mia said

at 6:19 pm on Mar 27, 2009

There are two issues that we need to work with - lists of terms might be contingent, and the definitions of those terms (e.g. periods) might also be contingent.

What's the best way to involve collections people in this work? I'm not sure just pointing people at this will help - so what would? Do we need specific questions or issues, or to translate our aims into documentation/collections friendly vocabulary?

Mia said

at 8:17 pm on Mar 27, 2009

Speak of the devil, this might help: http://museumtwo.blogspot.com/2009/03/guest-post-language-matters.html

"This guest post was written by Koven J. Smith, who makes technology things at the Metropolitan Museum of Art. He writes about the lessons he's learned about how to talk tech with curators, educators, and other museum beings who don't know what "OCR" means (like me)."

Mia said

at 4:36 pm on May 9, 2009

British Museum Materials Thesaurus: http://www.mda.org.uk/bmmat/matintro.htm

jeremy said

at 8:03 pm on May 10, 2009

re BMMT see previous comment. You can get it as SKOS RDF here: http://www.ottevanger.plus.com/xml/BMMT_skos_all.xml and you can query the SKOS via a Pipe to get related terms, though there's more that could be done with this. Here's a page for a basic query of it: http://www.ottevanger.plus.com/skos/skos_pipe.html

jeremy said

at 8:28 pm on May 10, 2009

PS there are errors in the SKOS that have crept in along the various translations via Dapper, XLST and so on, so don't take it as gospel, but it is a start. The others on the MDA/CT site generally look harder to do but I learned lessons from this one and I'm sure we can scrape them too. IP issues aside....

Mia said

at 11:59 am on May 15, 2009

Sorry Jeremy, that's what I get for doing a hit and run link dump!

Can you take us back a step and talk us through the end goals of your experiments with skos and the materials thesaurus?

jeremy said

at 1:39 pm on May 15, 2009

no problem :)
this damn thing lost my last attempt to write this. Curses! Anyway, basically I thought I'd like to enhance our collections online stuff with thesauri that aren't in our CollMS - even if they aren't what was used for documentation they could provide useful cues/clues for expanded and narrower searches, used behind the scenes. I can't find much machine-friendly stuff out there so thought I'd APIfy some machine-unfriendly stuff, BMMT being easiest (last night I realised the BM Object thesaurus is the same format so I'll do that one tonight, if poss). I tried Dapper into Pipes but bringing together lots of pages into one pipe is horrible and slow. Better to compile the Dapp results into one file and search that. But it may as well be in a well-known terminology interchange format, right? SKOS was easy enough to have a stab at, and the RDF-XML version was documented somewhere or other so I transformed the Dapp output to that and now the Pipe is much easier. You can YQL it too, of course, which is better!
In the end I envisage being able to, for example, narrow your search parameters or expand materials mentioned on an object information page. I'll let you know if and when I do this. We need lots more like this though - I especially want to see that period/event chronology one come into existence, likewise one that includes lost places (Where Not On Earth ID). One dream I'll have to wait a while for, I think.

jeremy said

at 12:42 am on May 16, 2009

OK first pass at it here http://www.ottevanger.plus.com/xml/BMOT_skos_all.xml though previous caveats apply: there are errors. Forgot to say, for YQL there's and Open Data Table definition XML file for the materials thesaurus here: http://www.ottevanger.plus.com/skos/skosODT.xml. Here's one for the BMOT too but I've not tried it yet...http://www.ottevanger.plus.com/skos/skos_BMOT_ODT.xml

Mia said

at 3:32 pm on May 22, 2009

WNOEIDs - I like it! We had a question recently "is 'space' allowed for 'where used' location in Mimsy?".

Paul Rowe said

at 11:20 pm on Feb 11, 2010

The Getty made their thesauri accessible through a web services API last year. It's available through on a licence basis: <a href="http://www.getty.edu/research/conducting_research/vocabularies/vocab_api_announcement.pdf">http://www.getty.edu/research/conducting_research/vocabularies/vocab_api_announcement.pdf</a>

I think there's merit in both free form tags (users can easily add terms that make sense to them) and more formal, structured lists (more consistency in terms used). Is there a way we can get a broader range of users to help organise and connect both systems. e.g. Expose social and traditional authority lists through a website where anyone can mark terms which have a connection. The more users that make a connection between two terms (both within their existing system and between social/formal realm), then the stronger the implied connection. Having a better understanding of links between various descriptive terms could open up more flexible naming and searching functions.

We're also seeing some ways to allow more structured data within previously free form tag/term fields, such Flickr's <a href="http://www.flickr.com/groups/api/discuss/72157594497877875/">triple tags</a> and DigitalNZ's plans for user enhanced metadata using a namespace:field convention similar to Flickr's triple tagging.

Mia said

at 4:10 pm on Feb 14, 2010

Aaron's running a workshop on machine tags at MW2010 (http://www.archimuse.com/mw2010/abstracts/prg_335002366.html) so it might come up there.

We've started using machine tags to eventually link objects mentioned in blog posts back to our collections online pages e.g. http://www.flickr.com/photos/sciencemuseum/tags/num%3AScienceMuseum%3D19811886/ and more generally http://www.flickr.com/search/?q=num%3AScienceMuseum%3D*&w=all

I haven't had time to build a service that would query the Flickr API for the images so they can be displayed on our collections pages, but that's the general idea. I haven't thought of doing the same for user images - I guess there might be lots of some of the more glamorous objects on display, though we might also find people tagging up less obvious objects if we explain what the tags are and how we'd use them.

I'm really interested in your suggestion about a service/website where people could connect terms.

And can you tell us more about DigitalNZ's use of machine tags for user-enhanced metadata?

Mia said

at 4:44 pm on Feb 14, 2010

I wish I knew more about SKOS because after a bit of a hunt it looks like there's some useful work on 'vocabulary alignment' e.g. "Combining Vocabulary Alignment Techniques" by Anna Tordai, Jacco Van, Guus Schreiber http://www.cs.vu.nl/~guus/papers/Tordai09a.pdf and "Integrated access to cultural heritage resources through representation and alignment of controlled vocabularies" http://www.emeraldinsight.com/Insight/viewContainer.do;containerType=Issue&containerId=6012769

Paul Rowe said

at 10:15 pm on Feb 14, 2010

DigitalNZ outlined its plans for what it's calling User Enhanced Metadata at a workshop before the National Digital Forum conference (Nov 2009).

I couldn't find anything published on the web about this yet, but they gave the example of the education community in NZ tagging material with age ratings within their own tag namespace. Educators could then pull out material from DigitalNZ's repository that is aimed at a particular age group using tags that have been more consistently applied. My understanding was that some namespaces could only be updated (and perhaps only viewed) by certain users. This gave other projects a ready-to-use architecture for extending the DigitalNZ data, rather than by creating yet more separate copies of the records.

Mia said

at 10:37 pm on May 16, 2011

There's a useful list from Europeana at http://europeanalabs.eu/wiki/WP12Vocabularies

M. Schwendener said

at 11:13 pm on Jan 13, 2016

Not sure if this is any help, but there's http://bartoc.org : "A multilingual, interdisciplinary Terminology Registry for Knowledge Organization Systems"

Mia said

at 11:27 pm on Jan 13, 2016

Thanks for the suggestion!

You don't have permission to comment on this page.