• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Finally, you can manage your Google Docs, uploads, and email attachments (plus Dropbox and Slack files) in one convenient place. Claim a free account, and in less than 2 minutes, Dokkio (from the makers of PBworks) can automatically organize your content for you.


LOD-LAM crowdsourcing, annotations and machine-learning

Page history last edited by Mia 10 years ago

Dave Lester, David Henry, Mia Ridge, Jane Mandelbaum, Romain Wenz, Asa Tourneau, SungHyuk Kim, Ingrid Mason, John Deck, William Gunn, Jane Hunter.


Question: machine or user annotations? Both, for different people.  Implicit and explicit annotations.


Using combinations of methods e.g. expert and non-expert tagging to train machine learning.

Trying to get data to the point where it's usable in programmatic ways - queryable, visualisations...


Issues around interpretation and the illusion of authority and changes over time.


['When you average the colour of a film you get brown' - summary of problems with data aggregation - mushes all that's special about a dataset?]


Impact of changes in academic practice and theory over time - what is the impact of paradigm shifts on already created data.


Can go from crowdsourced tags to an ontology if have large enough sample size.


Role of experts... difference between content creation and content validation. e.g. different editor roles in Wikipedia. But in Wikipedia, who made the edit is part of the interpretation of the correctness of the edit.  Fluency in the medium counts - people who are experts but don't know how to look trustworthy on Wikipedia.  Look to existing methods for verifying trustworthiness.


Implicit annotations - similar to recommender systems?  Who's reading what, sharing what?


ORE - bundling resources together - possible model?  Or FRBR?


Annotations - Mendeley looking at the highlighting and notes that people make on the desktop client?  Issues around who's looking at what and taking notes as potential clue to what individuals are working on - issues and fears around 'need' for secrecy about current research.  But fear can be mitigated by the benefits, what you get out of participating in the service. But maybe people would also want credit for making connections between different parts of (e.g. library) collections.


Showing recommendations as heat maps - aggregated visualisations - vs lists - named individuals and resources - way to deal with tension between fear of anonymity and idea stealing and wanting credit.


Providing digitisation tools like mounted cameras as bait with reward of getting digitised or catalogued content.


What about scholars who attempt to deliberately mislead? e.g. hide a certain resource to stop others using it (like undergrads hiding books they're using in assignments) ... determining reliability of collaborator - weighted signals.


Tentative principle for crowdsourcing: be generous in what you accept and strict in what you keep [at least in what you display].


Annotating tools also for use by internal specialists.


Record retention for provenance URIs for contributed content?  Questions about whether minting a URI for every statement is sustainable, scalable... Fit for purpose.

Comments (7)

Nick Poole said

at 12:02 am on Jun 3, 2011

Hi Mia, thanks for these notes, which give a really good flavour of the direction of the conversations today. I can't be at the event, but I wanted to post a question and ask participants for their views. The question really hinges on the role of Collections Management Systems in integrating Linked Open Data into the workflows of people who work with Collections.

It has been my view for a while that demonstrating the value of Linked Open Data depends on critical mass, and that achieving critical mass depends both on making semantic interoperability an 'out-of-the-box' function of Management Systems (in the way that I know MODES has already been pursuing for some time) and on delivering value back to the people (crowd and expert) who will need to draw the connections and do the tagging.

Apart from the twitch motivation of gaming, the long-term motivating factor is likely to be the extent to which LOD makes someone's job easier, rather than harder. To me, this means having cataloguing systems that use LOD to create dynamic and emergent references which support the cataloguer in their work. Imagine, for example, a visual interface which draws references out of LOD-space to enrich a cataloguer's description of an industrial artefact.

Experience shows, though, that the sector is relatively slow to demand even simple interoperability such as OAI targets from their Collections Management Systems (even though most vendors offer them). Can the people participating in the LOD-LAM event think of a business case or value proposition which would encourage their less LOD-aware peers to demand LOD-capabilities from their management systems?

Mia said

at 1:36 am on Jun 3, 2011

A quick note - publishing and consuming linked open data are two different things, I don't think they have to be tied together for one or the other to happen? You might need a critical mass of usable published LOD to start to build interfaces that consume it for collections documentation tasks...

Richard Light said

at 12:47 pm on Jun 3, 2011

Publishing and consuming should definitely be separate, and I think we need to break away from the pattern where museums dev staff create a unique API, use it themselves, and claim that they have thereby helped other people.

Separately-published Linked Data resources will only be useful if they are identically "visible" (in terms of collections-level metadata and licensing terms) and "structured" (in terms of the RDF patterns they contain), so that spidering them gives an aggregation which is greater than the sum of its parts. Ideally this homogeneity should apply across all of ALM, not just e.g. museums.

At the recent UK Discovery launch I was concerned that the projects being proposed were like Disneyland attractions - but with no thought being given to the non-existence of the infrastructure on which these castles in the air would be built.

Nick Poole said

at 5:30 pm on Jun 3, 2011

Hi Richard, many thanks for this. To be fair, the Discovery event was more of a game to get people talking across sectors than an actual scoping exercise. I thought it was very revealing in some interesting ways, but the main point was the social interaction. I was interested to see your response and Mia's about the difference between consuming and publishing LOD. I am not sure they *can* be entirely disentangled - or at least, if we're going to expect people to create LOD, then we have to give them something in return, otherwise they won't do it.

Like Richard, I have an aversion to techno-utopias. I have personally been through the 'hype, expectation, failure' cycle with too many technologies in the past 10 years and I haven't yet heard much from the LOD world about how it will solve the human psychological/motivational, business case and inertia issues which have confronted variously Z39.50, OAI, RSS and many, many others! In the case of LOD, the 'castle built on air' is the issue of connecting the technological potential with the real immediate institutional and personal behaviours which will define its utility. Persuading businesses to go 'Cloud' has meant demonstrating how the Cloud delivers to their bottom-line. Similarly, for LOD to become a core product (or to fulfil a use case) for cultural institutions, surely we have to show how it will help us deliver cultural value with less money?

Richard Light said

at 8:24 pm on Jun 3, 2011

One obvious way in which institutions can get value from using LOD internally is to have access, for free, to all the additional information which is lurking at the far end of the dereferenceable URL which they have just implanted in their catalogue record. For example, a Geonames URL will give you lat/long, population, etc. for the place in question.

However, this assumes (a) that the publisher of said URL provides a reliable enough service for dereferencing it (or that local cacheing removes the need for real-time lookup); (b) that the institution has an information-processing food chain which can digest the Linked Data it gets back from such lookups and (c) that it has some use for said information, either for publishing or for internal added-value. As regards the digestion issue, I have been experimenting with good old XSLT and its document() function. All I needed to make this work was a proxy service which takes an HTTP request and adds the "Accept: application/rdf+xml" header to it. With this I can dereference Linked Data URLs and get back an RDF/XML document to do with as I wish, as part of my XSLT transform. The returned RDFXML can (will) include further URLs, so I can explore the Linked Data, all within my XSLT transform.

However, I suspect that many web publishing frameworks will find it hard/impossible to digest Linked Data in this way - PHP for example has enough problems with XML in any form ...

Mia said

at 6:41 pm on Jun 4, 2011

Rewarding people for effort is a basic design pattern, isn't it? Even if museums or GLAM organisations more widely and tech evangelists aren't used to applying it.

And yes, demonstrating the benefit of publishing structured data is vital but again that requires a critical mass of published data which requires people to be convinced it's worth publishing it - I think there are various 'chicken or the egg' discussions elsewhere on the wiki.

To return to the question, 'Can the people participating in the LOD-LAM event think of a business case or value proposition which would encourage their less LOD-aware peers to demand LOD-capabilities from their management systems?' - I'm not sure that's the right question.

Some vendors are great but most systems can't even publish a stable URI for an object - IMO we'd get more benefit working at the aggregated level than asking individual orgs to negotiate something they don't understand with their CMS vendor.

There was discussion elsewhere at LOD-LAM about building LOD publishing layers as a stack on top of an org's collection management system, but for the UK and Europe we might have skipped that requirement if services like the CultureGrid and Europeana can publish LOD on behalf of contributing organisations. If only we knew someone who worked with the Culture Grid...

It's then easier for people to build compelling services for internal use and external audiences based on that published LOD-LAM when they only have to deal with the idiosyncrasies of one implementation (or a small set of them offering access to all the data they'd want).

Mia said

at 7:09 pm on Jun 4, 2011

One issue in all this is that while the collecting institutions pay the collections management vendors, the real clients for publication services like OAI-PMH are the aggregators... Is that conversation joined-up enough?

You don't have permission to comment on this page.