LOD-LAM crowdsourcing, annotations and machine-learning


Dave Lester, David Henry, Mia Ridge, Jane Mandelbaum, Romain Wenz, Asa Tourneau, SungHyuk Kim, Ingrid Mason, John Deck, William Gunn, Jane Hunter.

 

Question: machine or user annotations? Both, for different people.  Implicit and explicit annotations.

 

Using combinations of methods e.g. expert and non-expert tagging to train machine learning.

Trying to get data to the point where it's usable in programmatic ways - queryable, visualisations...

 

Issues around interpretation and the illusion of authority and changes over time.

 

['When you average the colour of a film you get brown' - summary of problems with data aggregation - mushes all that's special about a dataset?]

 

Impact of changes in academic practice and theory over time - what is the impact of paradigm shifts on already created data.

 

Can go from crowdsourced tags to an ontology if have large enough sample size.

 

Role of experts... difference between content creation and content validation. e.g. different editor roles in Wikipedia. But in Wikipedia, who made the edit is part of the interpretation of the correctness of the edit.  Fluency in the medium counts - people who are experts but don't know how to look trustworthy on Wikipedia.  Look to existing methods for verifying trustworthiness.

 

Implicit annotations - similar to recommender systems?  Who's reading what, sharing what?

 

ORE - bundling resources together - possible model?  Or FRBR?

 

Annotations - Mendeley looking at the highlighting and notes that people make on the desktop client?  Issues around who's looking at what and taking notes as potential clue to what individuals are working on - issues and fears around 'need' for secrecy about current research.  But fear can be mitigated by the benefits, what you get out of participating in the service. But maybe people would also want credit for making connections between different parts of (e.g. library) collections.

 

Showing recommendations as heat maps - aggregated visualisations - vs lists - named individuals and resources - way to deal with tension between fear of anonymity and idea stealing and wanting credit.

 

Providing digitisation tools like mounted cameras as bait with reward of getting digitised or catalogued content.

 

What about scholars who attempt to deliberately mislead? e.g. hide a certain resource to stop others using it (like undergrads hiding books they're using in assignments) ... determining reliability of collaborator - weighted signals.

 

Tentative principle for crowdsourcing: be generous in what you accept and strict in what you keep [at least in what you display].

 

Annotating tools also for use by internal specialists.

 

Record retention for provenance URIs for contributed content?  Questions about whether minting a URI for every statement is sustainable, scalable... Fit for purpose.