• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Stop wasting time looking for files and revisions. Connect your Gmail, DriveDropbox, and Slack accounts and in less than 2 minutes, Dokkio will automatically organize all your file attachments. Learn more and claim your free account.


LOD-LAM Messy data and same-as

This version was saved 9 years, 3 months ago View current version     Page history
Saved by Mia
on June 2, 2011 at 7:59:51 pm

Attendees: Mia Ridge, Shawn, David Henry, Rob Warren, John Deck, Eric Kansa, Asa Letourneau, Lisa Dawn Colvin, Gerry Parsons


Intros round: issues around data that's beyond messy - some data is unknown, lots of ambiguity, lots of gaps in 'authoritative' data, lots of incompleteness, what are the semantics of 'same-as', separating facts from policies.


First question - how clean does data have to be before you move forward? If it's the best information you have, can you still publish it?.


Learning to live with messiness: making it ok for things to be messy because otherwise only a tiny percentage of data will every be released.


What do you do when data doesn't fit the model - there's a gap in what currently fits into the schema and what's too messy.


Finding ways to manage ambiguity - ways to store information as it's resolved.


Finding right point to assert certainty in data and how to mark things as unknown/messy.


Feeding corrections back - have to think through lifecycle of data - not just putting it out there but being able to ingest any improvements back in. [Action point - recommendations or models for different LAM domains?] - otherwise lost opportunities (again, theme from crowdsourcing session).  Which leads to managing provenance of data or corrections (again)...


Using something like Freebase as way of linking information in different repositories...

Comments (0)

You don't have permission to comment on this page.