| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

LOD-LAM Messy data and same-as

This version was saved 12 years, 10 months ago View current version     Page history
Saved by Mia
on June 2, 2011 at 7:58:35 pm
 

Attendees: Mia Ridge, Shawn, David Henry, Rob Warren, John Deck, Eric Kansa, Asa Letourneau, Lisa Dawn Colvin, Gerry Parsons

 

Intros round: issues around data that's beyond messy - some data is unknown, lots of ambiguity, lots of gaps in 'authoritative' data, lots of incompleteness, what are the semantics of 'same-as', separating facts from policies.

 

First question - how clean does data have to be before you move forward? If it's the best information you have, can you still publish it?.

 

Learning to live with messiness: making it ok for things to be messy because otherwise only a tiny percentage of data will every be released.

 

What do you do when data doesn't fit the model - there's a gap in what currently fits into the schema and what's too messy.

 

Finding ways to manage ambiguity - ways to store information as it's resolved.

 

Finding right point to assert certainty in data and how to mark things as unknown/messy.

 

Feeding corrections back - have to think through lifecycle of data - not just putting it out there but being able to ingest any improvements back in. [Action point - recommendations or models for different LAM domains?] - otherwise lost opportunities (again, theme from crowdsourcing session).  Which leads to managing provenance of data or corrections (again)...

Comments (0)

You don't have permission to comment on this page.