Attendees: Mia Ridge, Shawn, David Henry, Rob Warren, John Deck, Eric Kansa, Asa Letourneau, Lisa Dawn Colvin, Gerry Parsons
Intros round: issues around data that's beyond messy - some data is unknown, lots of ambiguity, lots of gaps in 'authoritative' data, lots of incompleteness, what are the semantics of 'same-as', separating facts from policies.
First question - how clean does data have to be before you move forward? If it's the best information you have, can you still publish it?.
Learning to live with messiness: making it ok for things to be messy because otherwise only a tiny percentage of data will every be released.
What do you do when data doesn't fit the model - there's a gap in what currently fits into the schema and what's too messy.
Finding ways to manage ambiguity - ways to store information as it's resolved.
Finding right point to assert certainty in data and how to mark things as unknown/messy.
Feeding corrections back - have to think through lifecycle of data - not just putting it out there but being able to ingest any improvements back in. [Action point - recommendations or models for different LAM domains?] - otherwise lost opportunities (again, theme from crowdsourcing session). Which leads to managing provenance of data or corrections (again)...
Comments (0)
You don't have permission to comment on this page.