Attendees: Mia Ridge, Shawn, David Henry, Rob Warren, John Deck, Eric Kansa, Asa Letourneau, Lisa Dawn Colvin, Gerry Parsons
Intros round: issues around data that's beyond messy - some data is unknown, lots of ambiguity, lots of gaps in 'authoritative' data, lots of incompleteness, what are the semantics of 'same-as', separating facts from policies.
First question - how clean does data have to be before you move forward? If it's the best information you have, can you still publish it?.
Learning to live with messiness: making it ok for things to be messy because otherwise only a tiny percentage of data will every be released.
What do you do when data doesn't fit the model - there's a gap in what currently fits into the schema and what's too messy.
Finding ways to manage ambiguity - ways to store information as it's resolved.
Finding right point to assert certainty in data and how to mark things as unknown/messy.
Feeding corrections back - have to think through lifecycle of data - not just putting it out there but being able to ingest any improvements back in. [Action point - recommendations or models for different LAM domains?] - otherwise lost opportunities (again, theme from crowdsourcing session). Which leads to managing provenance of data or corrections (again)...
Using something like Freebase as way of linking information in different repositories...
Gate-keeping around core collections systems (and documentation backlog) might lead to layered model to store additional information (doesn't deal with corrections to core records still but it's a start) - 'thou shalt not touch a record'. External systems for discovery of other links e.g. through other APIs, crowdsourced data...
Comments (0)
You don't have permission to comment on this page.