You don’t need a degree in statistics or computer science to be an investigative journalist or even to do simple fact-checking. But with the increasing number of online data collection systems, tools, lingo and technologies, it helps to know where to begin and what stories you can tell. This session will help you take the first steps in understanding, finding and interpreting data and maybe even do a mash-up or two and create a visualization. You’ll get a set of replicable case studies and methods to get you going and offer tips on when to get a developer on board.
Resources
Audience kudos!
I love everything about @j_la28. #ONA12 #ONA12cleandata
— Andrew McGill (@andrewmcgill) September 21, 2012
If you’re at #ONA12 and you consider yourself a data journalist, get to Seacliff A-D for @j_la28 tips for working with data #ONA12cleandata
— David Herzog (@davidherzog) September 21, 2012
More tips when analyzing datasets – names are not enough, for example. Others:
Know your data fields, tools to check values/ranges – Excel filter, Google Refine #ona12cleandata
— Paul Hyland (@paulhyland) September 21, 2012
.@j_la28: Be creative with duplicate search. More than unique ID#, look for same time, other values for 2 fields. #ona12cleandata #OJR
— Brian Frank (@frankreporting) September 21, 2012
Ways that agencies try to say “no”
Tips from @j_la28 in #ONA12cleandata #ONA12 twitter.com/kegill/status/…
— Kathy E Gill (@kegill) September 21, 2012
Other data-related tips:
Most important tip- READ!your documents @j_la28 on #ona12cleandata #ONA12
— Nicole Chavez (@NicoleChavz) September 21, 2012
Tip from @j_la28 when agency doesn’t have data in shareable format: Shame them into converting it. #ona12cleandata #OJR
— Brian Frank (@frankreporting) September 21, 2012
Cleaning data: do integrity checks for every data set. Read/understand the contents of every field. – @j_la28 #ONA12cleandata #OJR
— Brian Frank (@frankreporting) September 21, 2012
Do your “due diligence” – that includes cost estimates
*Always* do your own analysis of data rather than an agency’s. Cross-check it, too. #ONA12cleandata #ONA12
— Brian J. Manzullo (@BrianManzullo) September 21, 2012
Bottom line: don’t just accept access cost estimates says @j_la28 #ona12cleandata #ona12
— Keith Robinson (@kdawg39) September 21, 2012
Know your database and what you need before asking to query data, @j_la28 says. #ONA12 #ONA12cleandata
— Nicole Ely (@JourStudent) September 21, 2012
Here’s the story about pardons that @j_la28 mentioned in #ONA12cleandata http://t.co/Y99P4VTs
Twitter link (not embedding)
Here’s the story about pardons that @j_la28 mentioned in #ONA12cleandata propublica.org/article/shades… #ONA12
— Kathy E Gill (@kegill) September 21, 2012
Be sure to specific dataset, time series to manage cost and time:
Getting data: Know costs. Know who does data entry. Get to know Leon (IT guy who works in basement, knows data best) #ona12cleandata #OJR
— Brian Frank (@frankreporting) September 21, 2012
Know your database and what you need before asking to query data, @j_la28 says. #ONA12 #ONA12cleandata
— Nicole Ely (@JourStudent) September 21, 2012
Ways of denying you data: Huge costs, delay tactics, “Silly little journalist,” sent wrong stuff, “request unclear.” #ona12cleandata #OJR
— Brian Frank (@frankreporting) September 21, 2012
Last year ProPublica obtained list of 2K people denied pardon – pulled random sample and spent a year backgrounding these people. Showed that whites were more likely to get a pardon.
How do you find the data?
If there is a report, if there is a form, there probably is a database. #ona12cleandata
— Brian Ries Verified! (@moneyries) September 21, 2012
Examples of data-based stories include NPR on auto acceleration #ONA12cleandata #ona12 twitter.com/kegill/status/…
— Kathy E Gill (@kegill) September 21, 2012
Most powerful part of data, to me, is organization & easily finding irregular patterns, perspective. Worth the work. #ONA12cleandata #ONA12
— Brian J. Manzullo (@BrianManzullo) September 21, 2012
Some of the most powerful stories that are data-based are not about numbers – @j_la28


Remember to adjust dollars for inflation (this is one of MY pet peeves when it’s not done!)