Data Mining Rescues Investigative Journalism 91

Posted by kdawson on Sunday January 04, 2009 @05:49PM from the radical-transparency dept.

John Mecklin sends in word of initiatives through which the digital revolution that has been undermining in-depth reportage may be ready to give something back, through a new academic and professional discipline known as "computational journalism." "James Hamilton, director of the DeWitt Wallace Center for Media and Democracy at Duke University, is in the process of filling an endowed chair with a professor who will develop sophisticated computing tools that enhance the capabilities — and, perhaps more important in this economic climate, the efficiency — of journalists and other citizens who are trying to hold public officials and institutions accountable. The goal: Computer algorithms that can sort through the huge amounts of databased information available on the Internet, providing public-interest reporters with sets of potential story leads they otherwise might never have found. Or, in short, data mining in the public interest."

Data Mining Rescues Investigative Journalism

This discussion has been archived. No new comments can be posted.

Search 91 Comments Log In/Create an Account

Comments Filter:

Oh bull (Score:5, Interesting)

by Groo Wanderer ( 180806 ) writes: <charlie.semiaccurate@com> on Sunday January 04, 2009 @07:12PM (#26323777) Homepage

As someone who does investigative journalism for a living, data mining won't get you squat. Having done it for a living for 5+ years, and being very familiar with data mining, the two so rarely cross paths that it rounds to zero.
Why? Because if it is in minable form, it doesn't take any digging to find. If you can run a google search and get even a tidbit about what you need, you don't need investigative journalism.
Of the stories I have gotten, little ones like the P4 going 64 bits, it never reaching 4GHz, Dell exploding laptops (an assist on that one), and more recently the Nvidia bump cracking problem(s), none of that would have been possible through data mining.
If it is out there, it doesn't need an investigative journalist. If it isn't, than data mining won't help. The end.
-Charlie

Just another use (Score:2, Interesting)

by emilienne ( 647608 ) writes: on Sunday January 04, 2009 @07:18PM (#26323849) Journal

The Cline Center for Democracy at UIUC has been running a data mining project, scanning archives and contents of newspapers around the world for reports of political disturbances such as riots &tc. The project, a collaboration between the center and the UIUC CS department, is meant to facilitate research on domestic stability and the like. Currently it's focused primarily on English papers, but efficiency and completeness will dictate searches in other languages sooner or later.

Information can be suppressed or 'spun', but at least this will ensure that the data's available for such evaluations instead of paying some graduate student peanuts for years and years to put it together.

Of course it does mean that I'm sort of out of a job...

Re:Oh bull (Score:2, Interesting)

by binpajama ( 1213342 ) writes: on Sunday January 04, 2009 @07:59PM (#26324223)

I'm a grad student and have recently been asked to help out on a research grant proposal for the very same thing. I agree with the point made in the parent post - if its already out there, there's not much investigation needed. Additionally:
1) How will algorithms figure out if a story is relevant? There's no deux ex machina here. It will see if the article has the relevant buzzwords and if it has been released by a reputable source.
2) The buzzword factor kills the algorithm's chances of finding something really new. Its just going to find something that is `current'. Thus, its doing news aggregation, not investigative journalism.
3) The `reputable' source issue will be decided by looking at factors like source authority (measured by incoming links etc) which means that the algorithm will be scraping sites that are already highly visible. Again, this is simply `Google News' by another name. I cannot think of a way by which algorithms can look into nooks and crannies of the internet by being agnostic about source reputation. If they tried, they would quickly start coming up with 9/11 conspiracy theories and other balderdash as news reports.
Basically, data mining is going the way of fuzzy logic. It has reached saturation in terms of its utility and applications, and now people are trying to sell all kinds of possibilities to allow for the overshoot in academia (too many PhDs, too little to do).

Re:Oh bull (Score:1, Interesting)

by Anonymous Coward writes: on Sunday January 04, 2009 @08:15PM (#26324337)

The way I understand this Journalistic data mining malarkey is that, as mentioned it helps to discover leads or starting points in public interest stories.
It's pretty much the way (I think) science should be conducted in the future. The best leads in science come from blips in your data, things that shouldn't be there. Data mining helps to identify these blips in the data, but does nothing to analyse them. As always, there's the hard slog of trying to figure out what the blip means that comes after identification of it. Using data mining allows you to prioritise your leads, so you can focus on those that are the most frequent (although not necessarily most important).
So, similarly, I think in public interest stories - things like water quality, public health, etc, there are a number of data sources that don't get analysed to identify any important blips in the data. Maybe it's not in the interests of the organisation that collected the data to do the analysis, or doing the analysis might not be useful for academics to get further funding - but the analysis of the data remains of import to the public.
It's in these situations that I'd see this data mining journalism to be important. Identify the leads - and then follow through on an analysis of the data (and really, you'd probably just be outsourcing the analysis to academics). You catch the important stories that have fallen through the cracks, and hopefully educate and inform at the same time.
I didn't read the article, so it may have said the same thing ;)

Thomson Reuters Calais (Score:2, Interesting)

by InsurgentGeek ( 926646 ) writes: on Sunday January 04, 2009 @11:12PM (#26325623)

If you're in the world of investigative journalism I'd encourage you to take a look at a new class of semantic data generation tools. New capabilities like Calais (www.opencalais.com) from Thomson Reuters allow you to ingest unstructured text (news articles, press releases, FOIA documents, whatever) and automatically extract semantic metadata like people, companies, management changes, natural disasters and hundreds of others. You can take the output of these tools and load them directly into databases to query. You could take news stories and build a social network of family relationships then play news events against the network. We're already seeing some initial uses in the area of investigative journalism and would love to see more. Jump in and give it a try.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Data Mining Rescues Investigative Journalism 91

Data Mining Rescues Investigative Journalism More Login

Data Mining Rescues Investigative Journalism

Oh bull (Score:5, Interesting)

Just another use (Score:2, Interesting)

Re:Oh bull (Score:2, Interesting)

Re:Oh bull (Score:1, Interesting)

Thomson Reuters Calais (Score:2, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot