Slashdot Log In
Data Mining In Law Enforcement
Posted by
Soulskill
on Thu May 08, 2008 06:05 PM
from the can't-they-just-google-it dept.
from the can't-they-just-google-it dept.
jcatcw points out a blog entry by Scott McPherson, CIO for the Florida House of Representatives. McPherson condemns the state of data sharing and data mining in law enforcement, saying that the US causes itself a great deal of trouble by focusing more on "antiterror armor and nuke-sniffing devices" than a useful information distribution network. He discusses a few such projects, and how they could have directly affected the events of 9/11. Quoting:
"One of those ingenious things that actually worked, Seisint founder Hank Asher's brilliant MATRIX system, remains mired in controversy and politics. Hank showed me MATRIX just a few short weeks after the 9/11 attacks. Using law enforcement data and commercial data, all of the commercial data available in the public domain, Asher's query produced [hijacker Mohamed] Atta's photo -- and about 80 others, many of them fellow 9/11 hijackers, many of them associates of the 9/11 hijackers. It was simple data mining and algorithms, and none of the information was obtained illegally."
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Hold on a minute here (Score:5, Insightful)
Re:Hold on a minute here (Score:4, Insightful)
But really. Lots of people *may* commit crimes. Computers may decide you are likely to rob a bank tomorrow, that does not mean you will. We need to make sure the law is always about what you do not what a computer projects your going to do. The day we jail people who *might* be about to commit a crime is the day we put people in jail for their thoughts.
Parent
Re: (Score:3, Insightful)
Now for the thought experiment.
Stipulation: The computer produces 0.00% false positive identifications.
The computer identifies a suspect as 100% likely to rob a bank (he's at the teller window, has demanded cash and is pointing a gun) is it OK to arrest him?
The computer identifies a suspect as 99.9% likely to rob a bank (he's next in line for a teller, has a gun and a demand note) is it OK to arrest him?
The computer identifies a suspect as 99% likely to rob a ba
Re:Hold on a minute here (Score:5, Insightful)
If that is the case, this is a pretty impressive set of results. Being able to identify, say, 5 of the attackers, and to have a number of the other hits be known associates, when the training set likely consisted of at least 10's of thousands of names, is pretty fair accuracy. The false positive rate is pretty fair, as well, especially when you contrast it to the No Fly list, which has numerous false positives, and no known successes in identifying anyone of interest.
There is likely some sort of clustering algorithm behind this, and the math behind those is pretty solid. Before you dis this, or even get excited about privacy issues, I'd suggest you check out a reference such as this [amazon.com]
I'm not really concerned about data mining as a privacy issue, and I think it's a pretty legitimate approach for law enforcement. As a side note, I do data mining and predictive analytics for a living. It's objective, it's factual, and if the practitioner is knowledgable about it, it shouldn't be stigmatizing. Indeed, it would reduce scrutiny on the majority of the folks that would otherwise be tarred by having an arabic surname and swarthy skin.
It would have the potential to be vastly more effective, and vastly less expensive than the path we are on now. One reason that we might not be using could be that we -have- used it, and didn't find anything. That's the thing about objective data mining, if there is nothing there, it'll tell you that. I don't think, for our current administration, that it's a desireable outcome to find that there is nothing to worry about. If that happened, the populace would be less fearful, and less easy to control.
Take this one step further, and apply this bit of thought. It has been shown time and again that the TSA is incompetent, and that any motivated terrorist could get a weapon on board a plane. It is further obvious that our ports are porous, and that soft targets abound. We have seen no triumphant pictures of the authorities frog marching attempted terrorists away, no success stories of how these measures have saved our lives again. We have also seen no further attacks.
This strongly suggests to this practitioner that we have a near zero incidence rate of terrorists in the US; that when a terrorist attempts an attack, he succeeds, and that the lack of attacks suggests that the attack rate is close to zero.
Data mining would be a useful tool to calibrate this theory.
Parent
I call BS (Score:3, Informative)
Or not (Score:2, Insightful)
As counter intuitive as it may seem at first, agencies have strict rules on this kind of behavior.
Re: (Score:2)
Re: (Score:2)
Sadly, punishment is no guarantee of behavior modification.
Re: (Score:3, Funny)
Re:Or not (Score:4, Funny)
Capital punishment has its uses, but as a deterrent it's pretty limited.
Parent
Re: (Score:2, Insightful)
Re:Or not (Score:5, Insightful)
Parent
Hindsight is 20/20 (Score:5, Insightful)
Re:Hindsight is 20/20 (Score:5, Interesting)
This guy also doesn't seem to have much knowledge of intel gathering. The idea that forward projection isn't happening is...uh...wrong, and that's all I'll say on the matter (disclaimer: I'm ex-NSA)
He also doesn't seem to comprehend the concept of misdirection, as the term is used by performance magicians.
I'd guess he can't even pronounce the name, "Sun Tzu", let alone have read the writings.
Parent
Algorithms are easy (Score:4, Interesting)
This guy also doesn't seem to have much knowledge of intel gathering. The idea that forward projection isn't happening is...uh...wrong, and that's all I'll say on the matter (disclaimer: I'm ex-NSA)
If you're ex-NSA, then you also know that the difficulty isn't in writing the algorithms, it's in getting somebody to stitch together all the goddamn databases that are strung out all over creation.
Shit, *I* can write the social networking algorithms, anomaly detection, etc. But it doesn't do any good if you don't have the data integrated, and despite what's happened the last 8 years we still don't have it.
I also don't get the false dichotomy the author uses to rag on sensor-based detection.
Parent
Re: (Score:3, Insightful)
The most likely sources of false data is not the people they are trying to catch but supposedly legitimate sources pushing their own barrow, intelligent consultants trying to rack up hundreds of thousands of dollars
Re:Hindsight is 20/20 (Score:5, Insightful)
I stand around a marketplace in Baghdad. When a guy runs up to a crowd, screams "Allah Akhbar", pulls a string on his coat, and fucking explodes all over the place, I point at the spot where he used to be, and say "That was a suicide bomber".
And before you try to horn in on my business, know that I've already sold the DoD enhancements to my algorithm that covers cases where the bomber doesn't scream "Allah Akhbar", or where the bomber is a she not a he, or where the explosives are in a car not a coat. Or combinations thereof.
But seriously, it says that "his query" produced Atta's photo (and 80 others only some of which apparently had anything to do with 9/11). What exactly was this query? "9/11 hijackers"? "terrorists named Atta"? "Arabs who've been pulled over"? So Atta's driving citations means it was theoretically possible for someone to pull his name up. The question is, why would they have done this? What would have motivated someone to perform that query, and how exactly does data mining driving citations lead to the important conclusion that Atta was a terrorist?
The article makes good points that data sharing between law enforcement agencies is a good thing, and helps with such rather mundane things as finding fugitives who skip out on parole, or people who don't show up for court dates. But that MATRIX nonsense is yet another attempt to cash in on post-9/11 anti-terror funding bonanzas. Which, now that I've gotten my slice of the pie, I'm against.
Parent
Re: (Score:2)
Anyone know of a system with an effectively low false positive rate? When dealing with millions of "possibles", it seems even a 1% or 2% false positive rate generates far too many false positives for the system to be effective.
This system seems to generate a number of false positives even in hindsight.
I could find 19 terrorists in like 5 minutes! (Score:2, Funny)
No (Score:2)
Re: (Score:3, Insightful)
Especially at airports I sometimes get so angry about all the silliness that I play some mind-game with the aim of blowing it all up. My current favorite is
Re: (Score:3, Insightful)
Especially at airports I sometimes get so angry about all the silliness that I play some mind-game with the aim of blowing it all up.
Last time I was at an airport dropping my sister of, I was thinking the exact same thing. I saw her going through the security-checkpoint and she had to turn on her laptop so they knew it wasn't a bom. How silly is that: "could you please activate the potential on-switch of a bomb, so we can be sure it isn't a bom?"
Not sure if it is the same everywhere, but the security-checkpoint was pretty crowded, at least 50 at the checkpoint and 100 in close vicinity. If your goal, as a terrorist, is to instill fear
Re: (Score:2)
Iraq, Israel, Southeast Asia... it's all about markets and churches and hotels for the high frag count. Like you said, a few attacks would completely shut down air travel in the United States for the foreseeable future. Like V for Vendetta, where they just gave up and abandoned the subway system.
Re: (Score:2)
it's all about markets and churches and hotels for the high frag count.
But those are not likely to be the targets over here (Europe, North-America) and those are different kinds of terrorism. Take Iraq; some of the terrorism is a form of resistance (violence aimed at occupying forces and collaborators), some is sectarian/tribal and some is foreign/imported. Different goals, different organization, different funding, etc.
I'm not saying there aren't terrorists whose sole goal is to spread death, destruction, chaos and fear. But, apart from the occasional fruitcakes, is not so
Wonder how long until this is all public domain (Score:3, Interesting)
First it was suspected enemy agentz.
Then it was suspected associates, even though separation may be 3-4 people away in a chain.
Now its anyone suspected of a crime.
How long until everyone is dumped in this database for not just intel or law enforcment, but potential employers, stalkers, and violent criminals data mining for easy marks?
Re: (Score:2, Insightful)
I keep watching the bar for spying on people get lower and lower.
First it was suspected enemy agentz.
Now its anyone suspected of a crime.
What the hell are you talking about? People suspected of crimes have always been subject to spying, e.g. wiretaps.
Hmm (Score:4, Interesting)
It was simple data mining and algorithms, and none of the information was obtained illegally.
2. I wonder what he means by "commercial data available in the public domain". Either it's commercial and you have to pay for it, or it's public domain. My long distance calling patterns are commercial data (and is sold by the phone company for marketing), but they're not "public domain" in the way that most of us would understand it.
Re:Hmm (Score:4, Informative)
In the context of Intelligence Analysis, "public domain" [sra.com] means information that is available publicly, as opposed to classified or secret information. Whether something is copyright or not doesn't enter into it.
Parent
Maybe (Score:5, Interesting)
Also, no local law enforcement officer would have been able to piece together this plot from looking through one car BEFORE the event. Piloting multiple planes simultaneously into various landmarks was just too implausible to be believed before it happened. Even if John McClain himself figured it out, he wouldn't be able to convince anyone to help him stop 19 other people from boarding planes in multiple airports.
Sharing information sure beats what we're doing now, both in law enforcement and the intelligence community where I work, which is holding everything close so no one else can take credit. But let's not exaggerate the benefits here.
Re: (Score:2)
Re: (Score:2)
Not when either my fiancee or I are at the other end of it... which is darn near 100% of the time.
Re: (Score:3, Insightful)
Worst. Clairvoyant. Ever. (Score:5, Funny)
A few short weeks after the Kentucky Derby, I devised a database system that predicted the winner. Impressive, no?
Re: (Score:2)
Though implementing it to actually make some predictions of events that have not yet occurred, that are then validated, certainly lends it credibility.
I predict tomorrow we will have daytime.
what 70% of the population missed (Score:2, Interesting)
It's a shame more of the public doesn't realize that it's not necessary to either break the law or pass laws to legalize violations of one's rights, to provide reasonable protection for the public good.
License plates (Score:4, Interesting)
I'm not saying it would put up a big "pull over and detain!" notice, but it could pop up the plate, the vehicle it should be on, the owner, and why it's of interest, then the officer would decide what to do. I.e., if a car pops up as belonging to a wanted 22-year-old male but it's obviously someone else in the car (too old, wrong gender, etc.) then they would ignore it.
Of course, like anything, there is the potential for abuse, but before you freak out about privacy, remember that driving, by definition, is a very public act. We're not talking about millimeter-wave radio or looking behind closed curtains with an infrared camera, we're talking about reading the required-by-law several-inch-high unique identifier on a hunk of steel with unobstructed windows on the public roads. If you're wanted and don't want to get caught, it's your responsibility to not go out in public with a visible unique identifier.
Re: (Score:2, Insightful)
I've always wondered why they don't equip police cars with a video camera and the ability to OCR every single plate that comes into view
There are already systems like this deployed. I don't know specifically where, but I receive a Law Enforcement monthly magazine and I've seen many ads for exactly this type of product.
A quick search for 'automated license plate [google.com]' on google brings up a bunch of relevant results if you're interested in finding out more.
Re: (Score:2)
Re:License plates (Score:4, Interesting)
I never understood why anyone involved in lucrative crime (drugs mainly) would ever commit even the most minor violation (I imagine the successful ones that you don't read about in the blotter do just this). If I were carrying anything even remotely illegal, I would make sure all my blinkers and lights work, that the plates insurance, registration and driver's license that I hand the officer are all spotless and in my name. I wouldn't speed, change lanes, honk, swerve or even imperceptibly roll a stop sign. The fact that criminals routinely cannot implement even this smallest amount of common sense boggles the mind. It's as if they just aren't thinking at all.
Parent
Re: (Score:2, Interesting)
The car belongs to a 22 year old male, a 50 year old woman is driving it, obviously stolen. Pull over and handcuff the driver with my gun d
TV and Phone Psychics? (Score:2)
I never paid much attention to them because I figured that if they were really a 'psychic', then they would already KNOW to call me instead. Had to be some kind of phone charges scam I concluded.
Hmmm, maybe I'M psychic! (nah, I'm probably just psycho)
Islands of Automation (Score:5, Informative)
Re: (Score:2)
For example, system A may separate "asian" and "pacific islander" for the race code, while system B lumps "pacific islander" into "asian" and has no pacific islander category. This is especially true in towns that may have very few o
Re: (Score:3, Informative)
Personally, the company I worked for had a system that kicked the butts of the larger initiatives. It replicated in near real time, worked with incremental data, optimized network resources and bandwidth, fault tolerant, highly scalable (from local to nationa
pff (Score:5, Funny)
I'll take "How do you round up the most possible innocent people and make false charges against them" for $500, Alex...
Bad news actually (Score:4, Insightful)
Re: (Score:3, Interesting)
It's left up to the officer's discretion to enforce or not enforce. And giving him more information with which to make that decision isn't a bad thing. You can't say we can't have more efficient tools because they can be abused more efficiently. You ob
It's called GOOGLE dumbass! (Score:2, Flamebait)
Algorithm training (Score:4, Informative)
Without additional information it's impossible to say if this is impressive, or just a stupid algorithm trick. With many mining algos, you can easily train them pull certain needles out of the haystack. The question is, will your training situation look anything like the future situations? Training the algo only with the 9/11 terrorists, would it pull out the trade center bombers, or Timothy McVeigh? Will future predictions be right or will it pull out groups of Arabic student pilots who had the misfortune of buying the same shampoo most preferred by 9 out of 10 terrorists. Especially with rare events, I think you mostly get into a hyper complicated version of correlation != causation.
Re: (Score:2)
>It was shut down over privacy profiling and other concerns, surely you remember, it wasn't that long ago.
> This story seems to lament this but geeze, make up your mind, if it's not an outcry about the lack of datamining it's someone saying datamining is one foot in Orwell's 1984.
>If this ever grows logs it'll become a political hot potato again and get dropped.
"shut down", v.t., to change the name of something, preferably in a way that doesn't e