British DNA Database Mismatch 194
nahal writes "DNA evidence is extremely compelling to a jury at trial when trying to convict a suspect. In this article at USA Today, the world's largest DNA crime-solving machine, located in Great Britain, mistakenly matched a suspect to a crime in a 1-in-37 million chance. American experts have called it 'mind blowing'."
The Odds (Score:1)
If a person FIRST was a suspect and then his DNA was tested and found to match the crime scene evidence... then that would be 1 in 37 million odds.
But to compare an individual's DNA with the databank would as others here have said be 1 out of 56 chances. NOT good odds.
It reminds me of the old parlor trick of matching birthdays. I forget the exact number (and to lazy to calculate it) however if 15 (approx) persons are in a group and each of their birthdays are matched then it is favorable that two within the group will have a birthday on the same day.
The same corollary exists for the DNA, except that the police are matching against a database of data.
You would think that the police would be smarter than that. Well, on second though, no....
6 loci, 10 loci, loci-schmoci (Score:1)
Re:Statistics and probability (Score:1)
The rule you are mistakenly using is this only happens if A and B are mutually exclusive, which brings with it that 37m samples will guarantee a match. In general, the formula is
BTW, the 1/56 figure is the probability that you have the right person given only the match in the DNA.
John
Book: "Randomness" by Deborah J. Bennett (Score:1)
At Amazon [amazon.com]
old news (Score:1)
Only a tiny part of our DNA is human-unique (Score:1)
So, presumably any moment now someone will get prosecuted for some of Godzy's mayhem.
Re:juries (Score:1)
...or is this a clever troll intended to demonstrate how OJ got off? "The policemen who arrested him were racists, so if you find him guilty, that makes you a racist too. It doesnt matter if he's guilty or not, we have to punish the racist police by finding OJ not guilty". All these arguments weighed heavily enough in the mind of the jury to find the small, but non-zero chance that the overwhealming forensic evidence was the result of an elaborate conspiracy to rob the black community of one of it's icons, ground for reasonable doubt.
Re:juries (Score:1)
Its not 1 in 37e6 its 1 in 56 duh.. (Score:1)
could be so stupid. If they have 660,000 records
on file, and the chance of a random match is
1 in 37e6, then the chance of matchng someone at
random in the entire database is
1 in 37e6/660,000 = 1 in 56. The bigger the database gets the worse the problem.
If they have been using such "evidence" to put
people away, they must have hundreds of innocents
in the slammer by now.
Re:DON'T THEY KNOW ANYTHING ABOUT STATISTICS? (Score:1)
Firstly you assume that each on the loci is
statistically independent, I'd guess not.
Secondly you assume that all gel locations
are equally probable, this has to be wrong, I'd
guess the distribution is highly skewed towards a
small subset of locations.
The article itself states that adding four
additional loci takes you from 1 in 37e6 to 1 in
1e9 an inprovement of a factor of 27 or 2.3 per
loci, a long way from 600!
There must be a strong case of diminishing returns
here, so even the 13 loci tests used in the US are
probably not much better than 10 loci test
mentioned in the article.
Re:juries (Score:1)
The LAPD got caught framing a guilty man. Since the police in this country have to play by the rules, there was no reasonable way he could have been convicted. The un-reasonable way would have been to ignore police tampering with evidence and violating the chain of custody, but that would have been worse than letting a guilty man walk free.
I did run that LA Times search URL [latimes.com] posted elsewhere, and it's pretty clear that the LAPD will do whyatever they think they can get away with. For example,
If Not Guilty verdicts and overturned convictions are the price we pay to send a message that police misconduct Will Not Be Tolerated, then we grit our teeth and bear it. Remember, America is supposed to be a free society, and police misconduct cannot be tolerated.
Gotta use the technology correctly (Score:1)
Re:Not as unlikely as you might think... (Score:1)
--
Re:I'll take the ignorant over the "annointed"... (Score:1)
>>stood up for the right of the defendant to be
>>tried by proper licensed professionals.
>So why not do away with juries altogether? The
>lawyers and judges are all certified, surely that
>makes them elite enough to decide the fate of
>the accused.
Heck, why not just let the police do it. Just add
a certification test to their training...
That would save some of the money spent on the
justice department. Just let the police drop their
supects off at the local maximum security prison.
No judges, no lawyers. Just good old fashioned
martial law.
Re:Twins? (Score:1)
>out the identical twins life time to make their
>DNA different. They're know as viruses. Other
>mutagens will also cause even more differences
>over time.
I'm afraid not. Generally mutagenic changes to each cell will be idiosyncratic -- that is they don't change every cell and the cells that are changed are altered differently, so they get washed out as 'noise' in a PCR RFLP.
You are very correct in citing viruses as one of the few agents (mutagens are not at all uncommon) that could cause a highly consistent change in the genome, but they are rarely (never, so far as I know) so consistent that they approach total alteration.
On a PCR-PAGE gel (with an adequate sample, this could result in abberant or extra bands, it should not erase bands that already exist (beecause some unaltered cells remain. therefore twins would definitely register as positives for each other -- though perhaps with some abberrant bands
>Then there is the testing method. The
>electrophersis gel tests used have rather poor
>repeatability.
Maybe they aren't perfect, but they are pretty darn good. I certainly would never call their reproducibility "poor".
The things that produce bad results are well known, and avoidable. But I don't doubt that technician error or carelessness could create real problems
>Sure some things can be done to help make them
>better. I wouldn't accept a match when the
>samples are done on two different machines in
>different labs. Having two different gel
>suppliers also makes a huge difference. The test
>is really only telling you the length of strands
>between markers where the chemicals split the
>strands into segments.
I, too, would want both samples run on adjoining lanes on the same gel (though irreproducibility would be *far, far* more likely to produce false negatives than false positives)
I don't think they still do RFLP on DNA IDs -- but I could be wrong. PCR RFLP would seem the way to go -- a stronger signal with a tiny sample, and many other advantages.
MY NIGHTMARE:
The tired/careless/whatever technician who double-dips (placing/contminating) my DNA in both the "evidence" and "suspect" lane, and creating a match (especially with PCR RFLP
Problems with this sort of estimates (Score:1)
I love PCR and the million ways it can be used, and I am very happy that it's being increasingly used in criminal investigations. The former 'gold' standard (eye witnesses) have been demonstrated in study after study to be frequently unreliable.
However, when I see a number like 1:3.7x10^7, I really fume. It's based on far too many assumptions that we simply do not have the knowledge to verify. The specifics vary with the loci and methods used, but I think I can illustrate a few major points with general principles.
1) DNA matching is *NOT* done by sequencing the entire sample of DNA available. Instead, a few quick measurements are performed. The principle is that no one individual is likely to match all of them ["Gee, how many green convertibles with a Z on the license plate could have been driving in this part of town at three a.m. last night? One, buster -- you!"] DNA evidence assumes a reasonable degree of randomness and statistical independence, but those qualities are poorly charaterized in the real world.
2) DNA is far from random. In fact (despite the inevitable mutations we all have) it's just a mix-n-match of the DNA of existing humans (who are similarly non-random, breed non-randomly, etc.).
3) Even after we sequence the Human genome, we won't have the information about genomic variance to make such estimates accurate -- until we characterize hundreds of thousands of people in a deliberately random fashion to even come close.
[It *must* be random -- not based on criminals or even volunteers. Many 'classic' post WWII medical studies were heavily biased towards the "70-kg white male medical student" (who will volunteer for almost any test).]
Think about it: how can any statistical analysis claim an accurate probability of 1 in 37 million, from a database of 660,000 individuals? or even 6.6 million? The number was created by assuming the individual measurements were independent -- even 'partial independence' would require a quantification of the degree of dependence for any real calculation. That data does not exist -- and would require millions of test subjects.
3) Variance information would be tricky to interpret, even if we had the data.
"A rare mediterannean genetic trait" isn't quite as significant if the crime took place in Italy -- or 'Little Italy' of your favorite town.
If a witness sends the police on a round up of "short blond female caucasians with freckles", then the probative value of the DNA analysis depends on the likelihood of a match for a random
"short blond female caucasians with freckles", not "tall, dark hispanics" or "short-haired male tabbys with spots"
[Want to start a fight? Ask ten forensic geneticists how the overall odds change if the suspect turns out to have a known identical twin. Even this seemingly simple question has never been completely resolved mathematically. Many investigators will mumble 'No change', but in fact, there clearly is a difference. we just can't quantify it. The same applies, crudely, to an only child vs a child in a large family]
Sadly, characteristics cluster in precisely the way we wish they wouldn't. Relatives share genetic similarities, have a tendency to be in the same general area, and often enough situational factors to predispose to similar motives. The same applies (much less strongly) for ethnicity.
It's important to note that deviations from perfectly independent assortment will ALWAYS reduce the 'odds' of an incorrect match, making any DNA match less conclusive
4) Generally, these corrections tend to have larger effects when the base (uncorrected) likelihood is small (i.e. it's easy for a correction to reduce one in 37 million to one in 10 million, but very hard to reduce 1:4 to 1:1.1)
The article says that the error reflects the rapid increase in the database size (470K to 660K in the past year). However, I think that it is more likely that the error reflects the flaws in the assumptions behind the estimates. As the database size groews (and DNA is nore widely used), we will see more errors -- not because of "all that nasty data" but because "all that data" will highlight (as data is supposed to do) the error of our assumptions.
Re:Statistics and probability (Score:1)
If there were only 37M permutations of 6 loci, that would imply roughly 20 discrete possible values at each loci. Is that how you envisioned the underlying data?
I don't know what test they use in the UK, but I'm assuming that it's the RFLP [Restriction Fragment Length Polymorphism]-- basically they use a highly specific enzyme to chop up the DNA, and place it on a polyacrylamide gel under an electric field to measure the size of the fragments. (Actually, nowadays, they probably use pre-synthesized n-nucleoside primers and PCR [polymerase chain reaction] to chop and selectively amplify the fragments, but the principle is the same)
A single gel can easily measure fragments ranging from a few hundred base pairs to 10-400+ kbp with good resolution. The exact range varies according to current/field, gel composition, and other factors, but the bottom line is: it's easy to see bands that are a millimeter apart, so if you use a foot long gel, the range of possible values is close to 300. that creates:
300^6= 7.29 x 10^14 possible permutations
Actually, 0.5mm is a more realistic resolution limit, so the actual number of resolvable values is at least 600.
(600 values) ^ (6 loci) =4.6x10^16 permutations
These are just crude estimates, for the benefit of those who've never read a electrophoresis gel. In actuality, the range of allowable values might be limited by other factors (values that are too extreme may be eliminated as artifacts) But it does give a sense of the TRUE numbers involved.
(with modern gels and automated readers, the resolution may be even higher, but my experience was with UV lamps, eyeballs and Polaroid prints way back in the 1900's... 1991 or so)
Please run your analysis again using this range of possible permutations, and you'll see that 1:37M could well be a FINAL probability of a false match.
Re:Statistics and probability (Score:1)
>case) is 1 to 37000000.
Where does it say that? If you'd ever run a gel, you'd know that this is ridiculously low. It implies that there are only about 20 possible 'positions for a band. the actual number of values on a full-size gel is in the 100's
>2: That means that ONE DNA-sample compared to
>ONE other DNA-sample has the chance to in 1 of >37000000.
No, it means (as stated in the article) that the chances of a mismatch OVERALL (under the condition listed for the database) were estimated a 1:37M. I have serious issues with the underlying assumptions of the model used in DNA ID calculations, but they are based on the fact that we lack critical data for the assumption of "independent assortment (a basic concept in first year genetics) but I DON'T doubt the ability oif the statisticians to do basic math. I just think that they made assumptions (required by the limit of current data) that are not justified.
That may be okay for a research paper, but not for a person's life -- no matter how much law enforcement may want answers! [note: the polygraph, inadmissible in most courts and widely discredited as a 'truth tool', is often used in investigations because police want answers, and are willing to accept the "risk" (minimal, to them) of a wrong answer]
>4: Any other circumstances have no impact on this >if THEY HAVE NOTHING TO DO WITH THE DNA-CODE !
Jesu christu, tu mater est stertocarari! I can name a few dozen things unrelated to the genatic code, from start (AUG) to finish (3 codons) that impact the calculation -- lab criteria for artifacts, choice of primer, ethnic dependencies, underlying population composition, inbreeding and genetic relationships with the suspect pool... and if I hadn't been up for days, I'd have a much longer and more varied list
>5: In this case we have 660000 OTHER DNA-samples
>to match against ! The rest is obvious
Yes, obvious. So obvious that (as I have shown in another post) the number of independent permutations may well be over 10^17 -- and the 3.7x10^7 figure cited in the article obviously is reduced to take account of this crude pairwise database comparison (and other factors)
Re:Welcome to the world of statistics . . . (Score:1)
>this anomalous case does not invalidate DNA
>evidence... (assuming the methodology of the
>tests is good) is exactly as useful now as it
>was before
I DON'T hate statistics. I hate what is often done with it.
As a result I can see that the assumption of independent assortment is severly flawed. This makes DNA IDs very useful for its negative predictive value (if it says you're not guilty, you almost certainly are not) but it's positive predictive value is much weaker.
That's why blood tests are far more accurate at disproving paternity than proving it: it may be impossible to prove a negative ["I have never fathered a child"] but it may be easy to disprove a positive assertion ["You fathered this child"]
Orpheus, father of the finest children, bar none
Re:P(false positive) -> 1 as n -> oo (Score:1)
>negative; this is very low since only one person
>can commit a crime!
Your logic is extremely weak here.
Are you saying that if I have lots of co-conspirators, I decrease my chances of getting caught?
Gee-- no wonder white collar crime is so rarely prosecuted
Re:Statistics and probability (Score:1)
I understand it to mean that a given DNA sample, using their testing techniques, will only match 1 in 37 million of the general population.
However, given 660,000 of the general population, the probability of you finding that one has just increased.
The probability you refer to "1 in 37 million that you will get a false match if it is in the database." is what the juries and others are led into believing, but it is not explicitly put that way, because (I believe) that is false.
The article actually says, "British authorities estimated that the likelihood of that match occurring at random was one in 37 million.", which is a totally different thing.
Re:Birthday Paradox? (Score:1)
Ok to get back to the subject....
There isn't that much chance of two people comming up with the same DNA.. Thats the whole point of the DNA fingerprint.
On the other hand it is posable to polute DNA samples... If you have sample a and sample b in the same lab at the same time it is posable to mistakenly mix a and b together...
DNA evedence shouldn't be total proff simply backup evedence...
Juries: The Morons tried by the Ignorant (Score:1)
I'm amazed human rights organisations have not stood up for the right of the defendant to be tried by proper licensed professionals.
DNA Testing Policy (Score:1)
The correct use of DNA testing is for verifying suspects. Ideally, evidence is collected at the scene of the crime and a list of suspects is generated. If DNA evidence is found, it is checked against the list of suspects only. Only then is a DNA match meaningful. I think that using the entire database violates the principle of probable cause. Citizens should be innocent until proven guilty. If your DNA happens to match the DNA found at a random crime scene, you should not have to prove your innocence.
Explanation of the math:
Chance of a DNA sample matching another random sample: 1/37e6
Chance of 2 DNA samples matching a random sample: (1/37e6)^2
Chance that neither of the 2 DNA samples match a random sample: (1 - 1/37e6)^2
Chance that none of the 660,000 samples match a random sample: (1 - 1/37e6)^660000
Chance that at least one of the 660,000 samples match any given random sample:
1 - (1 - 1/37e6)^660000 = 1/56.6
-Nathan Whitehead
MATH! (Score:1)
Ooops . . . ;-) (Score:1)
Thanks for the laugh!
himi
--
Re:Statistics and probability (Score:1)
I'll take the ignorant over the "annointed"... (Score:1)
I'm amazed human rights organisations have not stood up for the right of the defendant to be tried by proper licensed professionals.
So why not do away with juries altogether? The lawyers and judges are all certified, surely that makes them elite enough to decide the fate of the accused.
Death Penalty (a bit offtopic) (Score:1)
Re:furst (Score:1)
And the weakest link is...... (Score:1)
With a test as sensitive as DNA analysis, it doesn't take a lot of contamination to blow the test.
Especially if, as here in the USA, the detectives like to take the samples back to the scene for no good reason (as in the O.J.Simpson case).
As a juror, I would not be comfortable convicting on DNA evidence alone. I could not examine the evidence itself, only what prosecutors tell me about the evidence. In the USA, it is the jurors' specific role to examine the evidence and weigh its relevance to the case. Evidence that cannot be examined by the jurors is thin ice, AFAIC. (Of course, my attitude would get me kicked out in the voir dire.)
Re:juries (Score:1)
--
Re:Statistics and probability (Score:1)
2: That means that ONE DNA-sample compared to ONE other DNA-sample has the chance to in 1 of 37000000.
3: If You have TWO other DNA-samples to match against you have a chance of match in 2 (TWO!) of 37000000 !
</BLOCKQUOTE>
For a bunch of supposed geeks, you lot are fucking useless at mathematics.
Here we have a classic example of no logic skills at all.
I hardly need to point out that taking "i"'s logic a bit further,
If You have 37000000 other DNA-samples to match against you have a chance of match in 37000000 of 37000000 !
ie. a definite match. However this is blatantly false.
The guy who replied to this message has a bit of sense, although his calculation is irrelevant.
Now, time for another piece of using your brain instead of being a dick. It is blatantly false that the DNA database has a failure rate of about 1/56. I think it is safe to conclude that the 1/37,000,000 chance has <I>already taken into account</I> the fact that there are 660,000 people in the database.
I don't know how the locus check matches (perhaps we need to find a genetics expert here), but it is reasonable to suggest that if there were a 1/37*10^6 chance of a match on six loci, then the chance of a match on one locus is the sixth root of this, ie. about one in eighteen. It seems highly likely to me that each locus check would be more definite than this low chance.
Here's another piece of reasoning: If the chance of two DNA sets matching is 1 in 37,000,000, then the chance is <I>almost certain</I> that two in a pool of 660,000 will match. (Recall the birthday example; if there are 23 people in a room, then it is more than even chances that two will have the same birthday).
The formula here is:
chance = 1 - (37000000! / 36340000! / (37000000^660000))
which I can't really be bothered calculating, but would wager highly (based on my arithmetic intuition) that it's rather close to 1.
Anyhow, my purpose here was to show that one should reach sensible conclusions by using your brain and looking at different angles on a problem; and CHECK THAT YOUR ANSWER IS SENSIBLE before blindly trusting it. Perhaps the court judges could follow that principle. I know that if school students did, then the average marks on exams would be a lot higher.
</RANT>
Re:Twins? (Score:1)
Although if one got the death penalty, perhaps they could cut one head off or something
Re:Evidence (Score:1)
Re:Welcome to the world of statistics . . . (Score:1)
Re:Statistical problem can be overcome (Score:1)
First, each 'test' is a unique event, no two tests have any impact on each other. That said, for any single given comparison between two sets of human DNA the entire genome is certainly not compared directly for two reasons:
A) That is technically approaching 'rediculous' since the entire genome run out on an agarose gel looks like a big streak instead of the banding pattern we've all seen on Court TV.
B) It's an excercise in futility since a good percentage of all human DNA is repetitious. Some of these repetitious areas define what we look like (since we all basically look the same) and some are just 'junk' DNA that doesn't do anything. Comparing either of these sub groups is fairly futile.
To this end, DNA science instead turns to hypervariable regions of the human genome known as 'marker' regions. It's these marker regions that are actually compared in court cases. And the variability of these marker regions is what leads to the 1:37 million statistical figure.
Now since each trial is independant in a test like this EACH test for DNA similarity has a 1:37 million chance of matching to the level of exactness set by law. It doesn't really matter how many of these trials are carried out, whether its 80 a year or 80 million, each one has the same (tiny) chance of mis-conviction. This would be why the experts are suffering from blown minds about now considering my chances of winning the lottery every weekend are 1:~14million and I consider _that_ rediculous.
As a side note, as our ability to sequence genomic data increases in speed our ability to compare larger and larger regions of human DNA will improve. At the moment it's fairly archaic, we chop each DNA sample with enzymes that cut at particular loci and then see if the pieces come out the same size. It's a dirty method to be sure. Perhaps someday soon we'll be able to just sequence each persons genome and compare them directly ... though I don't see that being a possibility for at least ten years hence as it would require amazing computing power to compare 3x10^9 bases directly as well as some serious sequencing technology we just don't have yet. A typical state-of-the-art capillary DNA sequencer costs $250,000.00 (US) and can sequence approx. 100,000 bases a week. Do the math and you'd need a lot of machines to get 3x10^9 in any sort of court-friendly time frame ... for now!
-----------------------------------------------
James C. Diggans
jdiggans@excelsior-web.com
Re:Twins? (Score:1)
If I didn't get that right, somebody please tell me (as I'm sure you will anyway) I may have misunderstood. Biology was NEVER my strong point.
Excellent commentary! (Score:1)
(Don't even get me started on the people who post random made-up facts like "We have 90% of our DNA in common with dinosaurs" and expect that to be relevant to the discussion)
Thanks for posting something I actually enjoyed reading!
smallstar
Re:Statistics and probability (Score:1)
Re: Terry Pratchett (Score:1)
Canberra, Australia, and he didn't sound Australian to me.
Alex.
juries (Score:1)
Re:juries (Score:1)
Re:Problems with this sort of estimates (Score:1)
The assumptions you detail above and the intricacy of statistics is probably beyond analysis by the typical juror. This means that the jury is weighing not the technical or factual merit of the "experts" but some unknown subjective reason: the persuasiveness of the lawyers, how scary the defendant looks, how many days the case has gone on, or something else. In short, it is only a matter of time before the collection of assumption, error (possibly in procedure or in recording), and juror ignorance produced the result in this British case.
I hope that this acts as a wake-up call here in the U.S. but somehow I doubt it.
There was no mismatch, that is a lie. (Score:1)
This is like saying that we acidentally killed the wrong guy after releasing an innocent man from questioning. It didn't happen. It didn't even come close to happening, people are just ignoring the guy is sitting at home in his house and that it was the exact same DNA matching lab (the ones that CORRECTLY matched the first 5 pairs) that matched the 10 and said it wasn't the right guy.
Esperandi
Re: British DNA Database Mismatch (Score:1)
Mind blown American expert. (Score:1)
X is the unknown quantity,
a spurt is a drip under pressure.
So a 1 in 56 chance actually happening blows the minds of some 'American experts'. Scary is the word that comes to mind. So it's easy to mock journalists. Mock mock mock.
Re:juries (Score:1)
this should be considered in context (Score:1)
criminal. These kinds of problems actually happen with
fingerprints too, as for purposes of searching the databases
they only use something like 29 features of the fingerprints
for matching by computer, then use humans to make exact matches.
That said, even if he had been convicted, this one case
in 37 millions doesn't even begin to compare in magnitude
to the number of people who have been wrongly convicted
by eyewitnesses and the like.
bitter, but true. (Score:1)
odds not so great (Score:1)
660,000 "possible suspects". (Score:1)
Great. Guilty by association, and there's two-thirds of a million associated people. I wonder if the cops take a sample, and you prove not to be guilty, if you can insist that your DNA be removed from the database. I doubt it, though.
--
You're all doing statistics incorrectly (Score:1)
Re:DON'T THEY KNOW ANYTHING ABOUT STATISTICS? (Score:1)
If the 1 in 37M figure was the final probability, they must have taken the size of the data set into account. But if that was taken into account, why did the article say:
"British authorities say the mismatch probably was caused by the rapidly increasing size of their database, which has grown from 470,000 potential suspects to 660,000 in the past year."
Also, your numbers are too high to come out with a final probability of 1 in 37M.
Re:thank goodness for the recount.. (Score:1)
All you Americans must realise that you sue far too much, and unfortunately the rest of the world is going that way as well
Re:Evidence (Score:1)
Re:Statistics and probability (Score:1)
The point is that the DNA might not have been in the database. If it was guarnteed to be in there then you would be right, perhaps.
Re:Evidence (Score:1)
Re:Statistics and probability (Score:1)
Although if you are in the database you have previously commited a crime, so there is a good reason for suspicion if you do match.
Re:DON'T THEY KNOW ANYTHING ABOUT STATISTICS? (Score:1)
It's a bad day for some one... (Score:1)
Essentially the jury belived that the DNA test was 100% accurate and that there fore the man, and all the witnesses (all of them) were lying.
I can understand that, when you have a dodgy looking normal Joe charged by the Police force, repectable scientest types stating how accurate the tests are etc etc then poor old Joe is scrwed.
The sad thing was the people doing the DNA tests F%&#D up. Further tests (after a couple of years in jail) proved that this man wasn't the Rapist!
Re:thank goodness for the recount.. (Score:1)
Q: you know what the funniest thing about that is?
A:
hehehe. i think i'm soooo funny.
but really, i did a 'furst' post, not a first post. and besides, I apologized.
what about an amendment to andy warhol's "15 minutes" saying.. something like 'Every geek will, once in their life, be entitled to a first post and offtopic reply string,' or something along those lines? huh? what about it?
hehehe. i think i'm soooo funny.
furst (Score:1)
thank goodness for the recount.. (Score:1)
can you imagine the horror that the accused must have gone through? poor 'bloke' will probably be able to sue pretty nicely, though. what's the basis of law in britain, more like canada than california i hope..?
for the sake of Charles P. Taxpayer Jr. that is.
Re:juries (Score:1)
6 loci: 1 in 37 million
10 loci: 1 in one billion
The FBI tests 13 loci. There is the potential for one in billions chance. It just depends on the testing methodology.
Re:Death Penalty (a bit offtopic) (Score:1)
Re:Are those in the DB likely to have similar DNA? (Score:1)
Strangely, not so. In Britian at least, the Police may ask you to provide a DNA sample, generally to exclude people from a crime (This is generally used in cases such as a sex attack, killing etc.) You are not obliged to give a sample, but many people do.
There is nothing to stop the DNA sample being added to the National database even if you have not commited a crime. This was the situation in this case, the gentleman in question had no previous convictions, but had provided a DNA sample in the past.
Evidence (Score:1)
The most astounding thing about this, is that the suspect in question was disabled, epileptic (IIRC), had never even been to the place where the crime in question was commited, and had a rock solid alliby for the time & date the crime was comitted. Scotland Yard ignored all of this, and prosecuted solely on the basis of the forensic "evidence".
How many more people who are protesting their innocence, have been convicted on the basis of forensic evidence alone? How many of these convictions could now be wrong?
Re:Evidence (Score:1)
Prosecutors Fallacy (Score:1)
What they forgot to mention.. (Score:1)
Regards,
Re:The system is already flawed (Score:2)
This system can only be relied upon to "prove" guilt where every loci is tested.
Even if all match, there is a very tiny but non-zero probability that the match is a false positive. The question then is how much doubt constitutes reasonable doubt? (Or the equivilant phrase in non U.S. courts).
This is what happens... (Score:2)
... When prosecutors abuse scientific evidence with pseudoscience. DNA evidence is exclusionary in nature, not inclusionary. In other words (assuming no procedural errors etc) no match = didn't do it, match = COULD have done it. Of course, prosecutors would have the jury believe the opposite. If science is to be used to convict, then scientific thinking MUST be involved if there is to be fairness. No proper scientist would consider a DNA match on 6, 10, or 16 loci as conclusive (but would consider it a VERY strong reason to investigate further).
Consider the 1 in 37 million. If the database were complete for the world population (about 6 billion), that means that on average, any given DNA sample would appear to match 162 people. The 16 locus test that the FBI uses is better, but still is not damning in and of itself.
Now, add in procedural error and other bad thinking and you have (to me) reasonable doubt unless there is some other evidence.
I am certainly not against convicting criminals, but I AM against decieving juries into believing that a DNA match is damning evidence. Matching DNA evidence should be regarded as the beginning of an investigation, not the end.
They reported problems with DNA testing long ago (Score:2)
Anyway, they made a claim that the current DNA testing at that time was flawed and often made matches that were incorrect, flying in the face of the astronimcal odds. I think that there were two stages to the problem, one was cross-contamination, and two, the cloning process that makes the sample big enough for testing cloned the contaminating DNA too.
Perhaps the labs were using the same containers for both the evidence DNA and the sample DNA without proper cleaning between tests? It only takes one fragment of DNA to screw the whole thing up. I think that there was serious concern about the use of cost-cutting independent labs who were bidding to do this work for the police at the lowest possible rate.
How is this mind-blowing? (Score:2)
Look. It's a 1:37-million chance if you're comparing one person's DNA to one sample (probably found at the crime scene) That's why you only use DNA testing to weed people who couldn't possibly have been involved from a very narrow range of subjects. You can't pick out one suspect from a huge list.
This is the problem with archiving everyone's DNA. You know it'll be used for stuff like this, because law enforcement will get lazy.
DNA testing is a Good Thing. It's a very safe, reliable way to identify suspects. But only if you use it properly. This is hardly a "proper" use of the tests, and I'm not at all surprised that this happened. It's a case of lazy law enforcement more than faulty testing.
Re:Statistics and probability (Score:2)
It makes the utmost difference whether the police have a suspect and then use DNA matching to see if he did the crime or if they use DNA matching to find a suspect. As this poster mentions it is then a much lower probability that you did in fact commit the crime.
It is exactly the same as disease testing. If you have a large population which is uninfected (not guilty) a positive match even from a very reliable test is highly likely to in fact be an error.
Of course if you up the test to some obscene number of points you can probably make the probality of error very small again. Of course this leaves the scary possibility that people are falsely convicted because they left a hair lying around...but their are always false convictions.
Re:DON'T THEY KNOW ANYTHING ABOUT STATISTICS? (Score:2)
More statistical problems (Score:2)
What not so many have pointed out is that the true odds are probably lower than 1/37 million. That figure is based on the contents of each loci being independently distributed. (With about 1/18 of a match at each loci.) Well we know that is strictly not true - after all a sibling of yours will have 1/64 of getting the same loci from the same source that you did. But are there any larger effects?
The answer is that there is. Suppose that some of the loci have a different distributions in frequency between anglo-saxons, Celts, and East Indians. Then the chance of finding a match between 2 East Indians could be far higher than they estimate. For instance if that 1/18 figure was changed to around 1/9, the chance of matching 2 East Indians now becomes about 1/530,000. Even if your database has only 50,000 East Indians in it, if an East Indian committed the crime, the chance of a false positive is around 10%. Much higher than you would expect. (I am using East Indians because I understand that they are a disliked racial minority in England. Substitute your favorite group if you wish.)
So the moral of the story? Not only is the technique going to inevitably produce false positives, but it is likely to do so in a racially biased manner!
Regards,
Ben
Re:Statistics and probability (Score:2)
1: The chance of a DNA match (in this 6-loci case) is 1 to 37000000.
2: That means that ONE DNA-sample compared to ONE other DNA-sample has the chance to in 1 of 37000000.
3: If You have TWO other DNA-samples to match against you have a chance of match in 2 (TWO!) of 37000000 !
4: Any other circumstances have no impact on this if THEY HAVE NOTHING TO DO WITH THE DNA-CODE !
5: In this case we have 660000 OTHER DNA-samples to match against ! The rest is obvious
Thomas Berg
Mundus Vult Decipi
Birthday Paradox? (Score:2)
The question I ask... (Score:2)
Sure, all things being equal, I would prefer there be no chance of anyone being wrongly convicted; however, the fact of the matter is that we don't live in a perfect world. We were no better off before DNA testing. All we've ever been able to gaurantee in the courts is due process. There has always been (and likely always will be, to some degree) human error and prejudice involved in any trial. DNA, despite its flaws, brings us that much further away from those kinds of errors...
Re:DON'T THEY KNOW ANYTHING ABOUT STATISTICS? (Score:2)
The "1 in 37,000,000" figure is presented as a final probablility of a match. Where did you see *anything* about there only being 37,000,000 possible permutations?
If there were only 37M permutations of 6 loci, that would imply roughly 20 discrete possible values at each loci. Is that how you envisioned the underlying data?
I don't know what test they use in the UK, but I'm assuming that it's the RFLP -- basically they use a highly specific enzyme to chop up the DNA, and place it on a polyacrylamide gel under an electric current/field to measure the size of the fragments. (Actually, nowadays, they probably use pre-synthesized n-nucleoside primers and PCR [polymerase chain reaction] to chop and selectively amplify the fragments, but the principle is the same)
A single gel can easily measure fragments ranging from a few hundred base pairs to 10-400+ kbp with good resolution. The exact range varies according to current/field, gel composition, and other factors, but the bottom line is: it's easy to see bands that are a millimeter apart, so if you use a foot long gel, the range of possible values is close to 300. that creates:
300^6= 7.29 x 10^14 possible permutations
Actually, 0.5mm is a more realistic resolution limit, so the actual number of resolvable values is at least 600.
(600 values) ^ (6 loci) =4.6x10^16 permutations
These are just crude estimates, for the benefit of those who've never read a electrophoresis gel. In actuality, the range of allowable values might be limited by other factors (values that are too extreme may be eliminated as artifacts) But it does give a sense of the TRUE numbers involved.
(with modern gels and automated readers, the resolution may be even higher, but my experience was with UV lamps, eyeballs and Polaroid prints way back in the 1900's... 1991 or so)
Please run your analysis again using this range of possible permutations, and you'll see that 1:37M could well be a FINAL probability.
Actual experience counts for something. (And as someone who still likes to consider himself a Young Turk, I hate myself for saying that!)
Statistics and probability (Score:2)
I don't think so. Maybe onle one person in 37 million would match that DNA, but they were searching from 660,000 people. That makes the probability 660,000 : 37,000,000 or more plainly,
1:56.
I bet that figure never came up at trial. This is blatantly a case of a mis-understanding of probability, from what I have read about the case. They have to use DNA to narrow the search from a few suspects, instead of using it to pick out a person from 660,000 previous convicts.
Re:Simple Probability (Score:2)
For n=2
364 ways second person could have birthday without matching first
For n=3
363 ways third person can have birthday not matching other two
p(match) = 1-365x364x363/(365^3)
....
when this gets to about 20, p(match) is about 40%!!
The chances of a DNA match amounts to a similar problem, so the stats rapidly build up to an high likelihood of a match after about 20-30 samples.
Re:Twins? (Score:2)
The environmental factors are acting all through out the identical twins life time to make their DNA different. They're know as viruses. Other mutagens will also cause even more differences over time.
Then there is the testing method. The electrophersis gell tests used have rather poor repeatability. Sure some things can be done to help make them better. I wouldn't accept a match when the samples are done on two different machines in different labs. Having two different gel suppliers also makes a huge difference. The test is really only telling you the length of strands between markers where the chemicals split the strands into segments.
I knew DNA was all mythology (Score:2)
Welcome to the world of statistics . . . (Score:2)
They say there was a one in 37 million chance of this false match occuring - so? There's a one in multi-millions chance of someone winning the lottery, and yet it generally happens (I realise they're not equivalent cases, but it does show my point) - whenever you talk about probabilities, you have to realise that they are only relevant over a statistically significant sample size. They say nothing about individual cases - anomalies happen, the one-in-a-million chance does happen, and almost certainly will happen if you take a large enough sample.
The most important thing to understand is that this anomalous case does not invalidate DNA evidence - all it does is highlight the statistical nature of such evidence. DNA evidence (assuming the methodology of the tests is good) is exactly as useful now as it was before - that is, very useful - as long as it isn't abused. And generally speaking, the various police forces that use it are honest enough that they don't abuse it (witness the fact that they got a second opinion in this case).
This is an interesting and eye-opening occurence, but it isn't the end of DNS evidence in forensics.
himi
--
Re:Death Penalty (a bit offtopic) (Score:2)
Re:1:370000000 (Score:2)
This point is the only valid take-away from the whole article. The British database only captures a DNA fingerprint based on 6 loci and we've all seen the math on that. The vast majority of US states and all federal cases require DNA tests with more than 10 loci. The odds of this error cropping up in the states is significantly less.
p.s. This is an *old* story. It was reported at the beginning of the month in several British papers and ran on CNN on Tuesday. Granted, Saturday night is a slow time for Slashdot, but it'd be nice to hear stuff we didn't already know. :)
Re:Statistics and probability (Score:2)
Not quite. If you are in the database you have been *convicted* of a crime. You may not have actually been guilty. For this reason, your previous track record cannot legally be taken into account when deciding whether you are guilty.
Re:Welcome to the world of statistics . . . (Score:2)
To be specific, if their database has 700000 entries in it, it has 700000*699999/2 pairs in it. That's 245 billion pairs. If the odds against any pair matching at random is 37 million to one, that means there are a *LOT* of matches in that database, probably about 7000 of them.
This simply seems to be a case of scaling the database without scaling the identification key --with predictable results, non-unique keys.
Anyone know how the probability of a bad match decreases with number of loci tested?
Re:DON'T THEY KNOW ANYTHING ABOUT STATISTICS? (Score:2)
This brings up too issues. The first is the tendency to believe that technology can put complex techniques within the capabilities of people without training in the field. The second, closely related, is the belief that the reliability of the technology is not effected by the possibility of human error. On anything where the odds are stated as being that long, the two things I always ask are:
Re:Twins? (Score:2)
DNA is evidence, not proof (Score:2)
As the article mentions, there is a 1 in 37 million chance of this happening. Statistically this means that while it will not happen often, it will happen at some point.
I think the problem arises from the wide spread belief that DNA testing is infallible and provides concrete proof of a persons guilt/innocence - it does not.
DNA evidence is just that, evidence, and should be regarded as such in court. If DNA testing along with collaborating evidence indicates the person is guilty, then they probably are - or vice versa. If there is evidence that points against the DNA results, one should not automatically assume that the DNA results are correct.
Not as unlikely as you might think... (Score:2)
"Million to one chances happen nine times out of ten."
Re:juries (Score:2)
Did you read the article - they re-tested with 10 points of reference, which supposedly has a 1 in 1,000,000,000 chance of a mismatch, so it was more a case of not using the most reliable test they could. Also, apparently in the US they use 13 points of reference, which presumably has a stupidly large number for it's mismatch chance. I guess it'll just change the procedure so they use the 1 in 37,000,000 and re-test with a higher level if it matches to confirm it.
Are there any figures for finger print testing? How truly unique is a single finger print, and whats the chance of mismatch with 2 finger prints? DNA testing is still pretty accurate!
Statistical problem can be overcome (Score:3)
If the probability of a false positive in any individual test is p, then the probability of conducting n tests without getting any false positives is (1-p)^n. As pointed out, this means that if enough tests are done you'll almost certainly convict an innocent person. If you have two crimes with DNA evidence that is only this reliable, then more than likely some innocent person in the UK would test guilty.
Actually, it's worse than this because people don't have independent DNA - they're likely to be distantly related. This makes false positives even more likely.
If there are n people and you want the probabilility that any of them test positive to be less than x then you need
1 - (1-p)^n < x, which is nearly the same as 1 - p*n < x. So to be fairly sure that nobody in the world falsely tests positive you need p to be less than about 1 in 80 billion.
P(false positive) -> 1 as n -> oo (Score:3)
P(false positive) = 1 - P(no false positives)
= 1 - (P(correct answer))^n
= 1 - (1-p)^n
-> 1 as n -> oo.
This is ignoring the probability of a false negative; this is very low since only one person can commit a crime!
Re:Statistics and probability (Score:3)
1 - (1 - 1/37million) ^ 660,000
which is nearly the same as
660,000 / 37million = 1/56.
DON'T THEY KNOW ANYTHING ABOUT STATISTICS? (Score:4)
This is so basic, I can't even believe it! I can't believe peoples lives are decided on such a weak mathematical basis!
If the chance of a match between two random DNA samples is 1/37.10^6, and they have 660000 samples in their database, then the likelihood -- assuming their system does'nt give false positives, which I doubt -- of a database match is ... 1.78% !!! We don't know how much DNA tests they make each year, but it's porbably well over a thousand, wich leads to over 10 false positives a year!
Americans find that "mind blowing"? Minboggling stupidity, if you ask me