Lies, Damned Lies, and Statistics

JonB

BBC Radio 4 news just stated:

A man in France has just won some Lottery Jackpot for a second time. The chances of this are 16 trillion to 1."

Now, I have always been interested in statistics, and this sounds offensive to my ear....

16 trillion is 16,000,000,000,000 (not USA!) which is 16 * 10 ^ 12. Since the guy won ~ 1 million €, I think we can safely assume this was calculated via (4 * 10 ^ 6) ^ 2, i.e. 1 in 4 million, times itself.

1 in 4 million sounds vaguely right, on the basis that a ticket costs, say, €1; 4 million are sold; and the pay-out is €1 million @ 25%.
I don't think the chances of winning the lottery are 1 in 16 trillion rather than 1 in 4 million, otherwise the French lottery is making a mint!
We'll assume that 4 million tickets get sold [the news did not claim this was the biggest lottery], there is always precisely 1 jackpot winner, and that the same people play each time, etc., to keep things simple. We will also consider there are just 2 lottery draws --- it's difficult to know how the Beeb would have calculated chances if they attempted to take into account the number of plays over an unknown period of time.

Now, the question is: why do they calculate the chances of such a double-win as the chance of one win times itself?

It is true that for the individual guy who won, "M. de Gaulle", the chances of two wins are the chances of one squared --- the chance he wins the first time, times the same chance the second time.

However, the item would not have been only newsworthy if this individual M. de Gaulle had won twice. Any punter winning twice would have made the headlines.

That means the news item gist of "any previous winner winning for a second time" is precisely the chance of winning once, i.e. 1 in 4 million, not 1 in 16 trillion.

Hmm, as I type this in I begin to critique my own argument, and wonder just how they came to the statistic they did. Still, I've got this far, so I'm posting! Feel free to comment.

Of course, I may well be the only person who cares about this at all in this forum, in which case I'll get no discussion....

:)

SGaist

Did you calculate the chances to get an answer ? ^^

JonB

@SGaist
Well, I was hoping higher than my chances of winning a euro lottery. :)

mzimmers

Math is hard...and statistics are harder.

Your derivation is correct...given your provisos. In reality, however, it's probably that the winner played much, much more than 2 times, as did most of the single-time winners.

If the news source were really interested in accuracy, their calculations would reflect this, meaning that the actual number of entrants is greater than 4MM.

But, if your news is as interested in truth as ours (in the US) is, then this figure has all the credibility of a horoscope reading.

kshegunov

Your spidy sense is correct, Jon, this is called the Gambler's fallacy and journalists, being the lowest of the low have no idea what statistics is to begin with. The point is that something happening will not alter the chance of it happening again. If you toss a coin you have 50% chance to get heads, getting a heads and then tossing it - you still have 50 percent chance to get heads ... :)
Ironically it's exactly BBC that contradicts BBC Radio 4:
http://www.bbc.com/future/story/20150127-why-we-gamble-like-monkeys

... what would be the chance of that happening? ;)

mzimmers

Not exactly the gambler's fallacy; just a case of flawed analysis. Not too surprising, really...probability can be quite counter-intuitive. My father, for example, was a EE and one of the more intelligent people I've known, yet he just couldn't grasp simple probability.

Here's a good problem to illustrate how confounding it can be: imagine a population that suffers from a particular disorder at the rate of .1% (1 in 1000 are afflicted). Someone devises a test for this disorder which, in correctly diagnoses all cases, but also reports a false positive exactly 1% of the time.

You take the test and it reports positive. What are the chances you have the disorder?

kshegunov

I don't get it. The test has nothing to do with your chances of having the disorder (.1%) ...

mzimmers

So...you're claiming that, after you know the test results, your chances are the same as before?

kshegunov

Yeah, pretty much, I guess.

mzimmers

So, here's the deal. As Mike Caro (a brilliant professional gambler) has observed, "in the beginning, everything was even money." In other words, lacking any other information, one's best guess as to the probability of ANYTHING is 50-50.

Now, consider the problem I posed. If all I told you was a certain population was (partially) afflicted with a disorder, and I asked you what the chances were that a given individual in that population is afflicted, your best guess would be 50-50, because you have absolutely NO other information upon which to base an estimate.

So, now I feed you another datum: the population is afflicted with an incidence of .1%. You immediately change your answer from 50-50 to 1 in 1000.

Nothing has changed except the amount of information you possess, yet you've just profoundly altered your estimate (and correctly so).

So, I ask you, why would my giving you a second datum (your test result) not cause you to further revise your answer?

kshegunov

@mzimmers said in Lies, Damned Lies, and Statistics:

So, I ask you, why would my giving you a second datum (your test result) not cause you to further revise your answer?

The second piece of information relates to the accuracy of the test, not the incidence level. The incidence level is unchanged by the reliability of the test.

I am one person, not a population to base measure on. So with some probability (99%) the test is correct and if you average the test measure you'd get that from the 0.1% of people that have the condition 99% were correctly diagnosed and 1% were incorrectly diagnosed (have had false positives). Still, this does not affect the incidence level, just the reliability of the testing.

mzimmers

But I'm not asking what the incidence level is -- I'm asking, what are the chances that you have the disorder? Your goal is to use the available information to make the best guess/estimate possible.

With no other information, your best estimate is 50-50.

With knowledge that your population has an incidence rate of .0%, your best estimate is 1 in 1000 (or 999-1 against to express it as odds).

With knowledge that your test came back positive, your best estimate is...?

kshegunov

Yeah, I got it now, but I have to point out I really hated statistics in the university and Bayes' theorem wasn't one of my favorite topics. I would have the particular disease with probability of 1% and change ...

mzimmers

I'll wait to see if anyone else wants to hazard a guess before I give the answer.

kshegunov

Okay but you do realize this is different from gambling (i.e. the lottery), where every run is independent.

JonB

@mzimmers said in Lies, Damned Lies, and Statistics:

I'll wait to see if anyone else wants to hazard a guess before I give the answer.

Can you wait 24 hours on that? I want to read & get my head around what you're saying so I can try to answer, but it's way too late tonight now .... :)

mzimmers

@JonB heh...sure, I'm not going anywhere. Anyone who can't wait for the answer can message me...

JonB

@mzimmers ... tell me tomorrow how many ppl messaged you ... :)

JonB

@mzimmers
Right, let's start my logical analysis :)

First, let me see if I've got the figures from what you have said:

Out of every 1,000 people, 1 has the affliction.
The test will always identify that one person as being afflicted.
Additionally, the test will report 10* other people as being afflicted who in fact are healthy.

[* Actually, the remaining population is 999, so really 9.99 rather than 10.0. This would affect my final figure, but I imagine you're not looking for that degree of accuracy, so my answer will be right to nearest couple of decimal places!]

Obviously I have misunderstood them I reserve the right to be corrected by you and then re-analyse! Otherwise, please continue....

So, I take the test, and it reports me positive. (I knew it! Just my luck :( This is about my smoking, isn't it?)

Well, in this case, the test has reported 11 people as positive. 1 is genuinely positive, while 10 are false positive.

My conclusion:

Before the test result I had 1 in 1,000 chance of the terminal illness you are imposing.
After the test I have a 1 in 11 chance of being the positive one, and a 10 in 11 chance of being one of the falsies.

If it helps any, you can also think of this as balls in a bag:

There is 1 black ball, which has "You're toast" on a piece of paper inside it.
There are 10 black balls, which have "Only kidding" on a piece of paper inside them.
There are 989 white balls.

You put your hand in the bag and pull out a ball. It's black :( Given that, until you open the ball and look at the piece of paper, there's a 1 in 11 chance it contains the fateful news.

Right?

======================================================

Meanwhile....
You also wrote:

As Mike Caro (a brilliant professional gambler) has observed, "in the beginning, everything was even money." In other words, lacking any other information, one's best guess as to the probability of ANYTHING is 50-50.

I don't know if there was a context in which he wrote this which you have omitted, but that's a very strange statement. Lacking any information at all, one's "best guess" of a probability should not be anything like "50-50". I can only think a gambler might think that way!

BTW, a quick analysis:

I tell you I have a bag of balls, which you cannot see.
I ask you to guess how many balls are in the bag.
This is an example of "you have absolutely NO [other] information upon which to base an estimate".
You say: There are 23 balls in the bag.
According to you/him, the odds of this being correct are 0.5.
You decide to guess again. This time you predict 587.
Again, you/he claim the odds of this being right are 0.5.
Finally, you decide to change your mind to 77.
One more time, it's 0.5 likely you're right.

3 guesses, each of which has a 0.5 chance of being right? I don't think so!

Now, we could re-analyse precisely what you mean by "one's best guess as to the probability of ANYTHING is 50-50", because perhaps you didn't have just the case above in mind.

But the point is: "lacking any other information, one's best guess as to the probability of ANYTHING is 50-50." is not a "good guess". The correct answer is: "Lacking any information, a 'probability' is simply meaningless." Probability requires some information in order to have anything to say.

J.Hilk

Ok, I give it a try myself

We have the starting position, you either have the illness or your don't, with a 0.1% chance that you have it.
The test always has a result, but there's a 1 % chance the result is the exact opposite.
it is asked only for the cases that the test says "You have it"

you have it 0.001 and the test shows it 0.99 => 0.00099
you don't have it 0.999 but the test says you have it 0.01 => 0.00999

=> 0.01098 ~ 1.1 % chance you're diagnosed with the illness when only 0.1% off all people have it ?