# Correcting numerical misconceptions

Okay, time to loosen up a bit on this blog. Ponderous meditations on philosophy and history are great, but I want to stretch all kinds of writing muscles. Here are a few short notes I wrote today which aimed to clear up numerical misconceptions.

First: ICBC recently released a list of the top 10 Vancouver intersections for car-bicycle collisions; yesterday’s Vancouver Sun ran a front-page article warning us all about these “dangerous” intersections. But these are not at all the same thing. I wrote the following letter to the editor *[update: it was published on Dec 4 2010]*:

Dear editor,

As an avid Vancouver cyclist, I read with interest your article on ICBC’s recent top 10 list of intersections for cyclist-car collisions. However, it seems to me that calling these intersections “dangerous” might be giving commuters misleading information.

As I understand it, the list seems to be based on the total number of collisions at each intersection. This is valuable information for any public agency interested in reducing the total numbers of collisions, since it suggests that that’s where safety measures should be placed to have the biggest impact.

However, if I’m an individual cyclist planning my route, should I avoid these intersections? Not necessarily. These intersections could be the safest in the city, but if that’s where most people go, they could still account for the lion’s share of collisions. All else being equal, more cyclists (or cars) means more collisions with cars (or cyclists). Instead of total numbers, an intersection should be seen as “dangerous” when there are a **disproportionate** number of collisions for the amount of traffic. Cyclists should be told about collision-per-cyclist rates, and motorists should be told about collision-per-motorist rates.

This is not just an academic point: if people start avoiding these so-called “dangerous” intersections, they might just end up at intersections which are actually **more** dangerous, but which were previously not on this list simply because they were going (rightly) unused. This would make the problem worse.

I wonder whether ICBC (or anyone else) has bike and car traffic data for each intersection. If so, I’d love to see the traffic-adjusted collision rates. Until I see those data, however, I’ll stick to my favourite bike routes—Burrard and 10th.

I also wrote a letter to ICBC pointing out the same misconception, and asking if they have the traffic data.

Then, today, we had a lecture at school on interpreting medical research, which included a review of such fundamental concepts as sensitivity, specificity, pre- and post-test probabilities, likelihood ratios, absolute and relative risk reduction, and number needed to treat. The lecturer kept pre-emptively acknowledging our collective boredom and confusion, which I find slightly tragic (I for one love this stuff and think I understand it pretty well). As is usual in any discussion of research findings, there came the offhand remark that a low *p*-value means the finding is probably true. Unfortunately that’s not exactly true either, so it’s always bugged me when I’ve heard it, though I’d never before bothered to work out the details for myself. And—sorry to 95% of my audience—I’m not going to bother explaining the math, I merely leave it here for posterity.

An experiment compares a null hypothesis *H0* (e.g. “the experimental treatment is not helpful”) with an alternative hypothesis *H1* (“the experimental treatment has some benefit”). It then produces some result.

The *p*-value is the probability that a result at least as extreme as the one produced would have happened, if *H0* was true. That is, *p* = P(*result* | *H0*). If this is low, we say “good!”

But that’s not what we’re actually interested in. We want to know if *H0* is likely, given our result. That is, we want to know if P(*H0* | *result*) is low. (Call this *h*.) Precisely the opposite!

The reason we use the *p*-value is that P(*result* | *H0*) can be calculated, but P(*H0* | *result*) can’t. A classic case of measuring what is measurable and treating it as important, rather than measuring what is actually important!

Still: if *p* << 1 (i.e. *p* is low), can this tell us anything about whether *h* << 1? Well:

*h* = P(*H0* | *result*) = P(*result* and *H0*) / P(*result*) = P(*result* | *H0*) * P(*H0*) / P(*result*)

And so, *h* << 1 if and only if P(*result* | *H0*) * P(*H0*) / P(*result*) << 1.

That is, *p* = P(*result* | *H0*) << P(*result*) / P(*H0*).

Even if *H0* is likely without the result, P(*H0*) can’t get any higher than 1, so it won’t affect much how we judge *p*. But if P(*result*) is small — that is, if your experiment has produced a result that seemed unlikely from the beginning — then you need a lower *p*-value to be convincing. So it probably doesn’t make sense to apply the same *p*-value cutoff (usually 0.05) to all situations; we still need to evaluate the claims of an experiment in light of the rest of our knowledge about how plausible those claims are. I believe this constitutes the proof of the old adage that “extraordinary claims demand extraordinary evidence”.

This is, I hope, a very basic derivation found in every experimental statistics textbook; I’m sure I have done nothing new here, but it’s more fun to derive this stuff than look it up.

Actually, I am just doing this to distract myself as I positively twitch with anticipation for the Ontario decision on prostitution laws. The Canadian government was originally given 90 days to come up with a good argument why the laws shouldn’t be struck down. They haven’t been able to, so they’ve asked for more time. Those 90 days ended last Saturday and I’ve been checking Google News regularly for any updates. Today I learned that the decision on whether to give the government a little more time, and keep the laws in place a little longer, is due to be released on Thursday at 11am EST. I can’t wait! I’ll write more about this, and the related issue of sex-positivity, in my next entry.

I really like your point re: adjusting for traffic volume, with the caveat that I think the benefits of this approach diminish somewhat when you deal with rare events. Comparing two intersections with similar amounts of traffic, if one has 1 accident/year and the other 2 accidents/year, do you think it’s *really* fair to say that the second is twice as dangerous? In those cases I think I’d rather see the raw data and figure it out for myself. But I do think ICBC should at least consider normalizing for volume of traffic. Let us know if they get back to you!

Side note- If I had to take a wild guess, I’d argue that I’d expect a graph of risk of collisions (normalized by number of cyclists) vs. number of cyclists to show a bimodal distribution – high risk when there are so few cyclists no-one is expecting them, and probably highest risk at very busy intersections with lots of cyclists.

However, I didn’t understand why you think that “if the thing you hoped to prove was unlikely from the beginning — then you need a lower p-value to be convincing. So it probably doesn’t make sense to apply the same p-value cutoff (usually 0.05) to all situations”. For the non-mathematicians in the audience, could you illustrate this using an example?

Re: rare events: you are right, of course, except that I think that’s a separate issue and not a caveat of traffic-adjustment specifically. The benefits of raw totals, as a measure of “danger” or anything else, also diminish with rare events.

I’m not sure why per-cyclist risk would go up with more cyclists. I would think more cyclists would mean fewer collisions, due to greater immediate visibility and (as you say) driver expectations. More cars, though, definitely. (And vice versa, swapping ‘cars’ and ‘cyclists’.)

As for your last question – I’ve edited slightly for clarification, so the quote no longer matches, but your question still applies. Suppose you decided to re-test whether insulin really was useful in type I diabetes, and your findings contradicted all prior study of the subject. Even if your methods were flawless, a

p-value of 0.01 should be taken as not at all convincing, even if that might pass muster in other settings.p<0.000001 might be considered compelling though. (I am just making these numbers up for illustration.) The proof essentially shows that p-values are not meaningful as absolute values, but must be considered relative to the prior probability of the experimental result.Aha! I knew you were a closet Bayesian 🙂

Closet?! If this makes me a Bayesian, why would anyone NOT be a Bayesian?