Photo credit to Gerd Altmann
In this article, we will explain how to measure the prevalence of a disease with an imperfect test (sensitivity and/or specificity less than 100%). We will assume that the reader understands the information in the previous articles on binary classification.
When we measure a group of 100 people without a disease using a test with 95% specificity, we expect 95 true negatives and 5 false positives. In the same way a coin flipped 10 times is does not always give 5 heads and 5 tails, the situation above does not alway give 95 true negatives and 5 false positives. There is a probability distribution function describing the frequency of deviation from the expected value. To make this a little easier to interpret we are going to start by examining probability mass functions, probability functions for discrete events.
Consider the a weighted coin flip with probability \(p\) of coming up heads. The probability of \(n\) heads in \(x\) flips is the product of the probability of those individual events, \(p^{n}\times (1-p)^{x-n}\), times the number of combinations those events may occur in, \(\frac{x!}{n!(x-n)!}\). The identical math applies to a medical diagnostic test applied to only positive patients. For a sensitivity \(S_n\), we can compute the probability of obtaining \(n\) number of positive results over \(x\) positive patients.
When we measure a population, we do not know who is positive and negative (that is why we have a test). All of the positive people form a distribution depending on the sensitivity and all of the negative people form a distribution depending on specificity. Notation is tricky here. \(P_\text{pos}\) is the probability function of positive results for the patients who are really positive. \(P_\text{neg}\) is the probability function of positive results for the patients who are really negative.
To combine these distributions we have to take a sum and a product. The probability of any two independent events occuring is the product of their individual probabilities. The probability of \(n\) positive tests is the sum of the probabilities of all situations which lead to \(n\) positive test results.
The graph above shows the probability mass function for the number of positive test results from a sample of 1000 patients and a 10% disease prevalence using a test with 90% and 95% sensitivity and specificity, respectively. It is important to notice if we run this experiment we will arrive at one measured prevalence (x out of 1000 patients will test positive). It is overwhelmingly likely that we will obtain 120-150 positive test results. However, unless we immediately retest the samples with a perfect test, we never find how many of the postive test results were true positive and how many were false.
This test (90%/95% sensitivity/specificity) cannot exactly measure the prevalence of this population but it can put some bounds on it. If the prevalence is 10%, we know there is a 98% chance our test returns 119-152 positive results. If the results are outside that range, we would conclude the prevalence is not 10%.
This equation produces computational problems for large \(n_\text{pos}\) and \(x_\text{tests (on pos people)}\). The first two factors are extremely small and the ratio of factorials is extremely large. To avoid this limitation of computer power, take the logarithm of both sides. Remember Stirling's approximation, \(\log(a!) \approx a\log(a)-a\) for large \(a\). The application of Stirling's approximation to combinatorics is a crucial part of statistical mechanics. As such, Stirling's approximation appears in the early chapters of every statistical mechanics textbook (or at least every book I am familiar with) [1,2,3].
The bottom is equation is far less limiting in modern computers. With my desktop and Python compiler, I can use the non-Stirling formula with samples no larger than 10k. With Stirling's approximation, I can solve for probabilities with samples in the millions without difficulty.
Unfortunately, up to this point we have been viewing the problem backwards relative to our intuition and that can cause subtle but important mistakes. We solved the problem given the prevalence is x, what are the odds the measured prevalence is b? We want to solve the problem given the measured prevalence is b, what are the odds the real prevalence is x? Our solution here is similar to p-values in this way. With p-values and here, every informed discussion must emphasize the following:
Part II of this discussion covers statistical inference, the methods used to estimate probability distributions based on experiments.
[1] "Introduction and Review," in Statistical Thermodynamics, pp. 1—34, 2000.
[2] "Entropy and the Boltzmann Law," in Molecular Driving Forces, pp. 81—92, 2011.
[3] "Equilibrium and Entropy," in Thermodynamics and Statistical Mechanics: An Integrated Approach, pp. 6—20, 2015.