Banner image

Insights

Confronting Corona: Using Pooling to Detect Coronavirus with Fewer Tests

Dina Berenbaum, Researcher at Stardat (a UST company)

As demonstrated in South Korea, the country which so far seems to be the most successful in slowing down infection rate, mass testing for infected individuals helps speed up the testing process.

Dina Berenbaum, Researcher at Stardat (a UST company)

Style

Post-info

A few days ago, I suggested to Professor Roy Kishony, my former advisor, the idea of pooling COVID-19 samples together as a way of making the tests more efficient and thus being able to test more people.

Since then, the team in the lab of Prof. Roy Kishony managed to coordinate with overwhelming speed and efficiency an experiment that shows that in fact the LOD (limit of detection) of the currently used COVID-19 test allows for 1:32 and even 1:64 dilution. This amazing result was posted on Twitter.

Prof. Roy Kishony’s experiment demonstrates how to pool the COVID-19 testing process. We can mix a single positive sample of COVID-19 with 31 or 63 negative samples respectively, and still manage to detect the virus.

How Does Pooling Help?

As demonstrated in South Korea, the country which so far seems to be the most successful in slowing down infection rate, mass testing for infected individuals helps speed up the testing process.

Together with Maor Ivgi, Stardat CTO, we present an explanation of the motivation behind the sample pooling method. I will explain how we can utilize an ability to detect COVID-19 in pooled samples to identify infected individuals with much fewer tests.

Important disclaimer: We are neither epidemiologists nor we are laboratory workers. We do not claim to understand or predict the course of the epidemic or the best lab practice. We are simply adding some mathematical background that can help improve testing efficiency. All the code for the calculations can be found here.

An Introduction to Pooling Strategy

Let’s first define some basics. Even if you are not a mathematician, this section should be clear enough for you. I’ve tried to incorporate some basic math into a coherent explanation of the technique:

Problem to Solve: Given N samples collected from individuals who are suspected to be infected by the COVID-19 virus, labs currently have to conduct N tests in order to identify all the positive samples. Given the limited number of laboratories and equipment, this is not efficient. We would like to decrease the number of tests required.
Pooling: This means mixing s samples from different people together and conducting the test on the mixed sample.
Probability of Sickness: This is represented by the letter p, meaning the estimated probability of an individual being infected, assuming that infections are independent across the population.
Claim: Given a number of samples of |s|=N>1, for certain values of p, we can, in expectancy, identify all positive samples in less than N tests. This will be done using a “Divide & Conquer” approach within each pool.

Stated more simply: We can find all of the positive samples inside a group - without the need to test all of them. The way to do it will be to rule out the entire group if it returns negative, and otherwise divide the group into two subgroups half the size each, and test both. This will be done recursively.

Pre-scoring and Ordering the Samples

While in the previous section, we referred to p (probabilities of infection) as being the same for the entire population - in reality, it is not.

If we manage to score the samples prior to testing based on their risk of infection, we can then order them by their scoring, resulting in fewer positive pools, meaning fewer tests conducted.

Again, without claiming to be epidemiologists or health care professionals, some possible suggestions for the scoring method could be:

Verified close contact with a known patient
Second-order contact
Presence of symptoms

Usage of Blocks

Perhaps at some stage, or in the presence of some considerations, there won’t be a need to identify each individual positive sample. Rather, it might be helpful to identify some very close contact blocks.

As an example, for family members who live in the same household, it will be enough to identify whether at least one family member is infected and act upon it.

In this case, the use of blocks that are moving together in the division will allow for early stopping of the algorithm – i.e., once a node is positive and consists only of the given block, there is no need to continue splitting and testing.

Reducing the Number of Tests to Identify Positive Samples

We have presented here a strategy, backed by math, to reduce the number of tests needed to identify positive samples.

Prof. Roy Kishony’s experiment proves that COVID-19 tests can be pooled at a rate of a 1:32 and even a 1:64 dilution. This means that we can mix a single positive sample of COVID-19 with 31 or 63 negative samples respectively, and still manage to detect the virus. The optimal size of the pooled group depends on the probability of infection.

Moreover, we can further reduce the number of tests needed by scoring the samples prior to testing based on their risk of infection as well as by taking advantage of close contact blocks (such as when testing members of the same family).

By pooling COVID-19 samples together using this strategy, we require fewer tests to obtain the same results. Thus, we can test more people, more quickly.