Skip to main content

During the Zika virus outbreak of 2015–16, public health officials scrambled to contain the epidemic and curb the pathogen’s devastating effects on pregnant women. At the same time, scientists around the globe tried to understand the genetics of this mysterious virus.

The problem was, there just aren’t many Zika virus particles in the blood of a sick patient. Looking for it in clinical samples can be like fishing for a minnow in an ocean.

A new computational method developed by VHFC researchers at the Broad Institute scientists helps overcome this hurdle. Built in the lab of VHFC researcher Pardis Sabeti, the “CATCH” method can be used to design molecular “baits” for any virus known to infect humans and all their known strains, including those that are present in low abundance in clinical samples, such as Zika. The approach can help small sequencing centers around the globe conduct disease surveillance more efficiently and cost-effectively, which can provide crucial information for controlling outbreaks.

The new study was led by MIT graduate student Hayden Metsky and postdoctoral researcher Katie Siddle, and it appears online in Nature Biotechnology.

“As genomic sequencing becomes a critical part of disease surveillance, tools like CATCH will help us and others detect outbreaks earlier and generate more data on pathogens that can be shared with the wider scientific and medical research communities,” said Christian Matranga, a co-senior author of the new study who has joined a biotech startup in the Boston area.

Scientists have been able to detect some low-abundance viruses by analyzing all the genetic material in a clinical sample, a technique known as “metagenomic” sequencing, but the approach often misses viral material that gets lost in the abundance of other microbes and the patient’s own DNA.

Another approach is to “enrich” clinical samples for a particular virus. To do this, researchers use a kind of genetic “bait” to immobilize the target virus’s genetic material, so that other genetic material can be washed away. Scientists in the Sabeti lab had successfully used baits, which are molecular probes made of short strands of RNA or DNA that pair with bits of viral DNA in the sample, to analyze the Ebola and Lassa virus genomes. However, the probes were always directed at a single microbe, meaning they had to know exactly what they were looking for, and they were not designed in a rigorous, efficient way.

What they needed was a computational method for designing probes that could provide a comprehensive view of the diverse microbial content in clinical samples, while enriching for low-abundance microbes like Zika.

“We wanted to rethink how we were actually designing the probes to do capture,” said Metsky. “We realized that we could capture viruses, including their known diversity, with fewer probes than we’d used before. To make this an effective tool for surveillance, we then decided to try targeting about 20 viruses at a time, and we eventually scaled up to the 356 viral species known to infect humans.”

Short for “Compact Aggregation of Targets for Comprehensive Hybridization,” CATCH allows users to design custom sets of probes to capture genetic material of any combination of microbial species, including viruses or even all forms of all viruses known to infect humans.

To run CATCH truly comprehensively, users can easily input genomes from all forms of all human viruses that have been uploaded to the National Center for Biotechnology Information’s GenBank sequence database. The program determines the best set of probes based on what the user wants to recover, whether that’s all viruses or only a subset. The list of probe sequences can be sent to one of a few companies that synthesize probes for research. Scientists and clinical researchers looking to detect and study the microbes can then use the probes like fishing hooks to catch desired microbial DNA for sequencing, thereby enriching the samples for the microbe of interest.

Tests of probe sets designed with CATCH showed that after enrichment, viral content made up 18 times more of the sequencing data than before enrichment, allowing the team to assemble genomes that could not be generated from un-enriched samples. They validated the method by examining 30 samples with known content spanning eight viruses. The researchers also showed that samples of Lassa virus from the 2018 Lassa outbreak in Nigeria that proved difficult to sequence without enrichment could be “rescued” by using a set of CATCH-designed probes against all human viruses. In addition, the team was able to improve viral detection in samples with unknown content from patients and mosquitos.

Using CATCH, Metsky and colleagues generated a subset of viral probes directed at Zika and chikungunya, another mosquito-borne virus found in the same geographic regions. Along with Zika genomes generated with other methods, the data they generated using CATCH-designed probes helped them discover that the Zika virus had been introduced in several regions months before scientists were able to detect it, a finding that can inform efforts to control future outbreaks.

To demonstrate other potential applications of CATCH, Siddle used samples from a range of different viruses. Siddle and others have been working with scientists in West Africa, where viral outbreaks and hard-to-diagnose fevers are common, to establish laboratories and workflows for analyzing pathogen genomes on-site. “We’d like our partners in Nigeria to be able to efficiently perform metagenomic sequencing from diverse samples, and CATCH helps them boost the sensitivity for these pathogens,” said Siddle.

The method is also a powerful way to investigate undiagnosed fevers with a suspected viral cause. “We’re excited about the potential to use metagenomic sequencing to shed light on those cases and, in particular, the possibility of doing so locally in affected countries,” said Siddle.

One advantage of the CATCH method is its adaptability. As new mutations are identified and new sequences are added to GenBank, users can quickly redesign a set of probes with up-to-date information. In addition, while most probe designs are proprietary, Metsky and Siddle have made publicly available all of the ones they designed with CATCH. Users have access to the actual probe sequences in CATCH, allowing researchers to explore and customize the probe designs before they are synthesized.

Sabeti and fellow researchers are excited about the potential for CATCH to improve large-scale high-resolution studies of microbial communities. They are also hopeful that the method could one day have utility in diagnostic applications, in which results are returned to patients to make clinical decisions. For now, they’re encouraged by its potential to improve genomic surveillance of viral outbreaks like Zika and Lassa, and other applications requiring a comprehensive view of low-level microbial content.

The CATCH software is publicly accessible on GitHub. Its development and validation, supervised by Sabeti and Matranga, is described online in Nature Biotechnology.