Abstract
The COVID-19 pandemic forever underscored the need for biosurveillance platforms capable of rapid detection of previously unseen pathogens. Oxford Nanopore Technology (ONT) couples long-read sequencing with in-field capability, opening the door to real-time, in-field biosurveillance. Though a promising technology, streaming assignment of accurate functional and taxonomic labels with nanopore reads remains challenging given: (i) individual reads can span multiple genes, (ii) individual reads may contain truncated genes, and pseudogenes, (iii) the error rate of the ONT platform that may introduce frameshifts and missense errors, and (iv) the computational costs of read-by-read analysis may exceed that of in-field computational equipment. Altogether, these challenges highlight a need for novel computational approaches. To this end, we describe SeqSeqscreen-Nano, a novel and portable computational platform for the characterization of novel pathogens. Based on results from simulated and synthetic microbial communities, SeqScreen-Nano can identify Open Reading Frames (ORFs) across the length of raw ONT reads and then use the predicted ORFs for accurate functional characterization and taxonomic classification. SeqScreen-Nano can run efficiently in a memory-constrained environment (less than 32GB of RAM), allowing it to be utilized in resource-limited settings. SeqScreen-Nano can also process reads directly from the ONT MinION sequencing device, enabling rapid, in-field characterization of previously unseen pathogens. SeqScreen-Nano (v4.0) is available on GitLab at: https://gitlab.com/treangenlab/seqscreen
Read the preprint here.
Authors
Advait Balajia, Yunxi Liua, Michael G. Nutea, Bingbing Hua, Anthony D. Kappellb, Danielle S. LeSassierb, Gene D. Godboldb, Krista L. Ternusb, Todd J. Treangena
aDepartment of Computer Science, Rice University, 6100 Main Street, Houston, TX,USA
bSignature Science LLC, 8329 North Mopac Expressway, Austin, TX, USA