Motivation: Single molecule real-time (SMRT) sequencing has important and underutilized advantages that amplification-based platforms lack. Among others, lack of GC-bias (systematic error), ability to accurately assemble large repetitive regions, and complete de novo assembly without the need for scaffolding, can be mentioned. Here, we introduce PBHoover, a software that uses a heuristic calling algorithm in order to make base calls with high certainty in low coverage regions. This software is also capable of mixed population detection with high sensitivity. In order to improve coverage depth, PBHoover uses CigarRoller—an in-house developed CIGAR string correction package.

Results: We tested both modules on 349Mycobacterium tuberculosis clinical isolates sequenced on chemistry 1 (C1) (n=76) and chemistry 2 (C2) (n=275). On average, CigarRoller fixed 31% of reads per C1 isolate and 49% of reads per C2 isolate. To validate our results, we compared 801920 PBHoover base calls to those of Sanger sequencing: we observed a 99.97% concordance with Sanger sequencing resulting in a quality value of 35.


HADTB Database

The effective control of Tuberculosis (TB) carries great importance today as the rise in drug-resistance hinders efforts to control the disease. Recent advances in sequencing have generated vast amounts of TB genomic data but existing databases are not reliably maintained and do not integrate well with other analysis tools. To address these issues, Hub for Aggregated Data for TB (HADTB) is being developed with an interactive user interface that presents all the necessary information in a single page web application. The web interface provides intuitive user interface for gene-based, variant-based, and isolate-based analysis. All search fields provided for each type of analysis are relationally mapped to one another which allows the user to retrieve all associated genomic and meta data in a single table view. Moreover, a built in report function provides users with a broad range of powerful statistical and graphical methods for easy visualization.



MIRU-HERO is a stand-alone linux based software which takes a whole genome sequence assembly from Mycobacterium tuberculosis or M. bovis and predicts spoligotype, MIRU-type, and lineage information for that sequence. The program will also identify M. canettii. As whole-genome sequencing (WGS) becomes cheaper and more commonplace, useful ways to analyze genomic data must also be available for the scientists who use it. MIRU-HERO was designed as a way for public-health scientists working with WGS data to efficiently generate results which are important in Mycobacterial strain typing/genotyping and outbreak detection.



Github website