De novo assembly

Typically sequencing data from genomes of interest are aligned reads to a reference sequence. Though cost-effective, every genome is different and this method preculdes the discovery of key novel elements such as structural variations. By assembling each genome “de novo” we are able to detect important genomic characteristics such as copy number variants, structural variants, and gene transfers.


As a proof-of-concept of our de novo assembly pipeline’s accuracy, we tested our assembly of avirulent reference strain H37Ra. To our surprise, not only did our assembly match up well with the reference assembly published in 2008, but we also discovered that mutations in over half of the genes reportedly different in H37Ra than reference strain H37Rv turned out to be sequencing errors, as our assembly matched H37Rv precisely and did not match the prior assembly of H37Ra. We wrote a paper describing our findings, their potential phenotypic implications on the source of virulence attenuation of H37Ra, and the utility of single-molecule seqeuncing for producing high-quality reference genomes. The pre-print is available here and will soon be published in BMC Genomics.


Clinical Isolates

We have de novo assembled the genomes of dozens of isolates from TB patients across the world, and are analyzing their genomes for genetic indicators of antibiotic resistance and other characteristics of interest.

  • Pan-Genome
  • Positive Selection


Epigenetics provide a mechanism for clonal species to alter phenotype is response to environmental cues without a change in DNA. We believe that persistence and other phenotypic characteristics may be related to epigenetic changes in the pathogen. They may buy the organism time until it acquires the genomic mutations that confer drug resistance. With single-molecule sequencing, we are able to study methylation patterns in the M. tuberculosis genome at single-base resolution

DNA Methylation Patterns

Once thought to be merely a mechanism for differentiation of self-DNA and foreign DNA, DNA methylation has been unveiled as a marker relevant to many physiological processes including antigenic variation, recruitment of alternative DNA repair machinery, cell-cycle control, and transposition frequency. We believe such mechanisms exist in Mtb, and are examining the methylomes of clinical isolates for patterns indicative of such roles.

DNA Methylation Heterogeneity

Heterogeneity in DNA methylation may act as a means to allow phenoypic variation in a genotpically clonal species such as Mycobacterium tuberculosis.  We are examining this phenomenon in the genomes of clinical isolates. The concept of methylation heterogeneity and its two forms are explained below

A) Site-specific heterogeneity – A clinical isolate genome exhibiting site-specific heterogeneity (“SSH”, black circle). Projection to subpopulations ( “SP”, red bars) within the isolate shows three distributions among SPs that would give rise to the observed methylated read fraction (33%) at the circled position. Competitive antagonism from DNA-binding factors (DBF) can cause both uniform SSH across SPs (first arrow) and varying levels of SSH across SPs (second arrow). Non-competitive antagonism from DBFs (not pictured) can cause binary variation between full methylation and full methylation across SPs (third arrow) by completely occluding the motif from MTase access.

B) Genome-wide heterogeneity (GWH) – A clear signature of GWH is illustrated in pane 3 (green dots). This pattern can occur when an LOF-MTase mutation arises in a SP but is absent in the other SPs. This may confer phenotypic heterogeneity to the colony by creating SP-specific regulatory responses, potentially widening the range of habitable microniches at the population level. Alternatively, the LOF-mutation could either be selected out of the population, or become the dominant genotype.