Loss of sigma factors could have initiated pseudogene accumulation in the
Mycobacterium leprae genome

Supplementary Material

M. Madan Babu

MRC Laboratory of Molecular Biology, Hills Road, Cambridge - CB2 2QH, United Kingdom
Phone: +44-(0)1223-402041 :: Fax: +44-(0)1223-213556

Pseudogenes are non-functional regions in the genome, which have arisen as a consequence of accumulating mutations that either results in the premature termination of proteins during protein synthesis or disrupting transcription. There have been various discussions about the origin and about the model for pseudogene formation1-3, however, there has not been much input on how pseudogenes could have accumulated in an organism. In this brief communication, I propose a two-step model for pseudogenes accretion in the Mycobacterium leprae genome, triggered by the loss of different sets of sigma factors at different time points, during the course of evolution.

Figure 1

Figure 1: A. represents a normal condition where the RNA polymerase core enzyme associates with different alternative sigma factors to regulate expression of different set of genes under different environmental conditions/stress (depicted as blue, red and green). The dotted lines represent the mRNA transcribed from that particular gene controlled by that particular alternative sigma factor. B. During the course of evolution, when a sigma factor (here s1) is mutated, the organism will be unable to express the set of genes, which it previously controlled. However, the organism will survive until that environmental condition is not experienced because the proteins need not be expressed until then. C. This results in a pressure to choose a selective environment, hence forcing the organism to adopt a specialised niche. At this point in time the organism is now in a state where it cannot survive when it encounters the condition (blue). Since the set of proteins will never be expressed, they are equivalent to any non-coding region in the genome and hence there is no selective pressure for the organism to maintain those genes without accumulating mutations. This may now result in a situation where those genes start mutating, leading to an accumulation of pseudogenes in the genome. Thus the loss of sigma factors as an early event will lead to accumulation of mutations and pseudogenes in a genome. If a particular protein is absolutely important, selective pressure will allow mutations in the upstream region to incorporate a different recognition site and hence expression by a ‘different’ sigma factor. This explanation fits very well in the case of Mycobacterium leprae because it has retained only 4 sigma factors and has lost 9 sigma factors (with a large number of mutations suggesting its inactivation was an early event and assuming equal probability for genes to be lost as sigma factors, it also appears that more sigma factors have been lost than expected at random, with a p-value of 2.8 x 10-2). If the sigma factors have been lost successively in the course of time, this will lead to different rates of accumulation of stop codons in the nucleotide sequence. Panels D. and E. shows two different mutation rates seen in pseudogenes of similar length suggesting that a set of sigma factors which was lost first, initiated accumulation of mutations (D), followed by the loss of a second set of sigma factors (E) leading to accumulation of mutations in another set of genes. If this is the case, then proteins with same lengths in these two different subsets should accumulate different number of stop codons at this point in time, as they have remained as pseudogenes for different amounts of time and that is clearly seen here. The red colored points in panels D and E represent the sigma factors, which are pseudogenes in Mycobacterium leprae.

Figure 2

Figure 2: Both sigJ and sigH have higher stop codon accumulation rates compared to their regualted genes as shown in the figure. The pink dots repersents respective sigma factors. The x-axis represents the number of bases and the y-axis represents the number of stop codons accumulated.

Dataset used for the analysis

Clicking on this link will take you to the list of 1116 pseudogenes with gene identifier, length, number of stop codons, gene name, functional class, start and stop position of the gene in the genome and the predicted function in MySQL format

Relevant references

  • Vanin, E. F. (1985) Processed pseudogenes: characteristics and evolution. Annu Rev Genet. 19, 253-272.
  • Li, W. H., Gojobori, T. and Nei, M. (1981) Pseudogenes as a paradigm of neutral evolution. Nature. 292, 237-239.
  • Lawrence, J. G., Hendrix, R. W. and Casjens, J. (2001) Where are the pseudogenes in bacterial genomes? Trends Microbiol. 9, 535-540.
  • Cole, S. T. et al. (2001) Massive gene decay in the leprosy bacillus. Nature. 409, 1007-1011.
  • Cole, S. T. et al. (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 393, 537-544.
  • Lewin, B. (1998). Genes VI. Oxford University Press.
  • Missiakas, D. and Raina, S. (1998) The extracytoplasmic function sigma factors: role and regulation. Mol. Microbiol. 28, 1066-1069.
  • Petrov, D.A., Sangster, T.A., Johnston, J.S., Hartl, D.L. and Shaw, K.L. (2000), Evidence for DNA loss as a determinant of genome size. Science. 287, 1060-1062.
  • Eiglmeier, K. et al. (2001) The decaying genome of Mycobacterium leprae. Lepr Rev. 72, 387-398.
  • Manganelli, R., Voskuil, M. I., Schoolnik, G. K., Dubnau, E., Gomez, M and Smith, I. (2002) Role of the extracytoplasmic-function sigma factor, SigH in Mycobacterium tuberculosis global gene expression. Mol. Microbiol. 45, 365-374.
  • Hu, Y and Coates, R. M. (2001) Increased levels of sigJ mRNA in late stationary phase cultures of Mycobacterium tuberculosis detected by DNA array hybridization. FEMS Microbiol. Lett. 202, 59-65.

This page was last updated on 25th Nov 2002






Since 25thNov 2002