Back to MRC-LMB Genomes homepage

---------njboot ------------

This program was written by me (Graeme Mitchison) when Sarah Teichmann and I were analysing some bacterial genomic data*. Anyone is welcome to use it, but should be warned that it's just my home-made program, not tailored for general convenience in any way and not written in well-structured C. I don't guarantee that it's a reliable tool and I offer no support (though I'll do my best to answer questions, of course).

Click here to download NJBOOT.

The program does 100 bootstrap runs using neighbour-joining and analyses the output in two ways: (1) as a list of the trees generated by the bootstrap runs, given in order of their frequencies, (2) as a list of clades that occur in all the trees, again in order of their frequencies. The program does not generate a consensus tree, so the output is different from that produced by running the Programs SEQBOOT, NEIGHBOR, CONSENSE in PHYLIP. For some purposes, it is nice to see the individual bootstrap trees rather than a consensus tree. For instance, one can attach lengths and variances of lengths to edges in the individual trees. However, the number of individual trees may grow too large for convenience if the data do not have a strong phylogenetic signal.

The program constructs PAM matrices using the file pamdata, which must always be present. The sequences to be analysed must be given as an alignment (using "-" for gaps), using the format

Name1 Seq1
Name2 Seq2
....

First compile the program using

cc -O -o njboot njboot.c -lm,

(or whatever is approriate for your C compiler). To run the program, type

njboot <datafile>

I have provided an example data file, testset, which should run in a few minutes. It consists of 5 globins. The output begins:

TREES
frequency=72.00 percent
( HBA_CATCL ((( HBA_HETPO HBA_SQUAC ) HBA_LEPPA ) HBA_LIOMI ))
5 l-branch=HBA_HETPO r-branch=HBA_SQUAC 0.396+-0.077 0.357+-0.071
6 l-branch=5 r-branch=HBA_LEPPA 0.156+-0.079 0.642+-0.124
7 l-branch=6 r-branch=HBA_LIOMI 0.167+-0.061 0.557+-0.100
8 l-branch=HBA_CATCL r-branch=7 0.000+-0.000 0.585+-0.105

This is the highest frequency tree. Parsing the brackets allows one to draw this as an unrooted tree:

Note that there is a redundancy in my way of notating trees, so one of the lengths is set to zero, allowing one of the nodes to be removed (here node 8). This tree pairs HETPO, SQUAC and LIOMI, CATCL. One can see how frequently these two clades occur over all trees by looking at the list following the trees, which begins:

CLADES
0.8800 HBA_CATCL HBA_LIOMI
0.8200 HBA_HETPO HBA_SQUAC

There is no extra information in the list of clades: it can all be read from the tree list. However, it's often convenient to have the clades listed separately.

Graeme Mitchison
Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH.
Jan 3rd 1999.

* Teichmann, S.A. and Mitchison, G. (1999) Is there a phylogenetic signal in prokaryote proteins? Submitted to J. Mol. Evol.