Back to MRC-LMB Genomes Homepage
GEnome ANalysis and Protein FAMily MakER
If any of these programs are used, please cite "Park, J. and Teichmann, S.A. DIVCLUS: an automatic method in the GEANFAMMER package that finds homologous domains in single- or multi-domain proteins. (Bioinformatics, 14, 144-150)".
News on geanfammer development. (July 1998)
. " GEANFAMMER "
refers either to a perl5 program*, a suite
of perl5 programs, a perl5 module or a perl5 subroutine library. These
are all available by anonymous ftp at cyrah.ebi.ac.uk
. It has been developed for the
analysis of most of the complete bacterial genomes announced since
1995. It summarises the whole procedure of preparing statistically and
biologically more relevant protein (sequence) duplication modules before
any more biological analysis like structure and function assignment. With
this now anybody can easily analyse the duplication level and types of
sequence families in any genome or database.
*A very preliminary version of GUI version (Perl/Tk) is also available.(older version)
This is critically important as a lot of protein sequences are multidomain and it can cause serious problems in analysing large amount of sequences automatically if sequences were not broken down to sequence domains.
Geanfammer uses FASTA or SSEARCH which allow the gap in sequence comparison
in compared to older BLASTP algorithm which does not. Also, it uses E
value instead of Z-score to increase the sensitivity.
DownloadGEANFAMMER single program . A summary of the single program geanfammer.pl is as follows.
The program takes the protein sequences of one or two databases and creates protein families. The protein sequence databases can be a whole genome, part of a genome or any other protein sequence databases in fasta format.
The protein sequence databases are compared to each other (or one database is compared to itself) using one of the two sequence comparison programs of the FASTA package. Using the ouput of the sequence comparison, the proteins are clustered by single linkage. Then GEANFAMMER divides the single linkage clusters which contain unrelated sequences (due to multi-domain proteins) using the DIVCLUS algorithm.
Finally, a sorted cluster file containing the duplication module families is created together with a summary file, which summarizes the distribution of duplication module families.
An example run could be:
prompt> geanfammer.pl YOUR_GENOME.fa
In the distribution, a test fasta format database (geanfammer_test_FASTA_DB.fa) is included, so you can see yourself what it does before trying a bigger real DB. Just type:
prompt> geanfammer.pl geanfammer_test_FASTA_DB.fa
The final result will be
Real Genome TEST!!
We have included the smallest complete Mycoplasma genitalium genome (MG.fa) in the distribution to play with. According to your choice of E value threshold, geanfammer should produce a domain level clustering.
geanfammer.pl MG.fa E=0.2 e=0.2
geanfammer.pl MG.fa E=0.01 e=0.01
and see what it produces. E=0.2 will produce
larger protein families as you are generous in the possible mismatches.
E=0.01 can be quite reasonable and we used
0.001 for our genome analysis work to be very strict ( to avoid wrong clusters
at the cost of losing distant but true members). The search part of the
program will take the most time. It will produce a subdirectory called
MG in which the results of search will be stored. Final results will be
made in the present directory. So, it is a good idea to make a new directory
for the test and run geanfammer inside it.
The suite of perl5 programs essentially consists of the constituent parts of the GEANFAMMER single program. A flow chart of the constituent programs can be found by clicking here .
A documentation of the single programs follows here, although details on usage can be found in the headers of all programs:
. You can also download geanfammer
from CPAN site. However, it might not be as updated as above ftp routes.
. Related WWW sites
We are the programmers who made this, so we will do our best to tackle any problems you have while using the program(s). However, there is no legal guarantee on the possible malfunction of any part of the package.
Bug report --> firstname.lastname@example.org , email@example.com
A. Teichmann & Jong
(C) Copyright. 1995.
Free for academic research and educational purposes for non-profit making purposes. The copyright rule for Perl itself applies to the program(s). For commercial use and collaboration please contact the authors.