Back to MRC-LMB Genomes Homepage
![]()
The GEANFAMMER
Package
GEnome ANalysis and Protein FAMily MakER
![]()
If any of these programs are used, please cite "Park,
J. and Teichmann, S.A. DIVCLUS: an automatic method in the GEANFAMMER package
that finds homologous domains in single- or multi-domain proteins. (Bioinformatics,
14, 144-150)".
News
on geanfammer development. (July 1998)
. " GEANFAMMER "
refers either to a perl5 program*, a suite
of perl5 programs, a perl5 module or a perl5 subroutine library. These
are all available by anonymous ftp at cyrah.ebi.ac.uk
. It has been developed for the
analysis of most of the complete bacterial genomes announced since
1995. It summarises the whole procedure of preparing statistically and
biologically more relevant protein (sequence) duplication modules before
any more biological analysis like structure and function assignment. With
this now anybody can easily analyse the duplication level and types of
sequence families in any genome or database.
*A very preliminary version of GUI version (Perl/Tk) is also available.(older version)
This is critically important as a lot of protein sequences are multidomain and it can cause serious problems in analysing large amount of sequences automatically if sequences were not broken down to sequence domains.
Geanfammer uses FASTA or SSEARCH which allow the gap in sequence comparison
in compared to older BLASTP algorithm which does not. Also, it uses E
value instead of Z-score to increase the sensitivity.
Download
GEANFAMMER
single program
.
A summary of the single program geanfammer.pl
is as follows.
The program takes the protein
sequences of one or two databases and creates protein families. The protein
sequence databases can be a whole genome, part of a genome or any other
protein sequence databases in fasta format.
The protein sequence databases
are compared to each other (or one database is compared to itself) using
one of the two sequence comparison programs of the FASTA
package. Using the ouput of the sequence comparison, the proteins are
clustered by single linkage. Then GEANFAMMER divides the single linkage
clusters which contain unrelated sequences (due to multi-domain proteins)
using the DIVCLUS algorithm.
Finally, a sorted cluster
file containing the duplication
module families is created together with a summary file, which summarizes
the distribution of duplication module families.
An example run could be:
prompt> geanfammer.pl YOUR_GENOME.fa
In the distribution, a test fasta format database (geanfammer_test_FASTA_DB.fa) is included, so you can see yourself what it does before trying a bigger real DB. Just type:
prompt> geanfammer.pl geanfammer_test_FASTA_DB.fa
The final result will be
"geanfammer_test_FASTA_DB.gclu
"
Real
Genome TEST!!
We have included the smallest complete Mycoplasma genitalium genome (MG.fa) in the distribution to play with. According to your choice of E value threshold, geanfammer should produce a domain level clustering.
Try:
geanfammer.pl MG.fa E=0.2 e=0.2
or
geanfammer.pl MG.fa E=0.01 e=0.01
and see what it produces. E=0.2 will produce
larger protein families as you are generous in the possible mismatches.
E=0.01 can be quite reasonable and we used
0.001 for our genome analysis work to be very strict ( to avoid wrong clusters
at the cost of losing distant but true members). The search part of the
program will take the most time. It will produce a subdirectory called
MG in which the results of search will be stored. Final results will be
made in the present directory. So, it is a good idea to make a new directory
for the test and run geanfammer inside it.
The suite of perl5 programs essentially consists of the constituent parts of the GEANFAMMER single program. A flow chart of the constituent programs can be found by clicking here .
A documentation of the
single programs follows here, although details on usage can be found in
the headers of all programs:
. You can also download geanfammer
from CPAN site. However, it might not be as updated as above ftp routes.
. Related WWW sites
Domainer (Prodom),
Prodom
HOME, Prodom
DB search
. Warranty
We are the programmers who made this, so we will do our best to tackle any problems you have while using the program(s). However, there is no legal guarantee on the possible malfunction of any part of the package.
Bug report -->
sat@mrc-lmb.cam.ac.uk
, jong@mrc-lmb.cam.ac.uk
Sarah
A. Teichmann & Jong
H. Park
1st/Oct/1997.
(C) Copyright. 1995.
Free for academic research and educational purposes for non-profit making purposes. The copyright rule for Perl itself applies to the program(s). For commercial use and collaboration please contact the authors.