*,Carlo Berzuini2, Matthew Bashton1, Julian Gough3,
Sarah A. Teichmann1
1MRC Laboratory of Molecular Biology, Cambridge, CB2 2QH, UK; 2MRC Biostatistics Unit, Cambridge, CB2 2QH, UK; 3Genome Exploration Research Group, RIKEN Genomic Sciences Centre, W121 1-7-22 Suehiro-cho, Tsurumi-ki, Yokohama 230-0045, Japan, and Department of Structural Biology, Fairchild bldg, D109, Stanford, CA 94305-5126, USA. .
* To whom correspondence should be addressed: cvogel[at]mrc-lmb.cam.ac.uk
The structural assignments to all proteins were obtained from SUPERFAMILY (Gough et al.) for 131 genomes.
The domains defined in the Structural Assignments of Proteins database (SCOP). Each domain superfamily is encoded by a 5-digit number as used in SCOP and listed here.
The structural assignments were post-processed to obtain strings of consecutive domains which determine the domain architecture of a protein.
The length of the domains was taken as the average length of all domains of that superfamily in SCOP (LoConte et al., 2002). For an estimation of the probability with which an unknown domain is between two known domains in a protein, one standard deviation of the length of each of these two known domains was added to the respective domain. A threshold of 30 amino acids, which is the length of the smallest known domains, was chosen to determine the presence of any residual unknown domain. The unknown domain or 'unassigned region' is then denoted in the domain architecture of the proteins.
Further information can be found here.
Domain architectures as a string of consecutive domains for all sequences used.
Please do also refer to the downloads website of the SUPERFAMILY database.
IDENTIFICATION OF OVER-REPRESENTED SUPRA-DOMAINS
A more detailed description than in the manuscript of the statistical procedures can be found here.
The program (SPLUS) can be downloaded here.
Illustrations of the relationship of R2 and the p-value can be found here.
Based on COGs, we defined 41 categories of domain function which were then grouped into six larger functional classes, listed here. The six larger functional classes are defined as