Supra-domains evolutionary units larger than single protein domains

 

Christine Vogel1*, Carlo Berzuini2,3 Matthew Bashton1, Julian Gough4 and Sarah A. Teichmann1*

1MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK,

2MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR, UK,

3Dipartimento di Informatica e Sistemistica, University of Pavia, 27100 Pavia, Italy

and 4Genome Exploration Research Group, RIKEN Genomic Sciences Centre, W121 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan & Department of Structural Biology, Fairchild bldg, D109, Stanford, CA 94305-5126, U.S.A.

*Corresponding authors: cvogel {at} mrc-lmb.cam.ac.uk, sat {at} mrc-lmb.cam.ac.uk

 

Coverage

 

 

 

Number

Sequences

 

%

Domain architectures

%

Domain combinations

%

Two-domain combinations

 

 

 

 

All

9,398

44

72

 

in PDB

616

30

31

63

not in PDB

8,782

18

55

37

Supra

2,368

40

54

 

in PDB

491

29

31

62

not in PDB

1,877

 

 

28

Over-represented

1,203

38

47

84

in PDB

456

29

31

61

not in PDB

747

11

25

23

Top 200 most duplicated

200

28

27

 

in PDB

161

 

 

48

not in PDB

39

 

 

8

Top 200 most versatile

200

12

30

 

in PDB

99

 

 

31

not in PDB

101

 

 

8

Three-domain combinations

 

 

 

 

All

4,323

12

30

 

in PDB

217

7

10

 

not in PDB

4106

7

24

 

Supra

935

10

22

 

in PDB

150

6

10

 

not in PDB

785

5

15

 

Over-represented

166

3

9

 

in PDB

37

2

5

 

not in PDB

129

1

6

 

Top 200 most duplicated

200

 

 

 

in PDB

107

6

10

 

not in PDB

93

3

7

 

Top 200 most versatile

200

 

 

 

in PDB

45

4

8

 

not in PDB

155

2

8