TIM Barrel Analysis
M. Madan Babu, Center for Biotechnology, Anna University
The eight-stranded a /b barrel (TIM barrel) is by far the most common tertiary fold observed in high resolution protein crystal structures. It is estimated that 10% of all known enzymes have this domain (Farber et.al.,). The members of this large family of proteins catalyze very different reactions. Such diversity in function has made this family an attractive target for protein engineering. Moreover, the evolutionary history of this protein family has been the subject of rigorous debate. Arguments have been made in favor of both convergent and divergent evolution. Because of the lack of sequence homology, the ancestry of this molecule is still a mystery. In this study, an analysis has been attempted on proteins which were found to have the a/b structural motif to study their structural features like conformational preferences, functional significance, topological features such as solvent accessibility, residue preference, salt bridges, sequence similarity etc.
The a /b Barrel Domain
The first protein that was discovered to have an eight-stranded a /b domain was Triose phosphate isomerase. This fold is characterized by a central barrel formed by parallel b -strands surrounded by seven or eight a helices which shield the barrel from solvent. The b -strands of the barrel form an intrinsic network of hydrogen bonds with the neighbouring strands and are oriented in the same direction. The overall twist associated with all the strands cause the first and the eighth strand to register in parallel held in place by hydrogen bonds causing the closure of the barrel. The overall sequence topology can therefore be either (b /a ) 8. where the protein begins with a strand or (a/b)8. where helix is the first secondary structure. In addition, many (a/b)8 enzymes have additional domains that are not a part of this fold. This fold has intrigued many researchers over the past years for two reasons :
Materials and Methods:
The protein structure databank (PDB) was screened for proteins with this topology and 75 targets were identified. They were then compiled into a database for further analysis. The secondary structures were then identified as helices, strands and turns by implementation of an earlier reported algorithm (Prof. C. Ramakrishnan et.al.,) wherein four or more consecutive residues in ar conformation was considered as a helix and four consecutive residues in extended conformation was taken to be a strand. The strands making up the central barrel core was identified visually and the number and location of extra domains in these structures tabulated. The distribution of lengths of each of these strands were then tabulated to study the variation among different members. The dihedral angles for all residues were calculated and Ramachandran map made to check for stereochemical features of this class. The aminoacid sequences of the proteins of this class were multiply aligned to study any local similarities if any between them. sequence alignment was done for all the proteins in swissprot format using the program ClustalW. The barrel region was then analysed for features like solvent accessibility, salt bridges, geometric features like length of major and minor axis etc using programs developed inhouse (coded in Fortran 77 by Prof. C. Ramakrishnan). Literature survey was done to explore the functions of these proteins.
Dataset for study:
A collection of proteins possessing this topology was made by screening the Brookhaven protein databank updated till June 2000 and all proteins with a resolution better than 3.0 Ao were compiled into a database. Redundant entries like structures of mutants of the same protein, ligand bound and ligand free forms, and the same protein from different sources were excluded from the dataset.
Results and discussion:
As can be observed from the table, though all the proteins have the same tertiary structure they have very diverse functions. The major biochemical pathways in which these proteins have been found to play a catalytic role are:
1. Glycolytic pathway.
2. Aminoacid metabolism.
3. Nucleotide metabolism.
4. Others (not belonging to the above three).
Click here for some general information on the proteins in the dataset (info on Functions, Salt Bridges and Multimeric states)
Conformational features of these proteins:
The Ramachandran plot of all the non gly residues in the dataset shows the rigidity of this structure. Most of the residues are in favoured conformation lying primarily in either one of the three well defined secondary structure regions aR, aL or extended conformation. The helices are almost ideal possessing backbone dihedral angles in the most favoured region of the Ramachandran map. Very few amino acids were in dissallowed conformation which may be due to crystallographic errors in structure determination. Indeed from other biochemical and structural studies it has been found that most of these proteins are rather robust and it is very difficult to disrupt their structure by denaturants and most of them retain some structure even at high temperature and urea concentrations sometimes as high as 6-8 M. Most of these proteins are multimeric with well defined interfaces which bury a substantial surface area upon multimerisation. They also bind to a host of inorganic metal ions like calcium, magnesium, manganese, chloride etc., and many cofactors like Pyridoxal 5 phosphate, inorganic phosphate etc. Many of these enzymes are dependant on vitamin B5 as a cofactor to carry out catalysis. With the exception of narbonin which is a storage protein and a protein of unknown function, all the rest of the members of this dataset are enzymes.
Geometry of the b-barrel:
This topology is characterised by two main features: i) The presence of alternating helical and strand segments along the primary sequence interspersed by turns and loops and ii) the individual strands form a central solvent excluded parallel b barrel which more often than not are eight in number.
This b barrel exhibits slight right handed twist like most other b barrel proteins. As expected the residues in the barrel are shielded from solvent as evidenced from their low solvent accessibility values. The residues in the b barrel region are primarily b branched showing very high frequency of occurance of valine and isoleucine.
As reported earlier, It was found out that the barrels of proteins in this family are not strictly circular but slightly elliptical with an average major axis aroung 16.5 A0 and minor axis around 14 A0.
Sequence Similarity and evolution of a/b proteins:
One of the most intriguing features among members of this class of proteins is although they all exhibit the same tertiary fold there is very little sequence homology between them. Hence this fold has attracted both structural biologists and evolutionary researchers alike. Understanding the evolutionary history of a /b -barrel proteins is essential for the understanding the relationship between amino acid sequence and three-dimensional structure, as well as for aiding the design of molecules with new functions.Two different theories exist about the origin of these proteins.
One of the strongest argument in favour of divergent evolution from a common ancestor was the fact that all known a /b -barrel proteins are enzymes. The second argument in favor is the location of the active sites of these enzymes. Despite the chemical diversity of the reactions they catalyse, the active site is always found at the C-terminal end of the b -barrel. The third most convincing argument comes from a more detailed study of the crystal structures of the a /b barrel enzymes. If these enzymes have diverged from a one or more common ancestors, some structural patterns should be present. These patterns, if present, should enable us to construct a family tree for the a /b barrel enzymes.
Which of the two theories is correct may still be debated because of lack of any concrete evidence supporting one while ruling out the other, but that is yet forthcoming and speculations abound. But there is an emerging general consensus that there was a definitive evolutionary guiding force that directed the formation and evolution of this rather non specific fold to specialize in performing various critical biochemical reactions in living organisms. Although only 75 proteins are so far known to possess this fold, there may still be many proteins with unknown functions. This is all the more apparent with the recently reported crystal structure of a hypothetical protein from yeast whose function is as yet unknown but has this motif. Hence, a critical analysis of this structural fold may provide many important insights in deciphering structure function relationships and improve our understanding of the interdependence of a proteins sequence, structure, function and evolution.