This database contains information on the manual curation of 1052 FlyBase identifiers, which are putative site-specific transcription factors, based on FlyBase/Gene Ontology annotation or the DBD Transcription Factor Database.
Authors: Boris Adryan and Sarah A. Teichmann
Although the published sequence of the fly genome has been available for many years now, it is still difficult to name the number of exact number site-specific transcription factors (TFs). The major obstacles in counting those factors lie within the experimental and computational identification of TFs, as well as in the lack of a unifying resource for the community. We sought to address this challenge by combining powerful computational methods with a vigorous manual curation and literature study. The results of this attempt can be seen on www.FlyTF.org.
We first seeded a list of proteins that were putative transcription factors, identified by one or both of the two following approaches:
GO. Transcription factors were identified using the Gene Ontology annotation (September 2005) from FlyBase. We counted those FlyBase identifiers(FBgn) as candidate TFs which had GO annotations as described in the Supplementary Material.
DBD. A long standing interest of our group is the structural identification of transcription factors (for background, see DBD, the DNA Binding Domain Transcription Factor Database, Kummerfeld & Teichmann, 2006). The TFs identified by this resource are detected using hidden Markov models of a manually curated list of structurally determined DNA binding domains. Using this strategy, based on an evaluation using a large set of annotated protein sequences, we find that it is extremely accurate (97% correct) and has good coverage (65% identification rate). We used DBD v1.2 for this analysis, which yielded 591 candidates.
There was generally a good overlap between the two approaches. In total, 1052 proteins were primary candidate TFs.
Reading & Verdict.
We focused on two separate aspects of the molecular function of transcription factors: DNA binding and transcription regulatory properties. We assessed the evidence for these two properties of each candidate transcription factor by using FlyBase annotation (mostly the sections on Gene Ontology and References), which also included the review of evidence stated for the GO annotation. Assignments to GO were not treated as evidence as such, but rather as pointers to the literature. Those assignments by evidence of sequence similarity or electronic annotation would only be accepted if the carefully benchmarked predictions (DBD) or experimental evidence from the literature were in favour for this annotation. Additional information was retrieved from PubMed or the iHOP search tool. Curated data from Casey Bergman's Drosophila DNase I Footprint Database (v2.0) at www.flyreg.org and the data-mining project FlyMine was included in our annotation.
We documented the DNA binding property particularly carefully and consistently. The evidence for transcriptional regulatory activity on the other hand was not curated exhaustively for the more obvious cases. For TFs predicted by DBD only for which there was no further experimental data, the transcriptional regulatory activity was inferred by the structural assignment to a commonly known transcription factor DNA binding domain, and they are annotated as ‘maybe’ for the DNA binding property and 'yes' for the putative short-range TF property. The C2H2 zinc finger domain family is an exception to this, since there is currently no known method for reliable functional assignment to either DNA or RNA binding or protein interaction, so they are annotated as ‘maybe’ for the DNA binding property, but no verdict is reached for their potential role in transcription.
Database search tips
The entire set of the 1052 originally seeded candidate TFs is available on www.FlyTF.org. The individual researcher can browse catalogues (all candidates, putative TFs, definitive TFs, not a TF) and filter for those criteria which appear most important for them.
Quick guide to the Advanced Search tool
You can query our database for a variety of parameters, depending on what you expect from the candidate TFs. A couple of exemplary queries should make the use clear:
(1) all candidates >>,
(2) putative and known site-specific TFs >>,
(3) well supported site-specific TFs >>,
(4) not a TF in our sense >>,
After you have entered your criteria, hit 'Please find appropriate TFs' and wait for the 'list view', which will appear updated in a separate window.
After querying the database, the list will contain summary information on your candidate TFs. The number of genes (FlyBase identifiers, FBgn) will be displayed on top of the list. Further information includes: FBid, Symbol/Name, Synonym, Architecture (as identified by DBD), Curator's verdict, GO terms (only those for DNA binding or transcription related ones), TF DB (presence in TF databases). The click on the FBid of the candidate TF will open a detailed compilation of data on this protein.
This is a compilation of data on the candidate TF. Most of the information from the list view is repeated. For your convenience, we do provide you with the sequences of proteins encoded by the gene. A click on the FlyBase ID opens their Web site in a separate window. The 'meaningful phrase' is the evidence that we documented for the call on the DNA binding capabilities. A click on the 'P' opens the appropriate abstract on PubMed.
The GO annotation as it appears in FlyBase is shown as full GO term.
The colour code represents the degree of evidence:
The single letters are links to the respective references in 'G' (GO), 'F' (FlyBase), or 'P' (PubMed).