Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we will assume that you are happy to receive all cookies and you will not see this message again. Click 'Find out more' for information on how to change your cookie settings.

Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.

Original publication

DOI

10.1038/76069

Type

Journal article

Journal

Nat Genet

Publication Date

06/2000

Volume

25

Pages

201 - 204

Keywords

Amino Acid Sequence, Automation, Catalytic Domain, Cloning, Molecular, Computational Biology, Databases, Factual, Expressed Sequence Tags, Genome, Human, Humans, Internet, Molecular Sequence Data, Monomeric GTP-Binding Proteins, Protein Structure, Tertiary, Proteins, Sequence Alignment, Sequence Homology, Amino Acid, Signal Transduction, Software