Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we will assume that you are happy to receive all cookies and you will not see this message again. Click 'Find out more' for information on how to change your cookie settings.

A growing number of solved protein structures display an elongated structural domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel alpha-helices. Alpha-rods are flexible and expose a large surface, which makes them suitable for protein interaction. Although most likely originating by tandem duplication of a two-helix unit, their detection using sequence similarity between repeats is poor. Here, we show that alpha-rod repeats can be detected using a neural network. The network detects more repeats than are identified by domain databases using multiple profiles, with a low level of false positives (<10%). We identify alpha-rod repeats in approximately 0.4% of proteins in eukaryotic genomes. We then investigate the results for all human proteins, identifying alpha-rod repeats for the first time in six protein families, including proteins STAG1-3, SERAC1, and PSMD1-2 & 5. We also characterize a short version of these repeats in eight protein families of Archaeal, Bacterial, and Fungal species. Finally, we demonstrate the utility of these predictions in directing experimental work to demarcate three alpha-rods in huntingtin, a protein mutated in Huntington's disease. Using yeast two hybrid analysis and an immunoprecipitation technique, we show that the huntingtin fragments containing alpha-rods associate with each other. This is the first definition of domains in huntingtin and the first validation of predicted interactions between fragments of huntingtin, which sets up directions toward functional characterization of this protein. An implementation of the repeat detection algorithm is available as a Web server with a simple graphical output: This can be further visualized using BiasViz, a graphic tool for representation of multiple sequence alignments.

Original publication




Journal article


PLoS Comput Biol

Publication Date





Algorithms, Amino Acid Sequence, Binding Sites, Computer Simulation, Models, Chemical, Models, Molecular, Molecular Sequence Data, Nerve Tissue Proteins, Neural Networks (Computer), Nuclear Proteins, Pattern Recognition, Automated, Protein Binding, Repetitive Sequences, Amino Acid, Sequence Analysis, Protein