Perhaps unsurprisingly, one of my interests is functional, accurate protein annotations. The default way to annotate new sequences, especially in a high-throughput manner, is to use sequence identity with some form of BLAST and use the best hit to annotate your sequence of interest.

There are some limitations here. It’s been shown that enzyme function is not necessarily conserved even with fairly similar sequences. We’ve demonstrated that each orphan enzyme we find a sequence for can lead to the re-annotation of hundreds of genomes.

In their paper DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe, Wang et al from Tokyo Institute of Technology and Tsinghua University apply a protein-domain based approach to try and expand our ability to predict enzyme activities for proteins.

Continue reading