Perhaps unsurprisingly, one of my interests is functional, accurate protein annotations. The default way to annotate new sequences, especially in a high-throughput manner, is to use sequence identity with some form of BLAST and use the best hit to annotate your sequence of interest.

There are some limitations here. It’s been shown that enzyme function is not necessarily conserved even with fairly similar sequences. We’ve demonstrated that each orphan enzyme we find a sequence for can lead to the re-annotation of hundreds of genomes.

In their paper DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe, Wang et al from Tokyo Institute of Technology and Tsinghua University apply a protein-domain based approach to try and expand our ability to predict enzyme activities for proteins.

Continue reading

Things have changed quite a bit in the last decade and especially the last five years in scientific publishing. Starting with open access efforts (notably the PLOS journals, which launched when I was in grad school and have driven a revolution in publishing) and continuing in venues like Retraction Watch and PubPeer, the move toward making science open, clear, honest, and above all accurate is powerful and completely unlike when I started doing research.

Retraction Watch put up a guest post today by Drummond Rennie and C.K. Gunsalus titled If you think it’s rude to ask to look at your co-author’s data, you’re not doing science. In it they talk about how a couple recent and slightly less recent high-profile scientific frauds have broken down, and the failure of senior authors to actually do their part to validate the data and really know what’s going on. They also provide a truly helpful breakdown of approaches to make sure everyone knows what work is being done, that the work is actually being done (and funded, and so forth), and that everyone is credited properly.

If you’re doing, well, science, or any kind of collaboration at all, I recommend this genuinely helpful piece.