Our lab is excited to share the work led by William Harrigan, titled “Beyond Annotations: Annotating Viral Genotypes Using Large Language Models,” which was presented at the 19th International Society of Microbial Ecology conference (ISME19) in Cape Town, South Africa, this past week.

This research leverages large language models—similar to those powering ChatGPT and other popular tools for protein analysis—to better understand viral genomes and their ecological impacts.

One of the innovative aspects of this research is the novel approach of thinking of a virus as a market basket. In market basket analysis, people study co-purchased products to understand consumer behavior. For example, market basket analysis might reveal that whenever you buy bread, you also buy butter, or when you buy nachos, you also buy salsa. Similarly, we view viruses as a collection of proteins and seek to understand which proteins frequently appear together and under whcih specific conditions. While it’s easy to identify products (say, butter) in a store, the function of the vast majority of proteins is unknown. This is where protein language models, and our second innovation, help us identify which proteins share the same function and are, therefore, functionally related.

Although Will was unable to attend the conference, our collaborator, Dr. Eric Wommack, graciously presented our research. We’re thrilled that this work was showcased on such a significant platform.~

Dr. Wommack presenting

Dr. Wommack Presenting our Work at ISME-19