Improvements in viral gene annotation using large language models and soft alignments

We’re exceited to share our novel method, which uses Large Language Models (LLMs) to tackle the long-standing challenge of annotating viral protein sequences. Our innovative soft alignment algorithm, based on embedding similarity at the amino acid level, outperforms traditional methods in both efficiency and interpretability.

Our new approaches provides transparent, BLAST-like visualizations, making it easy to trace homologous amino acids. This advancement showcases the potential of LLMs in viral genomics, paving the way for more accurate protein function inference.