RNA Secondary Structure Prediction using a Transformer Encoder
30 MAR 2026
Marcel Arian Hadi, Peregrin Wahle
course paper
Posted: 25 MAY 2026
Abstract
Predicting RNA secondary structure from nucleotide sequence is an important problem in computational biology. In this work, we study the task of translating RNA sequences into dot-bracket secondary structure representations using a Transformer encoder model. We systematically evaluate multiple model configurations and training settings, and assess their performance using both token-level and structure-aware metrics, including sequence-level exact match, paired F1 score, and structural validity. In addition, we analyze the impact of simple postprocessing strategies for correcting invalid structures and perform a downstream structure-level analysis based on the repaired dot-bracket outputs. Our findings show that Transformer encoders are capable of capturing relevant structural patterns in RNA sequences, but that standard token-level objectives alone are insufficient for producing valid structures. Incorporating structure-aware evaluation and lightweight postprocessing significantly improves the quality and validity of the predicted outputs.