30 MAR 2026
RNA Secondary Structure Prediction using a Transformer Encoder
Marcel Arian Hadi, Peregrin Wahle
Predicting RNA secondary structure from nucleotide sequence is an important problem in computational biology. In this work, we study the task of translating RNA sequences into dot-bracket secondary structure representations using a Transformer encoder model. We systematically evaluate multiple model configurations and training settings, and assess their performance using both token-level and structure-aware metrics, including sequence-level exact match, paired F1 score, and structural validity. In addition, we analyze the impact of simple postprocessing strategies for correcting invalid structures and perform a downstream structure-level analysis based on the repaired dot-bracket outputs. Our findings show that Transformer encoders are capable of capturing relevant structural patterns in RNA sequences, but that standard token-level objectives alone are insufficient for producing valid structures. Incorporating structure-aware evaluation and lightweight postprocessing significantly improves the quality and validity of the predicted outputs.