Saltar a la navegació principal Saltar a la cerca Vés al contingut principal

A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction

Lei Kang*, Xuanshuo Fu, Mohamed Ali Souibgui, Andrey Barsky, Lluis Gomez, Javier Vazquez-Corral, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas

*Autor corresponent d’aquest treball

Producció científica: Contribució a revistaArticleRecercaAvaluat per experts

Resum

Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance.

Idioma originalAnglès
Número d’article2851
Nombre de pàgines14
RevistaMathematics
Volum13
Número17
DOIs
Estat de la publicacióPublicada - 4 de set. 2025

Fingerprint

Navegar pels temes de recerca de 'A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction'. Junts formen un fingerprint únic.

Com citar-ho