An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension

Liangcheng Li, Feiyu Gao, Jiajun Bu, Yongpan Wang, Zhi Yu, Qi Zheng ;

Abstract


Nowadays rich description on detail images help users know more about the commodities. With the help of OCR technology, the description text can be detected and recognized as auxiliary information to remove the comprehending barriers among the visual impaired users. However, for lack of proper logical structure among these OCR text blocks, it is challenging to comprehend the detail images accurately. To tackle the above problems, we propose a novel end-to-end OCR text reorganizing model. Specifically, we create a Graph Neural Network with an attention map to encode the text blocks with visual layout features, with which an attention based sequence decoder inspired by the Pointer Network and a Sinkhorn global optimization will reorder the OCR text into a proper sequence. Experimental results illustrate that our model outperforms the other baselines, and the real experiment of the blind users' experience shows that our model improves their comprehension."

Related Material


[pdf]