Home News Xiaomi’s new technology: convert the table in the picture into an Excel...

Xiaomi’s new technology: convert the table in the picture into an Excel file

0

Lei Jun, the founder of Xiaomi, introduced a set of table recognition algorithms developed by Xiaomi, which efficiently and accurately converts tables in pictures into editable Excel files, significantly improving the user experience. Table recognition refers to the recognition of table structure and text information in pictures into a data format that can be understood by computers, which has wide practical value in office, business and education scenarios, and has been a hot issue in document analysis research.

Around this problem, Xiaomi has developed a set of table recognition algorithms, which efficiently and accurately extracts the tables in pictures and converts them into editable Excel files. The algorithm has been successfully implemented in Xiaomi 10S series, MIX Fold 2, and other flagship models, you can access the experience from Album – More – Table Recognition, or swipe to enter.

Form detection algorithm

Xiaomi said that the table detection algorithm mainly extracts the table area accurately from the picture and corrects the table to get a flat table picture for the next step of table recognition.

The table recognition algorithm mainly extracts the table structure and table text content from the picture, and then combines these information effectively to output an editable Excel table.

Form detection has the following difficulties: on the one hand, the algorithm and memory on the cell phone are limited, on the other hand, the requirements for the form detection results are very high, the form often contains other text around it, and if the detection results are not accurate, it will have a negative impact on the recognition results later.

Xiaomi’s table detection algorithm will detect both the table area and the four corner points of the table, and through perspective transformation and our self-developed anti-distortion algorithm to get a flat table with only the table area, the effect is shown in the figure.

Since the algorithm runs on the cell phone side, it needs to ensure the running speed and model size, Xiaomi adopts a very lightweight one-stage detection framework, backbone using shuffleNetV2.

regressing the key point information while detecting the table box to facilitate perspective correction of the table, and using Wing loss instead of L1 loss to make the key point regression more accurate.

In terms of data, the algorithm is used to mine a large amount of form detection data from public data at low cost, which significantly improves the form detection effect. The final model size is about 1M and runs smoothly on Xiaomi phones.

Table Recognition Algorithm

The table recognition algorithm runs on the server side and contains the following main modules: text detection, text recognition, table structure prediction, cell matching, alignment algorithm, and Excel export.

The current mainstream approach is to represent tables in HTML hypertext, and then encode the HTML to predict the HTML sequence and the corresponding coordinate information.

This method has achieved good results on open source datasets, and Ping An Technology of China and Baidu have also adopted this scheme, but too many tags in HTML lead to error-prone table structure recognition.

To address the shortcomings of this method, we adopt a new encoding method for tables, which can represent tables of arbitrary structure with only four tags, greatly improving the accuracy of table structure recognition.

Table recognition is accelerated by using the Faster transformer inference framework during deployment, which officially claims that Xiaomi’s inference speed is improved by about 20 times, significantly improving user experience.

Summary

The algorithm can efficiently and easily extract tables from images, greatly improving office efficiency. Xiaomi said that engineers will continue to improve the recognition experience of document-based images in Xiaomi phones.

Exit mobile version