Exceptional recognition performance! The new world of digitalization made possible by using highly accurate AI OCR to read forms

This voice was created by using Toshiba’s speech synthesis middleware, ToSpeak.

In recent years, optical character recognition (OCR) has been attracting attention as part of efforts to improve operation efficiency through the use of software robots (also known as robotic process automation (RPA)). Conventional OCR requires a scanner (hardware) and specialized software, and it can only be used on forms designed for use with these devices and software, which has restricted the extent of its usage. It has also faced technical challenges in dealing with open-ended comments or with hand-written corrections to errors, both of which are common in actual forms. Because of this, the use of OCR in operations has been limited, and it has not been easy to make use of data. These technical challenges are increasingly being tackled using AI OCR, which uses AI technologies that apply machine learning to recognize text from a wide range of forms with a high level of accuracy. In this issue, we will explain the technologies of AI OCR, introduce our AI OCR Service that supports customer business growth by applying the experience and system expertise we have developed through our many years of providing services to governments and customers in the private sector, and discuss how it will be possible to leverage data in the future.

How AI OCR differs from conventional OCR

Companies work with various kinds of forms. These include everything from billing statements sent to or from other companies to forms used in internal administrative processes. The data in those forms may be printed or hand-written. It could include symbols, barcodes, or even QR codes. OCR can be effective in digitalizing this data.

OCR has evolved through the years, becoming able to recognize characters more accurately and supporting a larger number of form formats. The range of operations and usage scenarios where OCR can be leveraged has grown. However, conventional OCR technologies have involved usage restrictions, and there have been limits to the ability of these technologies to handle various formats essential for operations and to achieve higher levels of character recognition accuracy.

That is where AI OCR came in. Applying AI technology to OCR made it possible to go beyond the limitations of conventional OCR to handle all kinds of formats and produce even more accurate character recognition results.

The methods by which conventional OCR and AI OCR recognize characters differ. For example, with conventional OCR, a program (recognition engine) follows predefined rules to search for character positions or borders, and reads each character one by one. With AI OCR, on the other hand, entire blocks of what appear to be characters are read at once, and an AI identifies the characters. The character recognition window is moved a little at a time to read characters, so characters that have been written sloppily and overlap, or characters that have been connected to each other, can be read with a high level of accuracy. (Fig. 1)

Toshiba has provided solutions that leverage OCR technology to various customers through the years. Through this, we have developed and nurtured OCR-related technologies, experience, and expertise, along with AI technologies that are the fruit of years of research and development. The AI OCR Service is the solution that is more useful for customers’ operations by improving recognition accuracy and enhancing functions, taking advantage of our OCR and AI technologies and experience.

Achieving a high level of character recognition accuracy for both fixed and free-format scanning

In general, there are two main types of scanning methods for reading the text on forms. Fixed-format scanning is used to read text in specific areas of specific forms. Free-format scanning scans forms without pre-defined formats, using keywords or the like which are registered in advance. Due to the technical challenges presented by free-format scanning, conventional OCR mostly supports fixed-format scanning. On the other hand, AI OCR has led to increased support for free-format scanning. The AI OCR Service supports both fixed-format and free-format scanning.

For fixed-format scanning, the user uses a mouse to select the area to be scanned in. The content in the area, be it printed text, hand-written text, checkboxes, or prefecture names, can be scanned in. It can also support reading text that spans multiple lines, taking strikethrough lines into account when reading text, and skipping over blacked-out text, which we can often see in actual forms. Furthermore, on forms where addresses and telephone numbers are hand-written in the same boxes, it can selectively read in only the address portion (Fig. 2).

The free-format scanning method includes two different functions: item search scanning, which scans text based on registered keywords, and title preset scanning, which is designed specifically for scanning billing statements. The item search scanning function searches a dictionary for keywords (guide words) that are registered by the user in advance, and then reads in the text to the right or below those keywords. For example, for scanning in form numbers or project names, the corresponding item names can be registered in advance as keywords. This makes it possible to automatically scan in text from a variety of forms with different formats. The title preset scanning function is a version of the item search scanning function that is optimized for use with billing statements. It can be used to scan billing statements without the need for any special configuration work. Specifically, billing statements contain certain essential information: billing statement numbers, issue dates, monetary amounts, billing addresses, the name of the billing party, and the like. These categories of information can be scanned in automatically, without the user needing to register keywords. This function can also be used with the national invoice system that will go into effect from October 1, 2023.

The AI OCR Service also has a full text OCR function that recognizes text printed on forms in units of entire single lines. This can be used to read in entire forms, such as contracts or meeting minutes.

Another major feature of this service is that the four scanning functions of fixed-format scanning, item search scanning, title preset scanning, and full text scanning can be combined depending on operation needs or the types of forms to be scanned.

Convenient functions that leverage our technologies and experience to provide ultimate ease of use

Character recognition accuracy is also significantly affected by external factors such as the state of the forms to be read (paper folding, rubbing off of writing, etc.) and the capabilities of the MFP or other devices used to perform the scanning. Therefore, our service takes measures to achieve as close to 100% accuracy as possible and also measures to prevent erroneous recognition.

A key function accomplishing this is the error suppression OCR function. This function displays a “?” to the user on the corrections screen for any characters that the recognition engine deemed to have a low level of recognition confidence, instead of showing the actual OCR results. This makes it apparent at a glance which characters must be checked or corrected, contributing to greater confirmation work efficiency (Fig. 1 example 5). It also uses various techniques to prevent these problem points from being overlooked, such as making the background behind the “?” red to stand out better and presenting a warning message if any “?”s are left in the document. The decision of whether to show recognition results or to return a “?” (the threshold between these two options) is an important one. This is where our expertise comes into play.

* The service currently supports handwritten numbers and katakana characters.

Another key point is how efficiently human error correction processing can be performed on recognition results after OCR processing. One of the features of the AI OCR Service is that the process of confirming and correcting the results of character recognition performed on forms can be configured in great detail. It is possible to set who will check which portions, how many people will perform confirmation and correction, and who approves confirmation and correction results. For example, correction workflows can be set flexibly for individual types of operations and forms: multiple personnel check recognition results (make sheet corrections) in order, multiple people perform checking in parallel and then a verifier makes decisions based on their results, or one person perform confirmation and then another person check their results. These kinds of correction workflows are often handled using separate systems and applications, but we provide them as standard functions based on customer requests.

There is also a form recognition function that automatically identifies what kind of form was scanned. As mentioned earlier, companies handle all kinds of different forms with varying formats. Until now, they have had to sort and categorize those forms before scanning them, separating them, for example, into “Company A estimates,” “Company A billing statements,” “Company B order forms,” “Company C vouchers,” or the like. With our service, even where there are many different types of forms being used, they can all be scanned at once. This reduces the workload placed on staff. This form recognition is performed based on the overall format of the form (the locations of tables and text, etc.) and the actual text on the forms. We have made continual improvements to the AI model’s learning methods to produce more accurate form identification results (Fig. 3).

The quest for even higher accuracy character recognition technologies and ways of using OCR data

We have looked at the four scanning functions of the AI OCR Service and introduced the convenient functions we have developed based on our experience. To create these functions, we have sought to develop elemental technologies that achieve an even higher level of character recognition accuracy.

Due to the nature of how it operates, AI OCR sometimes reads parts of characters twice, or, conversely, skips over parts of characters. To help prevent this, we have refined and revised our learning models and learning data to improve them, and in the process we have developed technologies for inferring character coordinates (locations). These technologies are used to confirm that there is no overlapping coordinate information for characters that have been scanned (character overlap) and to confirm that there are no areas where coordinate information has been acquired but no characters have been read (skipped characters). This helps raise recognition accuracy.

The functions we have discussed have been made possible by various OCR and AI technologies that we have developed in-house. The AI OCR Service, which has evolved based on feedback from customers, is a convenient, easy-to-use solution that provides highly accurate character recognition. It supports cloud and on-premise environments and can be integrated with other systems by using its application programming interfaces (APIs).

When people think about how AI OCR can be used, they tend to focus on operation efficiency improvements achieved by reducing transcription workloads. However, in the future, in addition to such operational efficiency, it will also be crucial to leverage the massive amounts of data collected by scanning various types of forms that include hand-written text, which was hard to read in the past. For example, the answers to open-ended questions on questionnaires can be scanned and the frequency of keywords can be analyzed and inferred to identify trends and characteristics. This can lead to new insights, reinforcing existing business by improving operational efficiency while also contributing to the launch of new businesses. Our AI OCR Service will continue to evolve, assisting with the digitalization of information on paper forms, which has been limited to use until now, and supporting the analysis and leveraging of data (Fig. 4).

  • The corporate names, organization names, job titles and other names and titles appearing in this article are those as of June 2023.
  • All other company names or product names mentioned in this article may be trademarks or registered trademarks of their respective companies.
  • Our AI OCR service is not currently available for purchase outside Japan.
  • This article may contain content specific to Japan.