Although the new 2018 appears to be the year of the Artificial Intelligence arrival in the RPA world, we honestly cannot say that there are many cases of real application or concrete examples of robots that demonstrate such usage in routines. There is still a long way to go for Intelligent Automation programs to fully adopt cognitive technologies!
We have worked on an interesting case in recent past that we would like to showcase in this article as a credible example of computer vision use in an RPA scenario. At its core the scenario is a highly specialized case of data capturing from a specific type of PDF documents by a back office team (BPO).
The problem statement is the following:
The European Commission, by the UCITS IV directive, has forced companies that distribute Investment Funds to publish a document known as KIID (Key Investor Information Document) or Document of Fundamental Data for the Investor. The objective is to simplify the most relevant information of a Fund, making it much simpler and easy to understand for the investors. The KIID is a file, usually a PDF, where the following Fund key data must be reflected:
- Fund objective and investment policy.
- Costs and associated expenses.
- Historical profitability.
- Fund risk profile.
Since this document has to be constantly updated, Funds distributors (banks and investment companies), are obliged to capture this information to integrate it into their databases and thus be able to offer it to customers in their systems. Being a repetitive, mechanical and high-volume work (more than 100,000 documents updated several times a year), this is usually outsourced through BPO services, which is why it is an ideal business case for automation. And that was the challenge a client stirred us and, as usual, we resolved successfully by implementing a Jidoka robot to capture and process this information.
Most of the information in the KIID document is structured, making it possible to program rules based on the patterns in the PDF structure, except for the Fund Risk Profile data. For risk profile a scale of 1 to 7 (ordered from lowest to highest risk) is used, which is displayed visually to the client, clearly indicating the risk level of the Fund. With the circumstance that, by using a visual scheme, each fund manager has used its own way of identifying it containing different colors & gradient schemes, as can be seen in the following image that highlight six different examples:
Therefore the challenge is to use artificial vision techniques to detect the number associated with the level of risk with absolute precision, given the sensitivity of the data for the investor.
In the following video, we show the Jidoka robot implemented.
The robot firstly uses a technique to identify tables within the document, and once it has selected the right table containing the risk profile, it ensures that it is composed of numbers, detecting which number is the marked one. But there are KIIDs where the numbers are not framed as a table, so the robot must also use a second method to locate the numbers instead based on colors.
Logically, the robot does not “see” the risk scale in the same way as a human. We need to adapt the document and make it “visible” to a machine. These adaptations can assimilate to what our brain does (while still very far from it) carrying out the information that reaches our eyes in the form of light.
The robot takes the PDF document and transforms it into an image, to later transform it into numbers, vectors, and matrices, a much more suitable information for it. In the previous example, we have taught the robot to find the risk scale using the dual vision techniques as mentioned above and explained further.
Applying the first method, the robot looks for patterns in the image that resemble a table. It erodes, dilates and even performs the binary of the image in search of this pattern, discarding those patterns that do not fit within the initial parameters (width, height, position on the page, etc.). The result of this analysis results in the identification several tables, present in the document, although the one that is needed to be selected is the risk scale.
How to recognize then which of the tables is the correct one? The answer is, again, by observing it via artificial intelligence going through the images and analyzing their characteristics. In the case of the risk scale, it will have its main characteristics as: a predominant color (unselected risk values) and at least one other secondary color (the marked value). This secondary color is the interesting one. Once known, the robot focuses its attention on the correct box, and all that remains is to extract the number from the box marked using the OCR technique.
Sometimes the method described above cannot find these patterns, or there is no certainty that one of the tables is the correct one. Normally this happens when the KIID design is not the usual one, and so the risk scale is not a regular table. For these cases, we have implemented a second method that is able to “filter” the image by a certain color, to use as a parameter. This color is precisely the color in which the risk level on the scale is marked. We could say the robot has to stay focused only on those areas only, and ignoring the rest. Again, all that follows is the extraction of the value from the risk scale.
Both methods perform common secondary processes to facilitate document viewing:
- Erosion and Dilation: allowing image enhancement or focus.
- Binarization: transforming the image into comprehensible vectors and matrices for the robot.
- Conversion to grayscale, which “eliminates noise” in low-resolution documents.
- Filter by colors.
- Pixel information extraction.
At Jidoka we are continually looking for new challenges. Do you have something for surprising our software robots? If so, do not hesitate to contact with us, we will accept the challenge.