A few years ago, a television which became very famous in Spain. One of her most famous phrases was “I’m going to put two black candles on you” and another, which gives the title to this entry was “control control”.
In the automation of tasks by software robots, one of the most interesting challenges to face is to identify and obtain information from controlled applications.
This identification refers to knowing where a specific text box is, where a certain button is located, or where the knob scroll is at a given moment.
Once the control we want to manage has been identified, we can obtain the information or carry out the action we need.
Identifying a control for a person results in 99% of the trivial occurrences, you only have to read a little and if you already know the application because you have used it for some time, just confirm that the text box you are looking for is where it should be.
For a robot, the thing changes. Humans have very different abilities to those of a robot and these differences mean that the approach to solving a problem must also be different, at least in part.
In addition, it must be kept in mind that the applications used by robots imitating people can change without notice and that robots, when the time comes to mature, usually do not have very high supervision by any person.
With these premises in mind, we must resolve the issue.
To do this, we will list the various methods to access the desired information that is available using Jidoka SDK.
The keyboard is a very precise and consistent method. For example, Alt-F4 in a given window will almost always perform the same action or 3 tabs from the beginning will put the focus on the same field.
The keyboard is also a very valid method when the application to be controlled does not change very often. A change in the order of the fields, or in the shortcut keys can be a problem until it is detected. Once detected the robot update is usually fast.
In the Windows operating system, we have available an API to select the control we want based on a unique identifier for each window. Using this method, we make sure that we get correct results while there is not a redesign of the window we are dealing with. The redesign must be deep in order to affect the identifiers, so it is a very safe technique when available.
However, it is of little use when applications do not use standard controls.
DOM (document object model)
For web applications, we have a particularly interesting method, access to the DOM. The DOM is the representation of the components (controls) of a page. Through it, we can navigate until we find a button, a checkbox, text or any other element. It is also possible to perform searches using the various attributes of HTML tags as well as the relationship of one with another.
When we are working with web applications, the DOM method is one of the most suitable, although we must bear in mind that this DOM changes with each new version of the application. The robot must be programmed so that it doesn’t have to be modified with each change in the application. When we achieve this goal, we can say that the robot is a “mature robot”.
The method of location by coordinates is very easy to implement, we just need to know where the control we want to manipulate will be located. This is its great advantage.
The main disadvantage is that a change in the window where we use this technique would imply an adjustment in the robot. But not only in such a case, but it could also even happen by changing the resolution of the window or the monitor of the machine where the robot is running (if any).
The use of coordinates can also be problematic when the volume of information in the window sections is variable as scrolling can occur.
In Jidoka we have developed our own solution for image recognition which we have called “Hawk-Eye”. Based on our experience in the development of hundreds of robots we have perceived that minor changes in applications usually do not include modifications in the icons, images, fonts, or in the style of the presentation. Developers often reserve this type of change for version jumps.
Taking advantage of that static image of the applications, we developed Hawk Eye for image recognition that allows us to solve the problem of identifying controls, even if they change their position in the windows when new versions are installed.
We should also point out that when we talk about image identification we are not only talking about the identification of identical images, we are also talking about the identification of similar images or with a programmable tolerance threshold. In this way, we can achieve the expected results even if there are slight changes caused by changes in screen resolution, color depth, or even due to corporate monitoring systems that sometimes modify the output screen.
A little bit of everything: imagination
However, the best method by far to locate control on a screen is imagination. The imagination of making different techniques work together. The imagination of “inventing” a new technique. Don’t you think it’s fun to program robots?
CTO at Jidoka.