Creating a DSL for AWT’s Robot

Martin MoisApril 6th, 2015Last Updated: April 5th, 2015

0 178 4 minutes read

The Java SDK ships with the class java.awt.Robot that allows the automation of keyboard and mouse input as well as the creation of screen captures. When you want to write a small test application that simulates user input or you just want to automate the input of some recurring text, this features comes in handy. But you do not want to write every time a complete Java application.

On the other hand ANTLR is a parser generator that enables us to create “Domain Specific Languages” (DSL). With the help of ANTLR we can develop a simple DSL that provides one command for each of the methods of java.awt.Robot. From then on we can easily write a script for various kinds of simple automation tasks.

The first step is to invent the syntax of our new “DSL”:

Different “statements” should be separated by a semicolon.
Each statement should consist of one “command” and a few parameters for this command.
Comments should either span multiple lines (using the C-like comments /* … */ or only until the end of the line.

A simple file could look like this:

/*
* A simple example demonstrating the basic features.
*/
delay 300; // sleep for 300ms
mouseMove 20,30;
createScreenCapture 100,100,200,200 file=/home/siom/capture.png;
mouseClick button1;
keyboardInput "Test";
delay 400;

With these requirements we can start to write down the grammar:

grammar Robot;
 
instructions:
    (instruction ';')+
    EOF;
 
instruction:
    instructionDelay |
    instructionMouseMove |
    instructionCreateScreenCapture |
    instructionMouseClick |
    instructionKeyboardInput;

We name the grammar “Robot” and define the first rule instructions such that we have one or more instructions followed by a semicolon as instruction separator until the end of the file is reached (EOF). The instructions that we want to support are listed as part of the rule instruction. The pipe between the different rules denotes a logical OR, i.e. only one of these rules has to match.

The most simple rule is the instructionDelay one:

instructionDelay:
    'delay' paramMs=INTEGER;
...
INTEGER:
    [0-9]+;

The rule starts with the command ‘delay’ followed by the only parameter that specifies the number of milliseconds to sleep as an integer. The token INTEGER is shown below the rule. It just defines that we expect at least one number between zero and nine. In order to ease the processing of the parameter later on, we assign the parameter to a separate tree node named paramMs.

The rule to take a screen capture looks like the following one:

instructionCreateScreenCapture:
    'createScreenCapture' x=INTEGER ',' y=INTEGER ',' w=INTEGER ',' h=INTEGER  'file=' file=FILENAME;
...
FILENAME:
    FileNameChar+;
fragment FileNameChar:
    [a-zA-Z0-9/\\:_-$~.];

Followed by the keyword createScreenCapture the user has to provide the two coordinates on the screen of the upper left point of the rectangle that should be captured. The two following coordinates denote the width and the height of the rectangle. Finally the user has to provide a filename for the captured image.

The filename consists of one or more characters from the fragment FileNameChar. This fragment defines all characters that should be allowed for a filename.

Using maven we can now store this grammar as file Robot.g4 in the folder src/main/antlr4 and utilize the corresponding maven plugin to generate the Java lexer and parser:

<build>
    <plugins>
        <plugin>
            <groupId>org.antlr</groupId>
            <artifactId>antlr4-maven-plugin</artifactId>
            <version>${antlr.version}</version>
            <executions>
                <execution>
                    <goals>
                        <goal>antlr4</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        ...
    </plugins>
</build>
 
<dependencies>
    <dependency>
        <groupId>org.antlr</groupId>
        <artifactId>antlr4-runtime</artifactId>
        <version>${antlr.version}</version>
    </dependency>
    ...
</dependencies>

The dependency on antlr4-runtime is necessary to use the generated classes in our own code.

The method execute() takes a Path to an input file as parameter and parses and executes it:

public void execute(Path inputPath) throws IOException, AWTException {
    RobotLexer lexer = new RobotLexer(new ANTLRInputStream(new FileInputStream(inputPath.toFile())));
    RobotParser parser = new RobotParser(new CommonTokenStream(lexer));
    final Robot robot = new Robot();
    parser.addParseListener(new RobotBaseListener() {
        @Override
        public void exitInstructionDelay(@NotNull RobotParser.InstructionDelayContext ctx) {
            int delayParam = Integer.parseInt(ctx.paramMs.getText());
            LOGGER.info("delay(" + delayParam + ")");
            robot.delay(delayParam);
        }
        ...
    });
    parser.instructions();
}

The content of the file is forwarded via the ANTLRInputStream to the RobotLexer that has been generated by ANTLR. After the lexer has parsed the file and generated a stream of tokens, this stream can be passed to the actual RobotParser.

In order to react to the incoming instructions, a ParseListener is added. Fortunately ANTLR has already created a base listener that implements all callback methods with an empty implementation. Hence we only have to override the methods we want to process. As ANTLR creates for each parser rule one callback method, we can override for example the method exitInstructionDelay(). The parameter passed in by the generated code is of type RobotParser.InstructionDelayContex. This context object has a field paramMs as we have assigned the parameter in the grammar before to a separate node. Its getText() method returns the value for this parameter as String. We only have to convert it to an integer value and then pass it to the delay() method of the Robot instance.

The implementation for the rule instructionCreateScreenCapture is shown in the following block:

@Override
public void exitInstructionCreateScreenCapture(@NotNull
    RobotParser.InstructionCreateScreenCaptureContext ctx) {
    int x = Integer.parseInt(ctx.x.getText());
    int y = Integer.parseInt(ctx.y.getText());
    int w = Integer.parseInt(ctx.w.getText());
    int h = Integer.parseInt(ctx.h.getText());
    LOGGER.info("Rectangle rectangle = new Rectangle(" + x + "," + y + 
        "," + w + "," + h + ")");
    Rectangle rectangle = new Rectangle(x, y, w, h);
    LOGGER.info("createScreenCapture(rectangle);");
    BufferedImage bufferedImage = robot.createScreenCapture(rectangle);
    File output = new File(ctx.file.getText());
    LOGGER.info("Save file to " + output.getAbsolutePath());
    try {
        ImageIO.write(bufferedImage, "png", output);
    } catch (IOException e) {
        throw new RuntimeException("Failed to write image file: " + e.getMessage(), e);
    }
}

The principle is the same as shown for the last instruction. The context object passed in has one field for each parameter and these string values have to be converted into integer values. With this information we can construct a Rectangle object, call the createScreenCapture() method of the Robot and store its BufferedImage.

Conclusion

Creating a specialized DSL for AWT’s Robot was easier than expected. The provided maven plugin creates all necessary classes out of the grammar file and therewith integrates smoothly into the build process. The resulting DSL can be used to automate simple mouse and keyboard tasks including the creation of screenshots.