Home » Software Development » Antlr 4 grammar with spaces

About Vadim Korkin

Vadim is a senior software engineer with lots of experience with different software technologies including Java, Javascript, databases (Oracle and Postgres), HTML/CSS and even machine learning. He likes learning new technologies and using the latest libraries in his work. Vadim also has some personal projects on Github

Antlr 4 grammar with spaces


This post discusses the difficulties and solutions of creating an Antlr grammar without removing spaces from the text being parsed. But first, a brief explanation of what Antlr really is.

Here is the link https://github.com/antlr/antlr4. Antlr can take a text with a well-defined rules and create a syntax tree out of it. It is basically what it is so good for. The user must create the grammar of the DSL (domain-specific language) and feed it to the tool. The tool generates the parser code. It can be a Java code or Javascript code. Antlr has no less than 5 targets (supported languages). This generated parser code can parse the text and create an abstract syntax tree. Then the user can use the generated visitor (it is the well-known visitor pattern) to walk the AST to get something out of it. 


 Most Antlr grammars remove the whitespace in the lexer like this:

 : [ \r\n\t] + -> skip

This definitely makes parsing somewhat easier as it transforms this expression:

    q1 + q2 / f2 – ( 4 + 5 )

into a much simpler form:


Without these spaces it is much easier to write the rules for the Antlr grammar.

But in some cases, it is not possible to remove all spaces because they may be part of a variable name. In the case, I was working on variables were with single quotes like this: ‘Variable 1’. Spaces were part of the variable name. The grammar had to take into account that a space could be part of the variable name but also could be a “useless” character that the user put in for readability like a space after a comma. Special handling of spaces is required to parse this expression:

‘Variable ABC’ * 5 + ‘Variable Q’*( 1 + ‘Variable W’)


The idea is as follows: if we cannot remove spaces from the text we have to “pull” spaces into parenthesis, operation signs, etc. 

The rules for the operators and parenthesis must change to pull in spaces. Firstly, we need to declare the tokens for spaces:

ANY_SPACE:                    SINGLE_SPACE+;

SINGLE_SPACE:                 ‘ ‘;

We cannot use SINGLE_SPACE* because the Antlr tool correctly shows that SINGLE_SPACE* can match an empty string. 

Then incorporate these tokens into rules for operators (MUL is ‘*’ and Div is ‘/’, rParameterSeparator is for the comma that separates function actual parameters

rLeftVarParenthesis:          L_VAR_PAREN ANY_SPACE | L_VAR_PAREN;

rArithmeticOperatorMulDiv:    ANY_SPACE (MUL | DIV) ANY_SPACE | ANY_SPACE (MUL | DIV) | (MUL | DIV) ANY_SPACE | (MUL | DIV);

rParameterSeparator:          ANY_SPACE ‘,’ ANY_SPACE | ANY_SPACE ‘,’ | ‘,’ ANY_SPACE | ‘,’;

Now, this grammar can handle variables with spaces in their names. The visitor when visiting the operator rule can join the text from the children and use trim to remove all unnecessary space. In Javascript it may look like this:

return ctx.children.join(”).trim();

This simple and elegant solution allows for parsing variable names with spaces. 

Published on Java Code Geeks with permission by Vadim Korkin, partner at our JCG program. See the original article here: Antlr 4 grammar with spaces

Opinions expressed by Java Code Geeks contributors are their own.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!


1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

6. Spring Interview Questions

7. Android UI Design


and many more ....


Receive Java & Developer job alerts in your Area

I have read and agree to the terms & conditions


Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Inline Feedbacks
View all comments