Converting XText grammars into Antlr4
Recently, a developer asked a question about using an Antlr grammar generated from an XText grammar. The grammar, which was added to a Git version control repo, was generated from an XText file elsewhere in the Git repo. Beyond the fact that the generated grammar was completely out of sync from the XText source, it seemed like the grammar should compile using the Antlr3 tool, but it didn’t. The Antlr3 grammar contained action code for tree construction using XText libraries.
But, could the Antlr3 grammar be converted to Antlr4 without the action code? Of course, Trash contains a tool to do just that.
Christian Dietrich, the lead developer for Eclipse XText, suggested that one just generate a debug version of the grammar. This note explains how to generate an Antlr3 grammar from an XText file using Eclipse.
Generating an Antlr grammar from XText
According to Wikipedia,
Xtext is an open-source software framework for developing programming languages and domain-specific languages (DSLs). Unlike standard parser generators, Xtext generates not only a parser, but also a class model for the abstract syntax tree, as well as providing a fully featured, customizable Eclipse-based IDE.
Unfortunately, XText is tightly bound to the Eclipse IDE. The generated parser has many dependencies with the IDE, which makes it difficult to use.
To get started, you need to download the IDE, and install “Eclipse IDE for Java and DSL Developers”. Afterward, you can then create a sample XText, and generate an Antlr3 grammar from it. The “15 Minute Tutorial” is helpful, but you can do this even faster by accepting the defaults.
Create a fresh Workspace. For me, this was a stumbling block because I have no idea what the terminology is. A “workspace” is a directory that contains the source code and Eclipse-specific files. To create a free Workspace, start Eclipse. This form will pop open where you can type in the path to a directory that you want to be created.
Afterwards, this window appears.
You can now do a File -> New -> Project
, where this windows opens, and you
can then create a new XText project:
Creating an XText project involves inputting information for that project, including the name of the domain-specific language. You can just accept all the defaults.
After clicking on Finish, it takes you right back to the previous window, where you are thinking what just happened? Yes, it looks the same, but now close off the “Welcome” sub-window, and you have full view of the program and grammar.
Next, you will need to alter the properties for the project so that a trimmed-down
version of the Antlr3 grammar is produced. In the Package Explorer, navigate to
"org.xtext.example.mydsl" -> GenerateMyDsl.mwe2
file. In the file, add a
few lines: “parserGenerator = { debugGrammar = true }” within the “language = StandardLanguage” section.
Finally, you are now in position to generate the Antlr3 grammar for the XText
grammar. Right-click anywhere in the grammar, and select from the pop-up menu
Run As -> Generate Xtext Artifacts
. After the code has been generated, you
can fish around for the grammar using Bash and find.
The generated Antlr grammar is “DebugInternalMyDsl.g”.
It’s not hard to convert an XText grammar to Antlr3 outside the IDE, but for sure it’s not easy. And, although I haven’t checked, I suspect that naked XText grammars (the grammar plus any included grammars) can’t be converted in absence of the Eclipse project information. And, it is dog slow to start up an instance of Eclipse–20 seconds or so on my 8 core Ryzen system.
Generating an Antlr4 grammar from XText
Trash currently doesn’t convert XText grammars directly to Antlr4, but it can convert the Antlr3-generated grammar from Eclipse into Antlr4. But, the main problem with the Antlr3 conversion is that it doesn’t take care of the extra parentheses that XText defines for some basic rules like that for RULE_STRING and RULE_SL_COMMENT:
RULE_STRING : ('"' ('\' .|~(('\'|'"')))* '"'|'\'' ('\' .|~(('\'|'\'')))* '\'');
RULE_SL_COMMENT : '//' ~(('\n'|'\r'))* ('\r'? '\n')? {skip();};
Trash does have a command to remove useless parentheses, but it also has a bug in being a little too greedy in doing so. Consequently, after you generate an Antlr3 “debug” grammar, edit the grammar to remove the extra parentheses. Then, using Trash, you can convert the file to Antlr4:
trparse -t antlr3 InternalDebugFoobar.g | trconvert | trsponge
Conversion of XText grammars directly to Antlr4 should be possible, but it is currently not implemented. But, it should not be too hard to do so. A XText grammar would require changes in syntax to fix XText scoping, also known as cross-referencing. Instead of using a terminal or non-terminal in place on a right-hand side element for a rule, XText uses a reference to another symbol’s type.
Model: packages+=Pack*;
Pack: 'package' name=ID '{' (defs+=Def | calls+=Call)* '}';
Def: 'def' name=ID;
Call: 'call' ref=[Def]; // same as "ref=[Def|ID]"
// From https://goto40.github.io/self-dsl/xtext_scoping/
When Eclipse converts the grammar to Antlr3, it substitutes the reference “ref=[Def]” with the “name” attribute value “ID”.
I find the syntax that XText uses for scoping confusing. Worse, it embeds static semantics constructs in the grammar as opposed to placing the static semantics in a separate specification–a separate file. I will have more to say on this later.