Finding direct recursion in an Antlr grammar via XPath
This XPath expression is the first important rule I wrote for Antlr grammars:
'//parserRuleSpec[RULE_REF/text() = ruleBlock/ruleAltList/labeledAlt/alternative/*[name()="element"][1]/atom/ruleref/*[1]/text()]'
.
This expression finds all rules that have direct left recursion, and is an important step for removing indirect left recursion.
Example
grammar test;
s : a EOF;
a : a 'b1a' 'b1b' 'b1c'
| a 'b2a' 'b2b' 'b2c'
| 'c1a' 'c1b'
| 'c2'
| 'c3'
;
ra : 'b1a' 'b1b' 'b1c' ra
| 'b2a' 'b2b' 'b2c' ra
| 'c1a' 'c1b'
| 'c2'
| 'c3'
;
$ trparse -l -t ANTLRv4 *.g4 2> /dev/null | trxgrep '//parserRuleSpec[RULE_REF/text() = ruleBlock/ruleAltList/labeledAlt/alternative/*[name()="element"][1]/atom/ruleref/*[1]/text()]' | trcaret
L5: a : a 'b1a' 'b1b' 'b1c'
^
Note, the expression for right-recursion is similar.
$ trparse -l -t ANTLRv4 *.g4 2> /dev/null | trxgrep ' //parserRuleSpec[RULE_REF/text() = ruleBlock/ruleAltList/labeledAlt/alternative/*[name()="element"][last()]/atom/ruleref/*[1]/text()]' | trcaret
L12: ra : 'b1a' 'b1b' 'b1c' ra
^
–Ken