Finding unreferenced lexer and parser symbols
Parser rule symbols
Unreferenced parser symbols in a grammar are rules that have a defining occurence but no applied occurence. Grammars that have EOF-terminated start rules would qualify as an unreferenced symbol, but it is used outside the grammar in a driver program. Instead, we are interested in parser rule symbols that are simply not used for any reason.
Why are these rules important? Unreferenced parser symbols indicate a grammar that may be ill-defined. It may be that the LHS symbol was referenced, but somewhere along the way the RHS was changed, removing the applied occurence.
How do we find these symbols? This Trash script can be used.
#
echo Finding unused parser symbols in grammars...
for i in `find . -name desc.xml | grep -v Generated\*`
do
echo $i
d=`dirname $i`
pushd $d > /dev/null 2>&1
# Parse all grammar files so that any imports can also be checked.
trparse *.g4 2> /dev/null | trxgrep ' //parserRuleSpec[not(doc("*")//ruleBlock//RULE_REF/text() = ./RULE_REF/text()) and not(./ruleBlock//TOKEN_REF/text() = "EOF")]/RULE_REF' | trtext
popd > /dev/null 2>&1
done
Lexer rule symbols
Unreferenced lexer symbols in a grammar are similar to unreferenced parser rule symbols but with a twist. In addition to not occurring on the RHS of a rule–either parser or lexer rule–the lexer symbol cannot have a lexer command of skip or channel. (It is possible to have the actions in the lexer rule perform skip or set a channel, but these are not easy to check.) In addition, parser rule string literals can reference an underlying lexer rule with the string literal. So, parser rule string literals have to be folded. But, because trfoldlit modifies the tree using string replacements, the grammar must be reparsed.
#
echo Finding unused lexer symbols in grammars...
for i in `find . -name desc.xml | grep -v Generated\*`
do
echo $i
d=`dirname $i`
pushd $d > /dev/null 2>&1
# Parse all grammar files so that any imports can also be checked.
rm -rf foobarfoobar
mkdir foobarfoobar
cp *.g4 foobarfoobar
cd foobarfoobar
trparse *.g4 2> /dev/null | trfoldlit | trsponge -c true 2> /dev/null
trparse *.g4 2> /dev/null | \
trxgrep ' //lexerRuleSpec[
not(doc("*")//ruleBlock//TOKEN_REF/text() = ./TOKEN_REF/text())
and not(doc("*")//lexerRuleBlock//TOKEN_REF/text() = ./TOKEN_REF/text())
and not(./lexerRuleBlock//lexerCommands)
]/TOKEN_REF' | \
trtext
popd > /dev/null 2>&1
done