Parser rule symbols

Unreferenced parser symbols in a grammar are rules that have a defining occurence but no applied occurence. Grammars that have EOF-terminated start rules would qualify as an unreferenced symbol, but it is used outside the grammar in a driver program. Instead, we are interested in parser rule symbols that are simply not used for any reason.

Why are these rules important? Unreferenced parser symbols indicate a grammar that may be ill-defined. It may be that the LHS symbol was referenced, but somewhere along the way the RHS was changed, removing the applied occurence.

How do we find these symbols? This Trash script can be used.

#
echo Finding unused parser symbols in grammars...
for i in `find . -name desc.xml | grep -v Generated\*`
do
  echo $i
  d=`dirname $i`
  pushd $d > /dev/null 2>&1
  # Parse all grammar files so that any imports can also be checked.
  trparse *.g4 2> /dev/null | trxgrep ' //parserRuleSpec[not(doc("*")//ruleBlock//RULE_REF/text() = ./RULE_REF/text()) and not(./ruleBlock//TOKEN_REF/text() = "EOF")]/RULE_REF' | trtext
  popd > /dev/null 2>&1
done

Lexer rule symbols

Unreferenced lexer symbols in a grammar are similar to unreferenced parser rule symbols but with a twist. In addition to not occurring on the RHS of a rule–either parser or lexer rule–the lexer symbol cannot have a lexer command of skip or channel. (It is possible to have the actions in the lexer rule perform skip or set a channel, but these are not easy to check.) In addition, parser rule string literals can reference an underlying lexer rule with the string literal. So, parser rule string literals have to be folded. But, because trfoldlit modifies the tree using string replacements, the grammar must be reparsed.

#
echo Finding unused lexer symbols in grammars...
for i in `find . -name desc.xml | grep -v Generated\*`
do
  echo $i
  d=`dirname $i`
  pushd $d > /dev/null 2>&1
  # Parse all grammar files so that any imports can also be checked.
  rm -rf foobarfoobar
  mkdir foobarfoobar
  cp *.g4 foobarfoobar
  cd foobarfoobar
  trparse *.g4 2> /dev/null | trfoldlit | trsponge -c true 2> /dev/null
  trparse *.g4 2> /dev/null | \
	trxgrep ' //lexerRuleSpec[
		not(doc("*")//ruleBlock//TOKEN_REF/text() = ./TOKEN_REF/text())
		and not(doc("*")//lexerRuleBlock//TOKEN_REF/text() = ./TOKEN_REF/text())
		and not(./lexerRuleBlock//lexerCommands)
		]/TOKEN_REF' | \
	trtext
  popd > /dev/null 2>&1
done