Antlr4 contains an XPath engine within the runtimes of each target (Cpp, CSharp, Java, Python2, Python3; the implementations for Dart, Go, JavaScript, PHP, and Swift are missing), and documented here. While this XPath engine is a good start, it is not quite XPath version 1, misses many more features of XPath version2, which is what I feel is necessary for a compiler writer to explore debugging the grammar.

Thus, I have written an XPath2 engine to the CSharp target, which is a port of the Eclipse XPath2 code. This XPath engine offers a more realistic set of operations that compiler writer would need.

This document is a quick “cheatsheet” for this engine.

Terminology

Step

  • Steps are separated by ‘/’.
  • Absolute location path: /step/step/…
  • Relative location path: step/step/…

Each step is evaluated against the nodes in the current node-set.

A step consists of:

  • an axis (defines the tree-relationship between the selected nodes and the current node)
  • a node-test (identifies a node within an axis)
  • zero or more predicates (to further refine the selected node-set)

Syntax: axisname::nodetest[predicate]

Selectors

Descendant selectors

Expr Grammar Result
//expression Arithmetic.g4 select parser rules named “expression”
/ Arithmetic.g4 select the document (not the IParseTree root of grammar)
/file_ Arithmetic.g4 select parser rules named “file_” (root of the parse tree)
file_ Arithmetic.g4 select parser rules named “file_” (root of the parse tree)
//expression/expression Arithmetic.g4 select parser rules named “expression” that must have a parent named “expression”
//. Arithmetic.g4 select all attributes of all nodes
//* Arithmetic.g4 select all children of all nodes (non-attributes)

Attribute selectors

Trash provides attributes for Antlr4’s CommonParserRule and TerminalNodeImpl: ChildCount, SourceInterval, Start, End, Text.

Expr Grammar Result
//expression/@Text Arithmetic.g4 select text attribute of “expression”
//expression/@SI Arithmetic.g4 select “SourceInterval” attribute of “expression”
//expression/@ChildCount Arithmetic.g4 select number of children attribute of “expression”
//expression/@Start Arithmetic.g4 select start token index of “expression”
//expression/@End Arithmetic.g4 select end token index of “expression”
//SCIENTIFIC_NUMBER/@Text Arithmetic.g4 select text attribute of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@SI Arithmetic.g4 select “SourceInterval” attribute of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@ChildCount Arithmetic.g4 select number of children attribute of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@Start Arithmetic.g4 select start token index of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@End Arithmetic.g4 select end token index of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@* Arithmetic.g4 select all attributes of “SCIENTIFIC_NUMBER”

Order selection

Expr Grammar Result
/file_/expression[2] Arithmetic.g4 select second “expression” child of “file_” (root)
/file_/expression[last()] Arithmetic.g4 select last “expression” child of “file_” (root)
/file_/*[name()=”expression”][last()] Arithmetic.g4 select last “expression” child of “file_” (root) (uses predicate)

Predicates

Expr Grammar Result
//SCIENTIFIC_NUMBER[text()=’1’] Arithmetic.g4 select all SCIENTIFIC_NUMBER that have text ‘1’
//*[not(name()=”expression”)] Arithmetic.g4 select all but “expression”

Operators

Comparison

Expr Grammar Result
//.[name() = “expression”] Arithmetic.g4 select all “expression”
//.[name() != “expression”] Arithmetic.g4 select all but “expression”
//.[@ChildCount > 1] Arithmetic.g4 select all nodes that have more than one child (attributes and text are not children)

Logic (and/or)

Expr Grammar Result
//expression[@Start=”0” and @End=”0”] Arithmetic.g4 select all “expression” that have Start=0 and End=0
//expression[@Start=”0” or @Start>3] Arithmetic.g4 select all “expression” that have Start=0 or Start>3

Union

Expr Grammar Result
//expression | //SCIENTIFIC_NUMBER Arithmetic.g4 select all “expression” and SCIENTIFIC_NUMBER

Using node sets in predicates

Use them inside functions

Expr Grammar Result
//expression[count(expression) > 1] Arithmetic.g4 select all “expression” that has more than one “expression” children
//expression[expression] Arithmetic.g4 select all “expression” that has an expression child

Functions

Function Expr Result
name() //*/name() select the name of the node
text() //*/text() select the nodes with text
count(x) count(//*) count the number of nodes
position() //*/position() select the position of the node in the list

Examples

Extracting relationships between classes

Based on:

https://devhints.io/xpath
https://www.w3schools.com/xml/xpath_axes.asp#:~:text=%2Fstep%2Fstep%2F...&text=Each%20step%20is%20evaluated%20against,in%20the%20current%20node%2Dset.&text=an%20axis%20(defines%20the%20tree,a%20node%20within%20an%20axis)