Antlr4 contains an XPath engine within the runtimes of each target (Cpp, CSharp, Java, Python2, Python3; the implementations for Dart, Go, JavaScript, PHP, and Swift are missing), and documented here. While this XPath engine is a good start, it is not quite XPath version 1, misses many more features of XPath version2, which is what I feel is necessary for a compiler writer to explore debugging the grammar.

Thus, I have written an XPath2 engine to the CSharp target, which is a port of the Eclipse XPath2 code. This XPath engine offers a more realistic set of operations that compiler writer would need.

This document is a quick “cheatsheet” for this engine.

Terminology

Step

  • Steps are separated by ‘/’.
  • Absolute location path: /step/step/…
  • Relative location path: step/step/…

Each step is evaluated against the nodes in the current node-set.

A step consists of:

  • an axis (defines the tree-relationship between the selected nodes and the current node)
  • a node-test (identifies a node within an axis)
  • zero or more predicates (to further refine the selected node-set)

Syntax: axisname::nodetest[predicate]

Selectors

Descendant selectors

Expr Grammar Result
//expression Arithmetic.g4 select parser rules named “expression”
/ Arithmetic.g4 select the document (not the IParseTree root of grammar)
/file_ Arithmetic.g4 select parser rules named “file_” (root of the parse tree)
file_ Arithmetic.g4 select parser rules named “file_” (root of the parse tree)
//expression/expression Arithmetic.g4 select parser rules named “expression” that must have a parent named “expression”
//. Arithmetic.g4 select all attributes of all nodes
//* Arithmetic.g4 select all children of all nodes (non-attributes)

Attribute selectors

Trash provides attributes for Antlr4’s CommonParserRule and TerminalNodeImpl: ChildCount, SourceInterval, Start, End, Text.

Expr Grammar Result
//expression/@Text Arithmetic.g4 select text attribute of “expression”
//expression/@SI Arithmetic.g4 select “SourceInterval” attribute of “expression”
//expression/@ChildCount Arithmetic.g4 select number of children attribute of “expression”
//expression/@Start Arithmetic.g4 select start token index of “expression”
//expression/@End Arithmetic.g4 select end token index of “expression”
//SCIENTIFIC_NUMBER/@Text Arithmetic.g4 select text attribute of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@SI Arithmetic.g4 select “SourceInterval” attribute of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@ChildCount Arithmetic.g4 select number of children attribute of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@Start Arithmetic.g4 select start token index of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@End Arithmetic.g4 select end token index of “SCIENTIFIC_NUMBER”
//SCIENTIFIC_NUMBER/@* Arithmetic.g4 select all attributes of “SCIENTIFIC_NUMBER”

Order selection

Expr Grammar Result
/file_/expression[2] Arithmetic.g4 select second “expression” child of “file_” (root)
/file_/expression[last()] Arithmetic.g4 select last “expression” child of “file_” (root)
/file_/*[name()=”expression”][last()] Arithmetic.g4 select last “expression” child of “file_” (root) (uses predicate)

Predicates

Expr Grammar Result
//SCIENTIFIC_NUMBER[text()=’1’] Arithmetic.g4 select all SCIENTIFIC_NUMBER that have text ‘1’
//*[not(name()=”expression”)] Arithmetic.g4 select all but “expression”

Operators

Comparison

Expr Grammar Result
//.[name() = “expression”] Arithmetic.g4 select all “expression”
//.[name() != “expression”] Arithmetic.g4 select all but “expression”
//.[@ChildCount > 1] Arithmetic.g4 select all nodes that have more than one child (attributes and text are not children)

Logic (and/or)

Expr Grammar Result
//expression[@Start=”0” and @End=”0”] Arithmetic.g4 select all “expression” that have Start=0 and End=0
//expression[@Start=”0” or @Start>3] Arithmetic.g4 select all “expression” that have Start=0 or Start>3

Union

Expr Grammar Result
//expression | //SCIENTIFIC_NUMBER Arithmetic.g4 select all “expression” and SCIENTIFIC_NUMBER

Using node sets in predicates

Use them inside functions

Expr Grammar Result
//expression[count(expression) > 1] Arithmetic.g4 select all “expression” that has more than one “expression” children
//expression[expression] Arithmetic.g4 select all “expression” that has an expression child

Functions

Function Expr Result
name() //*/name() select the name of the node
text() //*/text() select the nodes with text
count(x) count(//*) count the number of nodes
position() //*/position() select the position of the node in the list

Examples

Extracting relationships between classes

Online XPath

The best XPath engine online is XPather.com. There are others (FreeFormatter.com, w3schools.com) but they are not as good.

A good reference for XPath is on Mozilla. You can also check devhints.io, Wikipedia, w3.org. Mulberry Tech slides on XPath2