XPath cheatsheet for Antlr parse trees
Antlr4 contains an XPath engine within the runtimes of each target (Cpp, CSharp, Java, Python2, Python3; the implementations for Dart, Go, JavaScript, PHP, and Swift are missing), and documented here. While this XPath engine is a good start, it is not quite XPath version 1, misses many more features of XPath version2, which is what I feel is necessary for a compiler writer to explore debugging the grammar.
Thus, I have written an XPath2 engine to the CSharp target, which is a port of the Eclipse XPath2 code. This XPath engine offers a more realistic set of operations that compiler writer would need.
This document is a quick “cheatsheet” for this engine.
Terminology
Step
- Steps are separated by ‘/’.
- Absolute location path: /step/step/…
- Relative location path: step/step/…
Each step is evaluated against the nodes in the current node-set.
A step consists of:
- an axis (defines the tree-relationship between the selected nodes and the current node)
- a node-test (identifies a node within an axis)
- zero or more predicates (to further refine the selected node-set)
Syntax: axisname::nodetest[predicate]
Selectors
Descendant selectors
Expr | Grammar | Result |
//expression | Arithmetic.g4 | select parser rules named “expression” |
/ | Arithmetic.g4 | select the document (not the IParseTree root of grammar) |
/file_ | Arithmetic.g4 | select parser rules named “file_” (root of the parse tree) |
file_ | Arithmetic.g4 | select parser rules named “file_” (root of the parse tree) |
//expression/expression | Arithmetic.g4 | select parser rules named “expression” that must have a parent named “expression” |
//. | Arithmetic.g4 | select all attributes of all nodes |
//* | Arithmetic.g4 | select all children of all nodes (non-attributes) |
Attribute selectors
Trash provides attributes for Antlr4’s CommonParserRule and TerminalNodeImpl: ChildCount, SourceInterval, Start, End, Text.
Expr | Grammar | Result |
//expression/@Text | Arithmetic.g4 | select text attribute of “expression” |
//expression/@SI | Arithmetic.g4 | select “SourceInterval” attribute of “expression” |
//expression/@ChildCount | Arithmetic.g4 | select number of children attribute of “expression” |
//expression/@Start | Arithmetic.g4 | select start token index of “expression” |
//expression/@End | Arithmetic.g4 | select end token index of “expression” |
//SCIENTIFIC_NUMBER/@Text | Arithmetic.g4 | select text attribute of “SCIENTIFIC_NUMBER” |
//SCIENTIFIC_NUMBER/@SI | Arithmetic.g4 | select “SourceInterval” attribute of “SCIENTIFIC_NUMBER” |
//SCIENTIFIC_NUMBER/@ChildCount | Arithmetic.g4 | select number of children attribute of “SCIENTIFIC_NUMBER” |
//SCIENTIFIC_NUMBER/@Start | Arithmetic.g4 | select start token index of “SCIENTIFIC_NUMBER” |
//SCIENTIFIC_NUMBER/@End | Arithmetic.g4 | select end token index of “SCIENTIFIC_NUMBER” |
//SCIENTIFIC_NUMBER/@* | Arithmetic.g4 | select all attributes of “SCIENTIFIC_NUMBER” |
Order selection
Expr | Grammar | Result |
/file_/expression[2] | Arithmetic.g4 | select second “expression” child of “file_” (root) |
/file_/expression[last()] | Arithmetic.g4 | select last “expression” child of “file_” (root) |
/file_/*[name()=”expression”][last()] | Arithmetic.g4 | select last “expression” child of “file_” (root) (uses predicate) |
Predicates
Expr | Grammar | Result |
//SCIENTIFIC_NUMBER[text()=’1’] | Arithmetic.g4 | select all SCIENTIFIC_NUMBER that have text ‘1’ |
//*[not(name()=”expression”)] | Arithmetic.g4 | select all but “expression” |
Operators
Comparison
Expr | Grammar | Result |
//.[name() = “expression”] | Arithmetic.g4 | select all “expression” |
//.[name() != “expression”] | Arithmetic.g4 | select all but “expression” |
//.[@ChildCount > 1] | Arithmetic.g4 | select all nodes that have more than one child (attributes and text are not children) |
Logic (and/or)
Expr | Grammar | Result |
//expression[@Start=”0” and @End=”0”] | Arithmetic.g4 | select all “expression” that have Start=0 and End=0 |
//expression[@Start=”0” or @Start>3] | Arithmetic.g4 | select all “expression” that have Start=0 or Start>3 |
Union
Expr | Grammar | Result |
//expression | //SCIENTIFIC_NUMBER | Arithmetic.g4 | select all “expression” and SCIENTIFIC_NUMBER |
Using node sets in predicates
Use them inside functions
Expr | Grammar | Result |
//expression[count(expression) > 1] | Arithmetic.g4 | select all “expression” that has more than one “expression” children |
//expression[expression] | Arithmetic.g4 | select all “expression” that has an expression child |
Functions
Function | Expr | Result |
name() | //*/name() | select the name of the node |
text() | //*/text() | select the nodes with text |
count(x) | count(//*) | count the number of nodes |
position() | //*/position() | select the position of the node in the list |
Examples
Extracting relationships between classes
Online XPath
The best XPath engine online is XPather.com. There are others (FreeFormatter.com, w3schools.com) but they are not as good.
A good reference for XPath is on Mozilla. You can also check devhints.io, Wikipedia, w3.org. Mulberry Tech slides on XPath2