XPath (1, 2, 3) is a language for finding nodes in an XML tree, and has a long history in AST search. Maletic et al. (4) is probably the first paper on XPath used on ASTs, albeit representing an AST as XML. It was further researched and is now part of the OSS world (5). In 2014, Parr added to Antlr releases an XPath API to search Antlr-generated ASTs (6). Src-d uses XPath and an engine for “universal” ASTs (7).
Piggy patterns are similar to XPath expressions, and there is a simple grep function to the Piggy Tool. Beyond the superficial difference in syntax, Piggy patterns differ from XPath patterns in two ways.
First, the Piggy and XPath search engines do different things. XPath patterns select a list of nodes or attributes in the frontier of a tree. Piggy patterns find a partial subtree of nodes and intersperse output between nodes. Second, XPath describes one search path (i.e., it’s one expression). Piggy is really a template engine, more like XSLT. Piggy has multiple “passes” (i.e., it allows for more than one expression). After matching and selecting nodes, further nodes in the tree are considered. However, the pattern matching engine eliminates further matches from that root of the matching sub-tree. In this regard, Piggy patterns are more like “visitor patterns”, but should be extended for “listener patterns” (8), so as to be used in symbol table construction.
XPath and Piggy pattern syntax comparison
|bookstore||( bookstore )||Selects all nodes with the name “bookstore”|
|/bookstore||no equivalent; you must use an explicit top-level node name with Kleene star||Selects the root element bookstore|
|//book||(* book *)||Selects all book elements no matter where they are in the document|
|bookstore//book||( bookstore (* book *) )||Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element|
|//@lang||(* lang=* *) Note: Piggy cannot select attributes of an AST, only the nodes themselves. However, it is possible to find nodes with specific attribute values (see below), or nodes missing a particular attribute.||XPath: Selects all attributes that are named lang. Piggy: selects the NODES of the AST that have an attribute lang with any name.|
|//title[@lang]||(* title lang=* *)||Selects all the title elements that have an attribute named lang.|
|//title[@lang=”en”]||(* title lang=”en” *)||Selects all the title elements that have a “lang” attribute with a value of “en”|
|/bookstore/book[price>35.00]/title||No numeric comparison (>) in Piggy, everything is a string. Expressions are RegEx patterns||Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00|
|//book/title | //book/price||(% (* book (title) ) | ( book (price) *) %)||Selects all the title AND price elements of all book elements|
A few other notes. I’ve found the documentation used to describe “nodes” confusing in XPath. According to the XPath Spec (9), tutorials and Wikipedia page on XPath, XPath is a notation for selecting “nodes”, including “attribute nodes”. But you should be clear: the XML spec itself never uses the word “node” (10) for elements. Just remember that XML describes a tree, and XPath describes a collection of nodes in the tree.
XPath has a notation for directly addressing parent and sibliing “axes”. Piggy does not. The reason is that Piggy ties the output and code content to the AST structure, in order, as a tree. Introducing parent accessor functions would complicate what it would mean to insert code or text during the traversal of the AST for code generation.
- XPath, https://en.wikipedia.org/wiki/XPath, accessed Jan 12, 2019
- XPath Syntax, https://www.w3schools.com/xml/xpath_syntax.asp, accessed Jan 12, 2019
- Maletic, Jonathan I., Michael L. Collard, and Andrian Marcus. “Source code files as structured documents.” Program comprehension, 2002. proceedings. 10th international workshop on. IEEE, 2002.
- https://www.srcml.org/, accessed Jan 12, 2019
- Parse Tree Matching and XPath, https://github.com/antlr/antlr4/blob/master/doc/tree-matching.md, accessed Jan 12, 2019
- Src-d, https://github.com/src-d/engine/blob/master/README.md, access Jan 12, 2019.
- Antlr4 Visitor vs Listener Pattern, https://saumitra.me/blog/antlr4-visitor-vs-listener-pattern/, accessed Jan 12, 2019.
- https://www.w3.org/TR/1999/REC-xpath-19991116/, accessed Jan 12, 2019
- https://www.w3.org/TR/REC-xml/#attdecls, access Jan 12, 2019
Updated July 25 2022 by Ken Domino