Porting extensions to Visual Studio 2019

I was recently was trying to use my Visual Studio extension for Antlr in Visual Studio 2019, when I found that it just isn’t working anymore. In fact, I couldn’t even install the extension because it wouldn’t even show up in the search for the plug-in. In fact, there weren’t any extensions available for “Antlr” for Visual Studio 2019! I guess I hadn’t ported the add-in to VS 2019, so I decided to work on that.

Getting it to install in VS 2019

Mads Kristensen says that you only need to change a few things to get an extension that worked in Visual Studio 2017 to work with Visual Studio 2019. That’s simply not true in my case. But, Kristensen’s suggested fixes do take care of the problem of adding the extension to Visual Studio 2019 when the .vsix file is double-clicked. And, it will likely take care of the problem of it not showing up when searching for the extension in VS 2019.

Top-level menus have changed

But, the main problem is an organizational issue. Most extension that I use in VS 2019 have the UI for the extension now appear under the Extensions menu. It is not clear how to do that. The documentation for adding a menu to the Visual Studio menu bar says little about “top-level menus”, except this “Note: In VS 2019, top-level menus contributed by extensions are placed under the Extensions menu.” This, of course, means absolutely nothing because it’s never explained anywhere how to create a top-level menu! In fact, there are only a handful of discussions found via Google Search of IDM_VS_TOOL_MAINMENU, which is a key implementation detail never described.

Starting with the VSSDK Extensibility Samples, Menu and Commands, the code to add a menu to Tools menu is right there and works. However, modifying the .vsct file for the parent menu is not apparent.

<Parent guid="guidSHLMainMenu" id="IDM_VS_MENU_TOOLS"/>

Replacing IDM_VS_MENU_TOOLS with IDM_VS_MENU_EXTENSIONS results in a compiler error:

...\VSSDK-Extensibility-Samples\Menu_And_Commands\C#\PkgCmd.vsct(112,40):
error VSCT1103: Undefined 'Parent/@id' attribute 'IDM_VS_MENU_EXTENSIONS'
in a <Group> element

Although IDM_VS_MENU_EXTENSIONS is listed as a valid menu in the documentation, I cannot use the constant like the other IDM_VS_MENU_… ID’s.

The solution to adding the menu group to the EXTENSIONS menu is to use IDM_VS_TOOL_MAINMENU.

<Parent guid="guidSHLMainMenu" id="IDM_VS_TOOL_MAINMENU"/>

Managed Extensibility Framework (MEF)

It seems that MEF (aka, unmanageable unextensibility fuck-up) has changed and no longer loads my extension. I’m not exactly sure how to debug this because the MEF Analysis Tool (MEFX) no longer exists. But, it isn’t clear how you use the tool as it’s pretty cryptic. But, it does point to some portions that I can comment out.

Long story short: It turns out I got a little “cut” happy with source.extension.manifest. If your extension uses MEF, do not delete the MEF reference.

<Assets>
    <Asset Type="Microsoft.VisualStudio.MefComponent"
       d:Source="Project" d:ProjectName="%CurrentProject%" 
           Path="|%CurrentProject%|" />
</Assets>

Migrating packages.config to PackageReference

In VS 2019, there is now an option to convert the packages.config to package references within the .csproj file. To do that, I had to upgrade to Net 4.7.2, then navigate to References in the Solution Manager and right-click. It then offered an option to do the migration. Note, if I removed the app.config file, or didn’t do the upgrade to Net 4.7.2, the option to migrate didn’t appear. Afterward, I could then remove the app.config file.

Posted in Tip

New directions for Piggy patterns

Up to now, I have been making an assumption for the Piggy transformation system that regular expressions and NFA recognizers can be used to match arbitrary ASTs. After all, a lot of research has been published for tree recognizers, including XPath, TreeRegex, and Tregex, which are NFA based. With all this, I lost sight of the problem of trying to match subtrees within an AST. Unfortunately, pattern matching of ASTs is not regular, and so cannot be recognized by an NFA. This blog entry goes over why NFAs cannot be used, and how Piggy patterns should work.

XPath is regular; AST grammars are not regular

At issue is the difference between a grammar that recognizes a path through a tree vs. a grammar that recognizes an entire tree.

XPath is a language that matches a particular path of a node through the tree from the root. A grammar that models the abbreviated form of an XPath expression is:

s -> ‘/’ ID s | ‘/’ ID ;

This grammar is regular because there is no production that has a non-terminal symbol in the middle of the right-hand side.

ASTs, as output from a Piggy serializer, is represented as a parenthesized expression. A grammar that recognizes an serialized AST is:

s -> ( ID s* ) ;

Ignoring the fact that this grammar uses the Kleene closure on a non-terminal, this grammar is not regular because the production contains a non-terminal in the middle of the right-hand side.

The conclusion is that Piggy expressions cannot be regular expressions, and they cannot be recognized by an NFA.

The “…” operator problem

In addition to the above problem, Piggy implicitly adds an operator to accept any tree nodes that are “irrelevant” to a matching pattern. This operator is used in Coccinelle, and it used to match arbitrary code [Padioleau 2006, 2007] when the pattern needs to refer to two or more non-adjacent code blocks separated by code that is “irrelevant“. Piggy offers a similar operator but differs in a number of ways.

In Piggy, the “…” operator is equivalent to “.*”, where “.” refers to an AstParserParser.NodeContext or AstParserParser.AttrContext, and “*” is the Kleene operator (see grammar for node and attr ASTs in Piggy). This contrasts with Coccinelle, where “…” can refer to any code path, including ones that are nested within another block. In Piggy, the irrelevant code operator is inserted automatically in the user pattern, before every
AstParserParser.NodeContext or AstParserParser.AttrContext. The reason for this is so the user doesn’t have to explicitly add it, which helps the readability of a pattern. In Coccinelle, the operator is explicit.

Unfortunately, adding the irrelevant code operator “…” to a pattern results in a pattern that is ambiguous, meaning there are multiple parse trees for a given input. How does the recognizer work and resolve the ambiguity in patterns? For an NFA, I ordered the edges of the NFA and performed a greedy pattern match. Unfortunately, this isn’t going to work for a CFG recognizer.

Examples with the irrelevant operator

Piggy pattern: ( classBodyDeclaration ( modifier )* ( memberDeclaration ( methodDeclaration ) ) )

Explicitly stated, the above pattern is equivalent to the following pattern:

( classBodyDeclaration … ( modifier … )* … ( memberDeclaration … ( methodDeclaration … ) … ) … )

For input “( classBodyDeclaration ( modifier ) (memberDeclaration ( methodDeclaration ) ) )”, the first “…” will match the empty string if a greedy pattern matcher is employed.

For input ” ( classBodyDeclaration (memberDeclaration ( methodDeclaration ) ) )”, the pattern … ( modifier … )* … will match the empty string because the “(memberDeclaration etc)” matches the pattern
( memberDeclaration … ( methodDeclaration … ) … ) …

In any case, the irrelevant operator is essential for Piggy. But, adding it makes the resulting pattern highly ambiguous.

What should patterns in Piggy look like?

At this point, I need to re-evaluate what a Piggy pattern looks like. There are really three different possibilities for a pattern:

  • a collection of visitor or listener functions;
  • a collection of grammar rules with integrated text and code blocks;
  • a parse string with integrated text and code blocks.

Let’s examine each of these alternatives.

Piggy patterns as AST visitors and listeners

With this method, Piggy patterns are simply text and code blocks that are executed via an AST traversal. The pattern matcher traverses the tree using a pre- or post-order traversal, which can be specified for the pattern.

The main issue here is that the pattern is now specified as a code predicate that indicates a match. The LHS symbol of the pattern, “ast”, means that the code is executed when an “ast” node type is traversed.

// Pattern to recognize and operate on an enum definition from the AST.
	ast -> {{ if (tree.ID == "enumspecifier"
		&& tree.Child(0).ID == "enumhead"
		&& tree.Child(0).Child(0).ID == "enumkey"
		&& tree.Child(0).Child(1).ID == "TOKEN")
		{
		...
		return true; // match
		}

There are several other issues specifying patterns in this manner. One issue is that in order to note that “return true;” is used to note to the engine that the predicate matches. Another issue is that the runtime must provide appropriate routines to skip nodes for the irrelevant operator. In the above example, Child(0) means to match exactly the first child of a node. In all likelihood, the pattern will require for-loops, or Linq expressions, to scan the children. Lastly, it is difficult to visualize the subtree structure based on C# code. That was always the reason for a parenthesized expression in Piggy for patterns.

Piggy patterns integrated into the Antlr grammar

Another possibility of expressing Piggy patterns is to use “action”-like items in the grammar for the tree, as used in YACC, or Antlr. Here, we need to distinguish the grammar for ASTs and the grammar used in the serialized parse, e.g., C++, because we want to write patterns in the syntax of the serialized AST.

enumhead
   : enumkey attributespecifierseq? Identifier?
	{{ if (Identifer != null) { ... } }}
	enumbase?
	;

The advantage of this method is that it works in much the same was as YACC or Antlr actions. The main disadvantage is that the rules much match exactly the syntax within the original grammar, so must be carefully duplicated from the grammar.

Antlr Tree Patterns

Antlr version 4 has a pattern matching mechanism to parse input according to a grammar and fill in placeholders for nodes of the parse tree. This feature appears useful for matching Piggy patterns, but it seems to have several shortcomings.

For example, consider the following code that takes a pattern and a Piggy AST, then tries to match the pattern against the tree.

private void Fun(string pat, string ast_string)
{
    var ast_stream = CharStreams.fromstring(ast_string);
    var ast_lexer = new AstLexer(ast_stream);
    var ast_tokens = new CommonTokenStream(ast_lexer);
    var ast_parser = new AstParserParser(ast_tokens);
    ast_parser.BuildParseTree = true;
    var listener = new ErrorListener<IToken>();
    ast_parser.AddErrorListener(listener);
    IParseTree ast_tree = ast_parser.ast();
    if (listener.had_error) throw new Exception();
    _ast = ast_tree;

    var lexer = new AstLexer(null);
    var parser = new AstParserParser(new CommonTokenStream(lexer));
    var ptpm = new ParseTreePatternMatcher(lexer, parser);
    var re = ptpm.Compile(pat, AstParserParser.RULE_node);
    var c = ast_tree.GetChild(0);
    var b = re.Matches(c);
    System.Console.WriteLine(b);
}

Antlr parse tree matching works well if the tree you are trying to match fits exactly in the number and types of children for each node, including patterns that have sub-trees.

Fun("( <ID> <attr> <node> )", @"( u a=""1"" ( v ) )"); => true
Fun("( <ID> )", @"( u )"); => true
Fun("( <ID> ( <ID> ) )", @"( u ( v ) )"); => true

However, Antlr tree patterns do not work for patterns that contain an irrelevant operator, since there is no way to specify EBNF within the pattern.

Fun("( <ID> <attr> )", @"( u a=""1"" ( v ) )"); => false

Tagged nodes must be a left-hand symbol of a rule. Again, there is no way to use EBNF in a parse string pattern.

Fun("( <ID> <node | empty> )", @"( u ( v ) )"); => throws exception.

Parse tree patterns might be more useful if tagged items could contain the irrelevant operator and BNF syntax.

Conclusions

Some lessons are hard to learn. Piggy patterns cannot be expressed in XPath-like syntax because they need to express a whole subtree. We have looked at a number of ways to express patterns. The most promising seems to be something like Antlr’s parse tree patterns, perhaps even in code pre-serialized. But, it’s unclear how these would look in Piggy.

References and Further Reading

Padioleau, Y., Hansen, R.R., Lawall, J.L. and Muller, G., 2006, October. Semantic patches for documenting and automating collateral evolutions in Linux device drivers. In Proceedings of the 3rd workshop on Programming languages and operating systems: linguistic support for modern operating systems (p. 10). ACM.

Padioleau, Y., Lawall, J.L. and Muller, G., 2007. SmPL: A domain-specific language for specifying collateral evolutions in Linux device drivers. Electronic Notes in Theoretical Computer Science166, pp.47-62.

https://regular-expressions.mobi/lookaround.html?wlr=1

https://www.regular-expressions.info/lookaround.html

https://nlp.stanford.edu/software/tregex.html

https://pdfs.semanticscholar.org/0779/c4686c66dbbf1d1ccf41c49cea4c75b73d05.pdf

Posted in Tip

Rewriting the pattern matching engine — part 2

Patterns in Piggy are regular expressions of parenthesized expressions that represent trees. The conversion of the regular expressions to a finite state automaton is easy (via Thompson’s Construction), but the resulting automaton for the pattern may not work for an important reason: patterns that specify attributes and children of an AST node have an implicit “…” between each attribute or child pattern. While it’s possible to introduce additional states to recognize “…” between each attribute or child node in the pattern, the resulting automaton is ambiguous. For every input AST node, any attribute or child AST node can mismatch a “sub-pattern”, but can still match the complete pattern.

Continue reading
Posted in Tip

Speeding up this website

A few months ago, I started to migrate some of my websites to DigitalOcean because the cost of a virtual server is $5/mo. So, I moved CodingGorilla.com to the new host. (Note, a long story, but the name came from an old boss, who saw I have the patience of a saint and attention to minute details, the traits of any good programmer.)

Unfortunately, the website has been painfully slow because I was told that you should keep your database and web server on separate hardware. This may be fine for large corps which have their servers on a fast LAN, but this was the wrong advice for a blog. I moved the MySQL database to the web server, and now the website works ~100x faster. Adage for the day: Believe half of what you see and nothing of what you hear.

Incidentally, the tool which I used to find this problem is Query Monitor by John Blackbourn (plugin page, website). I flags the slow queries, places the runtime for each query in a table, which you can then copy and paste into Excel to compute the total time required for the queries.

Posted in Tip

Rewriting the pattern matching engine — part 1

For the last two weeks, I’ve trying to write Piggy patterns to construct a symbol table from a Java AST. Patch after patch, I’d change the pattern matching code to “fix” something that wasn’t working. Unfortunately, I finally wrote a pattern that broke the camel’s back, “< classBodyDeclaration < modifier >* < memberDeclaration < methodDeclaration > > >”, which looks for methods within a class.

So, I decided to rewrite the engine the way it should have been done: using an NFA. It’s so far taken a week or so, but it turns that the pattern matcher is much cleaner, and likely faster. In addition, the output engine–which executes the code blocks in the pattern–is also much cleaner. I’ll try to see if I can combine the patterns together in one automaton.

I really should have known better than to approach the tree regular expression matching problem following what other people did, using a top-down recursive recognizer. Live and learn. Always follow a clean, clear theory instead of just hacking.

Posted in Tip

Series on program transformation systems: Coccinelle (2006)

Due to my work on Piggy, I’m starting to do a thorough review of the literature on program transformation systems, how Piggy relates to prior research, and what improvements I can make to Piggy. Note, a good list to start from is in Wikipedia: ASF+SDF, CIL (for C), Coccinelle (for C), DMS, Fermat, Spoon (for Java), Stratego/XT, TXL). This is the first entry in the series, on Coccinelle, a system that modifies C source code.

Continue reading
Posted in Tip