New — Antlr4BuildTasks.Templates

For those interested in creating an Antlr4 program using C#, I wrote a dotnet package and uploaded it to Nuget. There is similar functionality in the VS2019 extension AntlrVSIX, but I am starting to move towards a Language Server Protocol client/server implementation for Antlr.

It’s finally pretty apparent that VS IDE will eventually go away once “complex workflows” are implemented in VS Code (e.g., attach to process). This is rather sad because VS Code’s UI is actually kind of lousy, just my opinion. But, it is faster than VS IDE, which isn’t saying much because just about everything is faster than VS IDE. However, MS can’t put many resources into a product that’s free, isn’t open-source, and which isn’t being extended much by third parties anymore. Why should third parties write extensions for a tool in which the fundamental infrastructure is stagnant? After 20 or 30 years, it’s still a 32-bit app, still uses COM interfaces that are nearly impossible to figure out, and still doesn’t have any extensions for supporting any other languages except C#, Javascript, C++, VB, and F#. As of this post, there are 2955 extensions for VS2019, yet 14993 for VS Code, which only first released in 2015. Contrary to what others may try to convince themselves, the future of VS IDE isn’t so bright.

Posted in Tip

Getting VS2019 to work with the Clangd LSP server

Continuing with my work in making Antlrvsix a Language Server Protocol server implementation, I created another extension for Visual Studio 2019 that uses the client API Microsoft.VisualStudio.LanguageServer.Client, this time with the Clangd LSP server. The extension source code is here. But, there is a hitch…

This client does not work out-of-the-box with version 9 of Clangd (LLVM). After a bit of bantering about what the LSP protocol means, I learned that the Clangd LSP server, while it follows the LSP spec, it does not follow the intent of the LSP protocol that any client can work with any server. In the spec, TextDocumentSyncKind is used to tell the client whether the server requires open requests and incremental vs. full file content updates. Clangd indicates that it is an “Incremental” server (“Documents are synced by sending the full content on open. After that only incremental updates to the document are send.”) Whether this also means that an open is required even for language feature calls, the spec isn’t so clear-cut, so anyone can interpret the spec however they want. But Clangd assumes it is required, even for language feature requests. As it goes, the server rejects all requests from client after the client/server initialization conversation, so the server never opens the files, never emits a diagnostics notification to the client, and the client never issues any open. It’s a complete stalemate.

The problem here is two-fold: (1) the spec is ambigous, and (2) Clangd assumes the worst case possible, that an open is required even for language feature requests. Unfortunately, open is essentially a locking mechanism, where the “truth” is a “write lock” on the entire contents of the file. Clearly, requiring a write lock on a file in order to use the server for language markup is ridiculous, but I can’t seem to convince anyone that there are multiple problems here. The fact is that Clangd already reads #include files on disk without locking each of those files, a fact the developers ignore.

When I made a change request to Clangd to back off on this assumption, a change that affects 0.5% of the source code, it was rejected based on being too complex. Apparently the authors insist that the client issue an open request immediately after initialization. But, the change does work.

Even with the change, the server took 60+ s to parse and perform semantic analysis of a 7-line “Hello World” program before able to process Language Feature requests. (It was a debug build, so I will check it against a release build.) Pre-indexing with the tool clangd-indexer did not improve the processing time. Surprisingly, in VSCode the server responds in a few seconds.

With changes in the master branch of LLVM post-release 9.x, the MS VS IDE client code (https://www.nuget.org/packages/Microsoft.VisualStudio.LanguageServer.Client/16.3.57) no longer works at all with Clangd due to an unrecognized initialization response packet returned from the server. Other operations such as “go to def”, “find all refs”, reformat, hover, and typing completions do work.

Based on my interactions with the folks developing Clangd and the LSP spec, it feels they’re okay working with one client–Visual Studio Code–and with a bad spec. If you aren’t aware, MS announced Visual Studio online, with a screenshot that essentially showing the Visual Studio Code UI. It looks the end of days for Visual Studio IDE, and why there needs to be an LSP implementation for Antlr.

One further sidenote: LLVM and the build procedure have completely reorganized since the source code is now on Github.com. To do a build:

git clone --branch release/9.x https://github.com/llvm/llvm-project.git
mkdir build
cd build
cmake -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;clangd;clangd-xpc-support" -G "Visual Studio 16 2019" -Thost=x64 ..\llvm-project\llvm
msbuild LLVM.sln /p:Configuration=Debug /p:Platform=x64

Posted in Tip

Getting VS2019 working with the Eclipse Java LSP server

Continuing with my work in making Antlrvsix a Language Server Protocol server implementation, I created an extension for Visual Studio 2019 that uses the client API Microsoft.VisualStudio.LanguageServer.Client with the Eclipse Java Language Server Protocol implementation. This extension follows the steps outlined in an old article on creating LSP clients in Visual Studio. The extension source code is here. The code has been updated so that the only requirement is that you have the Java runtime downloaded and installed, and JAVA_HOME set to the top-level directory of the Java runtime. The code will prompt you for the path and warn you that it isn’t set properly.

Even as noted in the old MS documentation page, many of the client features are enabled, e.g., go to def, find all refs, reformat, hover, and typing completions. What is missing is building and debugging. But it is very usable.

Posted in Tip

A note on getting Gnu Emacs working with Omnisharp-Roslyn LSP server

After wasting a bit of time the last few days, I figured out how to get the Gnu Emacs editor to work with the Omnisharp-Roslyn LSP server for C#. Finding the right solution required a lot of trial and error work because I work mainly on Windows, and that is completely sacrilegious.

First off: Why Emacs? Sorry, but I grew up with Emacs and vi, starting in the early ’80s. Emacs was decent, and it still is. It comes in handy on the command line for a GUI-less Linux server.

There are two Gnu Emacs clients. One is written by OmniSharp before LSP existed, at https://github.com/OmniSharp/omnisharp-emacs . This client works fine, but the short instructions at http://www.omnisharp.net/#integrations are completely bogus. The other is https://github.com/emacs-lsp/lsp-mode/ . This is what I use below. Like the instructions for omnisharp-emacs, the instructions here are terrible. But, it does work. All operations, like “go to definition” are defined as functions that begin with “lsp-“, available through the M-x command, e.g., M-x lsp-find-references.

  1. Download Gnu Emacs from https://www.gnu.org/software/emacs/download.html , specifically the latest at http://ftp.wayne.edu/gnu/emacs/windows/ . Do not install MSYS2, run pacman as per instructions. While I could get MSYS2 working, the paths are all messed up due to filtering out of Windows binaries, to the point of being useless. In addition, I tried to use the Bash LSP server, but that required npm to be installed–which there isn’t in MSYS2 (though in Node.js, etc, etc, etc). Again, do not go down this road.
  2. Clone the repository https://github.com/OmniSharp/omnisharp-roslyn . Start VS2019, open the file OmniSharp.sln, then build the executable, and make a note of the full path to OmniSharp.exe.
  3. Create an init.el file for Emacs. It should be at c:/Users/etcetcetc/AppData/Roaming/.emacs.d/init.el. You can check the location by opening Gnu Emacs, then in the start-up screen, click on “Open Home Directory”. This file should contain this:
(require 'package)
 (let* ((no-ssl (and (memq system-type '(windows-nt ms-dos))
                     (not (gnutls-available-p))))
        (proto (if no-ssl "http" "https")))
   (when no-ssl
     (warn "\
 Your version of Emacs does not support SSL connections,
 which is unsafe because it allows man-in-the-middle attacks.
 There are two things you can do about this warning:
 Install an Emacs version that does support SSL and be safe.
 Remove this warning from your init file so you won't see it again."))
 ;; Comment/uncomment these two lines to enable/disable MELPA and MELPA Stable as desired
 (add-to-list 'package-archives (cons "melpa" (concat proto "://melpa.org/packages/")) t)
 ;;(add-to-list 'package-archives (cons "melpa-stable" (concat proto "://stable.melpa.org/packages/")) t)
 (when (< emacs-major-version 24)
 ;; For important compatibility libraries like cl-lib
 (add-to-list 'package-archives (cons "gnu" (concat proto "://elpa.gnu.org/packages/")))))
 (package-initialize) 
 ;; Added by Package.el.  This must come before configurations of
 ;; installed packages.  Don't delete this line.  If you don't want it,
 ;; just comment it out by adding a semicolon to the start of the line.
 ;; You may delete these explanatory comments.
 (package-initialize)
 ;;(custom-set-variables
  ;; custom-set-variables was added by Custom.
  ;; If you edit it by hand, you could mess it up, so be careful.
  ;; Your init file should contain only one such instance.
  ;; If there is more than one, they won't work right.
 ;; '(package-selected-packages (quote (omnisharp lsp-mode))))
 (custom-set-faces
  ;; custom-set-faces was added by Custom.
  ;; If you edit it by hand, you could mess it up, so be careful.
  ;; Your init file should contain only one such instance.
  ;; If there is more than one, they won't work right.
  '(default ((t (:family "Courier New" :foundry "outline" :slant normal :weight normal :height 161 :width normal)))))
 (package-refresh-contents)
 (package-install 'flycheck)
 (global-flycheck-mode)
 (require 'lsp-mode)
 (add-hook 'csharp-mode-hook #'lsp)
 ;;(package-install 'omnisharp)
 ;;(add-hook 'csharp-mode-hook 'omnisharp-mode)
 ;;
 ;;
 (defcustom lsp-clients-csharp-language-server-path
   (expand-file-name "c:/users/etcetcetc/documents/omnisharp-roslyn/bin/debug/omnisharp.stdio.driver/net472/OmniSharp.exe")
   "The path to the OmnisSharp Roslyn language-server."
   :group 'lsp-csharp
   :type '(string :tag "Single string value"))
 (custom-set-variables
  ;; custom-set-variables was added by Custom.
  ;; If you edit it by hand, you could mess it up, so be careful.
  ;; Your init file should contain only one such instance.
  ;; If there is more than one, they won't work right.
  '(package-selected-packages (quote (flycheck omnisharp lsp-mode))))

You will need to change the location of the Omnisharp-roslyn server executable which you built in step 2 in this init.el file.

Start Gnu Emacs and open a C# file and follow the instructions. All operations begin with M-x lsp-.

Posted in Tip

A comparison of Antlr grammars for parsing Java

This is an article about my ongoing research regarding the state of Antlr grammars for parsing Java. Some of the tests are taking weeks of computing time, so the results are preliminary.

Antlr is a popular LL(*) parser generator for recognizers of C#, Java, and many other programming languages. For Java, there are three grammars available on the Antlr grammar website: Java, Java8, and Java9. If you are a developer who hasn’t followed the maintenance history, it is unclear which grammar one should choose. Some of the changes that have been made to one grammar have not been applied to the other grammars. The basis for all grammars, however, is The Java Language Specification. Unfortunately, the latest available now is version 13 making all of them out of date.

The Java grammar was written by Parr originally with left recursion and common left-factors removed. (For additional information on left recursion removal, see this blog series.) The code has not been updated in the last year and apparently does accept anything more recent than Java 8. After the release of Antlr version 4, left recursion and common left-factors were allowed. The Java8 code was written by Harwell and Parr with this in mind. It has been last updated a few months ago. The Java9 code was forked by Chan from the Java8 grammar and has been last updated two years ago. It apparently accepts Java 9 source code. Both Java8 and Java9 are very slow.

Which brings us to the following questions: How well do these grammars perform? How well do they accept currently available open-source code? Is it possible to improve the performance of the grammars? Which one would be best to choose to update to the current specification?

In order to answer these questions, we studied the Java and Java9 grammars for performance and acceptance of several open-source projects.

Methods

1. Three grammars for Java that were tested:

2. These were tested against two runtime targets

  • Java
  • C#

3. The Java source code that was used for the tests:

  • http://hg.openjdk.java.net/jdk/jdk
    • 17234 files; 4816078 lines. (Using the Java grammar, 25349955 tokens; 38019443 tree nodes.
  • )https://github.com/AndroidSDKSources/android-sdk-sources-for-api-level-5.git
    • 11101 files; 2803192 lines. (Using the Java grammar, 16567260 tokens; 25457752 tree nodes in all parse trees generated.)
  • https://github.com/AndroidSDKSources/android-sdk-sources-for-api-level-29.git
    • 11473 files; 4175779 lines. (Using the Java grammar, 23624317 tokens; 35668427 tree nodes.)

4. Source and scripts for testing are here.

5. Machine 1: AMD Ryzen 7 2700 eight-core processor, ASRock B450 Gaming-ITX/ac, 16 GB DDR4-2666 (1333 MHz) memory; Antlr 4.7.2 tool and runtime; Visual Studio 2019; Java SE 11.0.4.

6. Machine 2: AMD Ryzen 3 2200G four-core processor, Gigabyte Technology Co. Ltd. B450M DS3H-CF, 16 GB DDR4 (1066 MHz) memory; Antlr 4.7.2 tool and runtime; Visual Studio 2019; Java SE 11.0.4. Used to count nodes in parse tree.

Preliminary results:

  • In the Java grammar, white space is shunted into the HIDDEN channel. In the Java9 grammar, white space is thrown away via the “skip” Antlr keyword. After adjusting the grammars and token types, Antlr generates lexers that produce the same token input stream from anecdotal evidence of a few large test cases.
  • A Java source that ran particularly slow for the Java9 grammar compared to the Java grammar was android-sdk-sources-for-api-level-5/com/android/phone/BluetoothHandsfree.java. This file produced a large number of parse tree nodes associated with expressions, e.g., primary, assignmentExpression, expression, conditionalExpression, conditionalOrExpression, conditionalAndExpression, exclusiveOrExpression, inclusiveOrExpression, andExpression, equalityExpression, relationalExpression, shiftExpression, additiveExpression, multiplicativeExpression, postfixExpression, unaryExpressionNotPlusMinus, unaryExpression, identifier, etc. The Java9 grammar disambiguates expression using the usual refactoring, but notably includes left-recursive productions for andExpression, equalityExpression, relationalExpression, shiftExpression, additiveExpression, etc. Although Antlr handles these productions, it it very expensive to use.
  • Running a Net Core program through dotnet.exe against a stub input program has a minimum runtime of 1.2s.
  • For Net Core, the Java9 grammar ran on average 40x’s slower compared to the Java grammar for the Android 5 source code.
  • There were not any differences in the parse trees constructed between C# and Java for the Java grammar.
  • Out of 39673 Java source files, 134 parsed with errors with the Java grammar.
  • The file android-sdk-sources-for-api-level-29/com/android/server/pm/ActivityManagerService.java parsed particularly slow with Java9: 18 m 50 s with C#!
  • C# tests of all three libraries of code:
    • Java/ grammar
      • android-sdk-sources-for-api-level-5 – 50m 10s
        • All 11101 files parsed:
          • No exceptions
          • 1 file parsed with errors.
      • android-sdk-sources-for-api-level-29 – 54m 7s
        • All 11473 files parsed:
          • No exceptions.
          • 47 files parsed with errors.
      • jdk – 1h 19m 1s
        • All 17234 files parsed:
          • One file parsed with exception.
          • 86 files parsed with errors.
    • Java9/ grammar
      • android-sdk-sources-for-api-level-5 – 20h 53m 59s
        • All 11101 files parsed:
          • No exceptions
          • 5 files parsed with errors.
      • android-sdk-sources-for-api-level-29 – 49h 20m 35s
        • All 11473 files parsed:
          • No exceptions.
          • 3 files parsed with errors.
      • jdk – 29h 55m 13s
        • All 17234 files parsed:
          • One file parsed with exception.
          • 3 files parsed with errors.
    • Java8/ grammar:
      • android-sdk-sources-for-api-level-5 – 14h 37m 06s
      • android-sdk-sources-for-api-level-29 – 20h 42m 28s
      • jdk – 31h 55m 37s
  • Java tests of all three libraries of code:
    • Java/ grammar
      • android-sdk-sources-for-api-level-5 – 1h 14m 27s
        • All 11101 files parsed:
          • No files with exceptions.
          • 1 file parsed with errors.
      • android-sdk-sources-for-api-level-29 – 1h 20m 13s
        • All 11473 files parsed:
          • No files with exceptions.
          • 47 files parsed with errors.
      • jdk – 1h 56m 39s
        • All 17234 files parsed:
          • One file parsed with exception.
          • 86 files parsed with errors.
    • Java9/ grammar
      • android-sdk-sources-for-api-level-5 – 9h 06m 22s
        • 11098 files parsed:
          • 3 files caused a complete crash uncaught.
          • No files parsed with exception.
          • 5 files parsed with errors.
      • android-sdk-sources-for-api-level-29 – 23h 51m 24s
        • 11448 files parsed:
          • 25 files caused a complete crash uncaught.
          • No files parsed with exception.
          • 3 files parsed with errors.
      • jdk – 14h 47m 9s
        • 17220 files parsed:
          • 14 files caused a complete crash uncaught.
          • 1 file parsed with exception.
          • 2 files parsed with errors.
  • More files parsed with the Java9 grammar than the Java grammar.
  • For ActivityManagerService.java, is the runtime proportional to the number of nodes in the parse tree? No!
    • Java grammar– 1.58 s, 243452 nodes
      • Top nodes:
        • 8497 blockStatement
        • 9331 statement
        • 21677 primary
        • 37843 expression
        • 50202 HIDDEN
        • 107849 TOKEN
    • Java9 grammar– 27 m 8 s, 559781 nodes.
      • Top nodes:
        • 8491 blockStatement
        • 9018 primaryNoNewArray_lfno_primary
        • 9074 statement
        • 9086 primary
        • 11487 expressionName
        • 14756 assignmentExpression
        • 14761 expression
        • 14915 conditionalExpression
        • 15133 conditionalOrExpression
        • 15485 conditionalAndExpression
        • 15535 exclusiveOrExpression
        • 15535 inclusiveOrExpression
        • 15614 andExpression
        • 16821 equalityExpression
        • 17332 shiftExpression
        • 17355 relationalExpression
        • 18758 additiveExpression
        • 18890 multiplicativeExpression
        • 19145 postfixExpression
        • 19289 unaryExpressionNotPlusMinus
        • 19311 unaryExpression
        • 34858 identifier
        • 50202 HIDDEN
        • 107849 TOKEN
    • 1.58 / 243452 = 6.5 x 10^-6 s / node.
      • 1.58 / 85401 = 1.9 x 10^-5 s / node.
    • 1628 / 559781 = 2.9 x 10^-3 s / node.
      • 1628 / 401730 = 4.1 x 10^-3 s / node.

Update November 11, 2019

Posted in Tip

Notes on Language Server Protocol

Surprise! I’ve been implementing something like the Language Server Protocol (LSP) with AntlrVSIX. The LSP is an abstraction for an editor (ed) using JSON as the lingua franca between the editor and a GUI client. The editor server: offers the persistence of code files; organizes files in a “workspace”; provides an API for edits, tagging, go to def, find all refs, reformat, code completion, defs for tooltips, etc. LSP is a step in the right direction because it separates an editor server backend from a GUI frontend, which is what AntlrVSIX is about.

What is not surprising is that the LSP handles things a little different than what I did. In AntlrVSIX, parse trees were shared with the GUI code. But LSP doesn’t share them, rightly so if you don’t want to swamp the link between GUI and server. But, LSP doesn’t seem to offer a “workspace” model that has nested “projects”, or attributes associated with a document. In VS, this information encodes whether a file participates in the build, compiler options, etc. LSP also hardwires the classes of symbols in a file–it can’t handle an Antlr grammar because there is no class for terminal or non-terminal symbols. In AntlrVSIX, classifications are strings, so it can represent anything.

The next release of AntlrVSIX is taking a while. I’ve been cleaning it up considerably and fixing bugs.

–Ken

Posted in Tip

Adding “workspaces” to AntlrVSIX

After a lot of hemming and hawing, I am now adding the concept of “workspaces” to AntlrVSIX. By this I mean an equivalent of workspaces that is defined in Roslyn. In AntlrVSIX, a Document will be a source file; a Project will be a collection of Documents; a program will be a Workspace, a collection of Projects. Properties on a Project or Document are copied to the equivalent AntlrVSIX object as a property list.

The reason for this is clear if you consider what constitutes a program for languages like C# or Java. But, Antlr has a similar issue: a grammar can be “imported” into another, and the scope of the grammar symbols is local to the project.

This required a lot of changes to AntlrVSIX, and will be released as v4.0. Although there aren’t many new features, it is a significant change nonetheless.

Note: I found that obtaining the property lists for the older csproj formats very, very slow, while the newer format much faster. For AntlrVSIX itself in VS 2019, it takes an additional 10 seconds if I query the property lists for the 10 projects contained in the solution. Investigating this with the profiler, I found that querying the Value of a property runs especially slow. I now only query the FullPath property and ignore the others. I’m planning on performing a lazy evaluation of EnvDTE properties for properties in general because the Microsoft code that performs the parsing and semantic analysis of build files is exceedingly slow.

–Ken

Posted in Tip

Updates to AntlrVSIX

I’ve been busy extending AntlrVSIX to include new features and correct bugs. Some of the things added with the 40 or so changes since the beginning of September are:

  • Persistent option settings;
  • Tagging of Java symbols using a symbol table;
  • Improved reformatter for languages;
  • Improved symbol click-on highlighter;
  • Improved co-existence with other extensions.

What is more important is that I’ve been settling on how to write attribute evaluation for the parse tree using Antlr’s Listener pattern, using a symbol table on the Java parse tree. If done in an unstructured, undisciplined, and ill informed manner, the code could become quite disastrous.

For example, I looked for “listeners in parse tree traversal” and Google Search found in the top answers Saumitra’s blog, Jakub Dziworski’s blog, MIT OpenCouseWare page on Antlr, Positive Technoligies page on Antlr, but there is no discussion of how this framework would be used in attribute evaluation, which is discussed nicely in these lecture notes. As I’m not that disciplined to write a system to declare the formal semantic equations and an evaluator of those equations, I’ve settled to make sure to try to set attributes for a node within the node’s listener (either “enter” or “exit”). Otherwise, I wouldn’t know which listener is setting what attribute for a node. This is “basic” compiler stuff, but “basic” information gets lost when people start using these tools.

All this is important, and will be eventually applied to the Piggy transformation system.

More folks seem to be using it, maybe because there is a link from the Antlr.org developer page to the extension.

–Ken

Posted in Tip

Adding in a symbol table into AntlrVSIX

I’m now working on adding in a full symbol table implementation into AntlrVSIX. This will help make the extension much more powerful with tasks such as tagging and navigating to defs.

For better or worse, I’m starting with the implementation in Antlr Symtab. It looks like it’ll work out pretty well, but it is missing certain classes, and needs a little polishing. For example, VariableSymbol is the base class for FieldSymbol and ParameterSymbol, but there isn’t a class for variables defined and referenced in a method body. This makes it somewhat difficult to distinguish between a field and a local variable in a block.

Also, for better or worse, I’ve added Mono.Cecil into the symbol table code. I’m not sure whether I’ll need this, but it’s there just in case. Mono contains a reader of PE files, which might be useful. But, what I’m really thinking of is the equivalent over in Java and other languages.

Posted in Tip

Extending AntlrVSIX with Rust; Adding a general wrapper for CLI tools to Msbuild

I’ve started two new tasks.

(1) I’m adding a Rust parser to AntlrVSIX. The grammar I’m using is an old grammar from Jason Orendorff. Unfortunately, it hasn’t been updated for several years. Also, it looks like it isn’t correct because there are cases where tokens are not correctly defined, e.g., ‘>’ ‘=’ instead of ‘>=’. It’s incorrect because the parser would allow a space between the ‘>’ and ‘=’ in the input, which is probably not intended. I’m first updating the grammar by separating the parser and lexer grammars, then I’ll add in rules recognized by another grammar.

(2) I’m really disappointed in the fact that no one seems to have written a general wrapper for command-line tools for Msbuild. I wrote a wrapper for the Antlr Java tool, but I’ve noticed that I’m going to have to write yet again another wrapper for Rust’s tools. I now realize that all of these wrappers can be generalized into a single wrapper with a specification of inputs and outputs. When I’m finished with the Rust grammar, I plan to write this tool, and rewrite the Antlr Java wrapper with it. It is no wonder why it seems everyone is using Visual Studio Code. I would too, but it’s written is the most awful of all languages, Javascript.

Posted in Tip