Survey of practical parsers
Last year, Phil Eaton wrote a nice survey of parsers for the most popular languages (1). And, recently there was a StackOverflow question asking why the parsers for Acorn and Babel don’t follow the ECMAScript Specification (2, 3). So, I thought I’d update Phil’s list with some additional notes of my own.
Despite years of research and development into parsing technologies, most parsers for programming languages are handwritten, using a recursive-descent design. Arguments in favor of handwritten parsers over ones generated from a grammar vary, but all assert handwritten parsers are better (4):
- “Handwritten parsers are faster.”
- “Handwritten parsers have better error reporting and recovery.”
- “Handwritten parsers are smaller.”
- “Handwritten parsers are easier to write.”
There’s a bit of evidence that generated parses aren’t as good as handwritten ones. But, the problem is that once the parser is handwritten, it’s hard to back out. Semantics usually enters the parser, and the grammar that the design follows is not documented. Even if you could, the parser design does not look anything like the grammar because it was “refactored” to have common string prefix grouping. And, whether they follow a spec or not, that is entirely up to the developer.
Survey of programming language parsers
- JavaScript
- Acorn – handwritten in JavaScript
- Babel – handwritten in TypeScript
- V8 – handwritten, used in Node and Chrome.
- WebKit – hadwritten
- Spec
- Java
- OpenJDK – handwritten
- Spec
- Typescript
- Implementation – Handwritten
- Specification – was documented, but no longer is
Reverse engineering a grammar from an implementation
One could derive a context-free grammar from the implementation, but they rarely look like those described by a spec.
Examples of specified grammar vs implementation
From the ECMAScript specification:
ExportDeclaration :
'export' ExportFromClause FromClause ';'
'export' NamedExports ';'
'export' VariableStatement
'export' Declaration
'export' 'default' HoistableDeclaration
'export' 'default' ClassDeclaration
'export' 'default' [lookahead ? { function , async [no LineTerminator here] function , class }] AssignmentExpression
ExportFromClause :
'*'
'*' 'as' ModuleExportName
NamedExports
NamedExports :
{ }
{ ExportsList}
{ ExportsList, }
ExportsList :
ExportSpecifier
ExportsList ',' ExportSpecifier
ExportSpecifier :
ModuleExportName
ModuleExportNameas ModuleExportName
Acorn
Babel
V8
WebKit
References
(1) Parser generators vs. handwritten parsers: surveying major language implementations in 2021. Accessed Aug 1, 2022. (2) StackOverflow question. Accessed Aug 1, 2022. (3) ECMAScript Specification. (4) Jonge, Maartje de, Emma Nilsson-Nyman, Lennart CL Kats, and Eelco Visser. “Natural and flexible error recovery for generated parsers.” In International Conference on Software Language Engineering, pp. 204-223. Springer, Berlin, Heidelberg, 2009.