Last year, Phil Eaton wrote a nice survey of parsers for the most popular languages (1). And, recently there was a StackOverflow question asking why the parsers for Acorn and Babel don’t follow the ECMAScript Specification (2, 3). So, I thought I’d update Phil’s list with some additional notes of my own.

Despite years of research and development into parsing technologies, most parsers for programming languages are handwritten, using a recursive-descent design. Arguments in favor of handwritten parsers over ones generated from a grammar vary, but all assert handwritten parsers are better (4):

  • “Handwritten parsers are faster.”
  • “Handwritten parsers have better error reporting and recovery.”
  • “Handwritten parsers are smaller.”
  • “Handwritten parsers are easier to write.”

There’s a bit of evidence that generated parses aren’t as good as handwritten ones. But, the problem is that once the parser is handwritten, it’s hard to back out. Semantics usually enters the parser, and the grammar that the design follows is not documented. Even if you could, the parser design does not look anything like the grammar because it was “refactored” to have common string prefix grouping. And, whether they follow a spec or not, that is entirely up to the developer.

Survey of programming language parsers

Reverse engineering a grammar from an implementation

One could derive a context-free grammar from the implementation, but they rarely look like those described by a spec.

Examples of specified grammar vs implementation

From the ECMAScript specification:

ExportDeclaration :
    'export' ExportFromClause FromClause ';'
    'export' NamedExports ';'
    'export' VariableStatement
    'export' Declaration
    'export' 'default' HoistableDeclaration
    'export' 'default' ClassDeclaration
    'export' 'default' [lookahead ? { function , async [no LineTerminator here] function , class }] AssignmentExpression
ExportFromClause :
    '*'
    '*' 'as' ModuleExportName
    NamedExports
NamedExports :
    { }
    { ExportsList}
    { ExportsList, }
ExportsList :
    ExportSpecifier
    ExportsList ',' ExportSpecifier
ExportSpecifier :
    ModuleExportName
    ModuleExportNameas ModuleExportName

Acorn

Babel

V8

WebKit

References

(1) Parser generators vs. handwritten parsers: surveying major language implementations in 2021. Accessed Aug 1, 2022. (2) StackOverflow question. Accessed Aug 1, 2022. (3) ECMAScript Specification. (4) Jonge, Maartje de, Emma Nilsson-Nyman, Lennart CL Kats, and Eelco Visser. “Natural and flexible error recovery for generated parsers.” In International Conference on Software Language Engineering, pp. 204-223. Springer, Berlin, Heidelberg, 2009.