Comparison of parser generators

This is a list of notable lexer generators and parser generators for various language classes.

Regular languages

Regular languages are a category of languages (sometimes termed Chomsky Type 3) which can be matched by a state machine (more specifically, by a deterministic finite automaton or a nondeterministic finite automaton) constructed from a regular expression. In particular, a regular language can match constructs like "A follows B", "Either A or B", "A, followed by zero or more instances of B", but cannot match constructs which require consistency between non-adjacent elements, such as "some instances of A followed by the same number of instances of B", and also cannot express the concept of recursive "nesting" ("every A is eventually followed by a matching B"). A classic example of a problem which a regular grammar cannot handle is the question of whether a given string contains correctly-nested parentheses. (This is typically handled by a Chomsky Type 2 grammar, also termed a context-free grammar.)

NameLexer algorithmOutput languagesGrammar, codeDevelopment platformLicense
AlexDFAHaskellMixedAllFree, BSD
AnnoFlexDFAJavaMixedJava virtual machineFree, BSD
AustenXDFAJavaSeparateAllFree, BSD
Booze-toolsDFAstate machine is runtime-generated or saved as JSONMixedPythonFree, public domain
C# FlexDFAC#Mixed.NET CLRFree, GNU GPL
C# LexDFAC#Mixed.NET CLR?
CookCCDFAJavaMixedJava virtual machineFree, Apache 2.0
DFAlexDFAno code generation requiredJavaJavaFree, Apache 2.0
DolphinDFAC++SeparateAllProprietary
FlexDFA table drivenC, C++MixedAllFree, BSD
gelexDFAEiffelMixedEiffelFree, MIT
golexDFAGoMixedGoFree, BSD-style
gplexDFAC#Mixed.NET CLRFree, BSD-like
JFlexDFAJavaMixedJava virtual machineFree, BSD
JLexDFAJavaMixedJava virtual machineFree, BSD-like
lexDFACMixedPOSIXPartial, proprietary, CDDL
lexertlDFAC++?AllFree, GNU LGPL
LRSTARDFAC++SeparateWindowsFree, BSD
QuexDFA direct codeC, C++MixedAllFree, GNU LGPL
RagelDFAC, C++, assemblyMixedAllFree, GNU GPL, MIT[1][1]
RE/flexDFA direct code, DFA table driven, and NFA regex librariesC++MixedAllFree, BSD
re2cDFA direct codeCMixedAllFree, public domain

Deterministic context-free languages

Context-free languages are a category of languages (sometimes termed Chomsky Type 2) which can be matched by a sequence of replacement rules, each of which essentially maps each non-terminal element to a sequence of terminal elements and/or other nonterminal elements. Grammars of this type can match anything that can be matched by a regular grammar, and furthermore, can handle the concept of recursive "nesting" ("every A is eventually followed by a matching B"), such as the question of whether a given string contains correctly-nested parentheses. The rules of Context-free grammars are purely local, however, and therefore cannot handle questions that require non-local analysis such as "Does a declaration exist for every variable that is used in a function?". To do so technically would require a more sophisticated grammar, like a Chomsky Type 1 grammar, also termed a context-sensitive grammar. However, parser generators for context-free grammars often support the ability for user-written code to introduce limited amounts of context-sensitivity. (For example, upon encountering a variable declaration, user-written code could save the name and type of the variable into an external data structure, so that these could be checked against later variable references detected by the parser.)

The deterministic context-free languages are a proper subset of the context-free languages which can be efficiently parsed by deterministic pushdown automata.

NameParsing algorithmInput grammar notationOutput languagesGrammar, codeLexerDevelopment platformIDELicense
ANTLR4ALL(*)[2]EBNFC#, Java, Python, JavaScript, C++, Swift, Go, PHPMixedgeneratedJava virtual machineYesFree, BSD
ANTLR3LL(*)EBNFActionScript, Ada95, C, C++, C#, Java, JavaScript, Objective-C, Perl, Python, RubyMixedgeneratedJava virtual machineYesFree, BSD
APGRecursive descent, backtrackingABNFC, C++, JavaScript, JavaSeparatenoneAllNoFree, GNU GPL
AXERecursive descentAXE/C++C++17, C++11MixednoneAny with C++17 or C++11 standard compilerNoFree, Boost
BeaverLALR(1)EBNFJavaMixedexternalJava virtual machineNoFree, BSD
BelrRecursive descentABNFC++17, C++11SeparateincludedPOSIXNoPartial, GNU GPL, proprietary
BisonLALR(1), LR(1), IELR(1), GLRYaccC, C++, JavaMixedexternalAllNoFree, GNU GPL with exception
Bison++[note 1]LALR(1)?C++MixedexternalPOSIXNoFree, GNU GPL
Bisonc++LALR(1)?C++MixedexternalPOSIXNoFree, GNU GPL
Booze-toolsLALR(1) or LR(1) canonical or minimalBNF with macros in place of EBNFState machine can be runtime-generated or saved as JSONMixed, separableincludedPythonNoFree, public domain
BtYaccBacktracking Bottom-up?C++MixedexternalAllNoFree, public domain
byaccLALR(1)YaccCMixedexternalAllNoFree, public domain
BYACC/JLALR(1)YaccC, JavaMixedexternalAllNoFree, public domain
CL-YaccLALR(1)LispCommon LispMixedexternalAllNoFree, MIT
Coco/RLL(1)EBNFC, C++, C#, F#, Java, Ada, Object Pascal, Delphi, Modula-2, Oberon, Ruby, Swift, Unicon, Visual Basic .NETMixedgeneratedJava virtual machine, .NET Framework, Windows, POSIX (depends on output language)NoFree, GNU GPL
CookCCLALR(1)Java annotationsJavaMixedgeneratedJava virtual machineNoFree, Apache 2.0
CppCCLL(k)?C++MixedgeneratedPOSIXNoFree, GNU GPL
CSPLR(1)?C++SeparategeneratedPOSIXNoFree, Apache 2.0
CUPLALR(1)?JavaMixedexternalJava virtual machineNoFree, BSD-like
DragonLR(1), LALR(1)?C++, JavaSeparategeneratedAllNoFree, GNU GPL
eliLALR(1)?CMixedgeneratedPOSIXNoFree, GNU GPL, GNU LGPL
Tunnel Grammar StudioRecursive descent, backtrackingABNFC++SeparategeneratedWindowsYesProprietary
EssenceLR(?)?Scheme 48MixedexternalAllNoFree, BSD
Eto.ParseLL(k)BNF, EBNF or C#N/A (state machine is runtime generated)Separateinternal.NET FrameworkNoFree, MIT
eyappLALR(1)?PerlMixedexternal or generatedAllNoFree, Artistic
FrownLALR(k)?Haskell 98MixedexternalAllNoFree, GNU GPL
geyaccLALR(1)?EiffelMixedexternalAllNoFree, MIT
GOLDLALR(1)BNFx86 assembly language, ANSI C, C#, D, Java, Pascal, Object Pascal, Python, Visual Basic 6, Visual Basic .NET, Visual C++SeparategeneratedWindowsYesFree, zlib modified
GPPGLALR(1)YaccC#SeparateexternalWindowsYesFree, BSD
GrammaticaLL(k)BNF dialectC#, JavaSeparategeneratedJava virtual machineNoFree, BSD
HiLexedLL(*)EBNF or JavaJavaSeparateinternalJava virtual machineNoFree, GNU LGPL
Hime Parser GeneratorLALR(1), GLRBNF dialectC#, Java, RustSeparategenerated.NET Framework, Java virtual machineNoFree, GNU LGPL
HyaccLR(1), LALR(1), LR(0)YaccCMixedexternalAllNoFree, GNU GPL
IronyLALR(1)C#N/A (state machine is runtime generated)Separateinternal.NET FrameworkYesFree, MIT
iyaccLALR(1)YaccIconMixedexternalAllNoFree, GNU LGPL
jaccLALR(1)?JavaMixedexternalJava virtual machineNoFree, BSD
JavaCCLL(k)EBNFJava, C++, JavaScript (via GWT compiler)[3]MixedgeneratedJava virtual machineYesFree, BSD
jayLALR(1)YaccC#, JavaMixednoneJava virtual machineNoFree, BSD
JFLAPLL(1), LALR(1)?Java??Java virtual machineYes?
JetPAGLL(k)?C++MixedgeneratedAllNoFree, GNU GPL
JS/CCLALR(1)EBNFJavaScript, JScript, ECMAScriptMixedinternalAllYesFree, BSD
KDevelop-PG-QtLL(1), backtracking, shunting-yard?C++Mixedgenerated or externalAll, KDENoFree, GNU LGPL
KelbtBacktracking LALR(1)?C++MixedgeneratedPOSIXNoFree, GNU GPL
kmyaccLALR(1)?C, Java, Perl, JavaScriptMixedexternalAllNoFree, GNU GPL
LapgLALR(1)?C, C++, C#, Java, JavaScriptMixedgeneratedJava virtual machineNoFree, GNU GPL
LarkLALR(1), Earley, CYKEBNFPython (no generation, library)SeparatenoneAllNoFree, MIT
LemonLALR(1)?CMixedexternalAllNoFree, public domain
LEPLRecursive descentPythonPython (no generation, library)SeparatenoneAllNoFree, MPL, GNU LGPL
LimeLALR(1)?PHPMixedexternalAllNoFree, GNU GPL
LISALR(?), LL(?), LALR(?), SLR(?)?JavaMixedgeneratedJava virtual machineYesFree, public domain
LLgenLL(1)?CMixedexternalPOSIXNoFree, BSD
LLnextgenLL(1)?CMixedexternalAllNoFree, GNU GPL
LLLPGLL(k) + syntactic and semantic predicatesANTLR-likeC#Mixedgenerated (?).NET Framework, MonoVisual StudioFree, GNU LGPL
LPGBacktracking LALR(k)?JavaMixedgeneratedJava virtual machineNoFree, EPL
LRSTARLALR(1), LR(1), LR(*)EBNF, Yacc-likeC++SeparategeneratedWindowsVisual StudioFree, BSD
MenhirLR(1)?OCamlMixedgeneratedAllNoFree, QPL
ML-YaccLALR(1)?MLMixedexternalAllNo?
MonkeyLR(1)?JavaSeparategeneratedJava virtual machineNoFree, GNU GPL
MstaLALR(k), LR(k)YACC, EBNFC, C++Mixedexternal or generatedPOSIX, CygwinNoFree, GNU GPL
MTP (More Than Parsing)LL(1)?JavaSeparategeneratedJava virtual machineNoFree, GNU GPL
MyParserLL(*)MarkdownC++11SeparateinternalAny with standard C++11 compilerNoFree, MIT
NLTGLRC#/BNF-likeC#Mixedmixed.NET FrameworkNoFree, MIT
ocamlyaccLALR(1)?OCamlMixedexternalAllNoFree, QPL
olexLL(1)?C++MixedgeneratedAllNoFree, GNU GPL
parglareScannerless LALR(1)/SLR(1)/GLRBNF-like, PythonN/A (state machine is runtime generated)MixednoneAllNoFree, MIT
ParsecLL, backtrackingHaskellHaskellMixednoneAllNoFree, BSD
Parse::YappLALR(1)?PerlMixedexternalAllNoFree, GNU GPL
Parser ObjectsLL(k)?JavaMixed?Java virtual machineNoFree, zlib
PCCTSLL?C, C++??AllNo?
PLYLALR(1)BNFPythonMixedgeneratedAllNoFree, MIT
PlyPlusLALR(1)EBNFPythonSeparategeneratedAllNoFree, MIT
PRECCLL(k)?CSeparategeneratedDOS, POSIXNoFree, GNU GPL
QLALRLALR(1)?C++MixedexternalAllNoFree, GNU GPL
RPATKRecursive descent, backtrackingBNFC (no generation, library)SeparatenoneAllNoFree, GNU GPL
SableCCLALR(1)?C, C++, C#, Java, OCaml, PythonSeparategeneratedAllNoFree, GNU LGPL
SLK[4]LL(k) LR(k) LALR(k)EBNFC, C++, C#, Java, JavaScriptSeparateexternalAllNoSLK[5]
SP (Simple Parser)Recursive descentPythonPythonSeparategeneratedAllNoFree, GNU LGPL
SpiritRecursive descent?C++MixedinternalAllNoFree, Boost
SpracheLL, backtrackingC#interpretedMixedinternal.NET FrameworkNoFree, MIT
StyxLALR(1)?C, C++SeparategeneratedAllNoFree, GNU LGPL
Sweet ParserLALR(1)?C++SeparategeneratedWindowsNoFree, zlib
TapLL(1)?C++MixedgeneratedAllNoFree, GNU GPL
TextTransformerLL(k)?C++MixedgeneratedWindowsYesProprietary
TinyPGLL(1)?C#, Visual Basic??WindowsYesPartial, CPOL 1.0
Toy Parser GeneratorRecursive descent?PythonMixedgeneratedAllNoFree, GNU LGPL
TP YaccLALR(1)?Turbo PascalMixedexternalAllYesFree, GNU GPL
UltraGramLALR(1), LR(1), GLRBNFC++, Java, C#, Visual Basic .NETSeparateexternalWindowsYesFree, public domain
UniCCLALR(1)EBNFC, C++, Python, JavaScript, JSON, XMLMixedgeneratedPOSIXNoFree, BSD
UrchinCCLL(1)?Java?generatedJava virtual machineNo?
WhaleLR(?), some conjunctive stuff, see Whale Calf?C++MixedexternalAllNoProprietary
wisentLALR(1)?C++, JavaMixedexternalAllNoFree, GNU GPL
Yacc AT&T/SunLALR(1)YaccCMixedexternalPOSIXNoFree, CPL & CDDL
Yacc++LR(1), LALR(1)YaccC++, C#Mixedgenerated or externalAllNoProprietary
YappsLL(1)?PythonMixedgeneratedAllNoFree, MIT
yeccLALR(1)?ErlangSeparategeneratedAllNoFree, Apache 2.0
Visual BNFLR(1), LALR(1)?C#Separategenerated.NET FrameworkYesProprietary
YooParseLR(1), LALR(1)?C++MixedexternalAllNoFree, MIT
ParseLR(1)BNF in C++ types??noneC++11 standard compilerNoFree, MIT
GGLLLL(1)GraphJavaMixedgeneratedWindowsYesFree, MIT
ProductParsing algorithmInput grammar notationOutput languagesGrammar, codeLexerDevelopment platformIDELicense

Parsing expression grammars, deterministic boolean grammars

This table compares parser generators with parsing expression grammars, deterministic boolean grammars.

NameParsing algorithmOutput languagesGrammar, codeDevelopment platformLicense
ArpeggioPEG parser interpreter, PackratPython (no generation, interpreted)MixedAllFree, MIT
AustenXPackrat (modified)JavaSeparateAllFree, BSD
AurochsPackratC, OCaml, JavaMixedAllFree, GNU GPL
BNFliteRecursive descentC++MixedAllFree, MIT
CanopyPackratJava, JavaScript, Python, RubySeparateAllFree, GNU GPL
CL-pegPackratCommon LispMixedAllFree, MIT
Drat!PackratDMixedAllFree, GNU GPL
FrisbyPackratHaskellMixedAllFree, BSD
grammar::pegPackratTclMixedAllFree, BSD
GrakoPackrat + Cut + Left RecursionPython, C++ (beta)SeparateAllFree, BSD
IronMetaPackratC#MixedWindowsFree, BSD
KatahdinPackrat (modified), mutating interpreterC#MixedAllFree, public domain
Laja2-phase scannerless top-down backtracking + runtime supportJavaSeparateAllFree, GNU GPL
lars::ParserPackrat (supporting left-recursion and grammar ambiguity)C++IdenticalAllFree, BSD
LPegParsing machineLuaMixedAllFree, MIT
lugParsing machineC++17MixedAllFree, MIT
MouseRecursive descentJavaSeparateJava virtual machineFree, Apache 2.0
NarwhalPackratCMixedPOSIX, WindowsFree, BSD
NearleyEarleyJavaScriptMixedAllFree, MIT
Nemerle.PegRecursive descent + PrattNemerleSeparateAllFree, BSD
neotomaPackratErlangSeparateAllFree, MIT
NPEGRecursive descentC#MixedAllFree, MIT
OMetaPackrat (modified, partial memoization)JavaScript, Squeak, PythonMixedAllFree, MIT
PackCCPackrat (modified)CMixedAllFree, MIT
PackratPackratSchemeMixedAllFree, MIT
PappyPackratHaskellMixedAllFree, BSD
parboiledRecursive descentJava, ScalaMixedJava virtual machineFree, Apache 2.0
Lambda PEGRecursive descentJavaMixedJava virtual machineFree, Apache 2.0
parseppRecursive descentC++MixedAllFree, public domain
ParsnipPackratC++MixedWindowsFree, GNU GPL
pegRecursive descentCMixedAllFree, MIT
PEG.jsPackrat (partial memoization)JavaScriptMixedAllFree, MIT
peg-parserPEG parser interpreterDylanSeparateAll?
PegasusRecursive descent, Packrat (selectively)C#MixedWindowsFree, MIT
pegcRecursive descentCMixedAllFree, public domain
pestRecursive descentRustSeparateAllFree, MPL
PetitParserPackratSmalltalk, Java, DartMixedAllFree, MIT
PEGTLRecursive descentC++11MixedAllFree, MIT
Parser Grammar Engine (PGE)Hybrid recursive descent / operator precedence[6]Parrot bytecodeMixedParrot virtual machineFree, Artistic 2.0
PyPy rlibPackratPythonMixedAllFree, MIT
pyPEGPEG parser interpreter, PackratPythonMixedAllFree, GNU GPL
Rats!PackratJavaMixedJava virtual machineFree, GNU LGPL
Spirit2Recursive descentC++MixedAllFree, Boost
textXPEG parser interpreter, PackratPython (no generation, interpreted)SeparateAllFree, MIT
TreetopRecursive descentRubyMixedAllFree, MIT
YardRecursive descentC++MixedAllFree, MIT or public domain
WaxeyeParsing machineC, Java, JavaScript, Python, Racket, RubySeparateAllFree, MIT
PHP PEGPEG Parser?PHPMixedAllFree, BSD

General context-free, conjunctive, or boolean languages

This table compares parser generator languages with a general context-free grammar, a conjunctive grammar, or a boolean grammar.

NameParsing algorithmInput grammar notationOutput languagesGrammar, codeLexerDevelopment platformIDELicense
ACCENTEarleyYacc variantCMixedexternalAllNoFree, GNU GPL
APaGeDGLR, LALR(1), LL(k)?DMixedgeneratedAllNoFree, Artistic
BisonLALR(1), LR(1), IELR(1), GLRYaccC, C++, Java, XMLMixed, except XMLexternalAllNoFree, GNU GPL
DMS Software Reengineering ToolkitGLR?ParlanseMixedgeneratedWindowsNoProprietary
DParserScannerless GLR?CMixedscannerlessPOSIXNoFree, BSD
DypgenRuntime-extensible GLR?OCamlMixedgeneratedAllNoFree, CeCILL-B
E3Earley?OCamlMixedexternal, or scannerlessAllNo?
ElkhoundGLR?C++, OCamlMixedexternalAllNoFree, BSD
eu.h8me.ParsingGLR?N/A (state machine is runtime generated)Separateexternal.NET FrameworkNoFree, BSD
GDKLALR(1), GLR?C, Lex, Haskell, HTML, Java, Object Pascal, YaccMixedgeneratedPOSIXNoFree, MIT
HappyLALR, GLR?HaskellMixedexternalAllNoFree, BSD
Hime Parser GeneratorGLR?C#, Java, RustSeparategenerated.NET Framework, Java virtual machineNoFree, GNU LGPL
IronText LibraryLALR(1), GLRC#C#Mixedgenerated or external.NET FrameworkNoFree, Apache 2.0
JisonLALR(1), LR(0), SLR(1)YaccJavaScript, C#, PHPMixedgeneratedAllNoFree, MIT
SyntaxLALR(1), LR(0), SLR(1) CLR(1) LL(1)JSON/YaccJavaScript, Python, PHP, Ruby, C#, Rust, JavaMixedgeneratedAllNoFree, MIT
LajaScannerless, two phaseLajaJavaSeparatescannerlessAllNoFree, GNU GPL
ModelCCEarleyAnnotated class modelJavaGeneratedgeneratedAllNoFree, BSD
parglareScannerless LR/GLRBNF-likePython interpreted, automata run-time generatedMixedscannerlessAllNoFree, MIT
P1CombinatorsBNF-likeOCamlMixedexternal, or scannerlessAllNo?
P3Earley–combinatorsBNF-likeOCamlMixedexternal, or scannerlessAllNo?
P4Earley–combinators, infinitary CFGsBNF-likeOCamlMixedexternal, or scannerlessAllNo?
Scannerless Boolean ParserScannerless GLR (Boolean grammars)?Haskell, JavaSeparatescannerlessJava virtual machineNoFree, BSD
SDF/SGLRScannerless GLRSDFC, JavaSeparatescannerlessAllYesFree, BSD
SmaCCGLR(1), LALR(1), LR(1)?SmalltalkMixedinternalAllYesFree, MIT
SPARKEarley?PythonMixedexternalAllNoFree, MIT
TomGLR?CGeneratednoneAllNoFree, "No licensing or copyright restrictions"
UltraGramLALR, LR, GLR?C++, C#, Java, Visual Basic .NETSeparategeneratedWindowsYesProprietary
WormholePruning, LR, GLR, Scannerless GLR?C, PythonMixedscannerlessWindowsNoFree, MIT
Whale CalfGeneral tabular, SLL(k), Linear normal form (conjunctive grammars), LR, Binary normal form (Boolean grammars)?C++SeparateexternalAllNoProprietary
yaepEarleyYacc-likeCMixedexternalAllNoFree, GNU LGPL
ZeccRecursive pattern matchingZecc/ZaccLinkable libraryMixedScannerlessmacOSYesProprietary

Context-sensitive grammars

This table compares parser generators with context-sensitive grammars.

NameParsing algorithmInput grammar notationBoolean grammar abilitiesDevelopment platformLicense
LuZc[7][8]delta chainmodularConjunctive, not complimentaryPOSIXProprietary
bnf2xmlRecursive descent (is a text filter output is xml)simple BNF grammar (input matching), output is xml?Beta, and not a full EBNF parserFree, GNU GPL

See also

References

  1. http://www.colm.net/open-source/ragel/
  2. "Adaptive LL(*) Parsing: The Power of Dynamic Analysis" (PDF). Terence Parr. Retrieved 2016-04-03.
  3. "Building parsers for the web with JavaCC & GWT (Part one)". Chris Ainsley. Retrieved 2014-05-04.
  4. "The SLK Parser Generator supports C, C++, Java, JavaScript, and C#, optional backtracking, free".
  5. http://www.H8dems.com/license.txt
  6. "Parrot: Grammar Engine". The Parrot Foundation. 2011. PGE rules provide the full power of recursive descent parsing and operator precedence parsing.
  7. "LuZ: A context sensitive parser". 2016-10-17. Archived from the original on 2016-10-17. Retrieved 2018-10-17.
  8. "LuZc – A conjunctive context-sensitive parser". luzc.zohosites.com. Retrieved 2018-10-17.

Notes

  1. Bison 1.19 fork
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.