| @@ -26,7 +26,7 @@ Most importantly, Lark will save you time and prevent you from getting parsing h | |||||
| - [Documentation @readthedocs](https://lark-parser.readthedocs.io/) | - [Documentation @readthedocs](https://lark-parser.readthedocs.io/) | ||||
| - [Cheatsheet (PDF)](/docs/_static/lark_cheatsheet.pdf) | - [Cheatsheet (PDF)](/docs/_static/lark_cheatsheet.pdf) | ||||
| - [Online IDE (very basic)](https://lark-parser.github.io/lark/ide/app.html) | |||||
| - [Online IDE](https://lark-parser.github.io/ide) | |||||
| - [Tutorial](/docs/json_tutorial.md) for writing a JSON parser. | - [Tutorial](/docs/json_tutorial.md) for writing a JSON parser. | ||||
| - Blog post: [How to write a DSL with Lark](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) | - Blog post: [How to write a DSL with Lark](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) | ||||
| - [Gitter chat](https://gitter.im/lark-parser/Lobby) | - [Gitter chat](https://gitter.im/lark-parser/Lobby) | ||||
| @@ -66,6 +66,8 @@ UnexpectedInput | |||||
| .. autoclass:: lark.exceptions.UnexpectedCharacters | .. autoclass:: lark.exceptions.UnexpectedCharacters | ||||
| .. autoclass:: lark.exceptions.UnexpectedEOF | |||||
| InteractiveParser | InteractiveParser | ||||
| ----------------- | ----------------- | ||||
| @@ -113,7 +113,7 @@ Resources | |||||
| .. _Examples: https://github.com/lark-parser/lark/tree/master/examples | .. _Examples: https://github.com/lark-parser/lark/tree/master/examples | ||||
| .. _Third-party examples: https://github.com/ligurio/lark-grammars | .. _Third-party examples: https://github.com/ligurio/lark-grammars | ||||
| .. _Online IDE: https://lark-parser.github.io/lark/ide/app.html | |||||
| .. _Online IDE: https://lark-parser.github.io/ide | |||||
| .. _How to write a DSL: http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/ | .. _How to write a DSL: http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/ | ||||
| .. _Program Synthesis is Possible: https://www.cs.cornell.edu/~asampson/blog/minisynth.html | .. _Program Synthesis is Possible: https://www.cs.cornell.edu/~asampson/blog/minisynth.html | ||||
| .. _Cheatsheet (PDF): _static/lark_cheatsheet.pdf | .. _Cheatsheet (PDF): _static/lark_cheatsheet.pdf | ||||
| @@ -427,9 +427,9 @@ I measured memory consumption using a little script called [memusg](https://gist | |||||
| | Lark - Earley *(with lexer)* | 42s | 4s | 1167M | 608M | | | Lark - Earley *(with lexer)* | 42s | 4s | 1167M | 608M | | ||||
| | Lark - LALR(1) | 8s | 1.53s | 453M | 266M | | | Lark - LALR(1) | 8s | 1.53s | 453M | 266M | | ||||
| | Lark - LALR(1) tree-less | 4.76s | 1.23s | 70M | 134M | | | Lark - LALR(1) tree-less | 4.76s | 1.23s | 70M | 134M | | ||||
| | PyParsing ([Parser](http://pyparsing.wikispaces.com/file/view/jsonParser.py)) | 32s | 3.53s | 443M | 225M | | |||||
| | funcparserlib ([Parser](https://github.com/vlasovskikh/funcparserlib/blob/master/funcparserlib/tests/json.py)) | 8.5s | 1.3s | 483M | 293M | | |||||
| | Parsimonious ([Parser](https://gist.githubusercontent.com/reclosedev/5222560/raw/5e97cf7eb62c3a3671885ec170577285e891f7d5/parsimonious_json.py)) | ? | 5.7s | ? | 1545M | | |||||
| | PyParsing ([Parser](https://github.com/pyparsing/pyparsing/blob/master/examples/jsonParser.py)) | 32s | 3.53s | 443M | 225M | | |||||
| | funcparserlib ([Parser](https://github.com/vlasovskikh/funcparserlib/blob/master/tests/json.py)) | 8.5s | 1.3s | 483M | 293M | | |||||
| | Parsimonious ([Parser](https://gist.github.com/reclosedev/5222560)) | ? | 5.7s | ? | 1545M | | |||||
| I added a few other parsers for comparison. PyParsing and funcparselib fair pretty well in their memory usage (they don't build a tree), but they can't compete with the run-time speed of LALR(1). | I added a few other parsers for comparison. PyParsing and funcparselib fair pretty well in their memory usage (they don't build a tree), but they can't compete with the run-time speed of LALR(1). | ||||
| @@ -442,7 +442,7 @@ Once again, shout-out to PyPy for being so effective. | |||||
| This is the end of the tutorial. I hoped you liked it and learned a little about Lark. | This is the end of the tutorial. I hoped you liked it and learned a little about Lark. | ||||
| To see what else you can do with Lark, check out the [examples](examples). | |||||
| To see what else you can do with Lark, check out the [examples](/examples). | |||||
| For questions or any other subject, feel free to email me at erezshin at gmail dot com. | For questions or any other subject, feel free to email me at erezshin at gmail dot com. | ||||
| @@ -107,3 +107,8 @@ Discard | |||||
| ------- | ------- | ||||
| .. autoclass:: lark.visitors.Discard | .. autoclass:: lark.visitors.Discard | ||||
| VisitError | |||||
| ------- | |||||
| .. autoclass:: lark.exceptions.VisitError | |||||
| @@ -21,7 +21,7 @@ decorators: decorator+ | |||||
| decorated: decorators (classdef | funcdef | async_funcdef) | decorated: decorators (classdef | funcdef | async_funcdef) | ||||
| async_funcdef: "async" funcdef | async_funcdef: "async" funcdef | ||||
| funcdef: "def" NAME "(" parameters? ")" ["->" test] ":" suite | |||||
| funcdef: "def" NAME "(" [parameters] ")" ["->" test] ":" suite | |||||
| parameters: paramvalue ("," paramvalue)* ["," SLASH] ["," [starparams | kwparams]] | parameters: paramvalue ("," paramvalue)* ["," SLASH] ["," [starparams | kwparams]] | ||||
| | starparams | | starparams | ||||
| @@ -29,25 +29,36 @@ parameters: paramvalue ("," paramvalue)* ["," SLASH] ["," [starparams | kwparams | |||||
| SLASH: "/" // Otherwise the it will completely disappear and it will be undisguisable in the result | SLASH: "/" // Otherwise the it will completely disappear and it will be undisguisable in the result | ||||
| starparams: "*" typedparam? ("," paramvalue)* ["," kwparams] | starparams: "*" typedparam? ("," paramvalue)* ["," kwparams] | ||||
| kwparams: "**" typedparam | |||||
| kwparams: "**" typedparam ","? | |||||
| ?paramvalue: typedparam ["=" test] | |||||
| ?typedparam: NAME [":" test] | |||||
| ?paramvalue: typedparam ("=" test)? | |||||
| ?typedparam: NAME (":" test)? | |||||
| varargslist: (vfpdef ["=" test] ("," vfpdef ["=" test])* ["," [ "*" [vfpdef] ("," vfpdef ["=" test])* ["," ["**" vfpdef [","]]] | "**" vfpdef [","]]] | |||||
| | "*" [vfpdef] ("," vfpdef ["=" test])* ["," ["**" vfpdef [","]]] | |||||
| | "**" vfpdef [","]) | |||||
| vfpdef: NAME | |||||
| lambdef: "lambda" [lambda_params] ":" test | |||||
| lambdef_nocond: "lambda" [lambda_params] ":" test_nocond | |||||
| lambda_params: lambda_paramvalue ("," lambda_paramvalue)* ["," [lambda_starparams | lambda_kwparams]] | |||||
| | lambda_starparams | |||||
| | lambda_kwparams | |||||
| ?lambda_paramvalue: NAME ("=" test)? | |||||
| lambda_starparams: "*" [NAME] ("," lambda_paramvalue)* ["," [lambda_kwparams]] | |||||
| lambda_kwparams: "**" NAME ","? | |||||
| ?stmt: simple_stmt | compound_stmt | ?stmt: simple_stmt | compound_stmt | ||||
| ?simple_stmt: small_stmt (";" small_stmt)* [";"] _NEWLINE | ?simple_stmt: small_stmt (";" small_stmt)* [";"] _NEWLINE | ||||
| ?small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt | assert_stmt) | |||||
| ?expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) | |||||
| | ("=" (yield_expr|testlist_star_expr))*) | |||||
| annassign: ":" test ["=" test] | |||||
| ?testlist_star_expr: (test|star_expr) ("," (test|star_expr))* [","] | |||||
| !augassign: ("+=" | "-=" | "*=" | "@=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>=" | "**=" | "//=") | |||||
| ?small_stmt: (expr_stmt | assign_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt | assert_stmt) | |||||
| expr_stmt: testlist_star_expr | |||||
| assign_stmt: annassign | augassign | assign | |||||
| annassign: testlist_star_expr ":" test ["=" test] | |||||
| assign: testlist_star_expr ("=" (yield_expr|testlist_star_expr))+ | |||||
| augassign: testlist_star_expr augassign_op (yield_expr|testlist) | |||||
| !augassign_op: "+=" | "-=" | "*=" | "@=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>=" | "**=" | "//=" | |||||
| ?testlist_star_expr: test_or_star_expr | |||||
| | test_or_star_expr ("," test_or_star_expr)+ ","? -> tuple | |||||
| | test_or_star_expr "," -> tuple | |||||
| // For normal and annotated assignments, additional restrictions enforced by the interpreter | // For normal and annotated assignments, additional restrictions enforced by the interpreter | ||||
| del_stmt: "del" exprlist | del_stmt: "del" exprlist | ||||
| pass_stmt: "pass" | pass_stmt: "pass" | ||||
| @@ -71,43 +82,52 @@ global_stmt: "global" NAME ("," NAME)* | |||||
| nonlocal_stmt: "nonlocal" NAME ("," NAME)* | nonlocal_stmt: "nonlocal" NAME ("," NAME)* | ||||
| assert_stmt: "assert" test ["," test] | assert_stmt: "assert" test ["," test] | ||||
| compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated | async_stmt | |||||
| ?compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated | async_stmt | |||||
| async_stmt: "async" (funcdef | with_stmt | for_stmt) | async_stmt: "async" (funcdef | with_stmt | for_stmt) | ||||
| if_stmt: "if" test ":" suite ("elif" test ":" suite)* ["else" ":" suite] | |||||
| if_stmt: "if" test ":" suite elifs ["else" ":" suite] | |||||
| elifs: elif_* | |||||
| elif_: "elif" test ":" suite | |||||
| while_stmt: "while" test ":" suite ["else" ":" suite] | while_stmt: "while" test ":" suite ["else" ":" suite] | ||||
| for_stmt: "for" exprlist "in" testlist ":" suite ["else" ":" suite] | for_stmt: "for" exprlist "in" testlist ":" suite ["else" ":" suite] | ||||
| try_stmt: ("try" ":" suite ((except_clause ":" suite)+ ["else" ":" suite] ["finally" ":" suite] | "finally" ":" suite)) | |||||
| with_stmt: "with" with_item ("," with_item)* ":" suite | |||||
| try_stmt: "try" ":" suite except_clauses ["else" ":" suite] [finally] | |||||
| | "try" ":" suite finally -> try_finally | |||||
| finally: "finally" ":" suite | |||||
| except_clauses: except_clause+ | |||||
| except_clause: "except" [test ["as" NAME]] ":" suite | |||||
| with_stmt: "with" with_items ":" suite | |||||
| with_items: with_item ("," with_item)* | |||||
| with_item: test ["as" expr] | with_item: test ["as" expr] | ||||
| // NB compile.c makes sure that the default except clause is last | // NB compile.c makes sure that the default except clause is last | ||||
| except_clause: "except" [test ["as" NAME]] | |||||
| suite: simple_stmt | _NEWLINE _INDENT stmt+ _DEDENT | suite: simple_stmt | _NEWLINE _INDENT stmt+ _DEDENT | ||||
| ?test: or_test ("if" or_test "else" test)? | lambdef | |||||
| ?test: or_test ("if" or_test "else" test)? | |||||
| | lambdef | |||||
| ?test_nocond: or_test | lambdef_nocond | ?test_nocond: or_test | lambdef_nocond | ||||
| lambdef: "lambda" [varargslist] ":" test | |||||
| lambdef_nocond: "lambda" [varargslist] ":" test_nocond | |||||
| ?or_test: and_test ("or" and_test)* | ?or_test: and_test ("or" and_test)* | ||||
| ?and_test: not_test ("and" not_test)* | ?and_test: not_test ("and" not_test)* | ||||
| ?not_test: "not" not_test -> not | |||||
| ?not_test: "not" not_test -> not_test | |||||
| | comparison | | comparison | ||||
| ?comparison: expr (_comp_op expr)* | |||||
| ?comparison: expr (comp_op expr)* | |||||
| star_expr: "*" expr | star_expr: "*" expr | ||||
| ?expr: xor_expr ("|" xor_expr)* | |||||
| ?expr: or_expr | |||||
| ?or_expr: xor_expr ("|" xor_expr)* | |||||
| ?xor_expr: and_expr ("^" and_expr)* | ?xor_expr: and_expr ("^" and_expr)* | ||||
| ?and_expr: shift_expr ("&" shift_expr)* | ?and_expr: shift_expr ("&" shift_expr)* | ||||
| ?shift_expr: arith_expr (_shift_op arith_expr)* | ?shift_expr: arith_expr (_shift_op arith_expr)* | ||||
| ?arith_expr: term (_add_op term)* | ?arith_expr: term (_add_op term)* | ||||
| ?term: factor (_mul_op factor)* | ?term: factor (_mul_op factor)* | ||||
| ?factor: _factor_op factor | power | |||||
| ?factor: _unary_op factor | power | |||||
| !_factor_op: "+"|"-"|"~" | |||||
| !_unary_op: "+"|"-"|"~" | |||||
| !_add_op: "+"|"-" | !_add_op: "+"|"-" | ||||
| !_shift_op: "<<"|">>" | !_shift_op: "<<"|">>" | ||||
| !_mul_op: "*"|"@"|"/"|"%"|"//" | !_mul_op: "*"|"@"|"/"|"%"|"//" | ||||
| // <> isn't actually a valid comparison operator in Python. It's here for the | // <> isn't actually a valid comparison operator in Python. It's here for the | ||||
| // sake of a __future__ import described in PEP 401 (which really works :-) | // sake of a __future__ import described in PEP 401 (which really works :-) | ||||
| !_comp_op: "<"|">"|"=="|">="|"<="|"<>"|"!="|"in"|"not" "in"|"is"|"is" "not" | |||||
| !comp_op: "<"|">"|"=="|">="|"<="|"<>"|"!="|"in"|"not" "in"|"is"|"is" "not" | |||||
| ?power: await_expr ("**" factor)? | ?power: await_expr ("**" factor)? | ||||
| ?await_expr: AWAIT? atom_expr | ?await_expr: AWAIT? atom_expr | ||||
| @@ -118,61 +138,75 @@ AWAIT: "await" | |||||
| | atom_expr "." NAME -> getattr | | atom_expr "." NAME -> getattr | ||||
| | atom | | atom | ||||
| ?atom: "(" [yield_expr|tuplelist_comp] ")" -> tuple | |||||
| | "[" [testlist_comp] "]" -> list | |||||
| | "{" [dict_comp] "}" -> dict | |||||
| | "{" set_comp "}" -> set | |||||
| ?atom: "(" yield_expr ")" | |||||
| | "(" _tuple_inner? ")" -> tuple | |||||
| | "(" comprehension{test_or_star_expr} ")" -> tuple_comprehension | |||||
| | "[" _testlist_comp? "]" -> list | |||||
| | "[" comprehension{test_or_star_expr} "]" -> list_comprehension | |||||
| | "{" _dict_exprlist? "}" -> dict | |||||
| | "{" comprehension{key_value} "}" -> dict_comprehension | |||||
| | "{" _set_exprlist "}" -> set | |||||
| | "{" comprehension{test} "}" -> set_comprehension | |||||
| | NAME -> var | | NAME -> var | ||||
| | number | string+ | |||||
| | number | |||||
| | string_concat | |||||
| | "(" test ")" | | "(" test ")" | ||||
| | "..." -> ellipsis | | "..." -> ellipsis | ||||
| | "None" -> const_none | | "None" -> const_none | ||||
| | "True" -> const_true | | "True" -> const_true | ||||
| | "False" -> const_false | | "False" -> const_false | ||||
| ?testlist_comp: test | tuplelist_comp | |||||
| tuplelist_comp: (test|star_expr) (comp_for | ("," (test|star_expr))+ [","] | ",") | |||||
| ?string_concat: string+ | |||||
| _testlist_comp: test | _tuple_inner | |||||
| _tuple_inner: test_or_star_expr (("," test_or_star_expr)+ [","] | ",") | |||||
| ?test_or_star_expr: test | |||||
| | star_expr | |||||
| ?subscriptlist: subscript | ?subscriptlist: subscript | ||||
| | subscript (("," subscript)+ [","] | ",") -> subscript_tuple | | subscript (("," subscript)+ [","] | ",") -> subscript_tuple | ||||
| subscript: test | ([test] ":" [test] [sliceop]) -> slice | |||||
| ?subscript: test | ([test] ":" [test] [sliceop]) -> slice | |||||
| sliceop: ":" [test] | sliceop: ":" [test] | ||||
| exprlist: (expr|star_expr) | |||||
| | (expr|star_expr) (("," (expr|star_expr))+ [","]|",") -> exprlist_tuple | |||||
| testlist: test | testlist_tuple | |||||
| ?exprlist: (expr|star_expr) | |||||
| | (expr|star_expr) (("," (expr|star_expr))+ [","]|",") | |||||
| ?testlist: test | testlist_tuple | |||||
| testlist_tuple: test (("," test)+ [","] | ",") | testlist_tuple: test (("," test)+ [","] | ",") | ||||
| dict_comp: key_value comp_for | |||||
| | (key_value | "**" expr) ("," (key_value | "**" expr))* [","] | |||||
| _dict_exprlist: (key_value | "**" expr) ("," (key_value | "**" expr))* [","] | |||||
| key_value: test ":" test | key_value: test ":" test | ||||
| set_comp: test comp_for | |||||
| | (test|star_expr) ("," (test | star_expr))* [","] | |||||
| _set_exprlist: test_or_star_expr ("," test_or_star_expr)* [","] | |||||
| classdef: "class" NAME ["(" [arguments] ")"] ":" suite | classdef: "class" NAME ["(" [arguments] ")"] ":" suite | ||||
| arguments: argvalue ("," argvalue)* ("," [ starargs | kwargs])? | arguments: argvalue ("," argvalue)* ("," [ starargs | kwargs])? | ||||
| | starargs | | starargs | ||||
| | kwargs | | kwargs | ||||
| | test comp_for | |||||
| | comprehension{test} | |||||
| starargs: "*" test ("," "*" test)* ("," argvalue)* ["," kwargs] | |||||
| starargs: stararg ("," stararg)* ("," argvalue)* ["," kwargs] | |||||
| stararg: "*" test | |||||
| kwargs: "**" test | kwargs: "**" test | ||||
| ?argvalue: test ("=" test)? | ?argvalue: test ("=" test)? | ||||
| comp_iter: comp_for | comp_if | async_for | |||||
| async_for: "async" "for" exprlist "in" or_test [comp_iter] | |||||
| comp_for: "for" exprlist "in" or_test [comp_iter] | |||||
| comp_if: "if" test_nocond [comp_iter] | |||||
| comprehension{comp_result}: comp_result comp_fors [comp_if] | |||||
| comp_fors: comp_for+ | |||||
| comp_for: [ASYNC] "for" exprlist "in" or_test | |||||
| ASYNC: "async" | |||||
| ?comp_if: "if" test_nocond | |||||
| // not used in grammar, but may appear in "node" passed from Parser to Compiler | // not used in grammar, but may appear in "node" passed from Parser to Compiler | ||||
| encoding_decl: NAME | encoding_decl: NAME | ||||
| yield_expr: "yield" [yield_arg] | |||||
| yield_arg: "from" test | testlist | |||||
| yield_expr: "yield" [testlist] | |||||
| | "yield" "from" test -> yield_from | |||||
| number: DEC_NUMBER | HEX_NUMBER | BIN_NUMBER | OCT_NUMBER | FLOAT_NUMBER | IMAG_NUMBER | number: DEC_NUMBER | HEX_NUMBER | BIN_NUMBER | OCT_NUMBER | FLOAT_NUMBER | IMAG_NUMBER | ||||
| string: STRING | LONG_STRING | string: STRING | LONG_STRING | ||||
| @@ -181,6 +215,7 @@ string: STRING | LONG_STRING | |||||
| %import python (NAME, COMMENT, STRING, LONG_STRING) | %import python (NAME, COMMENT, STRING, LONG_STRING) | ||||
| %import python (DEC_NUMBER, HEX_NUMBER, OCT_NUMBER, BIN_NUMBER, FLOAT_NUMBER, IMAG_NUMBER) | %import python (DEC_NUMBER, HEX_NUMBER, OCT_NUMBER, BIN_NUMBER, FLOAT_NUMBER, IMAG_NUMBER) | ||||
| // Other terminals | // Other terminals | ||||
| _NEWLINE: ( /\r?\n[\t ]*/ | COMMENT )+ | _NEWLINE: ( /\r?\n[\t ]*/ | COMMENT )+ | ||||
| @@ -10,7 +10,9 @@ Standalone Parser | |||||
| import sys | import sys | ||||
| from json_parser import Lark_StandAlone, Transformer, inline_args | |||||
| from json_parser import Lark_StandAlone, Transformer, v_args | |||||
| inline_args = v_args(inline=True) | |||||
| class TreeToJson(Transformer): | class TreeToJson(Transformer): | ||||
| @inline_args | @inline_args | ||||
| @@ -38,8 +38,8 @@ def create_transformer(ast_module: types.ModuleType, transformer: Optional[Trans | |||||
| Classes starting with an underscore (`_`) will be skipped. | Classes starting with an underscore (`_`) will be skipped. | ||||
| Parameters: | Parameters: | ||||
| ast_module - A Python module containing all the subclasses of `ast_utils.Ast` | |||||
| transformer (Optional[Transformer]) - An initial transformer. Its attributes may be overwritten. | |||||
| ast_module: A Python module containing all the subclasses of ``ast_utils.Ast`` | |||||
| transformer (Optional[Transformer]): An initial transformer. Its attributes may be overwritten. | |||||
| """ | """ | ||||
| t = transformer or Transformer() | t = transformer or Transformer() | ||||
| @@ -1,4 +1,5 @@ | |||||
| from types import ModuleType | from types import ModuleType | ||||
| from copy import deepcopy | |||||
| from .utils import Serialize | from .utils import Serialize | ||||
| from .lexer import TerminalDef, Token | from .lexer import TerminalDef, Token | ||||
| @@ -40,6 +41,17 @@ class LexerConf(Serialize): | |||||
| def _deserialize(self): | def _deserialize(self): | ||||
| self.terminals_by_name = {t.name: t for t in self.terminals} | self.terminals_by_name = {t.name: t for t in self.terminals} | ||||
| def __deepcopy__(self, memo=None): | |||||
| return type(self)( | |||||
| deepcopy(self.terminals, memo), | |||||
| self.re_module, | |||||
| deepcopy(self.ignore, memo), | |||||
| deepcopy(self.postlex, memo), | |||||
| deepcopy(self.callbacks, memo), | |||||
| deepcopy(self.g_regex_flags, memo), | |||||
| deepcopy(self.skip_validation, memo), | |||||
| deepcopy(self.use_bytes, memo), | |||||
| ) | |||||
| class ParserConf(Serialize): | class ParserConf(Serialize): | ||||
| @@ -41,8 +41,9 @@ class UnexpectedInput(LarkError): | |||||
| Used as a base class for the following exceptions: | Used as a base class for the following exceptions: | ||||
| - ``UnexpectedToken``: The parser received an unexpected token | |||||
| - ``UnexpectedCharacters``: The lexer encountered an unexpected string | - ``UnexpectedCharacters``: The lexer encountered an unexpected string | ||||
| - ``UnexpectedToken``: The parser received an unexpected token | |||||
| - ``UnexpectedEOF``: The parser expected a token, but the input ended | |||||
| After catching one of these exceptions, you may call the following helper methods to create a nicer error message. | After catching one of these exceptions, you may call the following helper methods to create a nicer error message. | ||||
| """ | """ | ||||
| @@ -136,10 +137,13 @@ class UnexpectedInput(LarkError): | |||||
| class UnexpectedEOF(ParseError, UnexpectedInput): | class UnexpectedEOF(ParseError, UnexpectedInput): | ||||
| """An exception that is raised by the parser, when the input ends while it still expects a token. | |||||
| """ | |||||
| expected: 'List[Token]' | expected: 'List[Token]' | ||||
| def __init__(self, expected, state=None, terminals_by_name=None): | def __init__(self, expected, state=None, terminals_by_name=None): | ||||
| super(UnexpectedEOF, self).__init__() | |||||
| self.expected = expected | self.expected = expected | ||||
| self.state = state | self.state = state | ||||
| from .lexer import Token | from .lexer import Token | ||||
| @@ -149,7 +153,6 @@ class UnexpectedEOF(ParseError, UnexpectedInput): | |||||
| self.column = -1 | self.column = -1 | ||||
| self._terminals_by_name = terminals_by_name | self._terminals_by_name = terminals_by_name | ||||
| super(UnexpectedEOF, self).__init__() | |||||
| def __str__(self): | def __str__(self): | ||||
| message = "Unexpected end-of-input. " | message = "Unexpected end-of-input. " | ||||
| @@ -158,12 +161,17 @@ class UnexpectedEOF(ParseError, UnexpectedInput): | |||||
| class UnexpectedCharacters(LexError, UnexpectedInput): | class UnexpectedCharacters(LexError, UnexpectedInput): | ||||
| """An exception that is raised by the lexer, when it cannot match the next | |||||
| string of characters to any of its terminals. | |||||
| """ | |||||
| allowed: Set[str] | allowed: Set[str] | ||||
| considered_tokens: Set[Any] | considered_tokens: Set[Any] | ||||
| def __init__(self, seq, lex_pos, line, column, allowed=None, considered_tokens=None, state=None, token_history=None, | def __init__(self, seq, lex_pos, line, column, allowed=None, considered_tokens=None, state=None, token_history=None, | ||||
| terminals_by_name=None, considered_rules=None): | terminals_by_name=None, considered_rules=None): | ||||
| super(UnexpectedCharacters, self).__init__() | |||||
| # TODO considered_tokens and allowed can be figured out using state | # TODO considered_tokens and allowed can be figured out using state | ||||
| self.line = line | self.line = line | ||||
| self.column = column | self.column = column | ||||
| @@ -182,7 +190,6 @@ class UnexpectedCharacters(LexError, UnexpectedInput): | |||||
| self.char = seq[lex_pos] | self.char = seq[lex_pos] | ||||
| self._context = self.get_context(seq) | self._context = self.get_context(seq) | ||||
| super(UnexpectedCharacters, self).__init__() | |||||
| def __str__(self): | def __str__(self): | ||||
| message = "No terminal matches '%s' in the current parser context, at line %d col %d" % (self.char, self.line, self.column) | message = "No terminal matches '%s' in the current parser context, at line %d col %d" % (self.char, self.line, self.column) | ||||
| @@ -198,10 +205,15 @@ class UnexpectedToken(ParseError, UnexpectedInput): | |||||
| """An exception that is raised by the parser, when the token it received | """An exception that is raised by the parser, when the token it received | ||||
| doesn't match any valid step forward. | doesn't match any valid step forward. | ||||
| The parser provides an interactive instance through `interactive_parser`, | |||||
| which is initialized to the point of failture, and can be used for debugging and error handling. | |||||
| Parameters: | |||||
| token: The mismatched token | |||||
| expected: The set of expected tokens | |||||
| considered_rules: Which rules were considered, to deduce the expected tokens | |||||
| state: A value representing the parser state. Do not rely on its value or type. | |||||
| interactive_parser: An instance of ``InteractiveParser``, that is initialized to the point of failture, | |||||
| and can be used for debugging and error handling. | |||||
| see: ``InteractiveParser``. | |||||
| Note: These parameters are available as attributes of the instance. | |||||
| """ | """ | ||||
| expected: Set[str] | expected: Set[str] | ||||
| @@ -209,6 +221,8 @@ class UnexpectedToken(ParseError, UnexpectedInput): | |||||
| interactive_parser: 'InteractiveParser' | interactive_parser: 'InteractiveParser' | ||||
| def __init__(self, token, expected, considered_rules=None, state=None, interactive_parser=None, terminals_by_name=None, token_history=None): | def __init__(self, token, expected, considered_rules=None, state=None, interactive_parser=None, terminals_by_name=None, token_history=None): | ||||
| super(UnexpectedToken, self).__init__() | |||||
| # TODO considered_rules and expected can be figured out using state | # TODO considered_rules and expected can be figured out using state | ||||
| self.line = getattr(token, 'line', '?') | self.line = getattr(token, 'line', '?') | ||||
| self.column = getattr(token, 'column', '?') | self.column = getattr(token, 'column', '?') | ||||
| @@ -223,7 +237,6 @@ class UnexpectedToken(ParseError, UnexpectedInput): | |||||
| self._terminals_by_name = terminals_by_name | self._terminals_by_name = terminals_by_name | ||||
| self.token_history = token_history | self.token_history = token_history | ||||
| super(UnexpectedToken, self).__init__() | |||||
| @property | @property | ||||
| def accepts(self) -> Set[str]: | def accepts(self) -> Set[str]: | ||||
| @@ -245,18 +258,24 @@ class VisitError(LarkError): | |||||
| """VisitError is raised when visitors are interrupted by an exception | """VisitError is raised when visitors are interrupted by an exception | ||||
| It provides the following attributes for inspection: | It provides the following attributes for inspection: | ||||
| - obj: the tree node or token it was processing when the exception was raised | |||||
| - orig_exc: the exception that cause it to fail | |||||
| Parameters: | |||||
| rule: the name of the visit rule that failed | |||||
| obj: the tree-node or token that was being processed | |||||
| orig_exc: the exception that cause it to fail | |||||
| Note: These parameters are available as attributes | |||||
| """ | """ | ||||
| obj: 'Union[Tree, Token]' | obj: 'Union[Tree, Token]' | ||||
| orig_exc: Exception | orig_exc: Exception | ||||
| def __init__(self, rule, obj, orig_exc): | def __init__(self, rule, obj, orig_exc): | ||||
| self.obj = obj | |||||
| self.orig_exc = orig_exc | |||||
| message = 'Error trying to process rule "%s":\n\n%s' % (rule, orig_exc) | message = 'Error trying to process rule "%s":\n\n%s' % (rule, orig_exc) | ||||
| super(VisitError, self).__init__(message) | super(VisitError, self).__init__(message) | ||||
| self.rule = rule | |||||
| self.obj = obj | |||||
| self.orig_exc = orig_exc | |||||
| ###} | ###} | ||||
| @@ -79,7 +79,7 @@ class LarkOptions(Serialize): | |||||
| Applies the transformer to every parse tree (equivalent to applying it after the parse, but faster) | Applies the transformer to every parse tree (equivalent to applying it after the parse, but faster) | ||||
| propagate_positions | propagate_positions | ||||
| Propagates (line, column, end_line, end_column) attributes into all tree branches. | Propagates (line, column, end_line, end_column) attributes into all tree branches. | ||||
| Accepts ``False``, ``True``, or "ignore_ws", which will trim the whitespace around your trees. | |||||
| Accepts ``False``, ``True``, or a callable, which will filter which nodes to ignore when propagating. | |||||
| maybe_placeholders | maybe_placeholders | ||||
| When ``True``, the ``[]`` operator returns ``None`` when not matched. | When ``True``, the ``[]`` operator returns ``None`` when not matched. | ||||
| @@ -137,7 +137,7 @@ class LarkOptions(Serialize): | |||||
| A List of either paths or loader functions to specify from where grammars are imported | A List of either paths or loader functions to specify from where grammars are imported | ||||
| source_path | source_path | ||||
| Override the source of from where the grammar was loaded. Useful for relative imports and unconventional grammar loading | Override the source of from where the grammar was loaded. Useful for relative imports and unconventional grammar loading | ||||
| **=== End Options ===** | |||||
| **=== End of Options ===** | |||||
| """ | """ | ||||
| if __doc__: | if __doc__: | ||||
| __doc__ += OPTIONS_DOC | __doc__ += OPTIONS_DOC | ||||
| @@ -195,7 +195,7 @@ class LarkOptions(Serialize): | |||||
| assert_config(self.parser, ('earley', 'lalr', 'cyk', None)) | assert_config(self.parser, ('earley', 'lalr', 'cyk', None)) | ||||
| if self.parser == 'earley' and self.transformer: | if self.parser == 'earley' and self.transformer: | ||||
| raise ConfigurationError('Cannot specify an embedded transformer when using the Earley algorithm.' | |||||
| raise ConfigurationError('Cannot specify an embedded transformer when using the Earley algorithm. ' | |||||
| 'Please use your transformer on the resulting parse tree, or use a different algorithm (i.e. LALR)') | 'Please use your transformer on the resulting parse tree, or use a different algorithm (i.e. LALR)') | ||||
| if o: | if o: | ||||
| @@ -484,11 +484,11 @@ class Lark(Serialize): | |||||
| d = f | d = f | ||||
| else: | else: | ||||
| d = pickle.load(f) | d = pickle.load(f) | ||||
| memo = d['memo'] | |||||
| memo_json = d['memo'] | |||||
| data = d['data'] | data = d['data'] | ||||
| assert memo | |||||
| memo = SerializeMemoizer.deserialize(memo, {'Rule': Rule, 'TerminalDef': TerminalDef}, {}) | |||||
| assert memo_json | |||||
| memo = SerializeMemoizer.deserialize(memo_json, {'Rule': Rule, 'TerminalDef': TerminalDef}, {}) | |||||
| options = dict(data['options']) | options = dict(data['options']) | ||||
| if (set(kwargs) - _LOAD_ALLOWED_OPTIONS) & set(LarkOptions._defaults): | if (set(kwargs) - _LOAD_ALLOWED_OPTIONS) & set(LarkOptions._defaults): | ||||
| raise ConfigurationError("Some options are not allowed when loading a Parser: {}" | raise ConfigurationError("Some options are not allowed when loading a Parser: {}" | ||||
| @@ -545,11 +545,11 @@ class Lark(Serialize): | |||||
| Lark.open_from_package(__name__, "example.lark", ("grammars",), parser=...) | Lark.open_from_package(__name__, "example.lark", ("grammars",), parser=...) | ||||
| """ | """ | ||||
| package = FromPackageLoader(package, search_paths) | |||||
| full_path, text = package(None, grammar_path) | |||||
| package_loader = FromPackageLoader(package, search_paths) | |||||
| full_path, text = package_loader(None, grammar_path) | |||||
| options.setdefault('source_path', full_path) | options.setdefault('source_path', full_path) | ||||
| options.setdefault('import_paths', []) | options.setdefault('import_paths', []) | ||||
| options['import_paths'].append(package) | |||||
| options['import_paths'].append(package_loader) | |||||
| return cls(text, **options) | return cls(text, **options) | ||||
| def __repr__(self): | def __repr__(self): | ||||
| @@ -560,6 +560,8 @@ class Lark(Serialize): | |||||
| """Only lex (and postlex) the text, without parsing it. Only relevant when lexer='standard' | """Only lex (and postlex) the text, without parsing it. Only relevant when lexer='standard' | ||||
| When dont_ignore=True, the lexer will return all tokens, even those marked for %ignore. | When dont_ignore=True, the lexer will return all tokens, even those marked for %ignore. | ||||
| :raises UnexpectedCharacters: In case the lexer cannot find a suitable match. | |||||
| """ | """ | ||||
| if not hasattr(self, 'lexer') or dont_ignore: | if not hasattr(self, 'lexer') or dont_ignore: | ||||
| lexer = self._build_lexer(dont_ignore) | lexer = self._build_lexer(dont_ignore) | ||||
| @@ -602,6 +604,10 @@ class Lark(Serialize): | |||||
| If a transformer is supplied to ``__init__``, returns whatever is the | If a transformer is supplied to ``__init__``, returns whatever is the | ||||
| result of the transformation. Otherwise, returns a Tree instance. | result of the transformation. Otherwise, returns a Tree instance. | ||||
| :raises UnexpectedInput: On a parse error, one of these sub-exceptions will rise: | |||||
| ``UnexpectedCharacters``, ``UnexpectedToken``, or ``UnexpectedEOF``. | |||||
| For convenience, these sub-exceptions also inherit from ``ParserError`` and ``LexerError``. | |||||
| """ | """ | ||||
| return self.parser.parse(text, start=start, on_error=on_error) | return self.parser.parse(text, start=start, on_error=on_error) | ||||
| @@ -158,20 +158,20 @@ class Token(str): | |||||
| def __new__(cls, type_, value, start_pos=None, line=None, column=None, end_line=None, end_column=None, end_pos=None): | def __new__(cls, type_, value, start_pos=None, line=None, column=None, end_line=None, end_column=None, end_pos=None): | ||||
| try: | try: | ||||
| self = super(Token, cls).__new__(cls, value) | |||||
| inst = super(Token, cls).__new__(cls, value) | |||||
| except UnicodeDecodeError: | except UnicodeDecodeError: | ||||
| value = value.decode('latin1') | value = value.decode('latin1') | ||||
| self = super(Token, cls).__new__(cls, value) | |||||
| self.type = type_ | |||||
| self.start_pos = start_pos | |||||
| self.value = value | |||||
| self.line = line | |||||
| self.column = column | |||||
| self.end_line = end_line | |||||
| self.end_column = end_column | |||||
| self.end_pos = end_pos | |||||
| return self | |||||
| inst = super(Token, cls).__new__(cls, value) | |||||
| inst.type = type_ | |||||
| inst.start_pos = start_pos | |||||
| inst.value = value | |||||
| inst.line = line | |||||
| inst.column = column | |||||
| inst.end_line = end_line | |||||
| inst.end_column = end_column | |||||
| inst.end_pos = end_pos | |||||
| return inst | |||||
| def update(self, type_: Optional[str]=None, value: Optional[Any]=None) -> 'Token': | def update(self, type_: Optional[str]=None, value: Optional[Any]=None) -> 'Token': | ||||
| return Token.new_borrow_pos( | return Token.new_borrow_pos( | ||||
| @@ -234,15 +234,13 @@ class LineCounter: | |||||
| class UnlessCallback: | class UnlessCallback: | ||||
| def __init__(self, mres): | |||||
| self.mres = mres | |||||
| def __init__(self, scanner): | |||||
| self.scanner = scanner | |||||
| def __call__(self, t): | def __call__(self, t): | ||||
| for mre, type_from_index in self.mres: | |||||
| m = mre.match(t.value) | |||||
| if m: | |||||
| t.type = type_from_index[m.lastindex] | |||||
| break | |||||
| res = self.scanner.match(t.value, 0) | |||||
| if res: | |||||
| _value, t.type = res | |||||
| return t | return t | ||||
| @@ -257,6 +255,11 @@ class CallChain: | |||||
| return self.callback2(t) if self.cond(t2) else t2 | return self.callback2(t) if self.cond(t2) else t2 | ||||
| def _get_match(re_, regexp, s, flags): | |||||
| m = re_.match(regexp, s, flags) | |||||
| if m: | |||||
| return m.group(0) | |||||
| def _create_unless(terminals, g_regex_flags, re_, use_bytes): | def _create_unless(terminals, g_regex_flags, re_, use_bytes): | ||||
| tokens_by_type = classify(terminals, lambda t: type(t.pattern)) | tokens_by_type = classify(terminals, lambda t: type(t.pattern)) | ||||
| assert len(tokens_by_type) <= 2, tokens_by_type.keys() | assert len(tokens_by_type) <= 2, tokens_by_type.keys() | ||||
| @@ -268,40 +271,54 @@ def _create_unless(terminals, g_regex_flags, re_, use_bytes): | |||||
| if strtok.priority > retok.priority: | if strtok.priority > retok.priority: | ||||
| continue | continue | ||||
| s = strtok.pattern.value | s = strtok.pattern.value | ||||
| m = re_.match(retok.pattern.to_regexp(), s, g_regex_flags) | |||||
| if m and m.group(0) == s: | |||||
| if s == _get_match(re_, retok.pattern.to_regexp(), s, g_regex_flags): | |||||
| unless.append(strtok) | unless.append(strtok) | ||||
| if strtok.pattern.flags <= retok.pattern.flags: | if strtok.pattern.flags <= retok.pattern.flags: | ||||
| embedded_strs.add(strtok) | embedded_strs.add(strtok) | ||||
| if unless: | if unless: | ||||
| callback[retok.name] = UnlessCallback(build_mres(unless, g_regex_flags, re_, match_whole=True, use_bytes=use_bytes)) | |||||
| terminals = [t for t in terminals if t not in embedded_strs] | |||||
| return terminals, callback | |||||
| def _build_mres(terminals, max_size, g_regex_flags, match_whole, re_, use_bytes): | |||||
| # Python sets an unreasonable group limit (currently 100) in its re module | |||||
| # Worse, the only way to know we reached it is by catching an AssertionError! | |||||
| # This function recursively tries less and less groups until it's successful. | |||||
| postfix = '$' if match_whole else '' | |||||
| mres = [] | |||||
| while terminals: | |||||
| pattern = u'|'.join(u'(?P<%s>%s)' % (t.name, t.pattern.to_regexp() + postfix) for t in terminals[:max_size]) | |||||
| if use_bytes: | |||||
| pattern = pattern.encode('latin-1') | |||||
| try: | |||||
| mre = re_.compile(pattern, g_regex_flags) | |||||
| except AssertionError: # Yes, this is what Python provides us.. :/ | |||||
| return _build_mres(terminals, max_size//2, g_regex_flags, match_whole, re_, use_bytes) | |||||
| callback[retok.name] = UnlessCallback(Scanner(unless, g_regex_flags, re_, match_whole=True, use_bytes=use_bytes)) | |||||
| mres.append((mre, {i: n for n, i in mre.groupindex.items()})) | |||||
| terminals = terminals[max_size:] | |||||
| return mres | |||||
| new_terminals = [t for t in terminals if t not in embedded_strs] | |||||
| return new_terminals, callback | |||||
| def build_mres(terminals, g_regex_flags, re_, use_bytes, match_whole=False): | |||||
| return _build_mres(terminals, len(terminals), g_regex_flags, match_whole, re_, use_bytes) | |||||
| class Scanner: | |||||
| def __init__(self, terminals, g_regex_flags, re_, use_bytes, match_whole=False): | |||||
| self.terminals = terminals | |||||
| self.g_regex_flags = g_regex_flags | |||||
| self.re_ = re_ | |||||
| self.use_bytes = use_bytes | |||||
| self.match_whole = match_whole | |||||
| self.allowed_types = {t.name for t in self.terminals} | |||||
| self._mres = self._build_mres(terminals, len(terminals)) | |||||
| def _build_mres(self, terminals, max_size): | |||||
| # Python sets an unreasonable group limit (currently 100) in its re module | |||||
| # Worse, the only way to know we reached it is by catching an AssertionError! | |||||
| # This function recursively tries less and less groups until it's successful. | |||||
| postfix = '$' if self.match_whole else '' | |||||
| mres = [] | |||||
| while terminals: | |||||
| pattern = u'|'.join(u'(?P<%s>%s)' % (t.name, t.pattern.to_regexp() + postfix) for t in terminals[:max_size]) | |||||
| if self.use_bytes: | |||||
| pattern = pattern.encode('latin-1') | |||||
| try: | |||||
| mre = self.re_.compile(pattern, self.g_regex_flags) | |||||
| except AssertionError: # Yes, this is what Python provides us.. :/ | |||||
| return self._build_mres(terminals, max_size//2) | |||||
| mres.append((mre, {i: n for n, i in mre.groupindex.items()})) | |||||
| terminals = terminals[max_size:] | |||||
| return mres | |||||
| def match(self, text, pos): | |||||
| for mre, type_from_index in self._mres: | |||||
| m = mre.match(text, pos) | |||||
| if m: | |||||
| return m.group(0), type_from_index[m.lastindex] | |||||
| def _regexp_has_newline(r): | def _regexp_has_newline(r): | ||||
| @@ -390,9 +407,9 @@ class TraditionalLexer(Lexer): | |||||
| self.use_bytes = conf.use_bytes | self.use_bytes = conf.use_bytes | ||||
| self.terminals_by_name = conf.terminals_by_name | self.terminals_by_name = conf.terminals_by_name | ||||
| self._mres = None | |||||
| self._scanner = None | |||||
| def _build(self) -> None: | |||||
| def _build_scanner(self): | |||||
| terminals, self.callback = _create_unless(self.terminals, self.g_regex_flags, self.re, self.use_bytes) | terminals, self.callback = _create_unless(self.terminals, self.g_regex_flags, self.re, self.use_bytes) | ||||
| assert all(self.callback.values()) | assert all(self.callback.values()) | ||||
| @@ -403,20 +420,16 @@ class TraditionalLexer(Lexer): | |||||
| else: | else: | ||||
| self.callback[type_] = f | self.callback[type_] = f | ||||
| self._mres = build_mres(terminals, self.g_regex_flags, self.re, self.use_bytes) | |||||
| self._scanner = Scanner(terminals, self.g_regex_flags, self.re, self.use_bytes) | |||||
| @property | @property | ||||
| def mres(self) -> List[Tuple[REPattern, Dict[int, str]]]: | |||||
| if self._mres is None: | |||||
| self._build() | |||||
| assert self._mres is not None | |||||
| return self._mres | |||||
| def match(self, text: str, pos: int) -> Optional[Tuple[str, str]]: | |||||
| for mre, type_from_index in self.mres: | |||||
| m = mre.match(text, pos) | |||||
| if m: | |||||
| return m.group(0), type_from_index[m.lastindex] | |||||
| def scanner(self): | |||||
| if self._scanner is None: | |||||
| self._build_scanner() | |||||
| return self._scanner | |||||
| def match(self, text, pos): | |||||
| return self.scanner.match(text, pos) | |||||
| def lex(self, state: LexerState, parser_state: Any) -> Iterator[Token]: | def lex(self, state: LexerState, parser_state: Any) -> Iterator[Token]: | ||||
| with suppress(EOFError): | with suppress(EOFError): | ||||
| @@ -428,7 +441,7 @@ class TraditionalLexer(Lexer): | |||||
| while line_ctr.char_pos < len(lex_state.text): | while line_ctr.char_pos < len(lex_state.text): | ||||
| res = self.match(lex_state.text, line_ctr.char_pos) | res = self.match(lex_state.text, line_ctr.char_pos) | ||||
| if not res: | if not res: | ||||
| allowed = {v for m, tfi in self.mres for v in tfi.values()} - self.ignore_types | |||||
| allowed = self.scanner.allowed_types - self.ignore_types | |||||
| if not allowed: | if not allowed: | ||||
| allowed = {"<END-OF-FILE>"} | allowed = {"<END-OF-FILE>"} | ||||
| raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column, | raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column, | ||||
| @@ -10,7 +10,7 @@ from numbers import Integral | |||||
| from contextlib import suppress | from contextlib import suppress | ||||
| from typing import List, Tuple, Union, Callable, Dict, Optional | from typing import List, Tuple, Union, Callable, Dict, Optional | ||||
| from .utils import bfs, logger, classify_bool, is_id_continue, is_id_start, bfs_all_unique | |||||
| from .utils import bfs, logger, classify_bool, is_id_continue, is_id_start, bfs_all_unique, small_factors | |||||
| from .lexer import Token, TerminalDef, PatternStr, PatternRE | from .lexer import Token, TerminalDef, PatternStr, PatternRE | ||||
| from .parse_tree_builder import ParseTreeBuilder | from .parse_tree_builder import ParseTreeBuilder | ||||
| @@ -176,27 +176,136 @@ RULES = { | |||||
| } | } | ||||
| # Value 5 keeps the number of states in the lalr parser somewhat minimal | |||||
| # It isn't optimal, but close to it. See PR #949 | |||||
| SMALL_FACTOR_THRESHOLD = 5 | |||||
| # The Threshold whether repeat via ~ are split up into different rules | |||||
| # 50 is chosen since it keeps the number of states low and therefore lalr analysis time low, | |||||
| # while not being to overaggressive and unnecessarily creating rules that might create shift/reduce conflicts. | |||||
| # (See PR #949) | |||||
| REPEAT_BREAK_THRESHOLD = 50 | |||||
| @inline_args | @inline_args | ||||
| class EBNF_to_BNF(Transformer_InPlace): | class EBNF_to_BNF(Transformer_InPlace): | ||||
| def __init__(self): | def __init__(self): | ||||
| self.new_rules = [] | self.new_rules = [] | ||||
| self.rules_by_expr = {} | |||||
| self.rules_cache = {} | |||||
| self.prefix = 'anon' | self.prefix = 'anon' | ||||
| self.i = 0 | self.i = 0 | ||||
| self.rule_options = None | self.rule_options = None | ||||
| def _add_recurse_rule(self, type_, expr): | |||||
| if expr in self.rules_by_expr: | |||||
| return self.rules_by_expr[expr] | |||||
| new_name = '__%s_%s_%d' % (self.prefix, type_, self.i) | |||||
| def _name_rule(self, inner): | |||||
| new_name = '__%s_%s_%d' % (self.prefix, inner, self.i) | |||||
| self.i += 1 | self.i += 1 | ||||
| t = NonTerminal(new_name) | |||||
| tree = ST('expansions', [ST('expansion', [expr]), ST('expansion', [t, expr])]) | |||||
| self.new_rules.append((new_name, tree, self.rule_options)) | |||||
| self.rules_by_expr[expr] = t | |||||
| return new_name | |||||
| def _add_rule(self, key, name, expansions): | |||||
| t = NonTerminal(name) | |||||
| self.new_rules.append((name, expansions, self.rule_options)) | |||||
| self.rules_cache[key] = t | |||||
| return t | return t | ||||
| def _add_recurse_rule(self, type_, expr): | |||||
| try: | |||||
| return self.rules_cache[expr] | |||||
| except KeyError: | |||||
| new_name = self._name_rule(type_) | |||||
| t = NonTerminal(new_name) | |||||
| tree = ST('expansions', [ | |||||
| ST('expansion', [expr]), | |||||
| ST('expansion', [t, expr]) | |||||
| ]) | |||||
| return self._add_rule(expr, new_name, tree) | |||||
| def _add_repeat_rule(self, a, b, target, atom): | |||||
| """Generate a rule that repeats target ``a`` times, and repeats atom ``b`` times. | |||||
| When called recursively (into target), it repeats atom for x(n) times, where: | |||||
| x(0) = 1 | |||||
| x(n) = a(n) * x(n-1) + b | |||||
| Example rule when a=3, b=4: | |||||
| new_rule: target target target atom atom atom atom | |||||
| """ | |||||
| key = (a, b, target, atom) | |||||
| try: | |||||
| return self.rules_cache[key] | |||||
| except KeyError: | |||||
| new_name = self._name_rule('repeat_a%d_b%d' % (a, b)) | |||||
| tree = ST('expansions', [ST('expansion', [target] * a + [atom] * b)]) | |||||
| return self._add_rule(key, new_name, tree) | |||||
| def _add_repeat_opt_rule(self, a, b, target, target_opt, atom): | |||||
| """Creates a rule that matches atom 0 to (a*n+b)-1 times. | |||||
| When target matches n times atom, and target_opt 0 to n-1 times target_opt, | |||||
| First we generate target * i followed by target_opt, for i from 0 to a-1 | |||||
| These match 0 to n*a - 1 times atom | |||||
| Then we generate target * a followed by atom * i, for i from 0 to b-1 | |||||
| These match n*a to n*a + b-1 times atom | |||||
| The created rule will not have any shift/reduce conflicts so that it can be used with lalr | |||||
| Example rule when a=3, b=4: | |||||
| new_rule: target_opt | |||||
| | target target_opt | |||||
| | target target target_opt | |||||
| | target target target | |||||
| | target target target atom | |||||
| | target target target atom atom | |||||
| | target target target atom atom atom | |||||
| """ | |||||
| key = (a, b, target, atom, "opt") | |||||
| try: | |||||
| return self.rules_cache[key] | |||||
| except KeyError: | |||||
| new_name = self._name_rule('repeat_a%d_b%d_opt' % (a, b)) | |||||
| tree = ST('expansions', [ | |||||
| ST('expansion', [target]*i + [target_opt]) for i in range(a) | |||||
| ] + [ | |||||
| ST('expansion', [target]*a + [atom]*i) for i in range(b) | |||||
| ]) | |||||
| return self._add_rule(key, new_name, tree) | |||||
| def _generate_repeats(self, rule, mn, mx): | |||||
| """Generates a rule tree that repeats ``rule`` exactly between ``mn`` to ``mx`` times. | |||||
| """ | |||||
| # For a small number of repeats, we can take the naive approach | |||||
| if mx < REPEAT_BREAK_THRESHOLD: | |||||
| return ST('expansions', [ST('expansion', [rule] * n) for n in range(mn, mx + 1)]) | |||||
| # For large repeat values, we break the repetition into sub-rules. | |||||
| # We treat ``rule~mn..mx`` as ``rule~mn rule~0..(diff=mx-mn)``. | |||||
| # We then use small_factors to split up mn and diff up into values [(a, b), ...] | |||||
| # This values are used with the help of _add_repeat_rule and _add_repeat_rule_opt | |||||
| # to generate a complete rule/expression that matches the corresponding number of repeats | |||||
| mn_target = rule | |||||
| for a, b in small_factors(mn, SMALL_FACTOR_THRESHOLD): | |||||
| mn_target = self._add_repeat_rule(a, b, mn_target, rule) | |||||
| if mx == mn: | |||||
| return mn_target | |||||
| diff = mx - mn + 1 # We add one because _add_repeat_opt_rule generates rules that match one less | |||||
| diff_factors = small_factors(diff, SMALL_FACTOR_THRESHOLD) | |||||
| diff_target = rule # Match rule 1 times | |||||
| diff_opt_target = ST('expansion', []) # match rule 0 times (e.g. up to 1 -1 times) | |||||
| for a, b in diff_factors[:-1]: | |||||
| diff_opt_target = self._add_repeat_opt_rule(a, b, diff_target, diff_opt_target, rule) | |||||
| diff_target = self._add_repeat_rule(a, b, diff_target, rule) | |||||
| a, b = diff_factors[-1] | |||||
| diff_opt_target = self._add_repeat_opt_rule(a, b, diff_target, diff_opt_target, rule) | |||||
| return ST('expansions', [ST('expansion', [mn_target] + [diff_opt_target])]) | |||||
| def expr(self, rule, op, *args): | def expr(self, rule, op, *args): | ||||
| if op.value == '?': | if op.value == '?': | ||||
| empty = ST('expansion', []) | empty = ST('expansion', []) | ||||
| @@ -221,7 +330,9 @@ class EBNF_to_BNF(Transformer_InPlace): | |||||
| mn, mx = map(int, args) | mn, mx = map(int, args) | ||||
| if mx < mn or mn < 0: | if mx < mn or mn < 0: | ||||
| raise GrammarError("Bad Range for %s (%d..%d isn't allowed)" % (rule, mn, mx)) | raise GrammarError("Bad Range for %s (%d..%d isn't allowed)" % (rule, mn, mx)) | ||||
| return ST('expansions', [ST('expansion', [rule] * n) for n in range(mn, mx+1)]) | |||||
| return self._generate_repeats(rule, mn, mx) | |||||
| assert False, op | assert False, op | ||||
| def maybe(self, rule): | def maybe(self, rule): | ||||
| @@ -22,54 +22,59 @@ class ExpandSingleChild: | |||||
| class PropagatePositions: | class PropagatePositions: | ||||
| def __init__(self, node_builder): | |||||
| def __init__(self, node_builder, node_filter=None): | |||||
| self.node_builder = node_builder | self.node_builder = node_builder | ||||
| self.node_filter = node_filter | |||||
| def __call__(self, children): | def __call__(self, children): | ||||
| res = self.node_builder(children) | res = self.node_builder(children) | ||||
| # local reference to Tree.meta reduces number of presence checks | |||||
| if isinstance(res, Tree): | if isinstance(res, Tree): | ||||
| res_meta = res.meta | |||||
| # Calculate positions while the tree is streaming, according to the rule: | |||||
| # - nodes start at the start of their first child's container, | |||||
| # and end at the end of their last child's container. | |||||
| # Containers are nodes that take up space in text, but have been inlined in the tree. | |||||
| src_meta = self._pp_get_meta(children) | |||||
| if src_meta is not None: | |||||
| res_meta.line = src_meta.line | |||||
| res_meta.column = src_meta.column | |||||
| res_meta.start_pos = src_meta.start_pos | |||||
| res_meta.empty = False | |||||
| res_meta = res.meta | |||||
| src_meta = self._pp_get_meta(reversed(children)) | |||||
| if src_meta is not None: | |||||
| res_meta.end_line = src_meta.end_line | |||||
| res_meta.end_column = src_meta.end_column | |||||
| res_meta.end_pos = src_meta.end_pos | |||||
| res_meta.empty = False | |||||
| first_meta = self._pp_get_meta(children) | |||||
| if first_meta is not None: | |||||
| if not hasattr(res_meta, 'line'): | |||||
| # meta was already set, probably because the rule has been inlined (e.g. `?rule`) | |||||
| res_meta.line = getattr(first_meta, 'container_line', first_meta.line) | |||||
| res_meta.column = getattr(first_meta, 'container_column', first_meta.column) | |||||
| res_meta.start_pos = getattr(first_meta, 'container_start_pos', first_meta.start_pos) | |||||
| res_meta.empty = False | |||||
| res_meta.container_line = getattr(first_meta, 'container_line', first_meta.line) | |||||
| res_meta.container_column = getattr(first_meta, 'container_column', first_meta.column) | |||||
| last_meta = self._pp_get_meta(reversed(children)) | |||||
| if last_meta is not None: | |||||
| if not hasattr(res_meta, 'end_line'): | |||||
| res_meta.end_line = getattr(last_meta, 'container_end_line', last_meta.end_line) | |||||
| res_meta.end_column = getattr(last_meta, 'container_end_column', last_meta.end_column) | |||||
| res_meta.end_pos = getattr(last_meta, 'container_end_pos', last_meta.end_pos) | |||||
| res_meta.empty = False | |||||
| res_meta.container_end_line = getattr(last_meta, 'container_end_line', last_meta.end_line) | |||||
| res_meta.container_end_column = getattr(last_meta, 'container_end_column', last_meta.end_column) | |||||
| return res | return res | ||||
| def _pp_get_meta(self, children): | def _pp_get_meta(self, children): | ||||
| for c in children: | for c in children: | ||||
| if self.node_filter is not None and not self.node_filter(c): | |||||
| continue | |||||
| if isinstance(c, Tree): | if isinstance(c, Tree): | ||||
| if not c.meta.empty: | if not c.meta.empty: | ||||
| return c.meta | return c.meta | ||||
| elif isinstance(c, Token): | elif isinstance(c, Token): | ||||
| return c | return c | ||||
| class PropagatePositions_IgnoreWs(PropagatePositions): | |||||
| def _pp_get_meta(self, children): | |||||
| for c in children: | |||||
| if isinstance(c, Tree): | |||||
| if not c.meta.empty: | |||||
| return c.meta | |||||
| elif isinstance(c, Token): | |||||
| if c and not c.isspace(): # Disregard whitespace-only tokens | |||||
| return c | |||||
| def make_propagate_positions(option): | def make_propagate_positions(option): | ||||
| if option == "ignore_ws": | |||||
| return PropagatePositions_IgnoreWs | |||||
| if callable(option): | |||||
| return partial(PropagatePositions, node_filter=option) | |||||
| elif option is True: | elif option is True: | ||||
| return PropagatePositions | return PropagatePositions | ||||
| elif option is False: | elif option is False: | ||||
| @@ -39,8 +39,7 @@ class MakeParsingFrontend: | |||||
| lexer_conf.lexer_type = self.lexer_type | lexer_conf.lexer_type = self.lexer_type | ||||
| return ParsingFrontend(lexer_conf, parser_conf, options) | return ParsingFrontend(lexer_conf, parser_conf, options) | ||||
| @classmethod | |||||
| def deserialize(cls, data, memo, lexer_conf, callbacks, options): | |||||
| def deserialize(self, data, memo, lexer_conf, callbacks, options): | |||||
| parser_conf = ParserConf.deserialize(data['parser_conf'], memo) | parser_conf = ParserConf.deserialize(data['parser_conf'], memo) | ||||
| parser = LALR_Parser.deserialize(data['parser'], memo, callbacks, options.debug) | parser = LALR_Parser.deserialize(data['parser'], memo, callbacks, options.debug) | ||||
| parser_conf.callbacks = callbacks | parser_conf.callbacks = callbacks | ||||
| @@ -92,26 +91,26 @@ class ParsingFrontend(Serialize): | |||||
| def _verify_start(self, start=None): | def _verify_start(self, start=None): | ||||
| if start is None: | if start is None: | ||||
| start = self.parser_conf.start | |||||
| if len(start) > 1: | |||||
| raise ConfigurationError("Lark initialized with more than 1 possible start rule. Must specify which start rule to parse", start) | |||||
| start ,= start | |||||
| start_decls = self.parser_conf.start | |||||
| if len(start_decls) > 1: | |||||
| raise ConfigurationError("Lark initialized with more than 1 possible start rule. Must specify which start rule to parse", start_decls) | |||||
| start ,= start_decls | |||||
| elif start not in self.parser_conf.start: | elif start not in self.parser_conf.start: | ||||
| raise ConfigurationError("Unknown start rule %s. Must be one of %r" % (start, self.parser_conf.start)) | raise ConfigurationError("Unknown start rule %s. Must be one of %r" % (start, self.parser_conf.start)) | ||||
| return start | return start | ||||
| def parse(self, text, start=None, on_error=None): | def parse(self, text, start=None, on_error=None): | ||||
| start = self._verify_start(start) | |||||
| chosen_start = self._verify_start(start) | |||||
| stream = text if self.skip_lexer else LexerThread(self.lexer, text) | stream = text if self.skip_lexer else LexerThread(self.lexer, text) | ||||
| kw = {} if on_error is None else {'on_error': on_error} | kw = {} if on_error is None else {'on_error': on_error} | ||||
| return self.parser.parse(stream, start, **kw) | |||||
| return self.parser.parse(stream, chosen_start, **kw) | |||||
| def parse_interactive(self, text=None, start=None): | def parse_interactive(self, text=None, start=None): | ||||
| start = self._verify_start(start) | |||||
| chosen_start = self._verify_start(start) | |||||
| if self.parser_conf.parser_type != 'lalr': | if self.parser_conf.parser_type != 'lalr': | ||||
| raise ConfigurationError("parse_interactive() currently only works with parser='lalr' ") | raise ConfigurationError("parse_interactive() currently only works with parser='lalr' ") | ||||
| stream = text if self.skip_lexer else LexerThread(self.lexer, text) | stream = text if self.skip_lexer else LexerThread(self.lexer, text) | ||||
| return self.parser.parse_interactive(stream, start) | |||||
| return self.parser.parse_interactive(stream, chosen_start) | |||||
| def get_frontend(parser, lexer): | def get_frontend(parser, lexer): | ||||
| @@ -65,7 +65,7 @@ class InteractiveParser(object): | |||||
| """Print the output of ``choices()`` in a way that's easier to read.""" | """Print the output of ``choices()`` in a way that's easier to read.""" | ||||
| out = ["Parser choices:"] | out = ["Parser choices:"] | ||||
| for k, v in self.choices().items(): | for k, v in self.choices().items(): | ||||
| out.append('\t- %s -> %s' % (k, v)) | |||||
| out.append('\t- %s -> %r' % (k, v)) | |||||
| out.append('stack size: %s' % len(self.parser_state.state_stack)) | out.append('stack size: %s' % len(self.parser_state.state_stack)) | ||||
| return '\n'.join(out) | return '\n'.join(out) | ||||
| @@ -178,8 +178,8 @@ class _Parser(object): | |||||
| for token in state.lexer.lex(state): | for token in state.lexer.lex(state): | ||||
| state.feed_token(token) | state.feed_token(token) | ||||
| token = Token.new_borrow_pos('$END', '', token) if token else Token('$END', '', 0, 1, 1) | |||||
| return state.feed_token(token, True) | |||||
| end_token = Token.new_borrow_pos('$END', '', token) if token else Token('$END', '', 0, 1, 1) | |||||
| return state.feed_token(end_token, True) | |||||
| except UnexpectedInput as e: | except UnexpectedInput as e: | ||||
| try: | try: | ||||
| e.interactive_parser = InteractiveParser(self, state, state.lexer) | e.interactive_parser = InteractiveParser(self, state, state.lexer) | ||||
| @@ -61,14 +61,13 @@ class Serialize(object): | |||||
| fields = getattr(self, '__serialize_fields__') | fields = getattr(self, '__serialize_fields__') | ||||
| res = {f: _serialize(getattr(self, f), memo) for f in fields} | res = {f: _serialize(getattr(self, f), memo) for f in fields} | ||||
| res['__type__'] = type(self).__name__ | res['__type__'] = type(self).__name__ | ||||
| postprocess = getattr(self, '_serialize', None) | |||||
| if postprocess: | |||||
| postprocess(res, memo) | |||||
| if hasattr(self, '_serialize'): | |||||
| self._serialize(res, memo) | |||||
| return res | return res | ||||
| @classmethod | @classmethod | ||||
| def deserialize(cls, data, memo): | def deserialize(cls, data, memo): | ||||
| namespace = getattr(cls, '__serialize_namespace__', {}) | |||||
| namespace = getattr(cls, '__serialize_namespace__', []) | |||||
| namespace = {c.__name__:c for c in namespace} | namespace = {c.__name__:c for c in namespace} | ||||
| fields = getattr(cls, '__serialize_fields__') | fields = getattr(cls, '__serialize_fields__') | ||||
| @@ -82,9 +81,10 @@ class Serialize(object): | |||||
| setattr(inst, f, _deserialize(data[f], namespace, memo)) | setattr(inst, f, _deserialize(data[f], namespace, memo)) | ||||
| except KeyError as e: | except KeyError as e: | ||||
| raise KeyError("Cannot find key for class", cls, e) | raise KeyError("Cannot find key for class", cls, e) | ||||
| postprocess = getattr(inst, '_deserialize', None) | |||||
| if postprocess: | |||||
| postprocess() | |||||
| if hasattr(inst, '_deserialize'): | |||||
| inst._deserialize() | |||||
| return inst | return inst | ||||
| @@ -163,7 +163,7 @@ def get_regexp_width(expr): | |||||
| return 1, sre_constants.MAXREPEAT | return 1, sre_constants.MAXREPEAT | ||||
| else: | else: | ||||
| return 0, sre_constants.MAXREPEAT | return 0, sre_constants.MAXREPEAT | ||||
| ###} | ###} | ||||
| @@ -198,14 +198,6 @@ def dedup_list(l): | |||||
| return [x for x in l if not (x in dedup or dedup.add(x))] | return [x for x in l if not (x in dedup or dedup.add(x))] | ||||
| def compare(a, b): | |||||
| if a == b: | |||||
| return 0 | |||||
| elif a > b: | |||||
| return 1 | |||||
| return -1 | |||||
| class Enumerator(Serialize): | class Enumerator(Serialize): | ||||
| def __init__(self): | def __init__(self): | ||||
| self.enums = {} | self.enums = {} | ||||
| @@ -253,7 +245,7 @@ except ImportError: | |||||
| class FS: | class FS: | ||||
| exists = os.path.exists | exists = os.path.exists | ||||
| @staticmethod | @staticmethod | ||||
| def open(name, mode="r", **kwargs): | def open(name, mode="r", **kwargs): | ||||
| if atomicwrites and "w" in mode: | if atomicwrites and "w" in mode: | ||||
| @@ -324,3 +316,29 @@ def _serialize(value, memo): | |||||
| return {key:_serialize(elem, memo) for key, elem in value.items()} | return {key:_serialize(elem, memo) for key, elem in value.items()} | ||||
| # assert value is None or isinstance(value, (int, float, str, tuple)), value | # assert value is None or isinstance(value, (int, float, str, tuple)), value | ||||
| return value | return value | ||||
| def small_factors(n, max_factor): | |||||
| """ | |||||
| Splits n up into smaller factors and summands <= max_factor. | |||||
| Returns a list of [(a, b), ...] | |||||
| so that the following code returns n: | |||||
| n = 1 | |||||
| for a, b in values: | |||||
| n = n * a + b | |||||
| Currently, we also keep a + b <= max_factor, but that might change | |||||
| """ | |||||
| assert n >= 0 | |||||
| assert max_factor > 2 | |||||
| if n <= max_factor: | |||||
| return [(n, 0)] | |||||
| for a in range(max_factor, 1, -1): | |||||
| r, b = divmod(n, a) | |||||
| if a + b <= max_factor: | |||||
| return small_factors(r, max_factor) + [(a, b)] | |||||
| assert False, "Failed to factorize %s" % n | |||||
| @@ -3,7 +3,7 @@ from __future__ import absolute_import | |||||
| import sys | import sys | ||||
| from unittest import TestCase, main | from unittest import TestCase, main | ||||
| from lark import Lark, Token, Tree | |||||
| from lark import Lark, Token, Tree, ParseError, UnexpectedInput | |||||
| from lark.load_grammar import GrammarError, GRAMMAR_ERRORS, find_grammar_errors | from lark.load_grammar import GrammarError, GRAMMAR_ERRORS, find_grammar_errors | ||||
| from lark.load_grammar import FromPackageLoader | from lark.load_grammar import FromPackageLoader | ||||
| @@ -198,6 +198,53 @@ class TestGrammar(TestCase): | |||||
| x = find_grammar_errors(text) | x = find_grammar_errors(text) | ||||
| assert [e.line for e, _s in find_grammar_errors(text)] == [2, 6] | assert [e.line for e, _s in find_grammar_errors(text)] == [2, 6] | ||||
| def test_ranged_repeat_terms(self): | |||||
| g = u"""!start: AAA | |||||
| AAA: "A"~3 | |||||
| """ | |||||
| l = Lark(g, parser='lalr') | |||||
| self.assertEqual(l.parse(u'AAA'), Tree('start', ["AAA"])) | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AA') | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAA') | |||||
| g = u"""!start: AABB CC | |||||
| AABB: "A"~0..2 "B"~2 | |||||
| CC: "C"~1..2 | |||||
| """ | |||||
| l = Lark(g, parser='lalr') | |||||
| self.assertEqual(l.parse(u'AABBCC'), Tree('start', ['AABB', 'CC'])) | |||||
| self.assertEqual(l.parse(u'BBC'), Tree('start', ['BB', 'C'])) | |||||
| self.assertEqual(l.parse(u'ABBCC'), Tree('start', ['ABB', 'CC'])) | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAB') | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAABBB') | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'ABB') | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAABB') | |||||
| def test_ranged_repeat_large(self): | |||||
| g = u"""!start: "A"~60 | |||||
| """ | |||||
| l = Lark(g, parser='lalr') | |||||
| self.assertGreater(len(l.rules), 1, "Expected that more than one rule will be generated") | |||||
| self.assertEqual(l.parse(u'A' * 60), Tree('start', ["A"] * 60)) | |||||
| self.assertRaises(ParseError, l.parse, u'A' * 59) | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'A' * 61) | |||||
| g = u"""!start: "A"~15..100 | |||||
| """ | |||||
| l = Lark(g, parser='lalr') | |||||
| for i in range(0, 110): | |||||
| if 15 <= i <= 100: | |||||
| self.assertEqual(l.parse(u'A' * i), Tree('start', ['A']*i)) | |||||
| else: | |||||
| self.assertRaises(UnexpectedInput, l.parse, u'A' * i) | |||||
| # 8191 is a Mersenne prime | |||||
| g = u"""start: "A"~8191 | |||||
| """ | |||||
| l = Lark(g, parser='lalr') | |||||
| self.assertEqual(l.parse(u'A' * 8191), Tree('start', [])) | |||||
| self.assertRaises(UnexpectedInput, l.parse, u'A' * 8190) | |||||
| self.assertRaises(UnexpectedInput, l.parse, u'A' * 8192) | |||||
| if __name__ == '__main__': | if __name__ == '__main__': | ||||
| @@ -94,6 +94,26 @@ class TestParsers(unittest.TestCase): | |||||
| r = g.parse('a') | r = g.parse('a') | ||||
| self.assertEqual( r.children[0].meta.line, 1 ) | self.assertEqual( r.children[0].meta.line, 1 ) | ||||
| def test_propagate_positions2(self): | |||||
| g = Lark("""start: a | |||||
| a: b | |||||
| ?b: "(" t ")" | |||||
| !t: "t" | |||||
| """, propagate_positions=True) | |||||
| start = g.parse("(t)") | |||||
| a ,= start.children | |||||
| t ,= a.children | |||||
| assert t.children[0] == "t" | |||||
| assert t.meta.column == 2 | |||||
| assert t.meta.end_column == 3 | |||||
| assert start.meta.column == a.meta.column == 1 | |||||
| assert start.meta.end_column == a.meta.end_column == 4 | |||||
| def test_expand1(self): | def test_expand1(self): | ||||
| g = Lark("""start: a | g = Lark("""start: a | ||||
| @@ -2183,27 +2203,7 @@ def _make_parser_test(LEXER, PARSER): | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAABB') | self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAABB') | ||||
| def test_ranged_repeat_terms(self): | |||||
| g = u"""!start: AAA | |||||
| AAA: "A"~3 | |||||
| """ | |||||
| l = _Lark(g) | |||||
| self.assertEqual(l.parse(u'AAA'), Tree('start', ["AAA"])) | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AA') | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAA') | |||||
| g = u"""!start: AABB CC | |||||
| AABB: "A"~0..2 "B"~2 | |||||
| CC: "C"~1..2 | |||||
| """ | |||||
| l = _Lark(g) | |||||
| self.assertEqual(l.parse(u'AABBCC'), Tree('start', ['AABB', 'CC'])) | |||||
| self.assertEqual(l.parse(u'BBC'), Tree('start', ['BB', 'C'])) | |||||
| self.assertEqual(l.parse(u'ABBCC'), Tree('start', ['ABB', 'CC'])) | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAB') | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAABBB') | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'ABB') | |||||
| self.assertRaises((ParseError, UnexpectedInput), l.parse, u'AAAABB') | |||||
| @unittest.skipIf(PARSER=='earley', "Priority not handled correctly right now") # TODO XXX | @unittest.skipIf(PARSER=='earley', "Priority not handled correctly right now") # TODO XXX | ||||
| def test_priority_vs_embedded(self): | def test_priority_vs_embedded(self): | ||||