| @@ -1,15 +1,13 @@ | |||||
| # Classes - Reference | |||||
| # Classes Reference | |||||
| This page details the important classes in Lark. | This page details the important classes in Lark. | ||||
| ---- | ---- | ||||
| ## Lark | |||||
| ## lark.Lark | |||||
| The Lark class is the main interface for the library. It's mostly a thin wrapper for the many different parsers, and for the tree constructor. | The Lark class is the main interface for the library. It's mostly a thin wrapper for the many different parsers, and for the tree constructor. | ||||
| ### Methods | |||||
| #### \_\_init\_\_(self, grammar, **options) | #### \_\_init\_\_(self, grammar, **options) | ||||
| The Lark class accepts a grammar string or file object, and keyword options: | The Lark class accepts a grammar string or file object, and keyword options: | ||||
| @@ -50,14 +48,10 @@ If a transformer is supplied to `__init__`, returns whatever is the result of th | |||||
| The main tree class | The main tree class | ||||
| ### Properties | |||||
| * `data` - The name of the rule or alias | * `data` - The name of the rule or alias | ||||
| * `children` - List of matched sub-rules and terminals | * `children` - List of matched sub-rules and terminals | ||||
| * `meta` - Line & Column numbers, if using `propagate_positions` | * `meta` - Line & Column numbers, if using `propagate_positions` | ||||
| ### Methods | |||||
| #### \_\_init\_\_(self, data, children) | #### \_\_init\_\_(self, data, children) | ||||
| Creates a new tree, and stores "data" and "children" in attributes of the same name. | Creates a new tree, and stores "data" and "children" in attributes of the same name. | ||||
| @@ -92,102 +86,6 @@ Trees can be hashed and compared. | |||||
| ---- | ---- | ||||
| ## Transformers & Visitors | |||||
| Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns. | |||||
| They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each methods accepts the children as an argument. That can be modified using the `v-args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument. | |||||
| See: https://github.com/lark-parser/lark/blob/master/lark/visitors.py | |||||
| ### Visitors | |||||
| Visitors visit each node of the tree, and run the appropriate method on it according to the node's data. | |||||
| They work bottom-up, starting with the leaves and ending at the root of the tree. | |||||
| **Example** | |||||
| ```python | |||||
| class IncreaseAllNumbers(Visitor): | |||||
| def number(self, tree): | |||||
| assert tree.data == "number" | |||||
| tree.children[0] += 1 | |||||
| IncreaseAllNumbers().visit(parse_tree) | |||||
| ``` | |||||
| There are two classes that implement the visitor interface: | |||||
| * Visitor - Visit every node (without recursion) | |||||
| * Visitor_Recursive - Visit every node using recursion. Slightly faster. | |||||
| ### Transformers | |||||
| Transformers visit each node of the tree, and run the appropriate method on it according to the node's data. | |||||
| They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree. | |||||
| Transformers can be used to implement map & reduce patterns. | |||||
| Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable). | |||||
| Transformers can be chained into a new transformer by using multiplication. | |||||
| **Example:** | |||||
| ```python | |||||
| from lark import Tree, Transformer | |||||
| class EvalExpressions(Transformer): | |||||
| def expr(self, args): | |||||
| return eval(args[0]) | |||||
| t = Tree('a', [Tree('expr', ['1+2'])]) | |||||
| print(EvalExpressions().transform( t )) | |||||
| # Prints: Tree(a, [3]) | |||||
| ``` | |||||
| Here are the classes that implement the transformer interface: | |||||
| - Transformer - Recursively transforms the tree. This is the one you probably want. | |||||
| - Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances | |||||
| - Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances | |||||
| ### v_args | |||||
| `v_args` is a decorator. | |||||
| By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior. | |||||
| When used on a transformer/visitor class definition, it applies to all the callback methods inside it. | |||||
| `v_args` accepts one of three flags: | |||||
| - `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists). | |||||
| - `meta` - Provides two arguments: `children` and `meta` (instead of just the first) | |||||
| - `tree` - Provides the entire tree as the argument, instead of the children. | |||||
| Examples: | |||||
| ```python | |||||
| @v_args(inline=True) | |||||
| class SolveArith(Transformer): | |||||
| def add(self, left, right): | |||||
| return left + right | |||||
| class ReverseNotation(Transformer_InPlace): | |||||
| @v_args(tree=True): | |||||
| def tree_node(self, tree): | |||||
| tree.children = tree.children[::-1] | |||||
| ``` | |||||
| ### Discard | |||||
| When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent. | |||||
| ## Token | ## Token | ||||
| When using a lexer, the resulting tokens in the trees will be of the Token class, which inherits from Python's string. So, normal string comparisons and operations will work as expected. Tokens also have other useful attributes: | When using a lexer, the resulting tokens in the trees will be of the Token class, which inherits from Python's string. So, normal string comparisons and operations will work as expected. Tokens also have other useful attributes: | ||||
| @@ -199,17 +97,25 @@ When using a lexer, the resulting tokens in the trees will be of the Token class | |||||
| * `end_line` - The line where the token ends | * `end_line` - The line where the token ends | ||||
| * `end_column` - The next column after the end of the token. For example, if the token is a single character with a `column` value of 4, `end_column` will be 5. | * `end_column` - The next column after the end of the token. For example, if the token is a single character with a `column` value of 4, `end_column` will be 5. | ||||
| ## Transformer | |||||
| ## Visitor | |||||
| ## Interpreter | |||||
| See the [visitors page](visitors.md) | |||||
| ## UnexpectedInput | ## UnexpectedInput | ||||
| ## UnexpectedToken | |||||
| ## UnexpectedException | |||||
| - `UnexpectedInput` | - `UnexpectedInput` | ||||
| - `UnexpectedToken` - The parser recieved an unexpected token | - `UnexpectedToken` - The parser recieved an unexpected token | ||||
| - `UnexpectedCharacters` - The lexer encountered an unexpected string | - `UnexpectedCharacters` - The lexer encountered an unexpected string | ||||
| After catching one of these exceptions, you may call the following helper methods to create a nicer error message: | After catching one of these exceptions, you may call the following helper methods to create a nicer error message: | ||||
| ### Methods | |||||
| #### get_context(text, span) | #### get_context(text, span) | ||||
| Returns a pretty string pinpointing the error in the text, with `span` amount of context characters around it. | Returns a pretty string pinpointing the error in the text, with `span` amount of context characters around it. | ||||
| @@ -1,5 +1,13 @@ | |||||
| # Grammar Reference | # Grammar Reference | ||||
| Table of contents: | |||||
| 1. [Definitions](#defs) | |||||
| 1. [Terminals](#terms) | |||||
| 1. [Rules](#rules) | |||||
| 1. [Directives](#dirs) | |||||
| <a name="defs"></a> | |||||
| ## Definitions | ## Definitions | ||||
| **A grammar** is a list of rules and terminals, that together define a language. | **A grammar** is a list of rules and terminals, that together define a language. | ||||
| @@ -25,6 +33,7 @@ Lark begins the parse with the rule 'start', unless specified otherwise in the o | |||||
| Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse-tree, and the automatic construction of the lexer (aka tokenizer, or scanner). | Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse-tree, and the automatic construction of the lexer (aka tokenizer, or scanner). | ||||
| <a name="terms"></a> | |||||
| ## Terminals | ## Terminals | ||||
| Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals. | Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals. | ||||
| @@ -70,6 +79,53 @@ WHITESPACE: (" " | /\t/ )+ | |||||
| SQL_SELECT: "select"i | SQL_SELECT: "select"i | ||||
| ``` | ``` | ||||
| ### Regular expressions & Ambiguity | |||||
| Each terminal is eventually compiled to a regular expression. All the operators and references inside it are mapped to their respective expressions. | |||||
| For example, in the following grammar, `A1` and `A2`, are equivalent: | |||||
| ```perl | |||||
| A1: "a" | "b" | |||||
| A2: /a|b/ | |||||
| ``` | |||||
| This means that inside terminals, Lark cannot detect or resolve ambiguity, even when using Earley. | |||||
| For example, for this grammar: | |||||
| ```perl | |||||
| start : (A | B)+ | |||||
| A : "a" | "ab" | |||||
| B : "b" | |||||
| ``` | |||||
| We get this behavior: | |||||
| ```bash | |||||
| >>> p.parse("ab") | |||||
| Tree(start, [Token(A, 'a'), Token(B, 'b')]) | |||||
| ``` | |||||
| This is happening because Python's regex engine always returns the first matching option. | |||||
| If you find yourself in this situation, the recommended solution is to use rules instead. | |||||
| Example: | |||||
| ```python | |||||
| >>> p = Lark("""start: (a | b)+ | |||||
| ... !a: "a" | "ab" | |||||
| ... !b: "b" | |||||
| ... """, ambiguity="explicit") | |||||
| >>> print(p.parse("ab").pretty()) | |||||
| _ambig | |||||
| start | |||||
| a ab | |||||
| start | |||||
| a a | |||||
| b b | |||||
| ``` | |||||
| <a name="rules"></a> | |||||
| ## Rules | ## Rules | ||||
| **Syntax:** | **Syntax:** | ||||
| @@ -114,6 +170,7 @@ Rules can be assigned priority only when using Earley (future versions may suppo | |||||
| Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default). | Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default). | ||||
| <a name="dirs"></a> | |||||
| ## Directives | ## Directives | ||||
| ### %ignore | ### %ignore | ||||
| @@ -122,7 +179,7 @@ All occurrences of the terminal will be ignored, and won't be part of the parse. | |||||
| Using the `%ignore` directive results in a cleaner grammar. | Using the `%ignore` directive results in a cleaner grammar. | ||||
| It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extranous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1. | |||||
| It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extraneous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1. | |||||
| **Syntax:** | **Syntax:** | ||||
| ```html | ```html | ||||
| @@ -7,7 +7,7 @@ There are many ways you can help the project: | |||||
| * Write new grammars for Lark's library | * Write new grammars for Lark's library | ||||
| * Write a blog post introducing Lark to your audience | * Write a blog post introducing Lark to your audience | ||||
| * Port Lark to another language | * Port Lark to another language | ||||
| * Help me with code developemnt | |||||
| * Help me with code development | |||||
| If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process. | If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process. | ||||
| @@ -60,4 +60,4 @@ Another way to run the tests is using setup.py: | |||||
| ```bash | ```bash | ||||
| python setup.py test | python setup.py test | ||||
| ``` | |||||
| ``` | |||||
| @@ -35,8 +35,8 @@ $ pip install lark-parser | |||||
| * [Examples](https://github.com/lark-parser/lark/tree/master/examples) | * [Examples](https://github.com/lark-parser/lark/tree/master/examples) | ||||
| * Tutorials | * Tutorials | ||||
| * [How to write a DSL](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) - Implements a toy LOGO-like language with an interpreter | * [How to write a DSL](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) - Implements a toy LOGO-like language with an interpreter | ||||
| * [How to write a JSON parser](json_tutorial.md) | |||||
| * External | |||||
| * [How to write a JSON parser](json_tutorial.md) - Teaches you how to use Lark | |||||
| * Unofficial | |||||
| * [Program Synthesis is Possible](https://www.cs.cornell.edu/~asampson/blog/minisynth.html) - Creates a DSL for Z3 | * [Program Synthesis is Possible](https://www.cs.cornell.edu/~asampson/blog/minisynth.html) - Creates a DSL for Z3 | ||||
| * Guides | * Guides | ||||
| * [How to use Lark](how_to_use.md) | * [How to use Lark](how_to_use.md) | ||||
| @@ -44,6 +44,7 @@ $ pip install lark-parser | |||||
| * Reference | * Reference | ||||
| * [Grammar](grammar.md) | * [Grammar](grammar.md) | ||||
| * [Tree Construction](tree_construction.md) | * [Tree Construction](tree_construction.md) | ||||
| * [Visitors & Transformers](visitors.md) | |||||
| * [Classes](classes.md) | * [Classes](classes.md) | ||||
| * [Cheatsheet (PDF)](lark_cheatsheet.pdf) | * [Cheatsheet (PDF)](lark_cheatsheet.pdf) | ||||
| * Discussion | * Discussion | ||||
| @@ -230,7 +230,8 @@ from lark import Transformer | |||||
| class MyTransformer(Transformer): | class MyTransformer(Transformer): | ||||
| def list(self, items): | def list(self, items): | ||||
| return list(items) | return list(items) | ||||
| def pair(self, (k,v)): | |||||
| def pair(self, key_value): | |||||
| k, v = key_value | |||||
| return k, v | return k, v | ||||
| def dict(self, items): | def dict(self, items): | ||||
| return dict(items) | return dict(items) | ||||
| @@ -251,9 +252,11 @@ Also, our definitions of list and dict are a bit verbose. We can do better: | |||||
| from lark import Transformer | from lark import Transformer | ||||
| class TreeToJson(Transformer): | class TreeToJson(Transformer): | ||||
| def string(self, (s,)): | |||||
| def string(self, s): | |||||
| (s,) = s | |||||
| return s[1:-1] | return s[1:-1] | ||||
| def number(self, (n,)): | |||||
| def number(self, n): | |||||
| (n,) = n | |||||
| return float(n) | return float(n) | ||||
| list = list | list = list | ||||
| @@ -315,9 +318,11 @@ json_grammar = r""" | |||||
| """ | """ | ||||
| class TreeToJson(Transformer): | class TreeToJson(Transformer): | ||||
| def string(self, (s,)): | |||||
| def string(self, s): | |||||
| (s,) = s | |||||
| return s[1:-1] | return s[1:-1] | ||||
| def number(self, (n,)): | |||||
| def number(self, n): | |||||
| (n,) = n | |||||
| return float(n) | return float(n) | ||||
| list = list | list = list | ||||
| @@ -5,9 +5,9 @@ Lark implements the following parsing algorithms: Earley, LALR(1), and CYK | |||||
| An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time. | An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time. | ||||
| Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitely using `lexer='dynamic'`. | |||||
| Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitly using `lexer='dynamic'`. | |||||
| It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independant first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'` | |||||
| It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independent first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'` | |||||
| **SPPF & Ambiguity resolution** | **SPPF & Ambiguity resolution** | ||||
| @@ -21,7 +21,7 @@ Lark provides the following options to combat ambiguity: | |||||
| 1) Lark will choose the best derivation for you (default). Users can choose between different disambiguation strategies, and can prioritize (or demote) individual rules over others, using the rule-priority syntax. | 1) Lark will choose the best derivation for you (default). Users can choose between different disambiguation strategies, and can prioritize (or demote) individual rules over others, using the rule-priority syntax. | ||||
| 2) Users may choose to recieve the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs. | |||||
| 2) Users may choose to receive the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs. | |||||
| 3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. Future versions of Lark intend to improve and simplify this interface. | 3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. Future versions of Lark intend to improve and simplify this interface. | ||||
| @@ -0,0 +1,117 @@ | |||||
| ## Transformers & Visitors | |||||
| Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns. | |||||
| They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each method accepts the children as an argument. That can be modified using the `v_args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument. | |||||
| See: <a href="https://github.com/lark-parser/lark/blob/master/lark/visitors.py">visitors.py</a> | |||||
| ### Visitors | |||||
| Visitors visit each node of the tree, and run the appropriate method on it according to the node's data. | |||||
| They work bottom-up, starting with the leaves and ending at the root of the tree. | |||||
| **Example** | |||||
| ```python | |||||
| class IncreaseAllNumbers(Visitor): | |||||
| def number(self, tree): | |||||
| assert tree.data == "number" | |||||
| tree.children[0] += 1 | |||||
| IncreaseAllNumbers().visit(parse_tree) | |||||
| ``` | |||||
| There are two classes that implement the visitor interface: | |||||
| * Visitor - Visit every node (without recursion) | |||||
| * Visitor_Recursive - Visit every node using recursion. Slightly faster. | |||||
| ### Transformers | |||||
| Transformers visit each node of the tree, and run the appropriate method on it according to the node's data. | |||||
| They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree. | |||||
| Transformers can be used to implement map & reduce patterns. | |||||
| Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable). | |||||
| Transformers can be chained into a new transformer by using multiplication. | |||||
| `Transformer` can do anything `Visitor` can do, but because it reconstructs the tree, it is slightly less efficient. | |||||
| **Example:** | |||||
| ```python | |||||
| from lark import Tree, Transformer | |||||
| class EvalExpressions(Transformer): | |||||
| def expr(self, args): | |||||
| return eval(args[0]) | |||||
| t = Tree('a', [Tree('expr', ['1+2'])]) | |||||
| print(EvalExpressions().transform( t )) | |||||
| # Prints: Tree(a, [3]) | |||||
| ``` | |||||
| All these classes implement the transformer interface: | |||||
| - Transformer - Recursively transforms the tree. This is the one you probably want. | |||||
| - Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances | |||||
| - Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances | |||||
| ### visit_tokens | |||||
| By default, transformers only visit rules. `visit_tokens=True` will tell Transformer to visit tokens as well. This is a slightly slower alternative to `lexer_callbacks`, but it's easier to maintain and works for all algorithms (even when there isn't a lexer). | |||||
| Example: | |||||
| ```python | |||||
| class T(Transformer): | |||||
| INT = int | |||||
| NUMBER = float | |||||
| def NAME(self, name): | |||||
| return lookup_dict.get(name, name) | |||||
| T(visit_tokens=True).transform(tree) | |||||
| ``` | |||||
| ### v_args | |||||
| `v_args` is a decorator. | |||||
| By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior. | |||||
| When used on a transformer/visitor class definition, it applies to all the callback methods inside it. | |||||
| `v_args` accepts one of three flags: | |||||
| - `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists). | |||||
| - `meta` - Provides two arguments: `children` and `meta` (instead of just the first) | |||||
| - `tree` - Provides the entire tree as the argument, instead of the children. | |||||
| Examples: | |||||
| ```python | |||||
| @v_args(inline=True) | |||||
| class SolveArith(Transformer): | |||||
| def add(self, left, right): | |||||
| return left + right | |||||
| class ReverseNotation(Transformer_InPlace): | |||||
| @v_args(tree=True): | |||||
| def tree_node(self, tree): | |||||
| tree.children = tree.children[::-1] | |||||
| ``` | |||||
| ### Discard | |||||
| When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent. | |||||
| @@ -5,4 +5,4 @@ from .exceptions import ParseError, LexError, GrammarError, UnexpectedToken, Une | |||||
| from .lexer import Token | from .lexer import Token | ||||
| from .lark import Lark | from .lark import Lark | ||||
| __version__ = "0.7.4" | |||||
| __version__ = "0.8.0rc1" | |||||
| @@ -13,6 +13,14 @@ class ParseError(LarkError): | |||||
| class LexError(LarkError): | class LexError(LarkError): | ||||
| pass | pass | ||||
| class UnexpectedEOF(ParseError): | |||||
| def __init__(self, expected): | |||||
| self.expected = expected | |||||
| message = ("Unexpected end-of-input. Expected one of: \n\t* %s\n" % '\n\t* '.join(x.name for x in self.expected)) | |||||
| super(UnexpectedEOF, self).__init__(message) | |||||
| class UnexpectedInput(LarkError): | class UnexpectedInput(LarkError): | ||||
| pos_in_stream = None | pos_in_stream = None | ||||
| @@ -69,6 +69,7 @@ class LarkOptions(Serialize): | |||||
| 'propagate_positions': False, | 'propagate_positions': False, | ||||
| 'lexer_callbacks': {}, | 'lexer_callbacks': {}, | ||||
| 'maybe_placeholders': False, | 'maybe_placeholders': False, | ||||
| 'edit_terminals': None, | |||||
| } | } | ||||
| def __init__(self, options_dict): | def __init__(self, options_dict): | ||||
| @@ -85,7 +86,7 @@ class LarkOptions(Serialize): | |||||
| options[name] = value | options[name] = value | ||||
| if isinstance(options['start'], str): | |||||
| if isinstance(options['start'], STRING_TYPE): | |||||
| options['start'] = [options['start']] | options['start'] = [options['start']] | ||||
| self.__dict__['options'] = options | self.__dict__['options'] = options | ||||
| @@ -205,6 +206,10 @@ class Lark(Serialize): | |||||
| # Compile the EBNF grammar into BNF | # Compile the EBNF grammar into BNF | ||||
| self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start) | self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start) | ||||
| if self.options.edit_terminals: | |||||
| for t in self.terminals: | |||||
| self.options.edit_terminals(t) | |||||
| self._terminals_dict = {t.name:t for t in self.terminals} | self._terminals_dict = {t.name:t for t in self.terminals} | ||||
| # If the user asked to invert the priorities, negate them all here. | # If the user asked to invert the priorities, negate them all here. | ||||
| @@ -3,7 +3,7 @@ | |||||
| import re | import re | ||||
| from .utils import Str, classify, get_regexp_width, Py36, Serialize | from .utils import Str, classify, get_regexp_width, Py36, Serialize | ||||
| from .exceptions import UnexpectedCharacters, LexError | |||||
| from .exceptions import UnexpectedCharacters, LexError, UnexpectedToken | |||||
| ###{standalone | ###{standalone | ||||
| @@ -43,7 +43,7 @@ class PatternStr(Pattern): | |||||
| __serialize_fields__ = 'value', 'flags' | __serialize_fields__ = 'value', 'flags' | ||||
| type = "str" | type = "str" | ||||
| def to_regexp(self): | def to_regexp(self): | ||||
| return self._get_flags(re.escape(self.value)) | return self._get_flags(re.escape(self.value)) | ||||
| @@ -166,36 +166,33 @@ class _Lex: | |||||
| while line_ctr.char_pos < len(stream): | while line_ctr.char_pos < len(stream): | ||||
| lexer = self.lexer | lexer = self.lexer | ||||
| for mre, type_from_index in lexer.mres: | |||||
| m = mre.match(stream, line_ctr.char_pos) | |||||
| if not m: | |||||
| continue | |||||
| t = None | |||||
| value = m.group(0) | |||||
| type_ = type_from_index[m.lastindex] | |||||
| if type_ not in ignore_types: | |||||
| t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) | |||||
| if t.type in lexer.callback: | |||||
| t = lexer.callback[t.type](t) | |||||
| if not isinstance(t, Token): | |||||
| raise ValueError("Callbacks must return a token (returned %r)" % t) | |||||
| last_token = t | |||||
| yield t | |||||
| else: | |||||
| if type_ in lexer.callback: | |||||
| t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) | |||||
| lexer.callback[type_](t) | |||||
| line_ctr.feed(value, type_ in newline_types) | |||||
| if t: | |||||
| t.end_line = line_ctr.line | |||||
| t.end_column = line_ctr.column | |||||
| res = lexer.match(stream, line_ctr.char_pos) | |||||
| if not res: | |||||
| allowed = {v for m, tfi in lexer.mres for v in tfi.values()} - ignore_types | |||||
| if not allowed: | |||||
| allowed = {"<END-OF-FILE>"} | |||||
| raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token]) | |||||
| break | |||||
| value, type_ = res | |||||
| t = None | |||||
| if type_ not in ignore_types: | |||||
| t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) | |||||
| if t.type in lexer.callback: | |||||
| t = lexer.callback[t.type](t) | |||||
| if not isinstance(t, Token): | |||||
| raise ValueError("Callbacks must return a token (returned %r)" % t) | |||||
| last_token = t | |||||
| yield t | |||||
| else: | else: | ||||
| allowed = {v for m, tfi in lexer.mres for v in tfi.values()} | |||||
| raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token]) | |||||
| if type_ in lexer.callback: | |||||
| t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) | |||||
| lexer.callback[type_](t) | |||||
| line_ctr.feed(value, type_ in newline_types) | |||||
| if t: | |||||
| t.end_line = line_ctr.line | |||||
| t.end_column = line_ctr.column | |||||
| class UnlessCallback: | class UnlessCallback: | ||||
| @@ -330,6 +327,11 @@ class TraditionalLexer(Lexer): | |||||
| self.mres = build_mres(terminals) | self.mres = build_mres(terminals) | ||||
| def match(self, stream, pos): | |||||
| for mre, type_from_index in self.mres: | |||||
| m = mre.match(stream, pos) | |||||
| if m: | |||||
| return m.group(0), type_from_index[m.lastindex] | |||||
| def lex(self, stream): | def lex(self, stream): | ||||
| return _Lex(self).lex(stream, self.newline_types, self.ignore_types) | return _Lex(self).lex(stream, self.newline_types, self.ignore_types) | ||||
| @@ -367,9 +369,21 @@ class ContextualLexer(Lexer): | |||||
| def lex(self, stream): | def lex(self, stream): | ||||
| l = _Lex(self.lexers[self.parser_state], self.parser_state) | l = _Lex(self.lexers[self.parser_state], self.parser_state) | ||||
| for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types): | |||||
| yield x | |||||
| l.lexer = self.lexers[self.parser_state] | |||||
| l.state = self.parser_state | |||||
| try: | |||||
| for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types): | |||||
| yield x | |||||
| l.lexer = self.lexers[self.parser_state] | |||||
| l.state = self.parser_state | |||||
| except UnexpectedCharacters as e: | |||||
| # In the contextual lexer, UnexpectedCharacters can mean that the terminal is defined, | |||||
| # but not in the current context. | |||||
| # This tests the input against the global context, to provide a nicer error. | |||||
| root_match = self.root_lexer.match(stream, e.pos_in_stream) | |||||
| if not root_match: | |||||
| raise | |||||
| value, type_ = root_match | |||||
| t = Token(type_, value, e.pos_in_stream, e.line, e.column) | |||||
| raise UnexpectedToken(t, e.allowed, state=e.state) | |||||
| ###} | ###} | ||||
| @@ -479,7 +479,7 @@ class Grammar: | |||||
| # =================== | # =================== | ||||
| # Convert terminal-trees to strings/regexps | # Convert terminal-trees to strings/regexps | ||||
| transformer = PrepareLiterals() * TerminalTreeToPattern() | |||||
| for name, (term_tree, priority) in term_defs: | for name, (term_tree, priority) in term_defs: | ||||
| if term_tree is None: # Terminal added through %declare | if term_tree is None: # Terminal added through %declare | ||||
| continue | continue | ||||
| @@ -487,7 +487,8 @@ class Grammar: | |||||
| if len(expansions) == 1 and not expansions[0].children: | if len(expansions) == 1 and not expansions[0].children: | ||||
| raise GrammarError("Terminals cannot be empty (%s)" % name) | raise GrammarError("Terminals cannot be empty (%s)" % name) | ||||
| terminals = [TerminalDef(name, transformer.transform(term_tree), priority) | |||||
| transformer = PrepareLiterals() * TerminalTreeToPattern() | |||||
| terminals = [TerminalDef(name, transformer.transform( term_tree ), priority) | |||||
| for name, (term_tree, priority) in term_defs if term_tree] | for name, (term_tree, priority) in term_defs if term_tree] | ||||
| # ================= | # ================= | ||||
| @@ -638,11 +639,10 @@ def import_from_grammar_into_namespace(grammar, namespace, aliases): | |||||
| def resolve_term_references(term_defs): | def resolve_term_references(term_defs): | ||||
| # TODO Cycles detection | |||||
| # TODO Solve with transitive closure (maybe) | # TODO Solve with transitive closure (maybe) | ||||
| token_dict = {k:t for k, (t,_p) in term_defs} | |||||
| assert len(token_dict) == len(term_defs), "Same name defined twice?" | |||||
| term_dict = {k:t for k, (t,_p) in term_defs} | |||||
| assert len(term_dict) == len(term_defs), "Same name defined twice?" | |||||
| while True: | while True: | ||||
| changed = False | changed = False | ||||
| @@ -655,11 +655,21 @@ def resolve_term_references(term_defs): | |||||
| if item.type == 'RULE': | if item.type == 'RULE': | ||||
| raise GrammarError("Rules aren't allowed inside terminals (%s in %s)" % (item, name)) | raise GrammarError("Rules aren't allowed inside terminals (%s in %s)" % (item, name)) | ||||
| if item.type == 'TERMINAL': | if item.type == 'TERMINAL': | ||||
| exp.children[0] = token_dict[item] | |||||
| term_value = term_dict[item] | |||||
| assert term_value is not None | |||||
| exp.children[0] = term_value | |||||
| changed = True | changed = True | ||||
| if not changed: | if not changed: | ||||
| break | break | ||||
| for name, term in term_dict.items(): | |||||
| if term: # Not just declared | |||||
| for child in term.children: | |||||
| ids = [id(x) for x in child.iter_subtrees()] | |||||
| if id(term) in ids: | |||||
| raise GrammarError("Recursion in terminal '%s' (recursion is only allowed in rules, not terminals)" % name) | |||||
| def options_from_rule(name, *x): | def options_from_rule(name, *x): | ||||
| if len(x) > 1: | if len(x) > 1: | ||||
| priority, expansions = x | priority, expansions = x | ||||
| @@ -10,10 +10,11 @@ is better documented here: | |||||
| http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/ | http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/ | ||||
| """ | """ | ||||
| import logging | |||||
| from collections import deque | from collections import deque | ||||
| from ..visitors import Transformer_InPlace, v_args | from ..visitors import Transformer_InPlace, v_args | ||||
| from ..exceptions import ParseError, UnexpectedToken | |||||
| from ..exceptions import UnexpectedEOF, UnexpectedToken | |||||
| from .grammar_analysis import GrammarAnalyzer | from .grammar_analysis import GrammarAnalyzer | ||||
| from ..grammar import NonTerminal | from ..grammar import NonTerminal | ||||
| from .earley_common import Item, TransitiveItem | from .earley_common import Item, TransitiveItem | ||||
| @@ -45,12 +46,8 @@ class Parser: | |||||
| # skip the extra tree walk. We'll also skip this if the user just didn't specify priorities | # skip the extra tree walk. We'll also skip this if the user just didn't specify priorities | ||||
| # on any rules. | # on any rules. | ||||
| if self.forest_sum_visitor is None and rule.options and rule.options.priority is not None: | if self.forest_sum_visitor is None and rule.options and rule.options.priority is not None: | ||||
| self.forest_sum_visitor = ForestSumVisitor() | |||||
| self.forest_sum_visitor = ForestSumVisitor | |||||
| if resolve_ambiguity: | |||||
| self.forest_tree_visitor = ForestToTreeVisitor(self.callbacks, self.forest_sum_visitor) | |||||
| else: | |||||
| self.forest_tree_visitor = ForestToAmbiguousTreeVisitor(self.callbacks, self.forest_sum_visitor) | |||||
| self.term_matcher = term_matcher | self.term_matcher = term_matcher | ||||
| @@ -273,6 +270,7 @@ class Parser: | |||||
| ## Column is now the final column in the parse. | ## Column is now the final column in the parse. | ||||
| assert i == len(columns)-1 | assert i == len(columns)-1 | ||||
| return to_scan | |||||
| def parse(self, stream, start): | def parse(self, stream, start): | ||||
| assert start, start | assert start, start | ||||
| @@ -291,7 +289,7 @@ class Parser: | |||||
| else: | else: | ||||
| columns[0].add(item) | columns[0].add(item) | ||||
| self._parse(stream, columns, to_scan, start_symbol) | |||||
| to_scan = self._parse(stream, columns, to_scan, start_symbol) | |||||
| # If the parse was successful, the start | # If the parse was successful, the start | ||||
| # symbol should have been completed in the last step of the Earley cycle, and will be in | # symbol should have been completed in the last step of the Earley cycle, and will be in | ||||
| @@ -299,18 +297,25 @@ class Parser: | |||||
| solutions = [n.node for n in columns[-1] if n.is_complete and n.node is not None and n.s == start_symbol and n.start == 0] | solutions = [n.node for n in columns[-1] if n.is_complete and n.node is not None and n.s == start_symbol and n.start == 0] | ||||
| if self.debug: | if self.debug: | ||||
| from .earley_forest import ForestToPyDotVisitor | from .earley_forest import ForestToPyDotVisitor | ||||
| debug_walker = ForestToPyDotVisitor() | |||||
| debug_walker.visit(solutions[0], "sppf.png") | |||||
| try: | |||||
| debug_walker = ForestToPyDotVisitor() | |||||
| except ImportError: | |||||
| logging.warning("Cannot find dependency 'pydot', will not generate sppf debug image") | |||||
| else: | |||||
| debug_walker.visit(solutions[0], "sppf.png") | |||||
| if not solutions: | if not solutions: | ||||
| expected_tokens = [t.expect for t in to_scan] | expected_tokens = [t.expect for t in to_scan] | ||||
| # raise ParseError('Incomplete parse: Could not find a solution to input') | |||||
| raise ParseError('Unexpected end of input! Expecting a terminal of: %s' % expected_tokens) | |||||
| raise UnexpectedEOF(expected_tokens) | |||||
| elif len(solutions) > 1: | elif len(solutions) > 1: | ||||
| assert False, 'Earley should not generate multiple start symbol items!' | assert False, 'Earley should not generate multiple start symbol items!' | ||||
| # Perform our SPPF -> AST conversion using the right ForestVisitor. | # Perform our SPPF -> AST conversion using the right ForestVisitor. | ||||
| return self.forest_tree_visitor.visit(solutions[0]) | |||||
| forest_tree_visitor_cls = ForestToTreeVisitor if self.resolve_ambiguity else ForestToAmbiguousTreeVisitor | |||||
| forest_tree_visitor = forest_tree_visitor_cls(self.callbacks, self.forest_sum_visitor and self.forest_sum_visitor()) | |||||
| return forest_tree_visitor.visit(solutions[0]) | |||||
| class ApplyCallbacks(Transformer_InPlace): | class ApplyCallbacks(Transformer_InPlace): | ||||
| @@ -146,4 +146,5 @@ class Parser(BaseParser): | |||||
| self.predict_and_complete(i, to_scan, columns, transitives) | self.predict_and_complete(i, to_scan, columns, transitives) | ||||
| ## Column is now the final column in the parse. | ## Column is now the final column in the parse. | ||||
| assert i == len(columns)-1 | |||||
| assert i == len(columns)-1 | |||||
| return to_scan | |||||
| @@ -3,6 +3,7 @@ from functools import wraps | |||||
| from .utils import smart_decorator | from .utils import smart_decorator | ||||
| from .tree import Tree | from .tree import Tree | ||||
| from .exceptions import VisitError, GrammarError | from .exceptions import VisitError, GrammarError | ||||
| from .lexer import Token | |||||
| ###{standalone | ###{standalone | ||||
| from inspect import getmembers, getmro | from inspect import getmembers, getmro | ||||
| @@ -21,6 +22,10 @@ class Transformer: | |||||
| Can be used to implement map or reduce. | Can be used to implement map or reduce. | ||||
| """ | """ | ||||
| __visit_tokens__ = False # For backwards compatibility | |||||
| def __init__(self, visit_tokens=False): | |||||
| self.__visit_tokens__ = visit_tokens | |||||
| def _call_userfunc(self, tree, new_children=None): | def _call_userfunc(self, tree, new_children=None): | ||||
| # Assumes tree is already transformed | # Assumes tree is already transformed | ||||
| children = new_children if new_children is not None else tree.children | children = new_children if new_children is not None else tree.children | ||||
| @@ -45,10 +50,29 @@ class Transformer: | |||||
| except Exception as e: | except Exception as e: | ||||
| raise VisitError(tree, e) | raise VisitError(tree, e) | ||||
| def _call_userfunc_token(self, token): | |||||
| try: | |||||
| f = getattr(self, token.type) | |||||
| except AttributeError: | |||||
| return self.__default_token__(token) | |||||
| else: | |||||
| try: | |||||
| return f(token) | |||||
| except (GrammarError, Discard): | |||||
| raise | |||||
| except Exception as e: | |||||
| raise VisitError(token, e) | |||||
| def _transform_children(self, children): | def _transform_children(self, children): | ||||
| for c in children: | for c in children: | ||||
| try: | try: | ||||
| yield self._transform_tree(c) if isinstance(c, Tree) else c | |||||
| if isinstance(c, Tree): | |||||
| yield self._transform_tree(c) | |||||
| elif self.__visit_tokens__ and isinstance(c, Token): | |||||
| yield self._call_userfunc_token(c) | |||||
| else: | |||||
| yield c | |||||
| except Discard: | except Discard: | ||||
| pass | pass | ||||
| @@ -66,6 +90,11 @@ class Transformer: | |||||
| "Default operation on tree (for override)" | "Default operation on tree (for override)" | ||||
| return Tree(data, children, meta) | return Tree(data, children, meta) | ||||
| def __default_token__(self, token): | |||||
| "Default operation on token (for override)" | |||||
| return token | |||||
| @classmethod | @classmethod | ||||
| def _apply_decorator(cls, decorator, **kwargs): | def _apply_decorator(cls, decorator, **kwargs): | ||||
| mro = getmro(cls) | mro = getmro(cls) | ||||
| @@ -157,6 +186,11 @@ class Visitor(VisitorBase): | |||||
| self._call_userfunc(subtree) | self._call_userfunc(subtree) | ||||
| return tree | return tree | ||||
| def visit_topdown(self,tree): | |||||
| for subtree in tree.iter_subtrees_topdown(): | |||||
| self._call_userfunc(subtree) | |||||
| return tree | |||||
| class Visitor_Recursive(VisitorBase): | class Visitor_Recursive(VisitorBase): | ||||
| """Bottom-up visitor, recursive | """Bottom-up visitor, recursive | ||||
| @@ -169,8 +203,16 @@ class Visitor_Recursive(VisitorBase): | |||||
| if isinstance(child, Tree): | if isinstance(child, Tree): | ||||
| self.visit(child) | self.visit(child) | ||||
| f = getattr(self, tree.data, self.__default__) | |||||
| f(tree) | |||||
| self._call_userfunc(tree) | |||||
| return tree | |||||
| def visit_topdown(self,tree): | |||||
| self._call_userfunc(tree) | |||||
| for child in tree.children: | |||||
| if isinstance(child, Tree): | |||||
| self.visit_topdown(child) | |||||
| return tree | return tree | ||||
| @@ -9,5 +9,6 @@ pages: | |||||
| - How To Develop (Guide): how_to_develop.md | - How To Develop (Guide): how_to_develop.md | ||||
| - Grammar Reference: grammar.md | - Grammar Reference: grammar.md | ||||
| - Tree Construction Reference: tree_construction.md | - Tree Construction Reference: tree_construction.md | ||||
| - Visitors and Transformers: visitors.md | |||||
| - Classes Reference: classes.md | - Classes Reference: classes.md | ||||
| - Recipes: recipes.md | - Recipes: recipes.md | ||||
| @@ -10,7 +10,7 @@ from .test_reconstructor import TestReconstructor | |||||
| try: | try: | ||||
| from .test_nearley.test_nearley import TestNearley | from .test_nearley.test_nearley import TestNearley | ||||
| except ImportError: | except ImportError: | ||||
| pass | |||||
| logging.warn("Warning: Skipping tests for Nearley (js2py required)") | |||||
| # from .test_selectors import TestSelectors | # from .test_selectors import TestSelectors | ||||
| # from .test_grammars import TestPythonG, TestConfigG | # from .test_grammars import TestPythonG, TestConfigG | ||||
| @@ -15,9 +15,12 @@ NEARLEY_PATH = os.path.join(TEST_PATH, 'nearley') | |||||
| BUILTIN_PATH = os.path.join(NEARLEY_PATH, 'builtin') | BUILTIN_PATH = os.path.join(NEARLEY_PATH, 'builtin') | ||||
| if not os.path.exists(NEARLEY_PATH): | if not os.path.exists(NEARLEY_PATH): | ||||
| print("Skipping Nearley tests!") | |||||
| logging.warn("Nearley not installed. Skipping Nearley tests!") | |||||
| raise ImportError("Skipping Nearley tests!") | raise ImportError("Skipping Nearley tests!") | ||||
| import js2py # Ensures that js2py exists, to avoid failing tests | |||||
| class TestNearley(unittest.TestCase): | class TestNearley(unittest.TestCase): | ||||
| def test_css(self): | def test_css(self): | ||||
| fn = os.path.join(NEARLEY_PATH, 'examples/csscolor.ne') | fn = os.path.join(NEARLEY_PATH, 'examples/csscolor.ne') | ||||
| @@ -94,6 +94,24 @@ class TestParsers(unittest.TestCase): | |||||
| r = g.parse('xx') | r = g.parse('xx') | ||||
| self.assertEqual( r.children[0].data, "c" ) | self.assertEqual( r.children[0].data, "c" ) | ||||
| def test_visit_tokens(self): | |||||
| class T(Transformer): | |||||
| def a(self, children): | |||||
| return children[0] + "!" | |||||
| def A(self, tok): | |||||
| return tok.upper() | |||||
| # Test regular | |||||
| g = Lark("""start: a | |||||
| a : A | |||||
| A: "x" | |||||
| """, parser='lalr') | |||||
| r = T().transform(g.parse("x")) | |||||
| self.assertEqual( r.children, ["x!"] ) | |||||
| r = T(True).transform(g.parse("x")) | |||||
| self.assertEqual( r.children, ["X!"] ) | |||||
| def test_embedded_transformer(self): | def test_embedded_transformer(self): | ||||
| class T(Transformer): | class T(Transformer): | ||||
| def a(self, children): | def a(self, children): | ||||
| @@ -7,7 +7,7 @@ import pickle | |||||
| import functools | import functools | ||||
| from lark.tree import Tree | from lark.tree import Tree | ||||
| from lark.visitors import Transformer, Interpreter, visit_children_decor, v_args, Discard | |||||
| from lark.visitors import Visitor, Visitor_Recursive, Transformer, Interpreter, visit_children_decor, v_args, Discard | |||||
| class TestTrees(TestCase): | class TestTrees(TestCase): | ||||
| @@ -34,6 +34,43 @@ class TestTrees(TestCase): | |||||
| nodes = list(self.tree1.iter_subtrees_topdown()) | nodes = list(self.tree1.iter_subtrees_topdown()) | ||||
| self.assertEqual(nodes, expected) | self.assertEqual(nodes, expected) | ||||
| def test_visitor(self): | |||||
| class Visitor1(Visitor): | |||||
| def __init__(self): | |||||
| self.nodes=[] | |||||
| def __default__(self,tree): | |||||
| self.nodes.append(tree) | |||||
| class Visitor1_Recursive(Visitor_Recursive): | |||||
| def __init__(self): | |||||
| self.nodes=[] | |||||
| def __default__(self,tree): | |||||
| self.nodes.append(tree) | |||||
| visitor1=Visitor1() | |||||
| visitor1_recursive=Visitor1_Recursive() | |||||
| expected_top_down = [Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]), | |||||
| Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')] | |||||
| expected_botton_up= [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z'), | |||||
| Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')])] | |||||
| visitor1.visit(self.tree1) | |||||
| self.assertEqual(visitor1.nodes,expected_botton_up) | |||||
| visitor1_recursive.visit(self.tree1) | |||||
| self.assertEqual(visitor1_recursive.nodes,expected_botton_up) | |||||
| visitor1.nodes=[] | |||||
| visitor1_recursive.nodes=[] | |||||
| visitor1.visit_topdown(self.tree1) | |||||
| self.assertEqual(visitor1.nodes,expected_top_down) | |||||
| visitor1_recursive.visit_topdown(self.tree1) | |||||
| self.assertEqual(visitor1_recursive.nodes,expected_top_down) | |||||
| def test_interp(self): | def test_interp(self): | ||||
| t = Tree('a', [Tree('b', []), Tree('c', []), 'd']) | t = Tree('a', [Tree('b', []), Tree('c', []), 'd']) | ||||