| @@ -7,10 +7,29 @@ Lark | |||
| .. autoclass:: lark.Lark | |||
| :members: open, parse, save, load | |||
| LarkOptions | |||
| ----------- | |||
| **Using Unicode character classes with regex** | |||
| .. autoclass:: lark.lark.LarkOptions | |||
| Python's builtin `re` module has a few persistent known bugs and also won't parse | |||
| advanced regex features such as character classes. | |||
| With `pip install lark-parser[regex]`, the `regex` module will be installed alongside `lark` and can act as a drop-in replacement to `re`. | |||
| Any instance of `Lark` instantiated with `regex=True` will now use the `regex` module instead of `re`. | |||
| For example, we can now use character classes to match PEP-3131 compliant Python identifiers. | |||
| Example: | |||
| :: | |||
| from lark import Lark | |||
| >>> g = Lark(r""" | |||
| ?start: NAME | |||
| NAME: ID_START ID_CONTINUE* | |||
| ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/ | |||
| ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}·]+/ | |||
| """, regex=True) | |||
| >>> g.parse('வணக்கம்') | |||
| 'வணக்கம்' | |||
| Tree | |||
| ---- | |||
| @@ -24,7 +43,7 @@ Token | |||
| .. autoclass:: lark.Token | |||
| Transformer, Vistor & Interpretor | |||
| Transformer, Visitor & Interpreter | |||
| --------------------------------- | |||
| See :doc:`visitors`. | |||
| @@ -33,10 +33,10 @@ Welcome to Lark's documentation! | |||
| grammar | |||
| tree_construction | |||
| visitors | |||
| classes | |||
| visitors | |||
| nearley | |||
| Lark is a modern parsing library for Python. Lark can parse any context-free grammar. | |||
| @@ -17,12 +17,33 @@ See: `visitors.py`_ | |||
| Visitor | |||
| ------- | |||
| .. autoclass:: lark.visitors.VisitorBase | |||
| Visitors visit each node of the tree, and run the appropriate method on it according to the node's data. | |||
| They work bottom-up, starting with the leaves and ending at the root of the tree. | |||
| There are two classes that implement the visitor interface: | |||
| - ``Visitor``: Visit every node (without recursion) | |||
| - ``Visitor_Recursive``: Visit every node using recursion. Slightly faster. | |||
| Example: | |||
| :: | |||
| class IncreaseAllNumbers(Visitor): | |||
| def number(self, tree): | |||
| assert tree.data == "number" | |||
| tree.children[0] += 1 | |||
| IncreaseAllNumbers().visit(parse_tree) | |||
| .. autoclass:: lark.visitors.Visitor | |||
| .. autoclass:: lark.visitors.Visitor_Recursive | |||
| Interpreter | |||
| ----------- | |||
| .. autoclass:: lark.visitors.Interpreter | |||
| Transformer | |||
| ----------- | |||
| @@ -30,11 +51,6 @@ Transformer | |||
| .. autoclass:: lark.visitors.Transformer | |||
| :members: __default__, __default_token__ | |||
| Interpreter | |||
| ----------- | |||
| .. autoclass:: lark.visitors.Interpreter | |||
| v_args | |||
| ------ | |||
| @@ -43,4 +59,4 @@ v_args | |||
| Discard | |||
| ------- | |||
| .. autoclass:: lark.visitors.Discard | |||
| .. autoclass:: lark.visitors.Discard | |||
| @@ -26,11 +26,12 @@ class UnexpectedEOF(ParseError): | |||
| class UnexpectedInput(LarkError): | |||
| """UnexpectedInput Error. | |||
| Used as a base class for the following exceptions: | |||
| - ``UnexpectedToken``: The parser recieved an unexpected token | |||
| - ``UnexpectedCharacters``: The lexer encountered an unexpected string | |||
| After catching one of these exceptions, you may call the following | |||
| helper methods to create a nicer error message. | |||
| After catching one of these exceptions, you may call the following helper methods to create a nicer error message. | |||
| """ | |||
| pos_in_stream = None | |||
| @@ -57,7 +58,7 @@ class UnexpectedInput(LarkError): | |||
| def match_examples(self, parse_fn, examples, token_type_match_fallback=False, use_accepts=False): | |||
| """Allows you to detect what's wrong in the input text by matching | |||
| against example errors. | |||
| Given a parser instance and a dictionary mapping some label with | |||
| some malformed syntax examples, it'll return the label for the | |||
| example that bests matches the current error. The function will | |||
| @@ -66,7 +67,7 @@ class UnexpectedInput(LarkError): | |||
| For an example usage, see examples/error_reporting_lalr.py | |||
| Args: | |||
| Parameters: | |||
| parse_fn: parse function (usually ``lark_instance.parse``) | |||
| examples: dictionary of ``{'example_string': value}``. | |||
| use_accepts: Recommended to call this with ``use_accepts=True``. | |||
| @@ -27,75 +27,67 @@ class LarkOptions(Serialize): | |||
| """ | |||
| OPTIONS_DOC = """ | |||
| **General** | |||
| **=== General ===** | |||
| start | |||
| The start symbol. Either a string, or a list of strings for | |||
| multiple possible starts (Default: "start") | |||
| The start symbol. Either a string, or a list of strings for multiple possible starts (Default: "start") | |||
| debug | |||
| Display debug information, such as warnings (default: False) | |||
| Display debug information, such as warnings (default: False) | |||
| transformer | |||
| Applies the transformer to every parse tree (equivlent | |||
| to applying it after the parse, but faster) | |||
| Applies the transformer to every parse tree (equivlent to applying it after the parse, but faster) | |||
| propagate_positions | |||
| Propagates (line, column, end_line, end_column) attributes into all tree branches. | |||
| Propagates (line, column, end_line, end_column) attributes into all tree branches. | |||
| maybe_placeholders | |||
| When True, the ``[]`` operator returns ``None`` | |||
| when not matched. When ``False``, ``[]`` behaves like the ``?`` | |||
| operator, and returns no value at all. (default= ``False``. Recommended | |||
| to set to ``True``) | |||
| When True, the ``[]`` operator returns ``None`` when not matched. | |||
| When ``False``, ``[]`` behaves like the ``?`` operator, and returns no value at all. | |||
| (default= ``False``. Recommended to set to ``True``) | |||
| regex | |||
| When True, uses the ``regex`` module instead of the | |||
| stdlib ``re``. | |||
| When True, uses the ``regex`` module instead of the stdlib ``re``. | |||
| cache | |||
| Cache the results of the Lark grammar analysis, for x2 to | |||
| x3 faster loading. LALR only for now. | |||
| Cache the results of the Lark grammar analysis, for x2 to x3 faster loading. LALR only for now. | |||
| - When ``False``, does nothing (default) | |||
| - When ``True``, caches to a temporary file in the local directory | |||
| - When given a string, caches to the path pointed by the string | |||
| - When ``False``, does nothing (default) | |||
| - When ``True``, caches to a temporary file in the local directory | |||
| - When given a string, caches to the path pointed by the string | |||
| g_regex_flags | |||
| Flags that are applied to all terminals (both regex and strings) | |||
| Flags that are applied to all terminals (both regex and strings) | |||
| keep_all_tokens | |||
| Prevent the tree builder from automagically removing "punctuation" tokens (default: False) | |||
| Prevent the tree builder from automagically removing "punctuation" tokens (default: False) | |||
| **Algorithm** | |||
| **=== Algorithm ===** | |||
| parser | |||
| Decides which parser engine to use. Accepts "earley" or "lalr". | |||
| (Default: "earley"). (there is also a "cyk" option for legacy) | |||
| Decides which parser engine to use. Accepts "earley" or "lalr". (Default: "earley"). | |||
| (there is also a "cyk" option for legacy) | |||
| lexer | |||
| Decides whether or not to use a lexer stage | |||
| - "auto" (default): Choose for me based on the parser | |||
| - "standard": Use a standard lexer | |||
| - "contextual": Stronger lexer (only works with parser="lalr") | |||
| - "dynamic": Flexible and powerful (only with parser="earley") | |||
| - "dynamic_complete": Same as dynamic, but tries *every* variation | |||
| of tokenizing possible. | |||
| Decides whether or not to use a lexer stage | |||
| - "auto" (default): Choose for me based on the parser | |||
| - "standard": Use a standard lexer | |||
| - "contextual": Stronger lexer (only works with parser="lalr") | |||
| - "dynamic": Flexible and powerful (only with parser="earley") | |||
| - "dynamic_complete": Same as dynamic, but tries *every* variation of tokenizing possible. | |||
| ambiguity | |||
| Decides how to handle ambiguity in the parse. Only relevant if parser="earley" | |||
| - "resolve" - The parser will automatically choose the simplest | |||
| derivation (it chooses consistently: greedy for tokens, | |||
| non-greedy for rules) | |||
| - "explicit": The parser will return all derivations wrapped in | |||
| "_ambig" tree nodes (i.e. a forest). | |||
| Decides how to handle ambiguity in the parse. Only relevant if parser="earley" | |||
| **Domain Specific** | |||
| - "resolve" - The parser will automatically choose the simplest derivation | |||
| (it chooses consistently: greedy for tokens, non-greedy for rules) | |||
| - "explicit": The parser will return all derivations wrapped in "_ambig" tree nodes (i.e. a forest). | |||
| **=== Misc. / Domain Specific ===** | |||
| postlex | |||
| Lexer post-processing (Default: None) Only works with the | |||
| standard and contextual lexers. | |||
| Lexer post-processing (Default: None) Only works with the standard and contextual lexers. | |||
| priority | |||
| How priorities should be evaluated - auto, none, normal, invert (Default: auto) | |||
| How priorities should be evaluated - auto, none, normal, invert (Default: auto) | |||
| lexer_callbacks | |||
| Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution. | |||
| Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution. | |||
| use_bytes | |||
| Accept an input of type ``bytes`` instead of ``str`` (Python 3 only). | |||
| Accept an input of type ``bytes`` instead of ``str`` (Python 3 only). | |||
| edit_terminals | |||
| A callback | |||
| A callback for editing the terminals before parse. | |||
| """ | |||
| if __doc__: | |||
| __doc__ += OPTIONS_DOC | |||
| @@ -170,13 +162,11 @@ class LarkOptions(Serialize): | |||
| class Lark(Serialize): | |||
| """Main interface for the library. | |||
| It's mostly a thin wrapper for the many different parsers, and for | |||
| the tree constructor. | |||
| It's mostly a thin wrapper for the many different parsers, and for the tree constructor. | |||
| Args: | |||
| grammar: a string or file-object containing the | |||
| grammar spec (using Lark's ebnf syntax) | |||
| options : a dictionary controlling various aspects of Lark. | |||
| Parameters: | |||
| grammar: a string or file-object containing the grammar spec (using Lark's ebnf syntax) | |||
| options: a dictionary controlling various aspects of Lark. | |||
| Example: | |||
| >>> Lark(r'''start: "foo" ''') | |||
| @@ -317,8 +307,7 @@ class Lark(Serialize): | |||
| self.save(f) | |||
| # TODO: merge with above | |||
| if __init__.__doc__: | |||
| __init__.__doc__ += "\nOptions:\n" + LarkOptions.OPTIONS_DOC | |||
| __doc__ += "\nOptions:\n" + LarkOptions.OPTIONS_DOC | |||
| __serialize_fields__ = 'parser', 'rules', 'options' | |||
| @@ -391,8 +380,7 @@ class Lark(Serialize): | |||
| def open(cls, grammar_filename, rel_to=None, **options): | |||
| """Create an instance of Lark with the grammar given by its filename | |||
| If ``rel_to`` is provided, the function will find the grammar | |||
| filename in relation to it. | |||
| If ``rel_to`` is provided, the function will find the grammar filename in relation to it. | |||
| Example: | |||
| @@ -426,17 +414,15 @@ class Lark(Serialize): | |||
| def parse(self, text, start=None, on_error=None): | |||
| """Parse the given text, according to the options provided. | |||
| If a transformer is supplied to ``__init__``, returns whatever is the | |||
| result of the transformation. | |||
| Args: | |||
| Parameters: | |||
| text (str): Text to be parsed. | |||
| start (str, optional): Required if Lark was given multiple | |||
| possible start symbols (using the start option). | |||
| on_error (function, optional): if provided, will be called on | |||
| UnexpectedToken error. Return true to resume parsing. | |||
| LALR only. See examples/error_puppet.py for an example | |||
| of how to use on_error. | |||
| start (str, optional): Required if Lark was given multiple possible start symbols (using the start option). | |||
| on_error (function, optional): if provided, will be called on UnexpectedToken error. Return true to resume parsing. | |||
| LALR only. See examples/error_puppet.py for an example of how to use on_error. | |||
| Returns: | |||
| If a transformer is supplied to ``__init__``, returns whatever is the | |||
| result of the transformation. Otherwise, returns a Tree instance. | |||
| """ | |||
| @@ -7,11 +7,9 @@ from .. import Token | |||
| class ParserPuppet(object): | |||
| """ParserPuppet gives you advanced control over error handling when | |||
| parsing with LALR. | |||
| """ParserPuppet gives you advanced control over error handling when parsing with LALR. | |||
| For a simpler, more streamlined interface, see the ``on_error`` | |||
| argument to ``Lark.parse()``. | |||
| For a simpler, more streamlined interface, see the ``on_error`` argument to ``Lark.parse()``. | |||
| """ | |||
| def __init__(self, parser, state_stack, value_stack, start, stream, set_state): | |||
| self.parser = parser | |||
| @@ -24,8 +22,7 @@ class ParserPuppet(object): | |||
| self.result = None | |||
| def feed_token(self, token): | |||
| """Feed the parser with a token, and advance it to the next state, | |||
| as if it recieved it from the lexer. | |||
| """Feed the parser with a token, and advance it to the next state, as if it recieved it from the lexer. | |||
| Note that ``token`` has to be an instance of ``Token``. | |||
| """ | |||
| @@ -89,9 +86,9 @@ class ParserPuppet(object): | |||
| return '\n'.join(out) | |||
| def choices(self): | |||
| """Returns a dictionary of token types, matched to their action in | |||
| the parser. Only returns token types that are accepted by the | |||
| current state. | |||
| """Returns a dictionary of token types, matched to their action in the parser. | |||
| Only returns token types that are accepted by the current state. | |||
| Updated by ``feed_token()``. | |||
| """ | |||
| @@ -18,15 +18,14 @@ class Meta: | |||
| class Tree(object): | |||
| """The main tree class. | |||
| Creates a new tree, and stores "data" and "children" in attributes of | |||
| the same name. Trees can be hashed and compared. | |||
| Creates a new tree, and stores "data" and "children" in attributes of the same name. | |||
| Trees can be hashed and compared. | |||
| Args: | |||
| Parameters: | |||
| data: The name of the rule or alias | |||
| children: List of matched sub-rules and terminals | |||
| meta: Line & Column numbers (if ``propagate_positions`` is enabled). | |||
| meta attributes: line, column, start_pos, end_line, | |||
| end_column, end_pos | |||
| meta attributes: line, column, start_pos, end_line, end_column, end_pos | |||
| """ | |||
| def __init__(self, data, children, meta=None): | |||
| self.data = data | |||
| @@ -79,9 +78,8 @@ class Tree(object): | |||
| def iter_subtrees(self): | |||
| """Depth-first iteration. | |||
| Iterates over all the subtrees, never returning to the | |||
| same node twice (Lark's parse-tree is actually a DAG). | |||
| Iterates over all the subtrees, never returning to the same node twice (Lark's parse-tree is actually a DAG). | |||
| """ | |||
| queue = [self] | |||
| subtrees = OrderedDict() | |||
| @@ -121,8 +119,7 @@ class Tree(object): | |||
| def iter_subtrees_topdown(self): | |||
| """Breadth-first iteration. | |||
| Iterates over all the subtrees, return nodes in order like | |||
| pretty() does. | |||
| Iterates over all the subtrees, return nodes in order like pretty() does. | |||
| """ | |||
| stack = [self] | |||
| while stack: | |||
| @@ -45,28 +45,23 @@ class _Decoratable: | |||
| class Transformer(_Decoratable): | |||
| """Transformer visit each node of the tree, and run the appropriate method | |||
| on it according to the node's data. | |||
| """Transformers visit each node of the tree, and run the appropriate method on it according to the node's data. | |||
| Calls its methods (provided by user via inheritance) according to | |||
| ``tree.data``. The returned value replaces the old one in the structure. | |||
| Calls its methods (provided by user via inheritance) according to ``tree.data``. | |||
| The returned value replaces the old one in the structure. | |||
| They work bottom-up (or depth-first), starting with the leaves and | |||
| ending at the root of the tree. Transformers can be used to | |||
| implement map & reduce patterns. Because nodes are reduced from leaf to | |||
| root, at any point the callbacks may assume the children have already been | |||
| transformed (if applicable). ``Transformer`` can do anything ``Visitor`` | |||
| can do, but because it reconstructs the tree, it is slightly less | |||
| efficient. | |||
| They work bottom-up (or depth-first), starting with the leaves and ending at the root of the tree. | |||
| Transformers can be used to implement map & reduce patterns. Because nodes are reduced from leaf to root, | |||
| at any point the callbacks may assume the children have already been transformed (if applicable). | |||
| ``Transformer`` can do anything ``Visitor`` can do, but because it reconstructs the tree, | |||
| it is slightly less efficient. It can be used to implement map or reduce patterns. | |||
| All these classes implement the transformer interface: | |||
| - ``Transformer`` - Recursively transforms the tree. This is the one you | |||
| probably want. | |||
| - ``Transformer_InPlace`` - Non-recursive. Changes the tree in-place | |||
| instead of returning new instances | |||
| - ``Transformer_InPlaceRecursive`` - Recursive. Changes the tree in-place | |||
| instead of returning new instances | |||
| - ``Transformer`` - Recursively transforms the tree. This is the one you probably want. | |||
| - ``Transformer_InPlace`` - Non-recursive. Changes the tree in-place instead of returning new instances | |||
| - ``Transformer_InPlaceRecursive`` - Recursive. Changes the tree in-place instead of returning new instances | |||
| Example: | |||
| :: | |||
| @@ -82,7 +77,7 @@ class Transformer(_Decoratable): | |||
| # Prints: Tree(a, [3]) | |||
| Args: | |||
| Parameters: | |||
| visit_tokens: By default, transformers only visit rules. | |||
| visit_tokens=True will tell ``Transformer`` to visit tokens | |||
| as well. This is a slightly slower alternative to lexer_callbacks | |||
| @@ -164,16 +159,16 @@ class Transformer(_Decoratable): | |||
| def __default__(self, data, children, meta): | |||
| """Default operation on tree (for override) | |||
| Function that is called on if a function with a corresponding name has | |||
| not been found. Defaults to reconstruct the Tree | |||
| Function that is called on if a function with a corresponding name has not been found. | |||
| Defaults to reconstruct the Tree. | |||
| """ | |||
| return Tree(data, children, meta) | |||
| def __default_token__(self, token): | |||
| """Default operation on token (for override) | |||
| Function that is called on if a function with a corresponding name has | |||
| not been found. Defaults to just return the argument. | |||
| Function that is called on if a function with a corresponding name has not been found. | |||
| Defaults to just return the argument. | |||
| """ | |||
| return token | |||
| @@ -259,25 +254,6 @@ class Transformer_InPlaceRecursive(Transformer): | |||
| # Visitors | |||
| class VisitorBase: | |||
| """Visitors visit each node of the tree | |||
| Run the appropriate method on it according to the node's data. | |||
| They work bottom-up, starting with the leaves and ending at the root | |||
| of the tree. There are two classes that implement the visitor interface: | |||
| - ``Visitor``: Visit every node (without recursion) | |||
| - ``Visitor_Recursive``: Visit every node using recursion. Slightly faster. | |||
| Example: | |||
| :: | |||
| class IncreaseAllNumbers(Visitor): | |||
| def number(self, tree): | |||
| assert tree.data == "number" | |||
| tree.children[0] += 1 | |||
| IncreaseAllNumbers().visit(parse_tree) | |||
| """ | |||
| def _call_userfunc(self, tree): | |||
| return getattr(self, tree.data, self.__default__)(tree) | |||
| @@ -293,8 +269,7 @@ class Visitor(VisitorBase): | |||
| """Bottom-up visitor, non-recursive. | |||
| Visits the tree, starting with the leaves and finally the root (bottom-up) | |||
| Calls its methods (provided by user via inheritance) according to | |||
| ``tree.data`` | |||
| Calls its methods (provided by user via inheritance) according to ``tree.data`` | |||
| """ | |||
| def visit(self, tree): | |||
| @@ -312,8 +287,7 @@ class Visitor_Recursive(VisitorBase): | |||
| """Bottom-up visitor, recursive. | |||
| Visits the tree, starting with the leaves and finally the root (bottom-up) | |||
| Calls its methods (provided by user via inheritance) according to | |||
| ``tree.data`` | |||
| Calls its methods (provided by user via inheritance) according to ``tree.data`` | |||
| """ | |||
| def visit(self, tree): | |||
| @@ -348,13 +322,12 @@ class Interpreter(_Decoratable): | |||
| """Interpreter walks the tree starting at the root. | |||
| Visits the tree, starting with the root and finally the leaves (top-down) | |||
| Calls its methods (provided by user via inheritance) according to | |||
| ``tree.data`` | |||
| Unlike ``Transformer`` and ``Visitor``, the Interpreter doesn't | |||
| automatically visit its sub-branches. The user has to explicitly call ``visit``, | |||
| ``visit_children``, or use the ``@visit_children_decor``. This allows the | |||
| user to implement branching and loops. | |||
| For each tree node, it calls its methods (provided by user via inheritance) according to ``tree.data``. | |||
| Unlike ``Transformer`` and ``Visitor``, the Interpreter doesn't automatically visit its sub-branches. | |||
| The user has to explicitly call ``visit``, ``visit_children``, or use the ``@visit_children_decor``. | |||
| This allows the user to implement branching and loops. | |||
| Example: | |||
| :: | |||
| @@ -452,21 +425,17 @@ def _vargs_tree(f, data, children, meta): | |||
| def v_args(inline=False, meta=False, tree=False, wrapper=None): | |||
| """A convenience decorator factory for modifying the behavior of | |||
| user-supplied visitor methods. | |||
| By default, callback methods of transformers/visitors accept one argument - | |||
| a list of the node's children. ``v_args`` can modify this behavior. When | |||
| used on a transformer/visitor class definition, it applies to all the | |||
| callback methods inside it. Accepts one of three following flags. | |||
| Args: | |||
| inline: Children are provided as ``*args`` instead of a list | |||
| argument (not recommended for very long lists). | |||
| meta: Provides two arguments: ``children`` and ``meta`` (instead of | |||
| just the first) | |||
| tree: Provides the entire tree as the argument, instead of the | |||
| children. | |||
| """A convenience decorator factory for modifying the behavior of user-supplied visitor methods. | |||
| By default, callback methods of transformers/visitors accept one argument - a list of the node's children. | |||
| ``v_args`` can modify this behavior. When used on a transformer/visitor class definition, | |||
| it applies to all the callback methods inside it. | |||
| Parameters: | |||
| inline: Children are provided as ``*args`` instead of a list argument (not recommended for very long lists). | |||
| meta: Provides two arguments: ``children`` and ``meta`` (instead of just the first) | |||
| tree: Provides the entire tree as the argument, instead of the children. | |||
| Example: | |||
| :: | |||