Multiline PyParsing example

2,806

Removing new lines from the default whitespace characters is what was needed to solve this. As Paul suggested in his comment, other improvements can be made to ensure that it parses floats and names more strictly.

string = '''START
    1   10; %   Name1
    2   20; %   Name2
END'''

from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group, ParserElement

ParserElement.setDefaultWhitespaceChars(" \t")

Float = Word(nums + '.' + '-')
Name = Word(alphanums)
NL = Suppress(LineEnd())
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(~NL +
                                                            Suppress(Literal('%'))
                                                            + OneOrMore(Name)('name') + NL ) | NL
Lines = OneOrMore(Group(Line))

grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)
Share:
2,806

Related videos on Youtube

kdheepak
Author by

kdheepak

Updated on December 01, 2022

Comments

  • kdheepak
    kdheepak over 1 year

    I'm trying to parse something really simple in PyParsing that is multiline but I'm struggling understand why it doesn't work. The string I want to parse is as follows.

    string = '''START
        1   10; %   Name1
        2   20; %   Name2
    END'''
    

    I know that every line between the START and END tokens will contain one or more positive / negative numbers that can be int or float types. I also expect that a user may optionally add additional meta data after an % sign.

    So I start by defining the basic grammar for Floats and Names.

    Float = Word(nums + '.' + '-')
    Name = Word(alphanums)
    

    I know that a line can contain one or more Float followed by a semi-colon, and optionally by a % Name.

    Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Suppress(Optional(Literal('%'))) + Optional(OneOrMore(Name)('name')) + Suppress(LineEnd())
    

    I expect many lines, so I can define the grammar for Lines as follows.

    Lines = OneOrMore(Group(Line))
    

    I use Group as suggested by Paul in this answer to make retrieving possible.

    grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
    
    grammar.parseString(string)
    

    However this throws an error that says the following

    ParseException: Expected end of line (at char 62), (line:3, col:19)
    

    Full code below for easier copy and pasting.

    string = '''START
        1   10; %   Name1
        2   20; %   Name2
    END'''
    
    from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group
    
    Float = Word(nums + '.' + '-')
    Name = Word(alphanums)
    Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Suppress(Optional(Literal('%'))) + Optional(OneOrMore(Name)('name')) + Suppress(LineEnd())
    Lines = OneOrMore(Group(Line))
    
    grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
    grammar.parseString(string)
    

    Edit :

    I've tried the following to no avail either.

    string = '''START
        1   10; %   Name1
        2   20; %   Name2
    END'''
    
    from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group
    
    Float = Word(nums + '.' + '-')
    Name = Word(alphanums)
    NL = Suppress(LineEnd())
    Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(~NL +
                                                                Suppress(Literal('%'))
                                                                + OneOrMore(Name)('name') + NL ) | NL
    Lines = OneOrMore(Group(Line))
    
    grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
    grammar.parseString(string)
    

    The only thing that does seem to work is if I use restOfLine

    Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(restOfLine)
    

    However, this does not return the portion after the semi-colon in a structured fashion and I have to parse it separately again. Is that the recommended approach?

    • PaulMcG
      PaulMcG over 7 years
      Add "Name.setDebug()"and "Float.setDebug()" and see if that output is helpful.
    • kdheepak
      kdheepak over 7 years
      That output seems to suggest that OneOrMore(Name) is going further than the end of the line. What is the recommended way to make sure that OneOrMore(Name) stops at the end of the line. I tried OneOrMore(Name) + NL and that didn't work either, and I wasn't able to understand why.
    • PaulMcG
      PaulMcG over 7 years
      First of all, is "2" a valid Name? Second, are end-of-lines significant in your grammar? If so, then you should remove them from the set of ignorable whitespace, using ParserElement.setDefaultWhitespaceChars (see example inline with docs at pythonhosted.org/pyparsing/…). Finally, you might want to tighten up your definitions of Float and Name. As you have them now, Float will match strings such as "...", "---", and "1.1.1", and Name will match "12345" and "221B".
    • kdheepak
      kdheepak over 7 years
      Thanks Paul! That solved it. I had to remove newline from the Parser element. I've posted the full answer below for reference.
    • Steeve McCauley
      Steeve McCauley about 6 years
      I find that running # disconnect C8:84:47:08:5A:47, after that org.bluez.Error.Failed error message usually fixes things. If that doesn't help, try disconnecting, exiting bluetoothctl and running "pulseaudio -k".
  • ffledgling
    ffledgling over 6 years
    Pulseaudio usually fixes things if there's an audio driver crash, while possible, nothing in the OPs post suggests this is the case.
  • Steeve McCauley
    Steeve McCauley about 6 years
    Usually disconnecting with bluetoothctl fixes the problem, but I have had some success on more than one occasion with resetting pulseaudio as suggested by Fagner Sutel.