Multiline PyParsing example

python pyparsing

2,806

Removing new lines from the default whitespace characters is what was needed to solve this. As Paul suggested in his comment, other improvements can be made to ensure that it parses floats and names more strictly.

string = '''START
    1   10; %   Name1
    2   20; %   Name2
END'''

from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group, ParserElement

ParserElement.setDefaultWhitespaceChars(" \t")

Float = Word(nums + '.' + '-')
Name = Word(alphanums)
NL = Suppress(LineEnd())
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(~NL +
                                                            Suppress(Literal('%'))
                                                            + OneOrMore(Name)('name') + NL ) | NL
Lines = OneOrMore(Group(Line))

grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)

2,806

kdheepak

Updated on December 01, 2022

Comments

kdheepak over 1 year
I'm trying to parse something really simple in PyParsing that is multiline but I'm struggling understand why it doesn't work. The string I want to parse is as follows.
```
string = '''START
    1   10; %   Name1
    2   20; %   Name2
END'''
```
I know that every line between the START and END tokens will contain one or more positive / negative numbers that can be int or float types. I also expect that a user may optionally add additional meta data after an % sign.

So I start by defining the basic grammar for Floats and Names.
```
Float = Word(nums + '.' + '-')
Name = Word(alphanums)
```
I know that a line can contain one or more Float followed by a semi-colon, and optionally by a % Name.
```
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Suppress(Optional(Literal('%'))) + Optional(OneOrMore(Name)('name')) + Suppress(LineEnd())
```
I expect many lines, so I can define the grammar for Lines as follows.
```
Lines = OneOrMore(Group(Line))
```
I use Group as suggested by Paul in this answer to make retrieving possible.
```
grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))

grammar.parseString(string)
```
However this throws an error that says the following
```
ParseException: Expected end of line (at char 62), (line:3, col:19)
```
Full code below for easier copy and pasting.
```
string = '''START
    1   10; %   Name1
    2   20; %   Name2
END'''

from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group

Float = Word(nums + '.' + '-')
Name = Word(alphanums)
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Suppress(Optional(Literal('%'))) + Optional(OneOrMore(Name)('name')) + Suppress(LineEnd())
Lines = OneOrMore(Group(Line))

grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)
```
Edit :

I've tried the following to no avail either.
```
string = '''START
    1   10; %   Name1
    2   20; %   Name2
END'''

from pyparsing import Word, Keyword, nums, OneOrMore, Optional, Suppress, Literal, alphanums, LineEnd, LineStart, Group

Float = Word(nums + '.' + '-')
Name = Word(alphanums)
NL = Suppress(LineEnd())
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(~NL +
                                                            Suppress(Literal('%'))
                                                            + OneOrMore(Name)('name') + NL ) | NL
Lines = OneOrMore(Group(Line))

grammar = Suppress(Keyword('START')) + Lines + Suppress(Keyword('END'))
grammar.parseString(string)
```
The only thing that does seem to work is if I use restOfLine
```
Line = OneOrMore(Float)('data') + Suppress(Literal(';')) + Optional(restOfLine)
```
However, this does not return the portion after the semi-colon in a structured fashion and I have to parse it separately again. Is that the recommended approach?
- PaulMcG over 7 years
  
  Add "Name.setDebug()"and "Float.setDebug()" and see if that output is helpful.
- kdheepak over 7 years
  
  That output seems to suggest that OneOrMore(Name) is going further than the end of the line. What is the recommended way to make sure that OneOrMore(Name) stops at the end of the line. I tried OneOrMore(Name) + NL and that didn't work either, and I wasn't able to understand why.
- PaulMcG over 7 years
  
  First of all, is "2" a valid Name? Second, are end-of-lines significant in your grammar? If so, then you should remove them from the set of ignorable whitespace, using ParserElement.setDefaultWhitespaceChars (see example inline with docs at pythonhosted.org/pyparsing/…). Finally, you might want to tighten up your definitions of Float and Name. As you have them now, Float will match strings such as "...", "---", and "1.1.1", and Name will match "12345" and "221B".
- kdheepak over 7 years
  
  Thanks Paul! That solved it. I had to remove newline from the Parser element. I've posted the full answer below for reference.
- Steeve McCauley about 6 years
  
  I find that running # disconnect C8:84:47:08:5A:47, after that org.bluez.Error.Failed error message usually fixes things. If that doesn't help, try disconnecting, exiting bluetoothctl and running "pulseaudio -k".
ffledgling over 6 years

Pulseaudio usually fixes things if there's an audio driver crash, while possible, nothing in the OPs post suggests this is the case.
Steeve McCauley about 6 years

Usually disconnecting with bluetoothctl fixes the problem, but I have had some success on more than one occasion with resetting pulseaudio as suggested by Fagner Sutel.