Python shlex.split(), ignore single quotes

20,991

Solution 1

import shlex

def newSplit(value):
    lex = shlex.shlex(value)
    lex.quotes = '"'
    lex.whitespace_split = True
    lex.commenters = ''
    return list(lex)

print newSplit('''This string has "some double quotes" and 'some single quotes'.''')

Solution 2

You can use shlex.quotes to control which characters will be considered string quotes. You'll need to modify shlex.wordchars as well, to keep the ' with the i and the say.

import shlex

input = '"hello, world" is what \'i say\''
lexer = shlex.shlex(input)
lexer.quotes = '"'
lexer.wordchars += '\''

output = list(lexer)
# ['"hello, world"', 'is', 'what', "'i", "say'"]
Share:
20,991
tekknolagi
Author by

tekknolagi

(your about me is currently blank) click here to edit

Updated on November 24, 2020

Comments

  • tekknolagi
    tekknolagi over 3 years

    How, in Python, can I use shlex.split() or similar to split strings, preserving only double quotes? For example, if the input is "hello, world" is what 'i say' then the output would be ["hello, world", "is", "what", "'i", "say'"].

  • Matt Ball
    Matt Ball over 12 years
    Only negligibly so, and the lex.commenters bit is actually something that my answer doesn't do. +1 for a different way to git-r-dun.
  • tekknolagi
    tekknolagi over 12 years
    actually, this one failed on something else I was writing, and Peter's worked. thank you though!
  • Matt Ball
    Matt Ball over 12 years
    Out of curiosity, what input did it fail on?
  • Peter Lyons
    Peter Lyons over 12 years
    I started with the source code to the shlex.split function from the python source and just tweaked it with the list of quotes characters.
  • Pykler
    Pykler over 10 years
    It took me a while, but this is slightly different than the normal shlex.split which passes posix ... shlex.shlex(value, posix=True)
  • user14717
    user14717 almost 8 years
    If the escaped quote is in the quoted section, is there a way forward? Say, s = "1 'K\^o, Suzuk\'e'" to split into [1, "K\^o, Suzuk\'e"]