Étant donné un exemple de chaîne s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"' et je veux le séparer des morceaux suivants:

# To Do: something like {l = s.split(',')}
l = ['Hi', 'my name is Humpty-Dumpty', '"Alice, Through the Looking Glass"']

Je ne sais pas où et combien de délimiteurs je trouverai.

C'est mon idée initiale, et elle est assez longue et pas exacte, car elle supprime tous les délimiteurs, tandis que je veux que les délimiteurs entre guillemets survivent:

s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
ss = []
inner_string = ""
delimiter = ','

for item in s.split(delimiter):
    if not inner_string: 
        if '\"' not in item: # regullar string. not intersting
            ss.append(item)
        else:
            inner_string += item # start inner string

    elif inner_string:
        inner_string += item

        if '\"' in item:  # end inner string
            ss.append(inner_string)
            inner_string = ""
        else:            # middle of inner string
            pass

print(ss)
# prints ['Hi', ' my name is Humpty-Dumpty', ' from "Alice Through the Looking Glass"'] which is OK-ish
1
CIsForCookies 20 nov. 2018 à 14:13

3 réponses

Meilleure réponse

Vous pouvez diviser par expressions régulières avec re.split :

>>> import re
>>> [x for x in re.split(r'([^",]*(?:"[^"]*"[^",]*)*)', s) if x not in (',','')]

Lorsque s est égal à:

'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'

Il génère:

['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']

Expression régulière expliquée:

(
    [^",]*          zero or more chars other than " or ,
    (?:             non-capturing group
        "[^"]*"     quoted block
        [^",]*      followed by zero or more chars other than " or ,
    )*              zero or more times
)
2
fferri 20 nov. 2018 à 12:56

J'ai résolu ce problème en évitant complètement split:

s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'
l = []
substr = ""
quotes_open = False

for c in s:
    if c == ',' and not quotes_open: # check for comma only if no quotes open
        l.append(substr)
        substr = ""
    elif c == '\"':
        quotes_open = not quotes_open
    else:
        substr += c

l.append(substr)

print(l)

Production:

['Hi', ' my name is Humpty-Dumpty', ' from Alice, Through the Looking Glass']

Une fonction plus généralisée pourrait ressembler à quelque chose comme:

def custom_split(input_str, delimiter=' ', avoid_between_char='\"'):
    l = []
    substr = ""
    between_avoid_chars = False
    for c in s:
        if c == delimiter and not between_avoid_chars:
            l.append(substr)
            substr = ""
        elif c == avoid_between_char:
            between_avoid_chars = not between_avoid_chars
        else:
            substr += c
    l.append(substr)
    return l
1
Aquarthur 20 nov. 2018 à 11:39

Cela fonctionnerait pour ce cas spécifique et pourrait fournir un point de départ.

import re
s = 'Hi, my name is Humpty-Dumpty, from "Alice, Through the Looking Glass"'

cut = re.search('(".*")', s)

r = re.sub('(".*")', '$VAR$', s).split(',')
res = []
for i in r:
    res.append(re.sub('\$VAR\$', cut.group(1), i))

Production

print(res)
['Hi', ' my name is Humpty-Dumpty', ' from "Alice, Through the Looking Glass"']
0
Richy 20 nov. 2018 à 11:31