Use token as variable in pyparsing

Question

I've recently started using python/pyparsing to process a string of hex values and I'm having trouble with this: Consider this string:

string = "10020304059917368584304025326"

I want the end result to be this:

['10', '02', '03', ['04', '05', '9917365843'], ['04', '02', '5326']]

Assume that 04 is a tag which means data (same concept as in ASN.1), and 05 is the size of that data. I'm not seeing how to use the size variable on the pyparsing code. The best that I can do is:

byte = Word(hexnums, exact=2)
process = byte + byte + byte + Word(hexnums)
newstring = process.parseString(string)
print (newstring.dump())

Any help would be greatly appreciated.

PS: After the help of Hooked, my final code is:

from pyparsing import *

string = "10 02 03 04 05 99 17 36 58 43 04 02 53 26"

tag = Word(hexnums, exact=2)
size =  Word(hexnums)
array = Group(tag + countedArray(size))

process = tag + tag + tag + ZeroOrMore(array)

newstring = process.parseString(string)
print (newstring.dump())

Which prints:

['10', '02', '03', ['04', ['99', '17', '36', '58', '43']], ['04', ['53', '26']]]

Hope this helps in the future.

score 2 · Accepted Answer · edited May 23 '17 at 10:24

I asked the same question in a more general sense, Can a BNF handle forward consumption?. The answer to that question was no, as a context free grammar can not know what's coming up. Thankfully, pyparsing is more than a context-free grammar as the author of the package points out:

Pyparsing includes the helper countedArray which does exactly what you ask. It takes a single argument expr, and will parse an integer followed by 'n' instances of expr

A far more complete solution with a minimal working example is provided in his answer. The question: PyParsing lookaheads and greedy expressions is also a good reference for what you are trying to do as well.

score 0 · Answer 2 · answered Jun 19 '12 at 16:11

Would this work? It doesn't use pyparsing, but it records variable length sub-lists when it sees '04'.

def func( s ):
    d = []
    # while s isn't empty
    while len(s) != 0:
        b = s[0:2]
        if b != '04':
            # if b isn't '04' append it to d
            d.append( b )   
            # shorten s
            s = s[2:]
        else:
            # take the length, as a string
            l = s[2:4]
            # take the length, as an integer
            n = int(s[2:4])
            # record b='04', the length, and then the next values
            d.append( [ b, l, s[4:4+n*2] ] )
            # shorten s
            s = s[4+n*2:]
    return d

Unfortunately not for me. The example that I gave is just a simple part of what I aim to do, so it is critical that I use pyparsing (I have a large amount of tags, which represent various sizes). Thanks for the help though ! — Luis, Jun 19 '12 at 16:25

Use token as variable in pyparsing

2 Answers2