4

A pcap file is downloaded from url with the help of Python (2.7.9) Requests library:

import requests
response = requests.get('http://example.com/path/1.pcap',  stream=True)

According to documentation response.raw is a file-like object and my goal is to process the downloaded file without saving it to disk.

I first looked at Scapy and Pyshark libraries for .pcap file parsing but their functions (rdpcap and FileCapture) accept file path string as an argument. pcap.Reader from dpkt library accepts a file object. The first try pcap=dpkt.pcap.Reader(resonse.raw) gave an error:

AttributeError: 'HTTPResponse' object has no attribute 'name'

Name attribute was added:

setattr(response.raw,'name', 'test.pcap')

After that pcap=dpkt.pcap.Reader(resonse.raw) didn't give any errors but pcap.readpkts() failed with

io.UsupportedOperation: seek

And indeed response.raw.seekable() returns False.

I tried setting response.raw.decode_content = True but that didn't help.

Is there a solution for processing the object the way I'm trying? Maybe additional request parameters are required to get a seekable response object?

By the way, if response object is written to file (shutil.copyfileobj(response.raw,file)), dpkt succeeds in working with that file afterwards.

Andrey Grachev
  • 1,259
  • 1
  • 14
  • 22

1 Answers1

1

Support for StringIO objects was recently added to dpkt. So, now you can create a StringIO object from your string and then pass the to pcap.Reader

To create a StringIO object from a string:

from StringIO import StringIO
data = StringIO("aaaaa..aa")

You can then do

import dpkt
from StringIO import StringIO
import requests

response = requests.get('http://example.com/path/1.pcap',  stream=True)
data = StringIO(response.raw)
pcap = dpkt.pcap.Reader(data)
    for ts, buf in pcap:
        eth = dpkt.ethernet.Ethernet(buf)
        ...
Kiran Bandla
  • 686
  • 4
  • 10
  • 1
    You should mention that the way to do this is to `from StringIO import StringIO# (or cStringIO)` and then wrapping your string with `wrapped_string = StringIO("my string here")` – a p Jun 18 '15 at 00:54
  • Kiran, thank you. I hope that this method will work but I haven't been able to test it yet. After upgrading DPKT it fails at `from test import pystone` with error `cannot import name pystone`. I installed `test` package from pip but it does not contain `pystone`. Haven't been able to resolve that yet (I'm using Python 2.7 on Ubuntu 12.04). – Andrey Grachev Jun 26 '15 at 08:27
  • Andrey, pystone is part of Python 2.7. You do not have to install `test` from pip. At some point, you must have accidentally installed/copied a test.py or a test directory to your standard python lib. To check for this, do `import test` and then do `print test`. This will show you the location of test, which most likely overwrote the default `test` module. – Kiran Bandla Jun 27 '15 at 03:16
  • Kiran, thank you, that solved the problem with `pystone`. But when passing `stringIO` object to `pcap.Reader` I get 'invalid tcpdump header' exception. When passing file object for the same pcap file everything is processed correctly. – Andrey Grachev Nov 14 '15 at 08:26
  • @AndreyGrachev if that worked for you, can you mark it as the answer – Kiran Bandla Dec 28 '16 at 02:07
  • @KiranBandla, actually it is not finally resolved for me - I still get `invalid tcpdump header` excpeption when using StringIO. I ended up copying the response to a file object before processing. I'll try to test one more time after upgrading `dpkt` version and will update later. – Andrey Grachev Dec 28 '16 at 19:59
  • @KiranBandla, I upgraded `dpkt` to the latest version (1.9.0) and still got `invalid tcpdump header` when using `StringIO`. – Andrey Grachev Jan 27 '17 at 16:15