Scapy and rdpcap function

python networking pcap packet-capture scapy

20,707

Solution 1

Scapy has another method sniff which you can use to read the pcap files too:

def method_filter_HTTP(pkt):
    #Your processing

sniff(offline="your_file.pcap",prn=method_filter_HTTP,store=0)

rdpcap loads the entire pcap file to the memory. Hence it uses a lot of memory and as you said its slow. While sniff reads one packet at a time and passes it to the provided prn function. That store=0 parameter ensures that the packet is deleted from memory as soon as it is processed.

Solution 2

While I agree the load time is longer than one might expect, it is likely because the file is being parsed to generate an array of highly composed objects. What I've had to do was use editcap to chop up the packet captures to make reading them a bit easier. For example:

$ editcap -B 2013-05-2810:05:55 -i 5 -F libpcap inputcapture.pcap outputcapture.pcap

Please note: a full explanation of the switches of this command is available here.

Also, the -F libpcap part seemed to be necessary (at least for me) to get scapy's pcap function able to parse the file. (This is supposed to be the default pcap file output format, but this was not the case for me, for whatever reason. You can verify the file type of your input and output files with capinfos (e.g., simply enter capinfos your_capture.pcap).

Both capinfos and editcap are available with the WireShark distribution.

Solution 3

If you are looking for a more responsive code, consider using PcapReader() instead of rdpcap().

PcapReader() creates a generator and loads a packet only when it is needed, as opposed to rdpcap() which loads the entire trace into memory. PcapReader() is, therefore, well-suited for a large trace that takes forever to load with rdpcap(), or throws a MemoryError because it's simply too large for your system.

Example code:

packets = PcapReader('filename.pcap')
for packet in packets:
    mac_src = packet[Ether].src
    mac_dst = packet[Ether].dst
    ...

Please refer to the PcapReader() documentation for more information.

If you are only concerned about how long it takes to get the final output, then rdpcap() might have an advantage over PcapReader(), although I'm not sure about the magnitude of difference.

Solution 4

Since Scapy 2.4.3 it has built-in support to parse HTTP sessions. It can be used with the sniff() sessions functionality. e.g.

pkts = sniff(offline="http_chunk.pcap.gz", session=TCPSession, store=0)

When using the TCPsession functionality with an HTTP/1 capture it returns a list of 'packets' which contain the assembled data from all underlying packets that make up each HTTPRequest, HTTPResponse. It will still also return individual packets such as Ack packets. So, for example, checking if a 'packet' haslayer(HTTPResponse) then that 'packet' contains the entire response payload. It's also possible to use the answers() functionality to match requests and responses. Note you can use sniff() for a live capture, or with offline packet capture, or a list of packets.

View more solutions

20,707

Author by

auino

Updated on December 18, 2020

Comments

auino over 3 years

I'm using rdpcap function of Scapy to read a PCAP file. I also use the module described in a link to HTTP support in Scapy which is needed in my case, as I have to retrieve all the HTTP requests and responses and their related packets.

I noticed that parsing a large PCAP file the rdpcap function takes too much time to read it.

Is there a solution to read a pcap file faster?