Scapy and rdpcap function
Solution 1
Scapy has another method sniff
which you can use to read the pcap files too:
def method_filter_HTTP(pkt):
#Your processing
sniff(offline="your_file.pcap",prn=method_filter_HTTP,store=0)
rdpcap
loads the entire pcap file to the memory. Hence it uses a lot of memory and as you said its slow. While sniff
reads one packet at a time and passes it to the provided prn
function. That store=0
parameter ensures that the packet is deleted from memory as soon as it is processed.
Solution 2
While I agree the load time is longer than one might expect, it is likely because the file is being parsed to generate an array of highly composed objects. What I've had to do was use editcap
to chop up the packet captures to make reading them a bit easier. For example:
$ editcap -B 2013-05-2810:05:55 -i 5 -F libpcap inputcapture.pcap outputcapture.pcap
Please note: a full explanation of the switches of this command is available here.
Also, the -F libpcap
part seemed to be necessary (at least for me) to get scapy's pcap
function able to parse the file. (This is supposed to be the default pcap file output format, but this was not the case for me, for whatever reason. You can verify the file type of your input and output files with capinfos
(e.g., simply enter capinfos your_capture.pcap
).
Both capinfos
and editcap
are available with the WireShark distribution.
Solution 3
If you are looking for a more responsive code, consider using PcapReader()
instead of rdpcap()
.
PcapReader()
creates a generator and loads a packet only when it is needed, as opposed to rdpcap()
which loads the entire trace into memory. PcapReader()
is, therefore, well-suited for a large trace that takes forever to load with rdpcap()
, or throws a MemoryError
because it's simply too large for your system.
Example code:
packets = PcapReader('filename.pcap')
for packet in packets:
mac_src = packet[Ether].src
mac_dst = packet[Ether].dst
...
Please refer to the PcapReader()
documentation for more information.
If you are only concerned about how long it takes to get the final output, then rdpcap()
might have an advantage over PcapReader()
, although I'm not sure about the magnitude of difference.
Solution 4
Since Scapy 2.4.3 it has built-in support to parse HTTP sessions. It can be used with the sniff()
sessions functionality. e.g.
pkts = sniff(offline="http_chunk.pcap.gz", session=TCPSession, store=0)
When using the TCPsession
functionality with an HTTP/1 capture it returns a list of 'packets' which contain the assembled data from all underlying packets that make up each HTTPRequest, HTTPResponse. It will still also return individual packets such as Ack packets. So, for example, checking if a 'packet' haslayer(HTTPResponse)
then that 'packet' contains the entire response payload. It's also possible to use the answers()
functionality to match requests and responses. Note you can use sniff()
for a live capture, or with offline
packet capture, or a list of packets.
auino
Updated on December 18, 2020Comments
-
auino over 3 years
I'm using
rdpcap
function of Scapy to read a PCAP file. I also use the module described in a link to HTTP support in Scapy which is needed in my case, as I have to retrieve all the HTTP requests and responses and their related packets.I noticed that parsing a large PCAP file the
rdpcap
function takes too much time to read it.Is there a solution to read a
pcap
file faster?