Thursday, June 23, 2011

HTTP support in Scapy

"Scapy is a powerful interactive packet manipulation program. It is able to forge or decode packets of a wide number of protocols, send them on the wire, capture them, match requests and replies, and much more. It can easily handle most classical tasks like scanning, tracerouting, probing, unit tests, attacks or network discovery (it can replace hping, 85% of nmap, arpspoof, arp-sk, arping, tcpdump, tethereal, p0f, etc.)."

Scapy's documentation is very interesting to learn how to use it and how to add new protocols. To become more familiar with this great tool, I've decided to try to implement one of the most used protocol : HTTP (RFC 2616).
steeve-pc:blog steeve$ ./
Welcome to Scapy (2.2.0)
HTTP Scapy extension
>>> test=rdpcap("HTTP.pcap")
>>> test.summary()
Ether / IP / TCP > S
Ether / IP / TCP > SA
Ether / IP / TCP > A
Ether / IP / TCP > PA / HTTP / HTTPrequest / Raw
Ether / IP / TCP > A
Ether / IP / TCP > A / HTTP / HTTPresponse / Raw
Ether / IP / TCP > A
Ether / IP / TCP > A / HTTP / Raw
Ether / IP / TCP > A
Ether / IP / TCP > A / HTTP / Raw
Ether / IP / TCP > PA / HTTP / Raw
Ether / IP / TCP > A
The function summary() shows the content of each packet and here we can see that we have packets with interesting layers : HTTP, HTTPrequest and HTTPresponse. HTTP layer contains all the fields that can be in the 2 other layers like Date or Connection fields. HTTPrequest layer corresponds to HTTP request (GET, POST, TRACE, HEAD ...) and HTTPresponse to "200 OK", "404 Not Found"... webpages.

We can see the content of the paquet containing the HTTPrequest layer :
>>> test[3].show()
###[ Ethernet ]###
  dst= fe:ff:20:00:01:00
  src= 00:00:01:00:00:00
  type= 0x800
###[ IP ]###
     version= 4L
     ihl= 5L
     tos= 0x0
     len= 519
     id= 3909
     flags= DF
     frag= 0L
     ttl= 128
     proto= tcp
     chksum= 0x9010
###[ TCP ]###
        sport= tip2
        dport= http
        seq= 951057940
        ack= 290218380
        dataofs= 5L
        reserved= 0L
        flags= PA
        window= 9660
        chksum= 0xa958
        urgptr= 0
        options= []
###[ HTTP ]###
           CacheControl= None
           Connection= 'Connection: keep-alive\r\n'
           Date= None
           Pragma= None
           Trailer= None
           TransferEncoding= None
           Upgrade= None
           Via= None
           Warning= None
           KeepAlive= 'Keep-Alive: 300\r\n'
           Allow= None
           ContentEncoding= None
           ContentLanguage= None
           ContentLength= None
           ContentLocation= None
           ContentMD5= None
           ContentRange= None
           ContentType= None
           Expires= None
           LastModified= None
###[ HTTP Request ]###
              Method= 'GET /download.html HTTP/1.1\r\n'
              Host= 'Host:\r\n'
              UserAgent= 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113\r\n'
              Accept= 'Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1\r\n'
              AcceptLanguage= 'Accept-Language: en-us,en;q=0.5\r\n'
              AcceptEncoding= 'Accept-Encoding: gzip,deflate\r\n'
              AcceptCharset= 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n'
              Referer= 'Referer:\r\n'
              Authorization= None
              Expect= None
              From= None
              IfMatch= None
              IfModifiedSince= None
              IfNoneMatch= None
              IfRange= None
              IfUnmodifiedSince= None
              MaxForwards= None
              ProxyAuthorization= None
              Range= None
              TE= None
###[ Raw ]###
                 load= '\r\n'

Now we can easily manipulate HTTP packets with Scapy. Here, I will filter packets with HTTPrequest or HTTPresponse layer and then print some fields :

>>> http=test.filter(lambda(s): HTTPrequest in s or HTTPresponse in s)
>>> http.summary()
Ether / IP / TCP > PA / HTTP / HTTPrequest / Raw
Ether / IP / TCP > A / HTTP / HTTPresponse / Raw
Ether / IP / TCP > PA / HTTP / HTTPrequest / Raw
Ether / IP / TCP > PA / HTTP / HTTPresponse / Raw
Ether / IP / TCP > PA / HTTP / HTTPresponse / Raw
>>> for p in http.filter(lambda(s): HTTPrequest in s):
...     print p.Method, p.Host
GET /download.html HTTP/1.1
GET /pagead/ads?client=ca-pub-2309191948673629&random=1084443430285&lmt=1082467020&format=468x60_as&output=html& HTTP/1.1
>>> for p in http.filter(lambda(s): HTTPresponse in s):
...     print p.StatusLine, p.Server
HTTP/1.1 200 OK
Server: Apache
HTTP/1.1 200 OK
Server: CAFE/1.0
HTTP/1.1 200 OK
Server: CAFE/1.0
My script can be downloaded here. Don't hesitate to give me your opinion or to improve my script ;) 


  1. Excellent, finally :). Giving it a go.

  2. Works great, although it would be great to have all headers parsed regardless if they are defined in RFC or not. For example, GET requests do not parse Cookie/Connection headers. I had to stick a global variable in the class to set a.split(\r\n) to.

  3. please put on

    1. That's already on Github. Someone forked my code and improved it :

  4. Can you please help me how to extract a file name or url out of the packet