Scapy's documentation is very interesting to learn how to use it and how to add new protocols. To become more familiar with this great tool, I've decided to try to implement one of the most used protocol : HTTP (RFC 2616).
steeve-pc:blog steeve$ ./HTTP.py
Welcome to Scapy (2.2.0)
HTTP Scapy extension
>>> test=rdpcap("HTTP.pcap")
>>> test.summary()
Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http S
Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 SA
Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http A
Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http PA / HTTP / HTTPrequest / Raw
Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A
Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A / HTTP / HTTPresponse / Raw
Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http A
Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A / HTTP / Raw
Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http A
Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A / HTTP / Raw
Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 PA / HTTP / Raw
Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http A
[...]
The function summary() shows the content of each packet and here we can see that we have packets with interesting layers : HTTP, HTTPrequest and HTTPresponse. HTTP layer contains all the fields that can be in the 2 other layers like Date or Connection fields. HTTPrequest layer corresponds to HTTP request (GET, POST, TRACE, HEAD ...) and HTTPresponse to "200 OK", "404 Not Found"... webpages.
We can see the content of the paquet containing the HTTPrequest layer :
>>> test[3].show()
###[ Ethernet ]###
dst= fe:ff:20:00:01:00
src= 00:00:01:00:00:00
type= 0x800
###[ IP ]###
version= 4L
ihl= 5L
tos= 0x0
len= 519
id= 3909
flags= DF
frag= 0L
ttl= 128
proto= tcp
chksum= 0x9010
src= 145.254.160.237
dst= 65.208.228.223
\options\
###[ TCP ]###
sport= tip2
dport= http
seq= 951057940
ack= 290218380
dataofs= 5L
reserved= 0L
flags= PA
window= 9660
chksum= 0xa958
urgptr= 0
options= []
###[ HTTP ]###
CacheControl= None
Connection= 'Connection: keep-alive\r\n'
Date= None
Pragma= None
Trailer= None
TransferEncoding= None
Upgrade= None
Via= None
Warning= None
KeepAlive= 'Keep-Alive: 300\r\n'
Allow= None
ContentEncoding= None
ContentLanguage= None
ContentLength= None
ContentLocation= None
ContentMD5= None
ContentRange= None
ContentType= None
Expires= None
LastModified= None
###[ HTTP Request ]###
Method= 'GET /download.html HTTP/1.1\r\n'
Host= 'Host: www.ethereal.com\r\n'
UserAgent= 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113\r\n'
Accept= 'Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1\r\n'
AcceptLanguage= 'Accept-Language: en-us,en;q=0.5\r\n'
AcceptEncoding= 'Accept-Encoding: gzip,deflate\r\n'
AcceptCharset= 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n'
Referer= 'Referer: http://www.ethereal.com/development.html\r\n'
Authorization= None
Expect= None
From= None
IfMatch= None
IfModifiedSince= None
IfNoneMatch= None
IfRange= None
IfUnmodifiedSince= None
MaxForwards= None
ProxyAuthorization= None
Range= None
TE= None
###[ Raw ]###
load= '\r\n'
Now we can easily manipulate HTTP packets with Scapy. Here, I will filter packets with HTTPrequest or HTTPresponse layer and then print some fields :
My script can be downloaded here. Don't hesitate to give me your opinion or to improve my script ;)
>>> http=test.filter(lambda(s): HTTPrequest in s or HTTPresponse in s)
>>> http.summary()
Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http PA / HTTP / HTTPrequest / Raw
Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A / HTTP / HTTPresponse / Raw
Ether / IP / TCP 145.254.160.237:3371 > 216.239.59.99:http PA / HTTP / HTTPrequest / Raw
Ether / IP / TCP 216.239.59.99:http > 145.254.160.237:3371 PA / HTTP / HTTPresponse / Raw
Ether / IP / TCP 216.239.59.99:http > 145.254.160.237:3371 PA / HTTP / HTTPresponse / Raw
>>> for p in http.filter(lambda(s): HTTPrequest in s):
... print p.Method, p.Host
...
GET /download.html HTTP/1.1
Host: www.ethereal.com
GET /pagead/ads?client=ca-pub-2309191948673629&random=1084443430285&lmt=1082467020&format=468x60_as&output=html&url=http%3A%2F%2Fwww.ethereal.com%2Fdownload.html&color_bg=FFFFFF&color_text=333333&color_link=000000&color_url=666633&color_border=666633 HTTP/1.1
Host: pagead2.googlesyndication.com
>>> for p in http.filter(lambda(s): HTTPresponse in s):
... print p.StatusLine, p.Server
...
HTTP/1.1 200 OK
Server: Apache
HTTP/1.1 200 OK
Server: CAFE/1.0
HTTP/1.1 200 OK
Server: CAFE/1.0
>>>
Excellent, finally :). Giving it a go.
ReplyDeleteWorks great, although it would be great to have all headers parsed regardless if they are defined in RFC or not. For example, GET requests do not parse Cookie/Connection headers. I had to stick a global variable in the class to set a.split(\r\n) to.
ReplyDeleteplease put on Github.com
ReplyDeleteThat's already on Github. Someone forked my code and improved it : https://github.com/invernizzi/scapy-http
DeleteCan you please help me how to extract a file name or url out of the packet
ReplyDelete