标题: 使用curl/libcurl/pycurl时如何获取请求报文的Header 创建: 2015-05-13 15:49 更新: 链接: https://scz.617.cn/unix/201505131549.txt 很多人会碰上这种需求,查看HTTP响应的Header。有时也会碰上另一种类似的需求, 查看HTTP请求的Header。 如果是HTTP协议,直接Wireshark。 如果是HTTPS协议,中间人劫持、开发者工具等等。 我们这次简要介绍一下使用curl/libcurl/pycurl时如何获取请求报文的Header。在 一些快速测试过程中可能会用到。 对于curl,使用-v、--verbose参数即可: $ curl -vs http://www.nsfocus.net -o /dev/null 2> >(sed "/^*/d") -------------------------------------------------------------------------- > GET / HTTP/1.1 > User-Agent: curl/7.26.0 > Host: www.nsfocus.net > Accept: */* > < HTTP/1.1 200 OK < Date: Wed, 13 May 2015 02:19:53 GMT < Server: Apache < Expires: Wed, 13 May 2015 02:20:53 GMT < Vary: Accept-Encoding < Transfer-Encoding: chunked < Content-Type: text/html < { [data not shown] -------------------------------------------------------------------------- 冗余模式输出的内容太多了,请求报文以>号打头,响应报文以<号打头,还有以*号 打头的其他信息。上述命令删除了以*号打头的信息。 $ curl -vs http://www.nsfocus.net -o /dev/null 2> >(sed "/^[*{]/d") | cut -b3- -------------------------------------------------------------------------- GET / HTTP/1.1 User-Agent: curl/7.26.0 Host: www.nsfocus.net Accept: */* HTTP/1.1 200 OK Date: Wed, 13 May 2015 03:12:45 GMT Server: Apache Expires: Wed, 13 May 2015 03:13:45 GMT Vary: Accept-Encoding Transfer-Encoding: chunked Content-Type: text/html -------------------------------------------------------------------------- $ curl -vs http://www.nsfocus.net -o /dev/null 2> >(grep "^>") | cut -b1-2 --complement $ curl -vs http://www.nsfocus.net -o /dev/null 2> >(grep "^>") | cut -b3- $ curl -vs http://www.nsfocus.net -o /dev/null 2> >(grep "^>") | sed -e 's/^> \(.*\)/\1/g' $ curl -vs http://www.nsfocus.net -o /dev/null 2> >(sed -e '/^[^>].*/d' -e 's/^> \(.*\)/\1/g') -------------------------------------------------------------------------- GET / HTTP/1.1 User-Agent: curl/7.26.0 Host: www.nsfocus.net Accept: */* -------------------------------------------------------------------------- cut命令的"--complement"参数是取补集的意思。 下面提几个可能被误用于原始需求的curl参数: -D, --dump-header 不能用来显示请求报文的Header,它的目标对象是响应报文 $ curl -sD - http://www.nsfocus.net -o /dev/null -------------------------------------------------------------------------- HTTP/1.1 200 OK Date: Wed, 13 May 2015 02:34:51 GMT Server: Apache Expires: Wed, 13 May 2015 02:35:51 GMT Vary: Accept-Encoding Transfer-Encoding: chunked Content-Type: text/html -------------------------------------------------------------------------- -I, --head 发送HEAD请求并显示响应报文的Header。假设我们想看响应报文的Header,也不 应该使用它,毕竟HEAD请求不同于GET、POST请求。 $ curl -sI http://www.nsfocus.net -------------------------------------------------------------------------- HTTP/1.1 200 OK Date: Wed, 13 May 2015 02:39:09 GMT Server: Apache Expires: Wed, 13 May 2015 02:40:09 GMT Vary: Accept-Encoding Content-Type: text/html -------------------------------------------------------------------------- 与之前-D对比,少了"Transfer-Encoding:"。 除了冗余模式,curl还有一个参数可以满足原始需求: --trace --trace-ascii 针对请求、响应报文进行调试性转储。会覆盖之前的-v、--trace、--trace-ascii。 二者区别在于前者进行16进制格式转储,后者进行文本格式转储。 $ curl -s --trace - http://www.example.com -o /dev/null -------------------------------------------------------------------------- == Info: About to connect() to www.example.com port 80 (#0) == Info: Trying 93.184.216.34... == Info: connected == Info: Connected to www.example.com (93.184.216.34) port 80 (#0) => Send header, 79 bytes (0x4f) 0000: 47 45 54 20 2f 20 48 54 54 50 2f 31 2e 31 0d 0a GET / HTTP/1.1.. 0010: 55 73 65 72 2d 41 67 65 6e 74 3a 20 63 75 72 6c User-Agent: curl 0020: 2f 37 2e 32 36 2e 30 0d 0a 48 6f 73 74 3a 20 77 /7.26.0..Host: w 0030: 77 77 2e 65 78 61 6d 70 6c 65 2e 63 6f 6d 0d 0a ww.example.com.. 0040: 41 63 63 65 70 74 3a 20 2a 2f 2a 0d 0a 0d 0a Accept: */*.... == Info: additional stuff not fine transfer.c:1037: 0 0 == Info: HTTP 1.1 or later with persistent connection, pipelining supported <= Recv header, 17 bytes (0x11) 0000: 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d HTTP/1.1 200 OK. 0010: 0a . <= Recv header, 22 bytes (0x16) 0000: 41 63 63 65 70 74 2d 52 61 6e 67 65 73 3a 20 62 Accept-Ranges: b 0010: 79 74 65 73 0d 0a ytes.. <= Recv header, 31 bytes (0x1f) 0000: 43 61 63 68 65 2d 43 6f 6e 74 72 6f 6c 3a 20 6d Cache-Control: m 0010: 61 78 2d 61 67 65 3d 36 30 34 38 30 30 0d 0a ax-age=604800.. <= Recv header, 25 bytes (0x19) 0000: 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 65 Content-Type: te 0010: 78 74 2f 68 74 6d 6c 0d 0a xt/html.. <= Recv header, 37 bytes (0x25) 0000: 44 61 74 65 3a 20 57 65 64 2c 20 31 33 20 4d 61 Date: Wed, 13 Ma 0010: 79 20 32 30 31 35 20 30 33 3a 30 36 3a 33 33 20 y 2015 03:06:33 0020: 47 4d 54 0d 0a GMT.. <= Recv header, 19 bytes (0x13) 0000: 45 74 61 67 3a 20 22 33 35 39 36 37 30 36 35 31 Etag: "359670651 0010: 22 0d 0a ".. <= Recv header, 40 bytes (0x28) 0000: 45 78 70 69 72 65 73 3a 20 57 65 64 2c 20 32 30 Expires: Wed, 20 0010: 20 4d 61 79 20 32 30 31 35 20 30 33 3a 30 36 3a May 2015 03:06: 0020: 33 33 20 47 4d 54 0d 0a 33 GMT.. <= Recv header, 46 bytes (0x2e) 0000: 4c 61 73 74 2d 4d 6f 64 69 66 69 65 64 3a 20 46 Last-Modified: F 0010: 72 69 2c 20 30 39 20 41 75 67 20 32 30 31 33 20 ri, 09 Aug 2013 0020: 32 33 3a 35 34 3a 33 35 20 47 4d 54 0d 0a 23:54:35 GMT.. <= Recv header, 24 bytes (0x18) 0000: 53 65 72 76 65 72 3a 20 45 43 53 20 28 69 61 64 Server: ECS (iad 0010: 2f 31 38 32 41 29 0d 0a /182A).. <= Recv header, 14 bytes (0xe) 0000: 58 2d 43 61 63 68 65 3a 20 48 49 54 0d 0a X-Cache: HIT.. <= Recv header, 22 bytes (0x16) 0000: 78 2d 65 63 2d 63 75 73 74 6f 6d 2d 65 72 72 6f x-ec-custom-erro 0010: 72 3a 20 31 0d 0a r: 1.. <= Recv header, 22 bytes (0x16) 0000: 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 Content-Length: 0010: 31 32 37 30 0d 0a 1270.. <= Recv header, 2 bytes (0x2) 0000: 0d 0a .. <= Recv data, 1127 bytes (0x467) 0000: 3c 21 64 6f 63 74 79 70 65 20 68 74 6d 6c 3e 0a . 0010: 3c 68 74 6d 6c 3e 0a 3c 68 65 61 64 3e 0a 20 20 .. 0020: 20 20 3c 74 69 74 6c 65 3e 45 78 61 6d 70 6c 65 Example 0030: 20 44 6f 6d 61 69 6e 3c 2f 74 69 74 6c 65 3e 0a Domain. 0040: 0a 20 20 20 20 3c 6d 65 74 61 20 63 68 61 72 73 . . -------------------------------------------------------------------------- $ curl -s --trace-ascii - http://www.example.com -o /dev/null -------------------------------------------------------------------------- == Info: About to connect() to www.example.com port 80 (#0) == Info: Trying 93.184.216.34... == Info: connected == Info: Connected to www.example.com (93.184.216.34) port 80 (#0) => Send header, 79 bytes (0x4f) 0000: GET / HTTP/1.1 0010: User-Agent: curl/7.26.0 0029: Host: www.example.com 0040: Accept: */* 004d: == Info: additional stuff not fine transfer.c:1037: 0 0 == Info: HTTP 1.1 or later with persistent connection, pipelining supported <= Recv header, 17 bytes (0x11) 0000: HTTP/1.1 200 OK <= Recv header, 22 bytes (0x16) 0000: Accept-Ranges: bytes <= Recv header, 31 bytes (0x1f) 0000: Cache-Control: max-age=604800 <= Recv header, 25 bytes (0x19) 0000: Content-Type: text/html <= Recv header, 37 bytes (0x25) 0000: Date: Wed, 13 May 2015 03:06:07 GMT <= Recv header, 19 bytes (0x13) 0000: Etag: "359670651" <= Recv header, 40 bytes (0x28) 0000: Expires: Wed, 20 May 2015 03:06:07 GMT <= Recv header, 46 bytes (0x2e) 0000: Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT <= Recv header, 24 bytes (0x18) 0000: Server: ECS (iad/182A) <= Recv header, 14 bytes (0xe) 0000: X-Cache: HIT <= Recv header, 22 bytes (0x16) 0000: x-ec-custom-error: 1 <= Recv header, 22 bytes (0x16) 0000: Content-Length: 1270 <= Recv header, 2 bytes (0x2) 0000: <= Recv data, 1270 bytes (0x4f6) 0000: ... Example Domain. 0040: . . . . ....
. 03c0:

Example Domain

.

This domain is established to be 0400: used for illustrative examples in documents. You may use this. 0440: domain in examples without prior coordination or asking for pe 0480: rmission.

.

More information...

.
... -------------------------------------------------------------------------- $ curl -s --trace-ascii - http://www.example.com -o /dev/null | grep -vP "^[<>=]" | cut -b7- -------------------------------------------------------------------------- GET / HTTP/1.1 User-Agent: curl/7.26.0 Host: www.example.com Accept: */* HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: max-age=604800 Content-Type: text/html Date: Wed, 13 May 2015 03:10:45 GMT Etag: "359670651" Expires: Wed, 20 May 2015 03:10:45 GMT Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT Server: ECS (iad/182A) X-Cache: HIT x-ec-custom-error: 1 Content-Length: 1270 ... Example Domain. . . . . ....
.

Example Domain

.

This domain is established to be used for illustrative examples in documents. You may use this. domain in examples without prior coordination or asking for pe rmission.

.

More information...

.
... -------------------------------------------------------------------------- $ curl -s --trace-ascii - http://www.example.com -o /dev/null | grep -vP "^[<>=]" | cut -b7- | head -n 17 -------------------------------------------------------------------------- GET / HTTP/1.1 User-Agent: curl/7.26.0 Host: www.example.com Accept: */* HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: max-age=604800 Content-Type: text/html Date: Wed, 13 May 2015 03:15:41 GMT Etag: "359670651" Expires: Wed, 20 May 2015 03:15:41 GMT Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT Server: ECS (iad/182A) X-Cache: HIT x-ec-custom-error: 1 Content-Length: 1270 -------------------------------------------------------------------------- 显然,就原始需求而言,--trace-ascii不如-v好用,前者无法有效分离请求报文的 Header。 前面已经很好地演示了使用curl时如何查看请求报文的Header,在shell script编程 中可以利用这些技术。现在我们演示在pycurl编程中查看请求报文的Header。 -------------------------------------------------------------------------- #!/usr/bin/env python # -*- encoding: utf-8 -*- import sys, pycurl from cStringIO import StringIO info = [] send_header = [] recv_header = [] recv_body = [] ssl_recv_body = [] def debug_callback ( debug_type, debug_data ) : if pycurl.INFOTYPE_TEXT == debug_type : info.append( debug_data ) elif pycurl.INFOTYPE_HEADER_OUT == debug_type : send_header.append( debug_data ) elif pycurl.INFOTYPE_HEADER_IN == debug_type : recv_header.append( debug_data ) # # 对于HTTPS,这里存放的是解密后的数据 # # 如果响应报文Header中含有 # # Content-Encoding: gzip # Content-Length: 8379 # # 这里存放的是压缩数据,并未自动解压还原。Content-Length对应压缩数据的 # 长度。但c.body.getvalue()返回的是解压后的数据。 # elif pycurl.INFOTYPE_DATA_IN == debug_type : recv_body.append( debug_data ) # # 对于HTTPS,这里存放的是解密前的数据 # elif pycurl.INFOTYPE_SSL_DATA_IN == debug_type : ssl_recv_body.append( debug_data ) # # end of debug_callback # httpheader = \ [ r'User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C; .NET4.0E)', r'Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*', r'Connection: close' ] c = pycurl.Curl() c.setopt( pycurl.URL, 'https://www.cia.gov' ) c.setopt( pycurl.HTTPHEADER, httpheader ) c.setopt( pycurl.FOLLOWLOCATION, True ) #c.setopt( pycurl.MAXREDIRS, 5 ) #c.setopt( pycurl.REDIR_PROTOCOLS, 3 ) #c.setopt( pycurl.PROTOCOLS, 3 ) c.setopt( pycurl.SSL_VERIFYPEER, False ) c.setopt( pycurl.SSL_VERIFYHOST, False ) c.setopt( pycurl.ENCODING, "" ) c.setopt( pycurl.SSL_CIPHER_LIST, "DEFAULT" ) c.setopt( pycurl.DEBUGFUNCTION, debug_callback ) # # 查看pycurl源码中的module.c、easy.c,pycurl没有真正支持CURLOPT_STDERR # # c.setopt( pycurl.STDERR, ... ) # # pycurl.NOPROGRESS缺省就是True # # c.setopt( pycurl.NOPROGRESS, True ) # # the DEBUGFUNCTION has no effect until we enable VERBOSE # c.setopt( pycurl.VERBOSE, True ) # # 不要用CURLOPT_NOBODY,它将导致HEAD请求 # # c.setopt( pycurl.NOBODY, True ) # # 靠下面两行代码捕捉响应报文的Body,避免其向stdout输出 # c.body = StringIO() c.setopt( pycurl.WRITEFUNCTION, c.body.write ) c.perform() print "".join( send_header ) print "".join( info ) print "".join( recv_header ) print c.body.getvalue() -------------------------------------------------------------------------- GET / HTTP/1.1 Host: www.cia.gov Accept-Encoding: deflate, gzip User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C; .NET4.0E) Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */* Connection: close GET /index.html HTTP/1.1 Host: www.cia.gov Accept-Encoding: deflate, gzip User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C; .NET4.0E) Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */* Connection: close Rebuilt URL to: https://www.cia.gov/ Trying 23.47.179.36... Connected to www.cia.gov (23.47.179.36) port 443 (#0) TLSv1.2, TLS Unknown, Unknown (22): TLSv1.2, TLS handshake, Client hello (1): SSLv2, Unknown (22): TLSv1.2, TLS handshake, Server hello (2): SSLv2, Unknown (22): TLSv1.2, TLS handshake, CERT (11): SSLv2, Unknown (22): TLSv1.2, TLS handshake, Server key exchange (12): SSLv2, Unknown (22): TLSv1.2, TLS handshake, Server finished (14): SSLv2, Unknown (22): TLSv1.2, TLS handshake, Client key exchange (16): SSLv2, Unknown (20): TLSv1.2, TLS change cipher, Client hello (1): SSLv2, Unknown (22): TLSv1.2, TLS handshake, Finished (20): SSLv2, Unknown (20): TLSv1.2, TLS change cipher, Client hello (1): SSLv2, Unknown (22): TLSv1.2, TLS handshake, Finished (20): SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 Server certificate: subject: jurisdictionC=US; C=US; ST=Virginia; L=McLean; businessCategory=Government Entity; serialNumber=Government Entity; O=Central Intelligence Agency; CN=www.cia.gov start date: 2015-04-01 00:00:00 GMT expire date: 2016-04-11 23:59:59 GMT issuer: C=US; O=Symantec Corporation; OU=Symantec Trust Network; CN=Symantec Class 3 EV SSL CA - G3 SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway. SSLv2, Unknown (23): SSLv2, Unknown (23): Closing connection 0 SSLv2, Unknown (21): TLSv1.2, TLS alert, Client hello (1): Issue another request to this URL: 'https://www.cia.gov/index.html' Hostname www.cia.gov was found in DNS cache Trying 23.47.179.36... Connected to www.cia.gov (23.47.179.36) port 443 (#1) SSL re-using session ID SSLv2, Unknown (22): TLSv1.2, TLS handshake, Client hello (1): SSLv2, Unknown (22): TLSv1.2, TLS handshake, Server hello (2): SSLv2, Unknown (20): TLSv1.2, TLS change cipher, Client hello (1): SSLv2, Unknown (22): TLSv1.2, TLS handshake, Finished (20): SSLv2, Unknown (20): TLSv1.2, TLS change cipher, Client hello (1): SSLv2, Unknown (22): TLSv1.2, TLS handshake, Finished (20): SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 Server certificate: subject: jurisdictionC=US; C=US; ST=Virginia; L=McLean; businessCategory=Government Entity; serialNumber=Government Entity; O=Central Intelligence Agency; CN=www.cia.gov start date: 2015-04-01 00:00:00 GMT expire date: 2016-04-11 23:59:59 GMT issuer: C=US; O=Symantec Corporation; OU=Symantec Trust Network; CN=Symantec Class 3 EV SSL CA - G3 SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway. SSLv2, Unknown (23): SSLv2, Unknown (23): Closing connection 1 SSLv2, Unknown (21): TLSv1.2, TLS alert, Client hello (1): HTTP/1.1 301 Moved Permanently Content-Length: 0 Location: https://www.cia.gov/index.html Date: Wed, 13 May 2015 06:55:35 GMT Connection: close HTTP/1.1 200 OK ETag: "d0add78de1a39a4786dd944690fb2543:1431444402" Last-Modified: Tue, 12 May 2015 15:14:21 GMT Accept-Ranges: bytes Content-Type: text/html Vary: Accept-Encoding Content-Encoding: gzip Date: Wed, 13 May 2015 06:55:36 GMT Content-Length: 8379 Connection: close ... -------------------------------------------------------------------------- 真正的要点是: c.setopt( pycurl.DEBUGFUNCTION, debug_callback ) 在回调函数debug_callback()中可以任意处理请求、响应报文的Header、Body。 curl/libcurl/pycurl会自动处理HTTPS所涉及的加解密环节,在临时测试HTTPS时, 前述技术极大地方便了测试者,使得我们专注于明文部分。