httpGrab.pl Documentation
- NAME
- Purpose
- SCRIPT CATEGORIES
- PREREQUISITES
- COREQUISITES
- OSNAMES
- Description
- Usage
- Outstanding issues
NAME
htpGrab.pl
Purpose
httpGrab.pl uses the LWP library to make HTTP requests.
SCRIPT CATEGORIES
HTTP - suggested
PREREQUISITES
This script depends on both the strict
and vars
pragmas. The script also uses
the LWP::UserAgent
and Getopt::Long
modules.
COREQUISITES
If the Time::HiRes
module is available, it is used to generate higher-resolution
timings on the time test criterion and the script timing.
If the MIME::Base64
module is available, it can be used to generate Basic
HTTP authentication for a request.
OSNAMES
any. Tested on MSWin32 and Linux.
Description
httpGrab.pl uses the LWP library to make HTTP requests. Although the output is normally written to STDOUT, httpGrab.pl can also be used in simple profiling and other tests.
httpGrab.pl began as a simple script to understand the LWP::UserAgent
module. Although it was originally written as a throw-away script, it was useful
in working on dynamic web sites. As time went on, the script was enhanced with
various extra features to make it useful for the work I was doing. This
accumulation of features has resulted in a script unlike any other of its type.
Usage
Usage httpGrab.pl [options] url
or httpGrab.pl [options] -f file
Where options are any of
-m method perform a 'method' request instead of a GET
-a agent-string provide a user-agent string -c cookie provide a cookie, can be used multiple times for multiple cookies. -t content-type provide a content-type for POST requests -x proxy provide a proxy server for the request -H header=value provide header data for the request, can be used multiple times, argument is of the form header=value -A userid:password provide a userid and password string for HTTP Basic authentication
-b output the body of the response -B output the body of the response, forced binary write -h output the headers of the response -r output the response line
-p [n] profiling, return the time taken in seconds, supports an optional number of repetitions -n number of repetitions, when profiling (deprecated)
-f file load urls from a file instead of from command line -s simple request, do not follow redirects
The options are separated into four groupings. The first set defines the request. The second set defines what parts of the response are printed. The third set specifies profiling options. The last option specifies a file containing a list of URLs to request.
Request Options
The -m option allows the user to specify the HTTP method to use on this request. The default is 'GET'. If the method specified is 'POST', httpGrab.pl will retrieve the body of the request from STDIN.
The -a option specifies a user agent identification. This is particularly useful for pages that have different behavior for different browsers. The default user-agent string is 'httpGrab/0.92'.
The -c option specifies a cookie to be sent with the request. The -c option may be supplied multiple times to send multiple cookies. The value of the argument for this option is of the form 'name=value'.
The -t option specifies the content-type header for a 'POST' request. The default is 'application/x-www-form-urlencoded'. This content type is ignored unless the specified method is 'POST'.
The -x option specifies a proxy server to use when attempting to reach the specified URLs.
The -H option provides a header for the request. This option can be supplied multiple times for multiple headers. The argument for this option is of the form 'header=value'.
The -A option provides a userid and password for use with the HTTP Basic authentication scheme.
Response Options
By default, httpGrab.pl prints all of the response to STDOUT. This behavior can be modified through the use of one or more of the following options. To duplicate the default behavior, use the options -rhb.
The -b option causes the body of the response to be written to STDOUT.
The -h option causes the headers of the response to be written to STDOUT.
The -r option causes the response line to be written to STDOUT.
The -B option causes the body of the response to be written to STDOUT, just like the -b option. However, this option causes the output to be written as binary. This is particularly useful to allow the retrieval of binary data that has been misidentified by the server without translation by some operating systems.
Profiling Options
The profiling options perform only the most basic timing test. The time is measured from the beginning of the request until the response is completely returned. The profiling in httpGrab.pl does not (currently) support downloading any embedded components of the page, such as images or stylesheets.
The -p option turns on profiling and optionally supplies a number of times to repeat the request for better accuracy.
The deprecated -n option supplies a number of times to repeat the request. That ability is now provided by the -p option.
URL File
The -f option specifies a file from which to read a list of URLs. These URLs will be requested in order by httpGrab.pl. The only possible surprising result is the interaction between -f and -p. If the -p supplies a number, httpGrab.pl makes the request multiple times on the first URL. Then, it runs makes multiple requests on the second URL, etc.
Outstanding issues
In general httpGrab.pl works fairly well, but there are a few features I would like to add, at some point.
- Ability to do SSL.
- Ability to retrieve embedded content and stylesheets for timing.
To my knowledge there are no bugs in the current release.