This Perl script uses the Google API to search Google for a query string, and returns a list of the web hosts found in the set of results. You can then expand any of these hosts and display only the results from that host.

The hosts from 10 result pages are displayed at a time; use the Next, Previous, and First links above the list of hosts to page through these. There may be fewer than 10 hosts displayed, if more than one of the result pages was on the same host. Also, a set of hosts might contain some you've already seen on previous pages. The script simply gathers the host names from the 10 results returned by Google, and does not attempt to filter out those you've seen before. Google's own filter does a passable job of this, however.

Clicking on the triangle to the left of a host will perform the same query again, but restricted to that host (using Google's "site:www.foo.com" query syntax), and expand the listing to display the first 10 results. If there are more results, you can page through the results for the host with the Next, Previous, and First links directly under the host name. Clicking on the triangle next to an expanded host will collapse the listing.

If you want more flexibility in navigating through a particular set of results, click the View in Google link under the name of the host you're interested in. The same query will open directly in Google in a new browser window.

Form Fields

Find: The query string to be passed to Google. You can use Google's normal query syntax (quoted phrases, OR, + and -, etc.) and keywords (daterange:, intitle:, allinurl:, etc.—although link: and related: will not work).

License key: Google requires a license key to be passed for each query. These keys are assigned when a developer signs up to use the Google API, and each key currently allows 1000 queries per day. By default, these scripts use staggernation.com's key; since the scripts do multiple queries (each time you expand a listing in the outline or click a Next or Previous link, it's another query), we might hit that limit rather quickly. For this reason, if you've signed up for the Google API program and you're not approaching the 1000-query limit with your own key, we would appreciate it if you could enter it here, especially if you plan to use these scripts extensively. Staggernation.com will not store your key or do anything with it except pass it to Google for whatever searches you do with these scripts.

DHTML Outline

The GAWSH script uses a hidden frame and some JavaScript and DHTML to generate a dynamic outline (notice that the whole outline does not reload when you expand or collapse a topic). This is currently accomplished with the DHTML innerHTML attribute, which is only supported by Internet Explorer 5 and up and Mozilla/Netscape 6. So the script will only work with those browsers.

It probably wouldn't be too difficult to implement a W3C-DOM-compatible version of this outlining technique. If you'd like us to try, or have any other suggestions for cross-browser ways of doing a dynamic outline, please contact us (see below). And if you want to try re-coding the JavaScript yourself (it's not very complicated), by all means grab the source code and see what you can do.

Source Code

GAWSH has the following components:

  • index.html: The main page, a frameset that calls the CGI script into its lower frame; if the script is called into the top level of a browser window, it will redirect to this page.
  • blank.html: A blank page that's initially loaded into the hidden top frame. Don't look to it for wisdom, for it has none to impart.
  • gawsh.cgi: the Perl CGI script
  • ga_lib.pl: a Perl code library with some routines shared by the Google API scripts
  • ga_outlinelib.pl: a Perl code library with some routines shared by the scripts that do the DHTML outlining thing (currently, GAWSH and GARBO)
Please feel free to download, peruse, and improve as you see fit. The scripts were coded hastily, so there's probably plenty of room for improvement.

If you'd like to host a mirrored version of any of these scripts on your own site, that would be great. A standard Perl installation plus SOAP::Lite and URI::Escape (and a Google API license key) should be all you need.

Change History

4/24/02 - version 1.0 released

To Do

Possible future enhancements, some less bloody likely than others:

  • W3C-DOM version of dynamic outlining code
  • Use young, hip CSS instead of old, stodgy spacer GIFs to indent outline elements
  • Show ODP page summary and category if present
  • Display Google "search comments" field if returned
  • Option to sort hosts
  • Option to use domains as the first outline level, which would expand to hosts
  • Use inurl: keyword to create lower outline levels, somehow based on path components beyond the hostname. This would be especially useful for situations where many sites share the same host, eg. www.geocities.com.

Contact

Email googlescripts [AT] staggernation [DOT] com with questions, comments, bug reports, feature requests, and like that.


The End As I Know It: A Novel of Millennial Anxiety, by staggernation.com proprietor Kevin Shay, is now available in paperback.

Please visit kshay.com for more information.