[ Top | Up | Prev | Next | Map | Index ]

Readme for analog 4.01

Search arguments

Sometimes a URL contains arguments after a question mark. For example, the URL
/cgi-bin/script.pl?x=1&y=2
runs the /cgi-bin/script.pl program with arguments x=1 and y=2. (Sometimes the server records these arguments in a separate field in the logfile, but if so you can use the %q field in the LOGFORMAT command, and analog will translate the filename to the above format).

You can tell analog either to read or to ignore the arguments using the commands ARGSINCLUDE and ARGSEXCLUDE which we'll discuss in a minute. But by default, all arguments are read, and as this is usually what you want, you don't usually need those commands.

You don't always see the arguments in the reports, even if they're being read, because analog doesn't show them if there aren't enough of them. In order to see them, you have to set the corresponding ARGSFLOOR parameter low enough.

Also note that within a report, the search arguments are listed immediately under the file to which they refer. This temporarily interrupts the normal order of the files. It may be clearer if you turn the N column on.


Assuming that the arguments are being read, analog treats the file /cgi-bin/script.pl?x=1&y=2 as a different file from /cgi-bin/script.pl (or from /cgi-bin/script.pl?y=2&x=1 for that matter). It doesn't look like that in the Request Report because you see a grand total for /cgi-bin/script.pl with all its different arguments. But it matters if you want to do inclusions and exclusions or aliases on the file.

The reason is that, for example, the command

FILEINCLUDE /cgi-bin/script.pl
doesn't match the file /cgi-bin/script.pl?x=1&y=2. To match that, you would have to use something like
FILEINCLUDE /cgi-bin/script.pl*
instead. Similarly
FILEALIAS /cgi-bin/script.pl /script.pl
will change /cgi-bin/script.pl itself, but not /cgi-bin/script.pl?x=1&y=2. You might want to use something like
FILEALIAS /cgi-bin/script.pl?* /script.pl?$1
as well. (However, PAGEINCLUDE and PAGEEXCLUDE always refer to the part of the filename before the question mark.)
The alternative is to tell analog not to read the search arguments. There are commands called ARGSINCLUDE and ARGSEXCLUDE, and REFARGSINCLUDE and REFARGSEXCLUDE, to do this. They work the same as the other INCLUDE and EXCLUDE commands which we discussed in the previous section. So, for example, if the command
ARGSEXCLUDE /cgi-bin/script.pl
were given, analog would ignore the arguments to that file, and so read /cgi-bin/script.pl?x=1&y=2 as just /cgi-bin/script.pl. On the other hand, if
ARGSINCLUDE /cgi-bin/script.pl
were specified, analog would read the arguments, and so treat /cgi-bin/script.pl?x=1&y=2 as a different file from /cgi-bin/script.pl. REFARGSINCLUDE and REFARGSEXCLUDE are the same for referrers.

Technical note: the check for whether the arguments should be included happens before the filename has been subject to either built-in or user-specified aliases. So you have to use the unaliased name, exactly as it occurs in the logfile. For example, ARGSINCLUDE /~sret1/script.pl won't match /%7Esret1/script.pl even though they are really the same file. It also means that you can't use "pages" in the ARGSINCLUDE or ARGSEXCLUDE command, because we don't know whether a file is a page until after it's been aliased.


There is a related command called SEARCHENGINE. If you have referrers with search arguments, usually from search engines, you can tell analog which field corresponds to the search term. It uses this information to compile the Search Query Report and the Search Word Report. For example, consider the referrer
http://www.altavista.com/cgi-bin/query?pg=q&kl=XX&q=carrot+cake
The search term is in the field q= so the appropriate SEARCHENGINE command is
SEARCHENGINE http://www.altavista.com/cgi-bin/query q
or even better
SEARCHENGINE http://*.altavista.*/* q
to allow for all their mirror sites in different countries.

Sometimes a search engine has two or more possible fields for the search term. In that case you can list all of them separated by commas, like this:

SEARCHENGINE http://*.webcrawler.*/* search,searchText

Stephen Turner
Need help with analog? Subscribe to the analog-help mailing list

[ Top | Up | Prev | Next | Map | Index ]