{= include("docs.util"); start_docs_page(docs.technical_manual.page_titles.xref); =}
$PRODUCT_NAME lets you "zoom in" using complex filters, for instance to break down the events on any particular day by page (in a web log, to see which pages were hit on that day), or to break down the events on any page by day (to see which days the page was accessed). $PRODUCT_NAME can be configured to allow this sort of cross-referencing between any or all fields in the database. This zooming ability is always available, but without cross-reference tables it must scan the entire main table to compute results, which can be slow for large datasets. Cross-reference tables provide "roll-ups" of common queries, so they can be computed quickly without reference to the main log data table.
Cross-references are not an enabling feature, as they were in earlier versions of $PRODUCT_NAME -- all reports are available, even if no cross-reference tables are defined. Cross-reference tables are an optimizing feature, which increase the speed of certain queries.
Another way of looking at this feature is in terms of filters; when two fields are cross-referenced against each other, $PRODUCT_NAME is able to apply filters quickly to both fields at the same time, without needing to access the main table.
If two fields are not cross-referenced against each other, $PRODUCT_NAME can apply filters to one field or the other quickly, but filtering both simultaneously will require a full table scan. If the page field is not cross-referenced against the date/time field, for instance, $PRODUCT_NAME can quickly show the number of hits on a /myfile.html, or the number of hits on Jun/2004, but not the number of hits on /myfile.html which occurred during Jun/2004 (which will require a full table scan). This means not only that $PRODUCT_NAME can't quickly show a page with filters applied to both fields in the Filters section, but also that $PRODUCT_NAME cannot quickly show "pages" report when there is a filter on the date/time field, or a "years/months/days" or "days" report when there is a filter on the page field, since the individual items in these views effectively use simultaneous filters to compute the number of hits.
On the other hand, cross-reference tables use space in the database. The more fields you cross-reference, the larger and slower your database gets. Restricting cross-referencing only to those fields that you really need cross-referenced is a good way to limit the size of your database, and speed browsing of the statistics.
Cross-references are set by default for each field, but no two non-aggregating fields are included in the same cross-reference group. This is a fairly minimal use of cross-references, but for faster database builds, you can delete those as well (at a cost in query speed, when the database main table needs to be queries because there is no cross-reference group available).
Generally, you should start out with few cross-references; a default analysis is a good starting point. If you need a type of information not available (for instance, if you want to know the browser versions that are accessing a particular page), and the report generates too slowly, try adding the necessary cross-references. See {=docs_chapter_link('resources')=} for more information on optimizing your memory, disk space, and processing time.
Again, cross-references are never necessary to generate a particular report -- they just make reports faster.
{= end_docs_page() =}