|
Introduction
This page describes the details of the process used by eValid Site Analysis in processing a WebSite search.
It is written with the assumption that you are already familiar with the terms and eValid Site Analysis
options mentioned.
Basic Mapping Process
A WebSite is mapped by creating
a worklist of all the URLs specified on the starting page.
Each URL is then visited and,
if it resides on the same WebSite/sub-WebSite as the starting page,
it is in turn mapped and the URLs it contains are added to the worklist.
Any link that is OFF the starting WebSite/sub-WebSite is visited (to check for its existence) but is not mapped.
Processing continues from the current worklist until every URL has been visited once. At the end of the search process the completed worklist is used as the basis for the map reports.
Limits to Mapping
The following settings impose or relax limits on the Basic Mapping Process:
DEFAULT: Search for All Links.
This means to search with ALL EXTENSIONS and BROWSER PROTOCOLS.
DEFAULT: "html, htm"
Only add the URLs to the worklist if they have one of the specified Extensions or they contain one of the specified Query Strings.
Note that if you specify multiple such query strings, they are treated as if they were OR'd together: a match on any of them means the link is accepted for analysis.
DEFAULT: "http:"
Only add the URLs to the worklist if they have one of the specified Protocols.
Note: URLs without an explicit protocol are assumed to be HTTP:.
Example: mailto and ftp and file are alternative protocols that could be mapped.
DEFAULT: (empty) (starting WebSite is implied)
Additional WebSites or sub-WebSites that are to be searched can be added to this list.
URLs that are on such WebSites will be treated as if they are on the starting WebSite, i.e. they will be visited AND mapped and their URLs will be added to the worklist.
Example: If the starting page is http://www.this.com/info/index.html the starting WebSite is this.com/info.
The user could add "this.com" to the include WebSites/sub-WebSites list to ensure ALL pages on the WebSite are mapped, not just those on the this.com/info sub-WebSite. As each page is mapped, the worklist is built up according to the above criteria.
Whether a link is then visited (and in turn mapped to have its URLs added to the worklist) depends on the following criteria.
DEFAULT: (empty)
This is a text file, specified by name in the SiteMap Preferences.
Any string added to this file that is not a #comment is used to determine which URLs are to be excluded.
Any URL which contains one or more of these strings is not visited and is marked on the worklist as [Excluded URL].
DEFAULT: OFF
If this is set to ON then URLs that are 'off-WebSite', according to the above, are not visited. They are marked on the worklist as [Off-Site].