Your e-Business Quality Partner eValid™ -- Automated Web Quality Solution
Browser-Based, Client-Side, Functional Testing & Validation,
Load & Performance Tuning, Page Timing, Website Analysis,
and Rich Internet Application Monitoring.

eValid -- Orphan File Identification
eValid Home

Summary
eValid can be used to assist in identifying orphan files on a server. This page explains how this is done and makes recommendations

Background
An orphan file is one that exists on a website's server, but which cannot be accessed by a user from a browser. An orphan file -- being by definition unavailable to a user via a browser request -- cannot introduce any low-quality perception problems on the part of a user.

Orphan files arise through normal website maintenance, through oversights or errors by the webmaster, or from a variety of other causes. In normal operation, orphan files are not a problem, except that: (i) too many orphan files make website maintenance confusing; and, (ii) may increase the chance of errors.

eValid Solution
When eValid makes a complete site analysis of a website in Browser Mode it will visit every page and image that can be reached. On the server side, the server operating system will record the file as having been accessed or visited.

By having high confidence that the files in your website actually were visited, you can have confidence that deleting orphan files won't cause any problems. However, is is good practice to run the complete eValid site analysis over again after candidate orphan files are removed -- just to confirm the analysis.

Usage Recommendations
To assure you visit all accessible files it is best to make the eValid site analysis run using: (1) Full Browser Mode; (2a) without use of the cache if you have manually deleted the cache; or (2b) delete cache on start of run; (3) using minimal or no Excluded Files so that the search is as thorough as possible. These may require more time but the result is more accurate.

UNIX-Based Servers
The "last accessed (used)" attribute for files in any particular folder can be seen with the ls command using the u option. You might try the commands ls -lust or ls -lrust (reverse order).

Consult the UNIX documentation for your machine with man ls for complete details on this command.

Locate all files not accessed with a specified time (the smallest interval is one day) with the command: find . -name "*" -atime +1 -print. The -atime +1 clause in this command causes the find command to report those files it finds which were last accessed more than +1 day ago. If the eValid search was completed less than one day ago such files are candidate orphan files.

Consult the UNIX documentation for your machine with man find for complete details on this command.

Windows-Based Servers
Windows servers also record the time at which any file was last accessed. Use the Windows File Explorer command to display files. Move to the folder at which you suspect there are orphan files. Right-click on the menu bar to show the display options. Click "Accessed" ON. The display will now show each file in that folder in the order in which it was accessed. If the eValid search was completed less than one day ago files which were not accessed within that time are candidate orphan files.

eValid Site Analysis Searching Limitation
The eValid SiteMap engine examines the URL string to determine if it is "searchable". The following are not searchable, but are included in eValid mappings:

Protocols: JAVASCRIPT, MAILTO, NEWS

Suffixes: .gz, .tgz, .tar, .jar, .zip, .css, .xml, .pdf, .doc, .ppt, .gif, .png, .jpg, .jpeg

If an actual URL link exists within one of these types of files it is not visited by eValid because eValid does not scan these non-searchable files for possible links.

The section below is provides additional details
about orphan file identification procedures and related risks.

Warnings And Cautions
A file on a website server is only truly an orphan file if its removal from the server file system will not cause any failure evident to the user who views or uses the site through a browser.

There are certain technical problems with all non-browser approaches to identifying orphan files:

The conclusion [and caution] to be understood here is that server-side orphan file identification and removal needs to be coupled with systematic recheck of potentially affected page generation to be a completely reliable process. If removal of the file breaks the site [causes a page to download incorrectly] then the file is not an orphan.