123 Street, NYC, US 0123456789 info@example.com

web hosting

The Designing of Web Services to Deliver Web Documents Associated with Historical Links

The historical links of a web site include the URLs invalidated due to web site reorganization, document removal, renaming or relocation, or links to document snapshots, which are defined as the document’s contents as of a specific point in time. Tracking historical links will allow users to use out-of-date URLs, retrieve removed documents and document snapshots. This paper presents a logging and archiving scheme to track a document’s history of changes, and designs a web service to deliver the web document associated with a historical link.

A web site is a system of integrated web documents embedded with links to reference related documents. At any given time, a Uniform Resource Locator (URL) uniquely identifies a document on the Internet. The URLs of all web documents currently published by a web site are web site’s current links. One of the basic services of a web server is to deliver the document associated with a URL submitted by users. However, not all URLs submitted to a website are valid. There are many causes that will render an old URL invalid: the host may reorganize by changing its directory structure, hence changing the path to a document; a document may be renamed, deleted, or moved to a different directory. The document may no longer exist or have a new URL. Even valid URLs don’t necessarily retrieve the user’s desired document considering the following two situations: (1) The relationship between URL and web document is m:m where each URL may be associated with many documents and each document may be associated with many URLs overtime. A URL may point to a document that is different from the one it pointed to earlier. This may happen because of the renaming of documents and directories. For instance, a document A may be renamed to C and another document B may be renamed to A; hence the URL originally pointing to A is now point to a different document. On the other hand, a web document will have a different URL when it is renamed or moved to a different directory. (2) A document may have gone through several changes. A URL typically retrieves the current version of a document and its contents may be different from the last download. There are times when users would like to see the previous versions of a document. These previous versions are snapshots of a document as of a specific point in time. Studying a document’s snapshots may help users identify trends and patterns of changes in a document. Together, the URLs invalidated by a web site due to reorganization, document removal, renaming, or relocation, plus the links to document snapshots are a web site’s historical links. Maintaining historical links enables a web site to retrieve the document associated with an out-of-date URL submitted by users, deleted documents, and document’s snapshots. This will improve a web site’s ability in searching documents and provide needed historical information for users. This research has two objectives: First it presents a logging scheme to track historical links by recording the time and the type of changes that occurred to a document and the document’s URL and an archiving to retrieve document snapshots and deleted documents. Tracking changes has been the research topic in many areas such as managing database snapshots [1], materialized views [4], and document versions [6]. Using a log to record changes [2] [3] and archiving the many versions of a document are typical methods in tracking changes. Second, it designs a web service to deliver the web document associated with a historical link. Web server is designed to retrieve current document associated with a URL. Retrieving document associated with a historical link requires dedicated programs to work with the logging scheme. Although these programs may be implemented in many ways, such as server-side scripts, implementing them as a web service has the benefits of assembling related programs in one service and the flexibility in integrating with systems that need historical documents. In the following paper, Section 2 presents an analysis of the problem; Section 3 presents the logging and archiving scheme; Section 4 presents the algorithms for processing historical links and web service design; and Section 5 is a summary.

ARMS10