Using Server-side Caching to Increase Web Site Performance and Scalability
By Marc Mulzer (Issue 4 2001)
Caching is a technique that provides increased performance and scalability for Web sites. This article discusses Web content caching and the various techniques for accessing static and dynamic content. It also details the advantages of server-side caching of dynamic content.
In the early days of the World Wide Web, the Internet consisted of primarily static information. Users shared fixed information over a public network. Dial-up connections were slow and online traffic was light.
Since those early years, the Internet has changed dramatically—and continues to change as content becomes increasingly dynamic. Powerful Web servers are now required to handle the increasing demands from users. Additionally, bandwidth capacity continues to improve as users demand instant access to entire business applications through their browsers. With these ever-increasing needs, server-side caching emerges as the best response to the tremendous performance and scalability requirements of the modern Web site.
Static versus dynamic content
HTML and graphic files represent static content because they look identical each time users download them into their browsers. The Web server locates the requested files on its hard drives and pushes those files directly to the clients.
Dynamic content is displayed information that can change with every user request. This content is based on a set of instructions executed on the server side. When the Web server locates the requested script file on the local hard drive, it then runs this script inside a script engine, which turns the result into static HTML code. This dynamically created data, not the code hidden inside the dynamic page originally requested, is sent to the browser for display.
This process establishes the Web server as a Web application server. The served content is no longer static. Information may change each time it is displayed, depending on the code that is being executed. Furthermore, the server discards the results from its memory once the browser has received all of the data.
Database-driven Web sites, such as e-commerce applications, are typical examples of dynamic content. In this case, the visual appearance of the Web page depends on the data stored in the tables.
Dynamic caching: capturing the static data
Dynamic content puts a heavy load on the Web server since it requires immense processing power to produce dynamic results for thousands of visitors in real time. Database servers also work hard to provide the data to these Web application servers. The simple technique of Web content caching can improve both Web site performance and scalability—quickly and efficiently.
A cache is a temporary library of files, such as HTML documents, designed and optimized for fast read-and-write access to short-lived data. Clients can receive frequently requested static HTML files much faster if they have been stored in a specific Web content cache.
Caching allows static files to be quickly located in the cache rather than downloaded each time from the Web site. The same approach applies if consecutive requests for a particular dynamic page produce identical results. In this case, output also can be stored in a cache and set aside for future use. Rather than reproducing the HTML version for each request, the server can send the page from its cache, which avoids repetitive work and saves processor cycles for more productive work.
A common dynamic content caching scenario, shown in Figure 1 , is a product catalog Web page that multiple users browse simultaneously. Chances are high that many individual users will request identical pages since the catalog data does not change frequently. In this example, the Web server could create this product page once, but serve it to clients multiple times from the cache.
Figure 1. Dynamic content caching
Current caching options for high performance
Forward and reverse caching are familiar techniques used for many Web sites that must meet high-performance and increased traffic demands of users. A third option, server-side caching, is becoming increasingly popular as a dynamic information caching method.
Forward proxy cache saves bandwidth
Traditionally, forward proxy servers are implemented on the edge of a corporate network to improve the delivery of external Web content to internal corporate users. The cache stores and delivers the most frequently accessed content from the millions of documents on the Web. This technique provides better quality of service for end users and reduces the networking cost necessary to retrieve these documents from the origin server.
Figure 2 shows the forward proxy cache.
Figure 2. Forward proxy cache
Although proxies do not directly affect the corporate Web application server performance, they do save precious bandwidth that can be allocated for both internal and external access to the Web application server.
Reverse proxy cache serves data quickly
Reverse proxy caches also reside on the border of a corporate network to the World Wide Web, but they deliver the content of a finite number of internal documents to the millions of external users on the entire Internet.
Requests for content to an internal Web server are filtered through the cache before they reach the source Web server. Reverse proxy caches, which store the most frequently accessed data, are optimized to serve data quickly. See Figure 3 for an illustration of the reverse proxy cache.
Figure 3. Reverse proxy cache
Reverse proxy cache appliances should be the first to provide static content to customers. In a successful implementation, reverse proxies positively affect performance of the static Web site.
Server-side dynamic cache improves performance
This caching technique improves performance and scalability by executing only necessary code on the Web application server or in the required database. A specifically designed filter between the server and the clients caches the output of dynamic requests.
Once the HTML has been saved to the Web content cache, the filter detects subsequent requests to the same dynamic page, intercepts the requests, and immediately responds with the cached output. Therefore, the Web server does not try to fulfill the request.
Figure 4 shows the process of server-side dynamic caching.
Figure 4. Server-side dynamic caching
Depending on the page content, server-side dynamic caching may be up to 1,000 times faster than processing the page on the fly. This caching process also saves bandwidth and processing cycles because the database does not need to be accessed and no business logic needs to be applied to the database.
Caching scenarios depend on type of content
The type of content to be stored is the primary factor for determining the best caching option for a Web site.
Reverse proxies for static content
Reverse proxies are ideal for caching unsecured, static content. Very large sites implement server appliances because these appliances are significantly less expensive and easier to maintain than the Web servers.
Reverse proxies are optimized to serve cached content outside a firewall, representing a cost- effective alternative for Web sites that require high performance and scalability. Reverse proxies keep Internet users who browse the static Web site outside the internal network, which reduces bandwidth consumption. Additionally, the reverse proxy server functions as a gatekeeper against service attacks, significantly reducing the risk of security breaches.
Dynamic content caching affects Web site architecture
Dynamic content caching solutions address the scaling and performance issues of Web application servers directly. They work well for heavily visited sites containing many dynamic features that must be performance-tuned quickly and easily.
For a Web site on the verge of utilizing the Web application server's maximum processing power, dynamic caching solutions enable administrators to extend the time frame before an additional machine must be added to manage load.
Unlike reverse proxies, dynamic content caching affects the architecture of the entire Web site, including the database servers. Depending on the solution, Web applications may need retrofitting so that caching will work. This may result in increased Web site development costs.
Server-side dynamic caching maximizes performance and scalability
Server-side dynamic caching, a fairly new concept, is specific to the Web server in use. However, this innovative strategy clearly maximizes the performance and scalability of both application and data layers in three ways:
- Less bandwidth consumption as fewer requests and responses pass through the network
- Less server load because the server handles fewer requests
- Less page load time since responses for cached requests are closer to the client and are available immediately
Some server-side dynamic content caching solutions also implement page compression. This functionality removes all the remarks and white space inside the HTML that is sent to the clients. Additionally, the file can be zipped, greatly reducing its size. Such cleaned and compressed pages download much faster and further enhance the user's experience with the site.
Determining cacheable pages
Two factors to consider when evaluating whether to cache a dynamic page include frequency of content change and level of demand for current content. The determining factor for prioritizing pages to cache is the anticipated hit count. More popular pages take precedence over less frequently requested pages.
The home page for a Web site is an excellent example of a popular page to cache. Generally it is the first page that users see; therefore, it has tremendous impact on the user experience with the site. Often used in e-commerce sites, daily specials and personalized content served right from the cache can improve the response time of the default page.
Other Web pages also can be partially cached. While parts of the page can be fairly static, the dynamic portion of the Web page must be refreshed for each request. A feature called partial page caching excludes portions of the page from the cache via special remarks inside the script code, which provides greater flexibility and more control. Figure 5 shows a page from the Dell Web site.
Figure 5. Partially cacheable page on the Dell Web site
Imagine running nine database queries to display the static content of the page and a single database request to display current shopping cart information. By caching the nine static queries and dynamically processing only the shopping cart query, it is possible to save 90 percent of the original amount of work for each request.
Some pages cannot be cached at all. Server-side script pages that process information submitted via an HTML form document must be executed for every request and should never reside in a Web content cache. It may be useful to cache the form itself, especially if fields are populated from a database table.
Practical considerations when caching content
The application server no longer creates pages for each request, but sends old copies to the clients. Therefore, users do not see content changes once a page has been cached.
Refreshing the Web content cache frequently by using its "Time To Live" (TTL) settings can resolve this problem. These settings allow the server to discard cached pages once their TTL has expired and to force the application server to reprocess the expired pages, which refreshes the cache with current information.
Establishing appropriate TTL settings requires careful testing because the settings depend heavily on the page functionality and the overall traffic on the site. If the database content is somewhat static, the application will benefit from server-side dynamic caching features combined with effective TTL settings. TTL settings as low as two minutes may result in huge performance gains on a heavily visited site.
The Web server's authentication, encryption, and security features also present challenges to both dynamic content and reverse proxy caching. Once the page resides inside the cache, it loses its security context and compromises any sensitive information stored inside the page. This problem means that caching secured sites should be carefully considered and thoroughly tested.
Commercial Web content caching solutions
Three commercial reverse-proxy and dynamic caching solutions available today are described in the following section.
This hardware-based server appliance is easy to set up and maintain. PowerApp® is a preconfigured, turnkey solution that includes the PowerApp Kick Start Utility software and the PowerApp administration tool for simplified deployment, administration, and management.
Its rack-dense 1U and 2U form factors make it ideal when space is at a premium. PowerApp cache installs in most disparate environments, including Microsoft® Windows NT® , Novell® NetWare® , and UNIX® . It also works well in both forward and reverse proxy modes.
X-Cache® , when combined with Microsoft Active ServerTM pages, speeds up Web page response and decreases user download time through content compression. Completely software-based, X-Cache requires no scripting changes, does not affect the log files, and integrates with Microsoft Internet Information Server (IIS) versions 4.0 and 5.0.
X-Cache offers various options for configuring the cache. The click of a mouse in the management interface can cache entire directories. It also sets the cached page TTL.
Most importantly, X-Cache exposes an application programming interface (API) that allows database developers to trigger a cache flush based on events inside the database. This feature enables developers to control the cache settings at runtime and to instruct the server to empty and rebuild the cache if a major database update has occurred.
X-Cache supports Secure Sockets Layer (SSL) encryption for cached pages and restores access control list (ACL) settings on the cached pages, greatly reducing security problems related to cached pages.
The new release of the ASP.NETTM server scripting technology from Microsoft has built-in caching that can be configured on a page basis, similar to the previous product. Since caching decisions are made on the code level only, pages must be specifically designed to benefit from the ASP.NET caching features. Caching is integrated into ASP.NET so there are no security or encryption issues.
ASP.NET includes two modes of dynamic content caching:
To cache an entire page. The first line of code in the ASPX file must include the @ output directive as shown in the example. In this case, the TTL for the page is set to 120 seconds:
< %@ OutputCache Duration="120" VaryByParam="None" % >
- Place entire objects of a Web form in the cache to be programmatically retrieved from the cache during subsequent requests.
Marc Mulzer (email@example.com) is a systems engineer in Server and Storage Systems Engineering, Web Technologies at Dell. Marc works with Web technologies to architect scalable, highly available, and cost-effective Web application solutions. Marc has a B.S. in Computer Science from the College of Advanced Vocational Studies in Mannheim, Germany, and is a Microsoft Certified Systems Engineer (MCSE).
For more information
Dell servers: www.dell.com
ASP.NET caching features: