Service Performance Optimization Techniques for .NET – Part I

Abstract: Tuning service runtime performance will improve the utilization of individual services as well as the performance of service compositions that aggregate these services. Even though it is important to optimize every service architecture, agnostic services in particular, need to be carefully tuned to maximize their potential for reuse and recomposition.

Because the logic within a service is comprised of the collective logic of service capabilities, we need to begin by focusing on performance optimization on the service capability level.

In this article we will explore several approaches for reducing the duration of service capability processing. The upcoming techniques specifically focus on avoiding redundant processing, minimizing idle time, minimizing concurrent access to shared resources, and optimizing the data transfer between service capabilities and service consumers.

Caching to Avoid Costly Processing

Let’s first look at the elimination of unnecessary processing inside a service capability.

Specifically what we’ll be focusing on is:

avoidance of repeating calculations if the result doesn’t change

avoidance of costly database access if the data doesn’t change

developing a better performing implementation of capability logic

delegating costly capability logic to specialized hardware solutions

avoidance of costly XML transformations by designing service contracts with canonical schemas

A common means of reducing the quantity of processing is to avoid duplication of redundant capabilities through caching. Instead of executing the same capability twice, you simply store the results of the capability the first time and return the stored results the next time they are requested. Figure 1 shows a flow chart that illustrates a simple caching solution.


Figure 1 – Caching the results of expensive business process activities can significantly improve performance.

For example, it doesn’t make sense to retrieve data from a database more than once if the data is known to not change (or at least known not to change frequently). Reading data from a database requires communication between the service logic and the database. In many cases it even requires communication over a network connection.

There is a lot of overhead just in setting up this type of communication and then there’s the effort of assembling the results of the query in the database. You can avoid all of this processing by avoiding database calls after the initial retrieval of the results. If the results change over time, you can still improve average performance by re-reading every 100 requests (or however often).

Caching can also be effective for expensive computations, data transformations or service invocations as long as:

results for a given input do not change or at least do not change frequently

delays in visibility of different results are acceptable

the number of computation results or database queries is limited

the same results are requested frequently

a local cache can be accessed faster than a remotely located database

computation of the cache key is not more expensive than computing the output

increased memory requirements due to large caches do not increase paging to disk (which slows down the overall throughput)

If your service capability meets this criteria, you can remove the replace several blocks from the performance model and replace them with cache access, as shown in Figure 2.


Figure 2 – The business logic, resource access, and message transformation blocks are removed.

To build a caching solution you can:

explicitly implement caching in the code of the service

intercept incoming messages before the capability logic is invoked

centralize the caching logic into a utility caching service

Each solution has its own strengths and weaknesses. For example, explicitly implementing caching logic inside of a service capability allows you to custom-tailor this logic to that particular capability. In this case you can be selective about the cache expiration and refresh algorithms or which parameters make up the cache key. This approach can also be quite labor intensive.

Intercepting messages, on the other hand, can be an efficient solution because messages for more than one service capability can be intercepted, potentially without changing the service implementation at all.

You can intercept messages in several different places:


An intermediary between the service and the consumer can transparently intercept messages, inspect them to compute a cache key for the parameters, and then only forward messages to the destination service if no response for the request parameters is present in the cache (Figure 3). This approach relies on the application of Service Agent [SDP].


Figure 3 – Passive intermediaries can cache responses without requiring modifications to the service or the consumer.

Service Container

This is a variation of the previous technique, but here the cache lives inside the same container as the service to avoid introducing a scalability bottleneck with the intermediary (Figure 4). Service frameworks, such as ASMX and WCF, allow for the interception of messages with an HTTP Module or a custom channel.


Figure 4 – Message interception inside the service container enables caching to occur outside the service implementation without involving an intermediary.

Service Proxy

With WCF we can build consumer-side custom channels that can make the caching logic transparent to service consumers and services. Figure 19.6 illustrates how the cache acts as a service proxy on the consumer side before sending the request to the service. Note that with this approach you will only realize significant performance benefits if the same consumer frequently requests the same data.


Figure 5 – Message interception by a service proxy inside the service consumer introduces caching logic that avoids unnecessary network communication.

Caching Utility Service

An autonomous utility service can be used to provide reusable caching logic, as per the Stateful Service pattern. For this technique to work, the performance savings of the caching logic need to outweigh the performance impact introduced by the extra utility service invocation and communication. This approach can be justified if autonomy and vendor neutrality are high design priorities.


Figure 6 – A utility service is explicitly invoked to handle caching.

Comparing Caching Techniques

Each option has its own trade-offs between potential performance increases and additional overhead. Table 1 provides a summary.


Table 1 – The pros and cons of different service caching architectures.

Cache Implementation Technologies

When you decide on a caching architecture, keep in mind that server-side message interception can still impact performance because your service will need to compute a cache key and if it ends up with an oversized cache, the cache itself can actually decrease performance (especially if multiple services run on a shared server).

The higher memory requirements of a service that caches data can lead to increased paging activity on the server as a whole. Modern 64 bit servers equipped with terabytes of memory can reduce the amount of paging activity and thus avoid any associated performance reduction. Hardware-assisted virtualization further enables you to partition hardware resources and isolate services running on the same physical hardware from each other.

You can also leverage existing libraries such as the System.Web.Caching namespace for Web applications. Solutions like System.Runtime.Caching or the Caching Application Block from the Enterprise Library are available for all .NET-based services. These libraries include some more specialized caching features, such as item expiration and cache scavenging. REST services hosted within WCF can leverage ASP.NET caching profiles for output caching and controlling caching headers.

Furthermore, a distributed caching extension is provided with Windows Server AppFabric that offers a distributed, in-memory cache for high performance requirements associated with large-scale service processing. This extension in particular addresses the following problems of distributed and partitioned caching:

storing cached data in memory across multiple servers to avoid costly database queries

synchronizing cache content across multiple caching nodes for low latency and high scale and high availability

caching partitions for fast look ups and load balancing across multiple caching servers

local in-memory caching of cache subsets within services to reduce look up times beyond savings realized by optimizations on the caching tier

You also have several options for implementing the message interceptor. ASMX and WCF both offer extensibility points to intercept message processing before the service implementation is invoked. WCF even offers the same extensibility on the service consumer side. Table 2 lists the technology options for these caching architectures.


Table 2 – Technology choices for implementing caching architectures.

 Computing Cache Keys

Let’s take a closer look at the moving parts that comprise a typical caching solution. First, we need to compute the cache key from the request message to check if we already have a matching response in the cache. Computing a generic key before the message has been deserialized is straight-forward when:

the document format does not vary (for example, there is no optional content)

the messages are XML element-centric and don’t contain data in XML attributes or mixed mode content

the code is already working with XML documents (for example, as with XmlDocument, XmlNode or XPathNavigator objects)

the message design only passes reference data (not fully populated business documents)

the services expose RESTful endpoints where URL parameters or partial URLs contain all reference data

In these situations, you can implement a simple, generic cache key algorithm. For example, you can load the request into an XmlDocument object and get the request data by examining the InnerText property of the document’s root node. The danger here is that you could wind up with a very long and comprehensive cache key if your request message contains many data elements.

Computing a message type-specific cache key requires much more coding work and you may have to embed code for each message type. For server-side caching with ASMX Web services, for example, you would add an HTTP Module to the request processing pipeline for the service call. Inside the custom module, you can then inspect data items in the XML message content that uniquely identifies a service request and possibly by-passes the service call.

For client-side caching with ASMX on the other hand, there is no transparent approach for adding caching logic. Custom proxy classes would have to perform all the caching-related processing. Depending on requirements and the number of service consumers, it might be easier to implement caching logic in the service consumer’s code or switch to WCF for adding caching logic transparently.

For WCF-based services, you would define a custom binding with a custom caching channel as part of the channel stack for either the service or the consumer. A custom channel allows access to perform capabilities on the Message object. Oftentimes that’s more convenient than programming against the raw XML message.

Caching REST Responses

The HTTP protocol defines content caching behavior on the server, on intermediaries, and on the client. This native form of content caching was originally responsible for driving wide-spread support in Web frameworks, like ASP.NET.

WCF adds support for ASP.NET caching profiles. These profiles control the caching behavior on the server as well as sending HTTP headers to control caching on intermediaries and the client.

You configure a WCF REST service for ASP.NET caching with a combination of attributes and configuration file settings. You can begin by attributing the service contract with the AspNetCacheProfile attribute. The attribute is only valid for GET requests, which support how REST uses GET as the preferred verb for read capabilities.


public interface ICatalogService



WebGet( UriTemplate=”/param/{itemId}”)]


string GetCatalogItem(string itemId);


Example 1

The attribute references a named profile stored in the service’s configuration file. The service implementation class also needs an attribute to connect the service into the ASP.NET processing pipeline.


RequirementsMode =


public class CatalogService : ICatalogService



Example 2

The configuration file further needs to set up the service host for ASP.NET caching, by adding the aspNetCompatibilityEnabled attribute:


<serviceHostingEnvironment aspNetCompatibilityEnabled=”true” />


<service name=”Service”>

<endpoint address=”” binding=”webHttpBinding”

contract=”IService” />




<webHttpBinding />



Example 3

Note that this configuration is at the host level and therefore enables ASP.NET for all services under this host. This could change behavior and performance for other services that don’t require capabilities of ASP.NET. You should evaluate carefully if RESTful services and SOAP-based services should run in the same hosting environment.

The caching profile is also stored in the configuration file:





<add name=”CacheFor20SecondsServer”

duration=”20″ enabled=”true”

location=”Server” varyByParam=”itemId” />





Example 4

The profile’s location attribute indicates where the response can be cached. The preceding example configures server-side caching only, but other values are available to allow clients to cache responses as well. Client-side caching offers higher scalability and better performance because it doesn’t increase the service’s memory footprint and avoids unnecessary network calls.

If your architecture allows for response caching, caching should be enabled along the transmission chain because not all consumers may be built to support HTTP caching headers. WCF consumers, for example, ignore the caching attributes and repeat network calls even when HTTP Cache-Control headers indicate client-side caching is allowed.

If you consume cacheable data, you may invoke services with the System.Net.WebClient or System.Net.HttpWebRequest classes to optimize for performance.

Monitoring Cache Efficiency

After you have created a suitable caching architecture, it’s important that you monitor the cache for efficiency. System.Web.Caching and the Caching Application Block both include numerous performance counters to monitor efficiency metrics, such as the cache hit ratio, the number of cache misses, etc.

The cache hit ratio measures the number of times a cached response was returned divided by the total number of requests. If you notice that your cache hit ratio is low, or your number of misses is growing, then your cache criteria could not match the data requested by consumers or the cached items expired too quickly. Your caching is probably adding more overhead than it is improving performance of your service.

Reducing Resource Contention

By decreasing resource contention we can further improve performance by minimizing the time a service capability spends waiting for access to shared resources.

Shared resources in this context can be:

CPU time



service container threads

single-threaded code sections

database connections

databases (locks, rows, tables)

Several of these may exist as shared resources that can be accessed concurrently, whereas others may be limited to one executing thread.

It’s important to understand that even resources that can be accessed concurrently, such as system memory, are not isolated from other programs. Physical memory allocated on behalf of one Web service on a server impacts all other processes on that server because it’s not available to other processes. Therefore, allocating large portions of available system memory to one service can actually reduce performance for all other services on that server.

The memory required by the other services may only be available as virtual memory, which means increased paging activity will reduce performance. Each time a service tries to access a page that is not currently loaded, the operating system has to load the page from disk. That is a slow capability compared to accessing a page that’s already available in memory (because disks are orders of magnitude slower than memory).

Execution of the service stalls until a page that’s currently in memory is written to disk to make room for the requested page, which is then loaded into memory. What disk access essentially does is turn a fast, memory based capability into a slow, disk-based capability that can degrade performance.

You can monitor performance counters built into the Windows operating system and the .NET framework to determine if paging is impacting performance and if you can improve performance by reducing paging.

A high number of Page Faults indicates high contention for available system memory. If requested pages are frequently not available and have to be loaded from disk, you may also want to keep an eye on performance counters like:

Memory\\Committed Bytes to ensure that it doesn’t exceed the amount of physical memory in your server (if you cannot tolerate performance degradation due to paging).

Memory\\Private Bytes to check that processes do not impact performance of other processes by allocating all memory for them.

You can best avoid the performance degradation caused by paging by reducing contention for system memory. This way, you reduce contention either by supplying large amounts of physical memory or by reducing concurrent access to the available memory. Likewise, eliminating contention for other system resources improves performance as well.

Request Throttling

Exclusive access to resources reduces contention and thus improves performance. Sometimes you can relieve contention just by adding more resources, more system memory or more CPUs. Other times, adding more resources is not an option, perhaps due to hardware restrictions or a limited number of available database connections. In that case you can avoid concurrency by throttling the number of requests sent to the service. This reduces contention to the number of concurrent service requests, which reduces CPU context switches, and concurrent access to shared resources.

Effective throttling shrinks the idle time component in the performance model as shown in Figure 7.


Figure 7 – Request throttling reduces idle time in a service capability.

Remember though, that throttling is typically scoped to a single service. Throttling can reduce contention within a service, but multiple services sharing a server can still compete for these resources and cause contention and slow downs. You could introduce an intermediary service to throttle access to multiple services, but an intermediate service often introduces performance issues instead of fixing them.

Throttling With WCF

WCF allows for the throttling of messages by adding a throttling behavior to the channel. You can configure the throttling behavior in the service or client configuration file as follows:







Throttled” >





<behavior name=”Throttled”>




maxConcurrentInstances=”8″ >






Example 5

This configuration limits concurrent processing to eight and improves performance in scenarios where contention for limited resources causes a problem. Processing more than eight requests simultaneously in a server with less than eight processors would cause costly context switches. Throttling the message processing reduces the amount of concurrency and thus the time lost due to switching context between the processing threads.

With the Windows Server AppFabric Application Server Extensions installed, you can also configure throttling parameters from the IIS Management tool. Note that as with everything else in WCF, you can configure the concurrency thresholds programmatically. However, placing these values in the configuration file allows for more flexibility to adjust the numbers as necessary without having to recompile code. Request Throttling with BizTalk Server

BizTalk Server provides more sophisticated throttling features than WCF. The BizTalk messaging engine continuously monitors several performance counters for each BizTalk host. Under several load conditions, BizTalk throttles message processing when certain performance counters exceed configured thresholds. The engine slows down processing of incoming and outgoing messages to reduce contention for resources, such as memory or server connections. You can configure the thresholds for each BizTalk host from the Administration Console.

Throttling offers a relatively simple means of reducing resource contention. But just like with other performance tuning steps, it’s important to understand what it can and cannot do. Throttling in WCF happens for a single service, but other services running on the same server could still compete for the equivalent resources. Throttling in BizTalk occurs at the host level. Other BizTalk hosts or services not hosted in BizTalk could compete for the same physical resources on the server.

You can only control contention in a meaningful way when fully applying the Service Autonomy principle to create truly autonomous service implementations that have their own set of dedicated resources, either physically or through hardware-based virtualization. This level of autonomy allows you to commit your serves to hard Service Level Agreement (SLA) requirements.


The second article in this two-part series goes over the following topics: Coarse-Grained Service Contracts, Selecting Application Containers, Performance Policies, REST Service Message Sizes, Hardware Encryption, High Performance Transport, MTOM Encoding, Performance Considerations for Service Contract Design, and Impact on Service-Orientation Principles.

( Referenced from )