Thursday, August 24, 2006

Caching in Distributed Applications

This is an overview of different approaches to caching in a distributed application environment. Distributed N-Tier applications generally have at least two server farms: web servers and application servers.

Web servers, of course, can cache entire HTTP responses, using, for example, ASP.NET OutputCache page directive. This is a blunt tool, though. It can result in high memory load and impact page processing logic. Sometimes it's not applicable at all. In these cases, application data - in form of objects - should be cached instead. Naturally, this is the only caching option for application servers.

Application data that we need to cache can be either static or dynamic. I'm not suggesting that static data doesn't change at all (otherwise we could just build it into the application), only that it changes very infrequently. Static data can be loaded into cache on every server (let's call it isolated cache). We get the benefit of fast reading, because data is always stored locally. On the other hand, it is heavily duplicated - every single server has to have a copy, which may be a waste of memory. Here's another drawback of isolated cache: imagine several servers joining the farm at the same time. How stressful it will be for the database server while they are filling up their respective isolated caches?

Caching dynamic data is much more complicated, because any server may need to modify it at any time. First thing that comes to mind is to use a common caching data store, such as a database or dedicated server (let's call it centralized cache). For example, ASP.NET allows you to have centralized data store for session state. Unfortunately, centralized mechanism always creates a single point of failure, so it may not be a good solution depending on your availability requirements. Another drawbacks of centralized cache are generally reduced performance and scalability.

A good alternative to centralized cache is distributed cache, which assumes some kind of communication and coordination among the servers. Distributed cache comes in two essential flavors: fully replicated and partitioned. In the fully replicated architecture once application puts an object into local cache on one server, it is immediately copied to all other servers in the cluster. The end result may look very similar to the isolated cache, but remember that isolated cache only works with static data. Still, as the server farm grows, it will take more and more time and memory to maintain fully replicated cache.

Enter partitioned cache. While "get" operation in the previous scenario was always local, getting data from partitioned cache could mean querying all servers in the cluster until one is found that holds the required object. "Put", on the other hand, is local. Partitioned architecture represents a trade-off: we utilize memory more efficiently and don't waste time replicating the data, but it may take longer to retrieve it.

Last but not least: tools. Enterprise Library from Microsoft is free and has Caching application block which unfortunately doesn't support distributed cache. Enterprise version of NCache from Alachisoft does support distributed cache but is far from being free.

Thursday, August 17, 2006

Coding Standards - Good, Bad, and Ugly

Are coding standards a good thing to have in a development organization? Most companies would say yes, and cite a variety of reasons, among them ease of code maintenance and improved continuity (which is important in an industry with such high turnover rate). In addition, good coding standards could help developers to avoid common pitfalls. Code reuse, that holy grail of enterprise development, supposedly improves, too.

Yet the employees of the few companies I know that actually have coding standards document are rarely excited about it. Usually the document is extremely large and unbelievably boring. In an effort to make it comprehensive, authors put together lots of small rules, which makes the document feel like a programming textbook. Well, at least a textbook has a target audience, while the standards document contains a mixture of trivial, simple, moderate, and advanced items. Also, the rules in a textbook are supported by detailed explanations. Coding standards document can be very vague or simply omit the explanations.

The ugly part begins when project managers and team leaders require their engineers to follow coding standards to the letter. This immediately kills all creativity; people think more about compliance than solutions. Dogmatism in such a dynamic profession as software engineering can only mean one thing: stagnation.

So, how do we get all the benefits of coding standards without any of the drawbacks? First, we need to recognize that software engineering is a creative profession. I would put an emphasis on both words. It's creative, so we shouldn't limit the spectrum of algorithms, technologies, and patterns to solve the programming problem. We need to treat engineers as professionals, and assume that they don't need another textbook. Of course, there are plenty of bad programmers out there, which is a subject for a different blog.

Ideal standards document would concentrate on the specifics of the architecture adopted by the company. Describe how the application layers are structured, what are the common components for logging, data access, exception handling, configuration management, caching. Don't bother defining naming conventions for variables.