Monday, December 11, 2006

Design For Operations (Part I)

When I design an enterprise application, I need to realize one simple truth: the system is going to spend just 15% (or less) of its life in development environment. After that it moves to the gated community known as production, and the only people who are supposed to have access to production are system operations engineers, a.k.a. “IT guys”. And of course, I cannot assume that IT guys will become familiar with the intricacies of the application’s design. If I do, I would be making a grave mistake which results in those dreaded 4:50 PM or 2:30 AM phone calls from the NOC.

So, I really need to design the system with operations in mind. It should be able to report its status and notify about any issues. I should allow operations to monitor my system with their usual tools, such as event viewer, performance monitor, management console, or MOM, instead of running SQL queries and reviewing XML configuration files. This means instrumenting my application with event logs, performance counters, and WMI objects and events.

Event Logging. Although simple file log is very convenient to put all sorts of debugging and profiling information, I can’t really expect IT to dig through megabytes of text looking for error information. Instead, they should be able to get it from Windows Event Viewer. So, I will create an instance of EventLogInstaller in the application’s installer class and specify the Source and Log properties. I will make sure to log all unhandled exceptions (see my previous post) using EventLog.WriteEntry method.

Performance counters are invaluable tools for monitoring and profiling the system in production. They may also give early indication of system issues. Windows and .NET framework already contain dozens of performance counters, but custom counters can provide an insight into my application’s processing logic. So, exactly kind of information should I expose via performance counters and what kind of counters (instantaneous or average) I should use? There is no standard answer; it really depends on the nature of the system. There is a good introduction to the concept on MSDN. In order to register custom performance counters I usually create a custom installer:

public class PerformanceCountersInstaller : Installer
{
public const String CategoryName = "...";
public const String CategoryHelp = "...";
public const String CounterName = "...";
public const String CounterHelp = "...";

public override void Install(IDictionary state)
{
base.Install(state);
Context.LogMessage("Installing performance counters...");
SetupPerformanceCounters();
}

public override void Uninstall(IDictionary state)
{
Context.LogMessage("Uninstalling performance counters...");
if (PerformanceCounterCategory.Exists(CategoryName))
PerformanceCounterCategory.Delete(CategoryName);
Context.LogMessage("Successfully uninstalled performance counters");
base.Uninstall(state);
}

private void SetupPerformanceCounters()
{
try
{
if (PerformanceCounterCategory.Exists(CategoryName))
PerformanceCounterCategory.Delete(CategoryName);

CounterCreationDataCollection CCDC = new CounterCreationDataCollection();

// Create and add the counters
CounterCreationData ccd;
ccd = new CounterCreationData();
ccd.CounterType = PerformanceCounterType.CounterDelta32;
ccd.CounterName = CounterName;
ccd.CounterHelp = CounterHelp;
CCDC.Add(ccd);

// Create the category.
PerformanceCounterCategory.Create(CategoryName,
CategoryHelp,
PerformanceCounterCategoryType.SingleInstance,
CCDC);
Context.LogMessage("Successfully installed performance counters");
}
catch (Exception ex)
{
Context.LogMessage(String.Concat("Could not install performance counters", ex.Message));
}
}
}

In the next post I will discuss using WMI to publish application status information.

Tuesday, November 28, 2006

Dealing With Exceptions

Although I don't have exact statistics, it certainly feels that most .NET developers often don't know how to deal with exceptions. I often see code where author had assumed that nothing ever goes wrong and decided not to put in any kind of exception handling. Such "infantile" code is clearly not ready for the hard realities of life. On the other end of the spectrum we've got programs that swallow all exceptions in an effort to make themselves bullet-proof. What developers don't realize is that it actually makes them more vulnerable to security attacks. When such attacks destabilize operating environment, a normal system would fail but "exception-swallower" carries on, making an ideal target for exploitation.

So, when do I actually need to catch exceptions? In essence, there are three distinct scenarios. First is called handling. It's when I know what kind of exception to expect and - more importantly - how to recover from it. For example, my stored procedure may become a victim of a SQL Server deadlock. In the managed code, this will result in a SqlException which I should handle (retry transaction up to the pre-defined number of times). Another example is trying to read some configuration data from a file:

try
{
configData = File.ReadAllText(configFilePath);
}
catch(FileNotFoundException)
{
configData = DefaultConfigData;
}

As you can see, I am handling FileNotFoundException by force-feeding some default configuration data into the variable. It's important to emphasize that I didn't attempt to handle any other kind of exception that File.ReadAllText can throw. For instance, it may throw UnauthorizedAccessException or SecurityException and I'd rather have these bubble to the top and hopefully force program termination.

This brings us to the second scenario: unhandled exceptions. If the exception hasn't been handled anywhere in the call stack (which either means there is an unknown problem or a problem that I don't know how to recover from) it should be caught and properly logged. Windows applications should display a generic error message to the user and shut down, Web applications should redirect user to a generic error page, and services can either shut down or terminate failed thread.

Third scenario is called exception wrapping. The idea is to substitute a low-level exception object with higher-level exception class containing additional information (if you are absolutely positive that original error is not sufficient). Wrapping is different from handling because there is no recovery - a new exception is thrown. In the example below, I am replacing SqlException with ScriptException that adds stored procedure name in an effort to facilitate debugging:

catch(SqlException ex)
{
ScriptException e = new ScriptException(storedProcName, ex);
throw e;
}

Wrapping should be used with caution because it changes the call stack and makes debugging more difficult. It is imperative to assign original exception object to the InnerException property of the new exception (in the above example this is done using a constructor overload).

An interesting implication is that in order to handle exceptions I need to know what exceptions a method can throw in the first place. List of exceptions should really be part of the method signature. In fact, Java has the concept of checked exceptions and corresponding "throws" syntax while in .NET we need to rely on class documentation. If you are interested in comparative analysis of the two approaches, read this interview with Anders Hejlsberg, creator of C#.

Saturday, November 11, 2006

Recruitment By [Lucky] Numbers

In the past few years I have interviewed a lot of people for various software development positions. Finding the right employee is always a challenge (as Franco DiAddezio put it, recruitment is an equivalent of finding the perfect spouse after just one or two dates). Candidates can have plenty of work experience, and you can fairly easily confirm whether or not they really known the technologies advertised in their resume. But are technology skills alone sufficient? My personal opinion is that a good software engineer is defined by his or her analytical thinking and problem-solving abilities. Specific technologies, such as programming languages and API's can always be learned.

My own litmus test for identifying the right engineer is a small but elegant programming problem called "The Lucky Numbers" problem. I first heard of it years ago in the university, and more recently - on Mikhail Gustokashin's site dedicated to programming problems where it is ranked "Very Easy" (follow the link only if you can read Russian). Here it is:
A 6-digit ticket number is considered "lucky" if the sum of its first 3 digits equals the sum of last 3 digits. For example, "006123" and "511304" are both lucky, "980357" isn't. Write an efficient algorithm to determine how many lucky numbers exists among all 6-digit numbers (from 000000 to 999999).

First, let's write an inefficient algorithm. We will iterate through all six-digit numbers and increment the counter if sum of first 3 equals to sum of last 3.

for(int i=0; i<10; i++)
for(int j=0; j<10; j++)
for(int k=0; k<10; k++)
for(int l=0; l<10; l++)
for(int m=0; m<10; m++)
for(int n=0; n<10; n++)
if(i+j+k == l+m+n) luckyNumbersCount++;

This algorithm performs 1 million iterations and it is the least I would expect from a candidate (amazingly, more than half failed to produce it). We can arrive at the efficient solution by carefully reading the problem. It doesn't ask us to produce all "lucky numbers", only their quantity. Can we find it without generating the numbers? We know that digit sums of both halves of the lucky number are equal. A sum of digits can take values from 0 (0+0+0) to 27 (9+9+9). For each value, we need to find out how many combinations of digits can produce it, e.g. "1" has 3 combinations: "001", "010", and "100". Evidently, there are 3 * 3 = 9 "lucky numbers" that correspond to the value of "1". So, here is optimized algorithm that performs only 1027 iterations:

int[] combinations = new int[28];
for(int i=0; i<10; i++)
for(int j=0; j<10; j++)
for(int k=0; k<10; k++)
combinations[i+j+k]++;

int luckyNumbersCount = 0;
for(int i=0; i<28; i++)
luckyNumbersCount += combinations[i] * combinations[i];

Wednesday, October 25, 2006

Dependency Injection

Let's talk about dependency injection (DI). DI is, essentially, a design pattern that can be applied to tightly coupled systems. For example, imagine that several modules of our system use a cryptographic component. The code may look like this:

public class OrderManager
{
private CCryptography _Crypto;

public OrderManager()
{
_Crypto = new CCryptography();

Obvious drawback of this approach is that we cannot easily swap cryptographic algorithms - multiple references in the code will need to be changed. Using DI pattern, we would extract the interface of CCryptography class and delegate the creation of concrete object to an outside factory:

public class OrderManager
{
private ICryptography _Crypto;

public OrderManager(ICryptography crypto)
{
_Crypto = crypto;

This way, different kinds of cryptographic objects can be created and swapped at run-time; OrderManager class doesn't know anything about it (and doesn't need to know, either). Another important benefit is testability: we can now test OrderManager functionality without a fully-functional cryptographic component. All we need is a mock object that implements ICryptography interface and probably doesn't even encrypt/decrypt.

By moving object creation to a new entity, we can address additional issues. For example, by caching objects in a dictionary we may apply Singleton pattern and ensure that only one instance of cryptography component is created. We can also control the order in which objects are created. Thus, we complemented original DI pattern with the concept of lifetime container.

Folks at Microsoft Patterns and Practices group took the idea even further and created dependency injection container called ObjectBuilder (OB). OB uses reflection to analyze classes and automatically fulfill their dependency requests. So, as long as we explicitly expressed a dependency (by placing it into a constructor as in the above example or a property decorated by a special attribute), OB will know what to do. There is much more to OB than I just mentioned, so if you'd like to read more, here are a couple of links:

- Download Object Builder from CodePlex: http://www.codeplex.com/Wiki/View.aspx?ProjectName=ObjectBuilder
- Great tutorial by Sayed Hashimi: http://www.sayedhashimi.com/PermaLink,guid,d05aed4f-a211-4969-893e-7ffea324a56c.aspx

Friday, September 15, 2006

Business Objects and Value Objects

Encapsulation of data and behavior is one of the cornerstones of object-oriented programming. This basically means that a business object contains both the data and methods that manipulate the data. In the example below, a CreditCard object contains public method Authorize and public property AuthorizationCode. It's important for the object to establish a proper public interface. Authorization code value is returned by the payment processor; we wouldn't want clients to accidentally modify it. Therefore, I exposed a read-only property rather than a field.

public CreditCard
{
...
private string _AuthorizationCode;
public string AuthorizationCode
{
get { return _Authorizationcode; }
}

public bool Authorize(double amount) {...}
...
}


Object-oriented approach is great - within a single application tier. When designing a distributed application, we need to take other factors into consideration. Suppose, my application needs to display customer credit card data on a web page. Should I pass the CreditCard object to the web tier? Sure, it is possible, but the web tier only needs credit card data, not the behavior. Frankly, I wouldn't want web tier code to accidentally call Authorize() method. So, for the sake of security, we should somehow limit the objects. Performance is another concern, especially when passing objects between physical tiers. Regardless of which remote technology we use - web services, .NET Remoting, or COM+ - large objects with lots of methods and properties may not be ideal for this.

A simple and elegant solution is to combine essential business object data into a "value object". Individual data elements of the value object are exposed to business object's clients via public properties. Revised CreditCard class below demonstrates this approach. Note how overload of the constructor allows us to easily create a business object from a value object. We can extract value object just as easily and send it to another application tier.

public CreditCard
{
...
private CreditCardInfo _CCInfo;
public CreditCardInfo CCInfo
{
get { return _CCInfo; }
}

public CreditCard(CreditCardInfo info)
{
_CCInfo = info;
}

public string AuthorizationCode
{
get { return _CCInfo.Authorizationcode; }
}

public bool Authorize(double amount) {...}
...
}


public CreditCardInfo
{
...
public string CardNumber;
public DateTime ExpDate;
public string Authorizationcode;
public DateTime? LastTransactionDate;
...
}

Saturday, September 02, 2006

How Many Layers Is Enough?

A colleague recently asked my advise about the design of a web application he was working on. He showed me the draft: an elaborate architecture involving web application calling web service, which in turn invoked application server over .NET Remoting. The problem? This was an intranet application, with maximum number of concurrent users below 50. Had he implemented this original design, he would end up with an extremely scalable system which will never get a chance to realize its potential. On the other hand, the flip side of scalability - poor performance - will be obvious to any user.

So, how many layers is enough for an enterprise application? I am talking about physical layers, of course. With logical layers, approach is well-known: you would typically have data abstraction tier, business objects tier, and workflow tier. In addition, web application itself should be designed using Model-View-Presenter pattern .

With physical layers, it's far less straightforward. When we have all logical tiers running inside single application domain, we achieve best possible performance, because all calls are in-process. Once we place business and data tiers in a dedicated application server, we lose performance to out-of-process calls even if the application server is running on the same physical server. It really doesn't matter what specific remote calling mechanism we use: Web services, .NET Remoting, or Enterprise Services (COM+) - performance will suffer. When we move application server to a separate hardware, performance gets even worse because of network latency.

So why not run everything in-process? Well, many web application do just that - every web server in a farm has all logical tiers of the application. The downside is that web servers have to process both web application logic and business logic. This limits the number of HTTP requests each server could process and thus dramatically reduces system scalability. There are other drawbacks, too. Servers have to be really beefed up to handle the load, so hardware cost is high. Also, this deployment layout is inherently insecure: web servers are usually placed in a DMZ outside company firewall. Think of all the connection strings, encryption keys and other sensitive data that hacker could obtain by breaking into a web server.

By placing business and data tiers on separate physical layer (application server farm) we are trading performance to scalability. Web servers no longer have to process business logic, so they can handle much more page requests (and don't require high-end CPUs and tons of memory). Application servers can utilize connection pooling, object pooling and data caching in order to effectively support the web layer. Better yet, if we move from a homogenous app servers to more specialized "application services", we can improve performance even more by fine-tuning server configuration.

Coming back to the question I used as a title, there really isn't a universal rule. Number of physical layers can be different depending on scalability, performance, security, and other requirements of a particular application.

Thursday, August 24, 2006

Caching in Distributed Applications

This is an overview of different approaches to caching in a distributed application environment. Distributed N-Tier applications generally have at least two server farms: web servers and application servers.

Web servers, of course, can cache entire HTTP responses, using, for example, ASP.NET OutputCache page directive. This is a blunt tool, though. It can result in high memory load and impact page processing logic. Sometimes it's not applicable at all. In these cases, application data - in form of objects - should be cached instead. Naturally, this is the only caching option for application servers.

Application data that we need to cache can be either static or dynamic. I'm not suggesting that static data doesn't change at all (otherwise we could just build it into the application), only that it changes very infrequently. Static data can be loaded into cache on every server (let's call it isolated cache). We get the benefit of fast reading, because data is always stored locally. On the other hand, it is heavily duplicated - every single server has to have a copy, which may be a waste of memory. Here's another drawback of isolated cache: imagine several servers joining the farm at the same time. How stressful it will be for the database server while they are filling up their respective isolated caches?

Caching dynamic data is much more complicated, because any server may need to modify it at any time. First thing that comes to mind is to use a common caching data store, such as a database or dedicated server (let's call it centralized cache). For example, ASP.NET allows you to have centralized data store for session state. Unfortunately, centralized mechanism always creates a single point of failure, so it may not be a good solution depending on your availability requirements. Another drawbacks of centralized cache are generally reduced performance and scalability.

A good alternative to centralized cache is distributed cache, which assumes some kind of communication and coordination among the servers. Distributed cache comes in two essential flavors: fully replicated and partitioned. In the fully replicated architecture once application puts an object into local cache on one server, it is immediately copied to all other servers in the cluster. The end result may look very similar to the isolated cache, but remember that isolated cache only works with static data. Still, as the server farm grows, it will take more and more time and memory to maintain fully replicated cache.

Enter partitioned cache. While "get" operation in the previous scenario was always local, getting data from partitioned cache could mean querying all servers in the cluster until one is found that holds the required object. "Put", on the other hand, is local. Partitioned architecture represents a trade-off: we utilize memory more efficiently and don't waste time replicating the data, but it may take longer to retrieve it.

Last but not least: tools. Enterprise Library from Microsoft is free and has Caching application block which unfortunately doesn't support distributed cache. Enterprise version of NCache from Alachisoft does support distributed cache but is far from being free.

Thursday, August 17, 2006

Coding Standards - Good, Bad, and Ugly

Are coding standards a good thing to have in a development organization? Most companies would say yes, and cite a variety of reasons, among them ease of code maintenance and improved continuity (which is important in an industry with such high turnover rate). In addition, good coding standards could help developers to avoid common pitfalls. Code reuse, that holy grail of enterprise development, supposedly improves, too.

Yet the employees of the few companies I know that actually have coding standards document are rarely excited about it. Usually the document is extremely large and unbelievably boring. In an effort to make it comprehensive, authors put together lots of small rules, which makes the document feel like a programming textbook. Well, at least a textbook has a target audience, while the standards document contains a mixture of trivial, simple, moderate, and advanced items. Also, the rules in a textbook are supported by detailed explanations. Coding standards document can be very vague or simply omit the explanations.

The ugly part begins when project managers and team leaders require their engineers to follow coding standards to the letter. This immediately kills all creativity; people think more about compliance than solutions. Dogmatism in such a dynamic profession as software engineering can only mean one thing: stagnation.

So, how do we get all the benefits of coding standards without any of the drawbacks? First, we need to recognize that software engineering is a creative profession. I would put an emphasis on both words. It's creative, so we shouldn't limit the spectrum of algorithms, technologies, and patterns to solve the programming problem. We need to treat engineers as professionals, and assume that they don't need another textbook. Of course, there are plenty of bad programmers out there, which is a subject for a different blog.

Ideal standards document would concentrate on the specifics of the architecture adopted by the company. Describe how the application layers are structured, what are the common components for logging, data access, exception handling, configuration management, caching. Don't bother defining naming conventions for variables.