Sunday, January 25, 2009

Cloud Storage or SDS?

Ever since Ray Ozzie has announced Windows Azure on last year's PDC (watch the keynote) there's been a lot of buzz about new platform. Every recent Microsoft event, it seems, included a session or two on Azure. SoCal Code Camp that took place this past weekend at CalState Fullerton had an entire track of cloud-related presentations. 

One general observation: details of the new platform are still, well, cloudy. Windows Azure is presently in a CTP stage; nobody expects an RTM until the end of 2009. There are a lot of technology-, process-, and cost-related questions that no one yet knows the answers to. What's worse, marketing geniuses at Microsoft decided to slap Azure label on a set of technologies that originated in different parts of the company (and even Microsoft evangelists admit that there is very little coordination).

Take a look at the obligatory Azure platform stack slide:
My initial assumption about SQL Server Data Services (SDS) was that it is somehow built on top of Azure. Apparently, SDS is a completely separate service. In fact, you don't even need to have an Azure application in order to use it.

Let's take a closer look at the data support for cloud applications. This is, in my opinion, the biggest paradigm shift for developers and architects. After all, it's easy to understand the concept of deploying your application code to a whole bunch of virtual servers, but how are we going to survive without our beloved connection strings, stored procedures, triggers?

There are two options available to us, cloud storage and SDS. Both are going to be reliable, scalable, highly available, and support terabytes of data. On the back end, both will utilize a vast  network of SQL Server nodes that use some advanced algorithms to support distributed data storage and replication. Below is a side-by-side comparison.

Signup. When you sign up for Azure, you receive a separate storage account. To use SDS you will need to get yet another account.

Hierarchy
Cloud Storage: Provides account/container/entity model for your data.
SDS: The model is similar - authority/container/entity

Data Abstractions
Cloud Storage: Supports blobs (basically, named files with metadata) up to 50Gb, tables (which are essentially lists of entities, not database tables), and queues with message size up to 8Kb.
SDS: Only works with entities, which are similar to the tables above.

Data Access
Cloud Storage: Blobs and queues can be accessed via REST, but tables are also exposed via ADO.NET Data Services. This allows for a more convenient API (for example, you can query a table using LINQ). Large blobs can be uploaded by small 4Mb-sized chunks.
SDS: Entities can be queried using REST. Although there is no ADO.NET Data Services support, you can pass a LINQ-style query in the HTTP request (query may even join entities).

*** UPDATE (March 17, 2009)
Someone at Microsoft have finally noticed the striking similarities between Cloud Storage and SDS. Data Platform Insider blog is announcing the change to SDS architecture: REST-based interface will be decommissioned and replaced by a service protocol based on Tabular Data Stream (which has been a SQL Server network protocol since SQL 2000). Public CTP of the new architecture will be available in the middle of this year.