Achieving High Availability with Windows Azure Environment - Part 2

This is second part of Achieving High Availability with Windows Azure Environment series. Look at the end of this post for other parts.

Windows Azure Storage Services

Windows Azure Storage Services provides 99.9 percent availability by default to the subscribers.

There are three types of storage services available with Windows Azure Storage service such as Blob, Tables and Queue. There are additional three storage types in preview currently which are Drives, Virtual Machine Disks and Virtual Machine Images. The SLA will be applicable only for three types of storage services that are in production (Blob, Table and Queue).

Before understanding the high availability feature, we need to understand the architecture of storage services.

Windows Azure Storage Service architecture has three layers and each of which does important process for a request. Above diagram shows the architecture of Azure storage services.

Figure 3 - Windows Azure Storage Architecture

Storage Request – As defined in the picture any request on the storage service can be done using a unique DNS name (Ex: https://shell storage.blob.code.windows.net/). Once the request had been raised, it reaches to the allocated storage account server located in a region for that DNS name. This is done by Location Service by referring DNS entry and routing to the allocated server using VIP mapping. Then the request reaches to the Front-End Layer of the Storage service.
Front-End (FE) Layer – This layer’s responsibility is to take the incoming request, authenticate and authorize the request and route the request to a particular partition server located in the Partition Layer. There are multiple partition servers in the Partition layer. To forward the request to a particular partition server, it refers Partition Map. Partition Map keeps track of the partition information of the storage service stored in the partition server and knows which partition stored in which partition server.
Once the response comes from the partition layer for the request, it sends back to the client.
Partition Layer – There are multiple partition servers in the partition layer. This server manages the partitioning of the entire data object. To distinguish the data to the different partition, storage service has a key concept knows as partition key which present in all the storage system such as blob, table and queue. The object belong to a single partition will be managed by a single partition server and each partition server can server many partitions.
This server also provides an automatic load balancing of partitions across the servers to meet the traffic needs of blob, tables and queues.
This layer takes responsibility to read data from the DFS servers and to send request to store them into primary DFS server.
- When a request arises for reading data (GET), it verifies the data presented in the cache memory. If presented, it returns the data directly from cache memory. If not, sends a request to read from one of the DFS server which is holding the replicas of the data.
- When a request arises for adding/modifying/deleting data (PUT/POST/DELETE), it sends the request to the primary replica for insert, update or delete.
Distributed and replicated File System (DFS) Layer – This is the layer stores the actual data in to the disk and also manages the distribution and replication of data across many servers to keep the data durable.\

For operations insert, update and delete, it will complete the transaction on the primary replica and replicate to the other replicas then returns the status to the partition layer. For read request, it reads data from the disk and returns the result to the partition layer.

5.1 Fault Domains and Server Failures
As discussed in the Azure Compute services, the servers of these three layers are sits in different fault domains. Each fault domain is separates from others and has dedicated network, power and hardware specifications. So at any given time any of the fault domain fails, it never impact other fault domain and the servers in the other fault domain can response to the request raised from client.

As Storage service has three layers, any server failed due to any reason other server on the same layer takes the responsibility to process the request.

Front-End Server Failure – When front end server failed (due to any reason such as network failure, hardware failure), the load balancer realizes and updates the server status as unavailable in the data store. So when any request comes from the client to VIP, the load balancer will not route the request to the failed front-end server and route to the available servers.
Partition Server Failure – When the partition server is unavailable, the storage system realize the same and immediately reassign the partition it was serving to another available partition server and update the status into the Partition Map. So for another request, it won’t assign any partition to the failed partition server.
DFS Server Failure – When a DFS server is unavailable, the partition server stops requesting to the failed DFS server and route the request to another available replica which will be located in another fault domain. When DFS server is unavailable for long time, the fabric controller will generate a new DFS server replica and bring online.

5.2 Upgrade Domains and Rolling Upgrade
Windows Azure provides 99.9% availability on Storage service using fault domain and upgrade domain as cloud services follows. As shown in Figure #1, the fault domain and upgrade domain is a horizontal and vertical diversion on hardware. So servers of each three layers are sites in different fault domain and upgrade domain for the storage services.

When a server in fault domain does down, the users lose 1/x of server for the respective layer (where x is the number of fault domain). The same way, when any server goes down during the upgrade, the user loses 1/y of server till the completion of upgrade (where y is the number of upgrade domain).

When upgrading a particular layer, there will be multiple activities carried out before and after the upgrade to make sure the process not impacting others. The upgrade will be done a single upgrade domain at a time, so if there are x upgrade domain available there will be (100/x) % of servers will be consider for upgrade and will mark those server status unavailable to not route any request before upgrade process starts.

Once the upgrade process completed for any server, there will be a validation process to make sure everything running properly. Once the validation successful, the server status will mark as available and the servers can take the request for processing. When anything goes wrong while upgrading the domain or the validation not completed successfully, the servers will be rollback to the previous version of the production software.

5.3 Geo-replication
Windows Azure Storage services will be highly available using the upgrade domain and fault domain concept, which limited to the single datacenter. But geo-replication on storage services helps to keep additionally one replica to another datacenter which is hundred miles difference from the specified datacenter (but in the same region).

The geo-replication on storage services helps from major disaster on datacenter level such as earthquakes, wild fires, tornados, nuclear reactor meltdown, etc. and allows keeping a copy of the data in to another datacenter.

5.3.1 Primary and secondary locations in Geo-replication
When creating a storage account, the customer will be selecting the datacenter where the data must be located such as North Central US, South Central US, East US etc., This datacenter will be termed as primary location on which all the read, write goes to that storage services.

The secondary location will be determined automatically using the following mapping table when selecting the primary location. The Windows Azure constantly maintains multiple healthy (three) replicas of data on both the location.

Primary	Secondary
North Central US	South Central US
South Central US	North Central US
East US	West US
West US	East US
North Europe	West Europe
West Europe	North Europe
South East Asia	East Asia
East Asia	South East Asia

5.3.2 LRS Vs GRS

Microsoft Term the two storage categories as below –

Locally Redundant Storage (LRS), which is referring the three replicas located in the primary location. It means, when creating a storage account, the customer provides the datacenter name where it suppose to locate. Windows Azure provision three replicas of storage services such as blob, tables and queues in the same datacenter with different fault domain and upgrade domain as mention earlier. This provides high availability when a particular hardware on a datacenter goes down.
Geo Redundant Storage (GRS), which refers the Geo-replicated storage service. The location will be decided by Windows Azure (as defined in table). When writing any data on LRS, Azure will write in the primary replica and replicate to the other two replica. Once the transaction in LRS completed successfully, the same will be replicated to the GRS asynchronously in background.
This storage also keeps three replicas as the LRS has and replication across three replicas will happen in the same way done in LRS.
Note: The queue storage data will not be replicated to GRS as of now.

5.3.3 Cost impact on Geo-replication
There is no addition cost for Geo-replication of Windows Azure Storage service. So customer need not to pay for the additional replicas located in secondary location.

When creating a storage account, Azure creates the replicas in another datacenter which is hundred miles from the datacenter in the same region. Once the replicas created, the data will be replicated asynchronously. There is no cost billed for creating replicas, data transfer etc.

Windows Azure also provides an option to turn off the Geo replication when required. In that situation, Microsoft decides that data in the primary location is falls under categories of non-critical or temporary data, data that can be recreated if the data is lost from other resource. By turning off the Geo-replication facility, the data in secondary location will be deleted completely and customer will get the price discount from 23% to 34% depending upon the data size.

The customer can turn on the Geo-replication again if required. But it triggers the billing for one time bandwidth charge to bootstrap the data from primary to secondary location (LRS to GRS). The amount of bandwidth charged for the bootstrap will be equal to the amount of data in the storage account at the time of bootstrap. Once the bootstrap is completed, there is no billing for normal replication of data from primary to secondary location.

5.3.4 Geo-Failover
When a primary location goes down due to major disaster, the Windows Azure team will try to restore the primary location. If it is major disaster and not be able to restore easily, the geo-failover process will start.

In the geo-failover process, the customer will be notified about the issue and the impact via the subscription contact information and the DNS entry will be updated to point to the secondary location from primary location. So all the traffic will be routed to the secondary location as the DNS updated to point to secondary location. The secondary location will be referred as primary location once the failover process completed successfully.

Note: There is no code change required in the application as everything will be handled by Windows Azure.

Once the datacenter affected by major disaster comes up, the secondary replica will be created and data will be replicated.

The other links on Achieving High Availability with Windows Azure Environment: