Thursday, May 17, 2012

Autoscaling Application Blocks

Autoscaling Application blocks can automatically scale the Windows Azure application based on the rules defined specifically for the application. The Autoscaling application Block supports two autoscaling mechanisms:

1) Instance Autoscaling, where the block changes the number of role instances based on constraint and reactive rules.

2) Throttling, where the application modifies its own behavior to change its resource utilization based on set of reactive rules. For example switching off non-essential features, or gracefully degrading its UI.

So, there are two types of rules:

1) Constraint rules: Constraint rule set the upper and lower bounds on the number of instances. For example, in evening between 6:00 and 8:00, you need a minimum of 3 instances and a maximum of 7 instances, then use constraint rule.

2) Reactive rules: Reactive rule enables the number of role instances to change in response to unpredictable changes in demand. For example, if workload increases then increase the number of role instances by 1. The reactive rules can use a variety of techniques like performance counters, or windows azure queue length to monitor and control application’s workload. A reactive rule makes changes to the number of role instances only if a constraint rule applies at the same time. It is easy to create a default constraint rule that always applies.

While defining target of an autoscaling rule, you can identify a scale group instead of an individual role. A Scale group enables you to define autoscaling rules that target multiple roles. A scale group can have any number of roles.

Following is an example rule xml. In this there are two constraint rules. One is always active and default constraint. While other only become active at peak time daily at 6 for 2 hours, and overrides the default rule. There are two reactive roles: One will increase the instance count by 1 if the average CPU power usage for last 30 minutes is over 70%, while the other one decrease the instance count by 1 if the average CPU usage for last 30 minutes is less than 30%.

actions>

rule>

actions>

timetable>

rule>

constraintRules>

<when>

when>

actions>

rule>

<when>

when>

actions>

rule>

reactiveRules>

operands>

rules>

Conflicting Rules

1) Conflicting Constraint and Reactive rules: A constraint rule always overrides a reactive rule.

2) Conflicting Constraint rules: If two or more constraint rule includes timetables that specify they are active at the same time then

a) The rule with highest rank is given priority.

b) If two or more constraint rules of same rank conflict, then block will perform the action from first constraint rule it finds.

3) Conflicting Reactive rules: If two or more reactive rules results in conflicting changes to number of role instances then

a) The rule with highest rank is given priority.

b) If two or more reactive rules of same rank conflict, then if any rule suggests increase in number of instances, then largest increase is used.

c) If two or more reactive rules of same rank conflict, then if any rule suggests decrease in number of instances, then lowest decrease is used.

For example if one rule suggest increase the no. of instances by one, another suggest increase the number by two, and another suggest decrease the number by one, then the number will increase by two. Another example, if one rule suggest decrease in number of instances by one, another suggest decrease in number by three, then number of instances will decrease by one.

4) Conflicting actions on scale group: It is possible that multiple rules could suggest different scaling actions on same target at same time, either because same role is member of different scale group or so. In that case, it uses same logic as it is used in conflicting reactive rules.

Tuesday, May 15, 2012

Queue based messaging in Windows Azure

A typical messaging solution exchanges data between its distributed components using message queues includes publishers publishing messages into queues and subscribers intended to receive messages. The subscriber can be implemented as single- or multi- threaded process, either continuously running or initiated on demand.

At higher level there are two primary queuing mechanism used to enable queue listener (receiver) to receive messages stored on a queue:

Polling or Poll- based model: A listener monitors a queue by checking the queue at regular intervals. A listener is a part of a worker role instance. Main processing logic is comprised of a loop in which messages are dequeued and dispatched for processing. The listener checks for messages periodically. The queue is polled until listener is notified to exit the loop. Windows Azure pricing model measures storage transactions based on requests performed against the queue, regardless if the queue is empty or not.

Triggering or Push- based model: A listener subscribes to an event triggered either by the publisher or by queue service manager, whenever a message arrives on a queue. Then the listener dispatched the message for processing. So, it does not have to poll the queue in order to determine whether any new work is available or not. A notification can be pushed to the queue listeners for every new message, or when the first message arrives to an empty queue, or when queue reaches a certain level. While using Windows Azure Service Bus volume of messaging entities like queue or topics should be considered.

Best Practices for Optimizing Transaction Costs

In queue-based messaging solution, the volume of storage transactions can be reduced using combination of the following methods:

Group related messages into a single larger batch, and compress and stores the compressed image in the blob storage, while keeping a reference of blob in queue.
Batch multiple messages together in a single storage transaction. The GetMessages method in the Queue Service API enables de-queuing the specified number of messages in a single transaction.
While polling, avoid aggressive polling intervals and implement a back-off delay that increases the time between polling requests if a queue remains continuously empty.
Reduce the number of queue listeners – when using a pull-based model, use only 1 queue listener per role instance when a queue is empty. To further reduce the number of queue listeners per role instance to zero, and use a notification mechanism to instantiate queue listeners when the queue receives work items.
If queues remain empty for most of the time, automatically scale down the number of role instances and continue to monitor relevant system metrics to determine if and when the application should scale up the number of role instances to handle increasing workload.
Using a combination of polling and push-based notifications, enabling the listeners to subscribe to a notification event (trigger) that is raised upon certain conditions to indicate that a new workload is put on queue.

Friday, May 11, 2012