In reading various blog posts, forums, and Stack Overflow questions there is still quite a bit of confusion around the different data storage options available in Windows Azure. This is probably mostly due to Microsoft changing their strategy, architecture, and naming conventions a few times.
For example, SQL Azure used to be called SQL Server Data Services (SSDS), was non-relational, and used an Entity Attribute Value (EAV) schema – giving it a major overlap in the functionality of Azure Table Storage. This obviously provided a pretty poor migration path to the cloud for existing applications using relational SQL Server storage. It was also confusing for developers, seeing that there were two non-relational storage options. So, Microsoft dumped/revamped SSDS and turned it instead into the now fully-relational hosted SQL Server offering, SQL Azure.
Below I’ve outlined some key bullet points around the various Azure storage options, as well as when each option should be used.
Azure Blob Storage
- You can think of blob storage as the file system of Windows Azure.
- Blobs are stored in blob containers. You can think of a blob container as a folder.
- You can give a blob a primary key and some key/value metadata when you upload the blob to storage.
- Use the StorageClient class included with the Windows Azure SDK Samples to interact with blob storage from .NET (rather than using the REST API directly). This greatly simplifies things and adds some additional functionality like retries on failed calls.
- Use blob storage to store anything that you would normally store as a file or database blob.
- Querying is very limited – you can only pull a list of blobs by their container or by their primary key.
- There are two steps to retrieving a blob – first you pull the metadata, and then you make a separate call to pull down the actual byte content. For example, retrieving a list of blobs by their container retrieves a list of blob metadata. This prevents you from pulling unnecessarily pulling down a ton of data.
- Azure Blob Storage can be accessed from within the cloud or from outside of the cloud (i.e. a desktop or intranet application). Your blob storage API URL is publicly accessible, however it requires the use of a secret key to access unless you’re retrieving public blobs.
Azure Table Storage
- Use Azure Table storage to store simple structured data and objects.
- You can think of Azure Table Storage as a bunch of stand alone object tables with no relation to each other. There are no real foreign keys except what you implement on your own.
- Azure table storage is implemented as an Entity-Attribute-Value (EAV) database in the backend, but this is mostly abstracted via the StorageClient class included with the Windows Azure SDK samples.
- You can store more than one type of object with varying properties in a table, but this is not recommended for the sake of keeping things simple. This at least allows your objects to evolve over time as you add/remove fields.
- Every object that you store in Azure Table Storage must have a PartitionKey and a RowKey.
- The combination of the Partition Key and Row Key is the Primary Key.
- The Partition Key and Row Key are the only indexed values on the objects.
- The Partition Key can be used to logically partition data within your tables if required – though you can hard code the Partition Key if you don’t need it.
- The Row Key should be a unique key within the partition. You could use a Guid, for example.
- You can query Azure Table Storage via LINQ, though not all operations (i.e. scalars) are supported. Performance would definitely be a factor on queries from an EAV database. Make use of the indexed partition and row keys if possible for faster querying.
- Azure Table Storage can also be accessed from within the cloud or from outside the cloud (i.e. a desktop or intranet application). Your table storage API URL is publicly accessible, however it requires the use of a secret key to access.
Azure Queue Storage
- The main use of Azure Queue Storage is for communication between your Azure web and worker roles or between worker roles, i.e. for picking up and dropping off data to be processed.
- Can pass simple string messages in a queue.
- Similar to other message queueing frameworks.
- Also can use the StorageClient class in the Windows Azure SDK samples to abstract the HTTP REST API.
- SQL Azure is very similar to an on-premise SQL Server databases.
- You can currently create a 1 GB database (currently $9.99/mo) or 10 GB database (currently $99/mo). Much more expensive than Windows Azure Storage options (blob/table/queue).
- Use SQL Azure for storing complex relational data – i.e. any time you would normally store data in a database.
- For the most part, you can simply change your SQL Server connection string to point to your SQL Azure connection and your application will work fine in most cases.
- Your SQL Azure instance is publically accessible via a SQL TCP connection over the internet. You can retrieve the connection string for your database from the SQL Azure dashboard.
- Though the database is publicly acessible, you are given access to create basic inbound firewall rules for which hosts are allowed to access your SQL Azure database. For example, you can set it up such that your database is only accessible from within your Windows Azure roles or you could set it up such that it would only be accessible from IPs originating from your organization.
- SQL Azure has much of the same capabilities as on premise SQL Server, i.e. full T-SQL querying capabilities, indexing, stored procedures, triggers, views, etc.
- You can connect to SQL Azure via all the standard methods, i.e. ODBC, ADO.NET, PHP Drivers, LLBL, NHibernate, Entity Framework, etc.
- Some features are not supported including SQL Profiler, backup, replication, filegroups, manipulation of physical file resources, etc.
- Many of the unsupported features are taken care of for you by Azure, i.e. backup, replication, high availability, resource governance, etc.