Published on January 1st, 2015 | by Warner Chaves2
Tutorial: Using a SQL Server distributed partitioned view to allow transparent querying of multiple Azure SQL databases
Today we’re goingto go over a very interesting scenario that is possible now with Hybrid-IT environments. We’ll do a step by step on how to access data from multiple Azure SQL databases transparently from ONE On-Premise or in the cloud SQL Server. One early warning: this will require SQL Server Enterprise 2012 or higher. This scenario might work with 2008 or 2008 R2 but I have personally not tested it on those versions. If this sounds interesting, read along.
What’s the Setup?
You have data split into multiple Azure SQL databases. Now you want to query this data or use reporting tools but you don’t want your users to have to be writing queries against all the specific Azure SQL databases. You want your users to connect to ONE SQL Server, query this one server and pull data from the multiple Azure dbs in a way that is transparent to them.
For example, you have a Transactions table with data from 2013, 2014 and starting to populate 2015. You want to take advantage of the different Azure Service Tiers (see our beginner’s guide here) so you split each year into 3 databases and assign different tiers to each year. Now you have 3 separate Transactions table on 3 separate databases but your users are used to just using SSRS for reports that can access ALL the data with ONE connection.
Basically, we will make data that is stored in multiple Azure SQL database tables look as if they were one SQL Server view.
How do we accomplish this?
- Once you have split the data into multiple Azure SQL databases, you need to create a CHECK CONSTRAINT on your table that DOESN’T OVERLAP with the other ones. This could be a check constraint on a datetime column, for example, one constraint for 2013, one for 2014 and one for 2015.
- You create Linked Servers to each of your Azure SQL databases.So if we split the data into 2013, 2014, 2015 we would end up with 3 Azure SQL databases and 3 SQL Server Linked Servers.
- You create a distributed partitioned view on the SQL Server that points back to the Azure Linked Servers.
- Instead of querying each Azure database separately, you query the distributed partitioned view and let SQL Server handle the data routing.
A very important note: you *could* create the Linked Servers to Azure with ODBC but that is NOT what you want to do. You want to create them with the SQL Native Client 11.0 or higher!!!!
The reason for this is that using the Native Client will allow SQL Server to extract the metadata from the different Linked Servers and do server elimination. What this means is that SQL Server will be aware of the constraints on the data and will only query the databases that hold the data requested. Using ODBC will have a nasty effect: the queries will always FAN OUT to ALL the Linked Servers. This is a waste of resources and again, not what you want.
Here’s a high level diagram to illustrate it:
What’s the use case?
This specific scenario has the potential for many powerful advantages:
- You can easily get up and running on a solid SQL platform using Azure SQL databases. Pay as you go and for what you need and expand or contract to more or less databases as your needs change.
- You can change to different service tiers that are the best fit for the different data. For example, you could put really old data on S0 service tiers and the latest data as a P1 service database.
- Existing tools that depend on having ONE connection will work just fine by querying the On-Premise SQL Server view. For example, SSRS, SSAS, Excel, etc. The tool is not aware at all that the data is split into multiple Azure SQL databases and sitting in the cloud.
Let’s look at the entire solution in detail now.
Step by Step
Let’s do an end to end example. I will be using a table that would hold data from all of 2014 and we’re going to split it into 3 different databases each one holding 4 months of data.
First, I created 3 different Azure databases called T1, T2, T3.
Ran this on T1:
-- Population of T1 - Q1 2014 CREATE TABLE azurePartitionTest ( entityID int IDENTITY(1,1), recordDate datetime, CONSTRAINT PKAzurePTest PRIMARY KEY(entityID,recordDate) ) GO alter table azurePartitionTest add constraint DateCHK check(recordDate >= '2014-01-01 00:00:00' and recordDate < '2014-05-01 00:00:00') insert into azurePartitionTest select datefromparts(2014,cast(rand()*1000 as int)%4+1,cast(rand()*1000 as int)%27+1) GO 5000
Ran this on T2:
CREATE TABLE azurePartitionTest ( entityID int IDENTITY(1,1), recordDate datetime, CONSTRAINT PKAzurePTest PRIMARY KEY(entityID,recordDate) ) GO alter table azurePartitionTest add constraint DateCHK check(recordDate >= '2014-05-01 00:00:00' and recordDate < '2014-09-01 00:00:00') insert into azurePartitionTest select datefromparts(2014,cast(rand()*1000 as int)%4+5,cast(rand()*1000 as int)%29+1) GO 5000
Ran this on T3:
CREATE TABLE azurePartitionTest ( entityID int IDENTITY(1,1), recordDate datetime, CONSTRAINT PKAzurePTest PRIMARY KEY(entityID,recordDate) ) GO alter table azurePartitionTest add constraint DateCHK check(recordDate >= '2014-09-01 00:00:00' and recordDate < '2015-01-01 00:00:00') insert into azurePartitionTest select datefromparts(2014,cast(rand()*1000 as int)%4+9,cast(rand()*1000 as int)%29+1) GO 5000
At this point we have 3 Azure databases, each one with a table called azurePartitionTest and each table loaded with 5000 records corresponding to the time range from the Constraint.
Next step, create the Linked Servers on the SQL Server. You need to make sure you do these:
- Use the SQL Native Client. Remember my comments above.
- You have added the IP of your SQL Server on the firewall rules for your Azure SQL database server. Otherwise you won’t be able to connect at all.
- Use the security option to pass through username and password information. You need to pre-create this user on your Azure SQL database so that it has proper access to read the data.
In this case, I created 3 Linked Servers: AzureT1, AzureT2, AzureT3. One for each database.
Once the Linked Servers have been created, you can create the distributed partitioned view:
CREATE VIEW azureDistributedView AS SELECT entityID, recordDate from AZURET1.t1.dbo.azurePartitionTest union all SELECT entityID, recordDate from AZURET2.t2.dbo.azurePartitionTest union all SELECT entityID, recordDate from AZURET3.t3.dbo.azurePartitionTest
At this point we can run queries like the following:
select * from azureDistributedView where recordDate >= '2014-06-01 00:00:00' and recordDate < '2014-08-01 00:00:00'
SQL Server will read the metadata from the Linked Servers and in this case will send the query only to database T2.
Your users don’t have to know where in the cloud the data for this particular date range is sitting, SQL Server knows because of the Linked Servers and the non-overlapping CONSTRAINTS we created on the tables.
If you expand the date range to cover more dates, SQL will send the query only to the databases it needs to.
Here’s an example of an execution plan, notice how the date range in the query applies to T1 and T2 and SQL Server generates two remote queries as expected:
Here are the disadvantages from this approach:
- Execution plans generated from distributed partitioned views are One-Threaded. No parallelism will happen and queries sent to multiple databases will NOT happen in parallel.
- You have to create a VIEW per TABLE that you want to apply this technique to. This can be somewhat automated with some scripting though (a topic for another blog post!).
- Any changes in the Azure SQL database schema need to be reflected back to the SQL Server view.
- The user needs to remember to specify the date range predicate on the WHERE clause to get proper server elimination.
I hope you can see how powerful these Hybrid-IT scenarios can be. I outlined both the pros and cons of this particular technique so I advise readers to set it up on a development environment first. Get comfortable with the solution and it’s caveats and if the results are positive then roll it out to Production!