- February 19, 2015
- Posted by: Warner Chaves
- Category: Database Administration, SQL Server Performance Tuning
I’m surprised by how little of SQL Server’s 2014 marketing went into all the small but very meaningful improvements made to the core database engine. Of course we all heard about In-Memory OLTP and Buffer Pool Extensions but for example, not a lot of people are familiar with all the improvements done for partitioned tables.
These are the three major improvements for partitioned tables:
1. Individual partitions can now be rebuilt ONLINE
This was a big let down in previous versions of SQL Server. Oh you have a nice partitioned table and only the most frequent partition is getting active UPDATES and INSERTS and now you need to rebuild it. And what’s that? Yeah, you could only do it OFFLINE or you had to rebuild the ENTIRE TABLE. Total bummer.
Finally in SQL 2014 we have access to this feature in the way that we’ve been waiting on for several years. You can pick a specific partition, set it to REBUILD and set it to ONLINE=ON. The rest of the table is unaffected and the partition itself is still accessible thanks to the ONLINE functionality.
When should you use it?
Any time that your fragmentation levels have gotten too high on a particular partition. No need to do it OFFLINE anymore, no need to rebuild the entire table ONLINE (which nobody was doing anyway because it’s such an expensive operation). Just do it ONLINE from now on.
Full details in the syntax for ALTER INDEX.
ALTER INDEX IX_SalesHistoryClustered
REBUILD PARTITION = 10
WITH (ONLINE = ON)
2. Partitioned tables support incremental statistics
So you have a large partitioned table, you notice that your stats are not covering a particular range and you want to update them. In versions previous to 2014 you had two options:
a) You bite the bullet and update the stats on the entire table.
b) You come up with some filtered stats strategy where you create filtered stats to cover ranges that align with your partitions.
Neither one of those approaches is ideal. The first one is a waste of server resources and the second one requires spending time on scripting and ongoing management.
Now on SQL Server 2014 we have a new approach that solves this problem: incremental statistics. Basically, a statistic object that spans the entire partitioned table but that can merge new information from updating only a specific partition in the table.
When should you use it?
If you were not using filtered stats as a workaround then I recommend you switch to incremental statistics as soon as you upgrade to SQL Server 2014. If you already invested time on getting the filtered stats approach working and it’s working fine for you then I would be more hesitant and do some testing to decide which way you want to go. The reason for this is that filtered stats can potentially be providing more accurate estimations to the optimizer by producing multiple histograms compared to switching to one histogram for the entire table.
First option is to set it at the database level and then every time the engine has a chance to create an incremental stat it will do so.
ALTER DATABASE [ErpDB] SET AUTO_CREATE_STATISTICS ON (INCREMENTAL=ON);
Second option is to create them manually:
CREATE STATISTICS salesVendorStat ON dbo.SalesHistory(Vendor)
WITH FULLSCAN, INCREMENTAL = ON;
Either way they get created, after they are there you now have the option of updating the statistics only on specific partitions:
UPDATE STATISTICS dbo.SalesHistory(salesVendorStat) WITH RESAMPLE ON PARTITIONS(9,10)
Note that when using UPDATE STATISTICS with incremental stats, the RESAMPLE option is required if you want to specify a particular partition. This is because the stats object can’t be merged if the stats were calculated with a different sampling rate.
3. Priorities on index operations make maintenance easier
The last feature we’re going to highlight is the addition of “low priority waits” when acquiring locks for Index maintenance and for Partition switching. The “Running SAP on SQL Server” team covered this new feature in amazing detail on this blog post. If you’re interested in the technical history and internals of how this feature came to be then please head over that great article. In summary, the team wanted to provide better control to DBAs to decide who had the “right of way” to acquiring locks between maintenance routines and the normal database workload.
This new feature now gives us the option of deciding how long we will wait to acquire the necessary schema locks and what behavior to take after the time has elapsed: do we kill the blockers or we kill our own process?
When should you use it?
If you’re doing Online Index rebuilds or Partition Switching and the table is always busy serving application requests you definitely want to consider this. The actual settings of how long to wait and which process to kill definitely will depend on the environment conditions.
On some environments you can afford to wait a long time before maintenance starts where in others you will need to get it started by a specific time to avoid going for too long. Some environments will prioritize getting the maintenance done over lower priority application requests, others will always prioritize client requests over scheduled maintenance. You as a DBA need to be the one to make this call with your business owners.
ALTER index [IX_SalesHistoryClustered]
REBUILD WITH (ONLINE = ON
(WAIT_AT_LOW_PRIORITY ( MAX_DURATION = 60 MINUTES, ABORT_AFTER_WAIT = SELF) ))
Stuff I’d still like to see
This is my partitioning related wishlist for features that SQL Server could still implement:
- The engine would manage a histogram per partition instead of having just one 200 step histogram for an entire partitioned table.
- If I wanted to have an aligned PK index, it would not require me to change the index definition to add the partitioning key column to it. This is a pain when trying to convert existing tables to partitioned tables.
- The optimizer could have some dynamic rules where you wouldn’t need to specify the partitioning key to get partition elimination but it would do it based on actual runtime values (for example from the range of a table joining to the partitioned table).
- Instead of just range partitioning, provide a way to specify a function to apply to a column and get the partition from there (newsflash: both MySQL and Oracle already have this!).
Maybe we’ll see some of these in a future release of SQL Server… I can dream right?