Published on January 7th, 2015 | by Warner Chaves5
Top 10 Must-Do Items for your SQL Server Very Large Database
I’ve received some great questions and comments after delivering my material on “Best Practices for your SQL Server Very Large Databases“, both when presenting to live audiences and posting it here on SQLTurbo. A common thread along those comments is people asking me “OK, I understand all these are very important but I don’t have time/resources/budget to get all these done, so where do I start?”.
To answer this particular question I created this Top 10 of Must-Do Items for your SQL Server Very Large Database. I’m going to be outlining the practices that in my experience have given my clients the biggest benefits when working with their Very Large Databases. I will also explain why this particular item is important and make any notes about version or usage as necessary.
I’m not going to rank them in any order. The criticality of each one of these depends on your particular environment so I will leave that to you as a DBA to decide which ones should have a higher priority. Without further ado, let’s get started!
1. Use Multiple Filegroups
A lot of times a VLDB will start in life like any other small database and keep growing and growing over time. Suddenly the DBA is left with multiple TBs and all on a single PRIMARY filegroup. On the other hand, if you already know that your database has a goal of growing to multiple TBs then you should design it from the ground up to have multiple filegroups.
I don’t personally think there’s any advantage in doing something like an ‘Indexes’ filegroup. I think filegroups should be created based on these criteria:
a) Is it read-only data? Then you can mark the filegroup as read-only and back it up once and keep that one backup.
b) Does the data in the filegroup support a specific set of application functionality? In this case, you can think of the scenario of doing a piecemeal restore. For example, if your application has a set of functionality that is absolutely critical you could put it on a separate filegroup that you will restore right after the PRIMARY one. Or for example, restoring the filegroup with the oldest partitions in a table last.
2. Perform Smart Maintenance Routines
What I mean by “Smart” is routines that look at the current state of the database to decide what is the work they will do. The classic example is checking for the fragmentation level of a table to see if you will do a defrag or a rebuild. This is in contrast to the SQL Server Maintenance Plans which will simply run on any table regardless if it’s fragmented or not.
A similar approach can be taken with updating statistics as well. Some tables will have a higher amount of modifications than others and you should target the ones that have gone through many changes. Suffice to say, using read-only filegroups also helps in this regard, since you simply don’t need to run any maintenance on data that is not changing at all.
As the database grows you might need to customize these routines to fit your particular needs but to get started you can use Ola Hallengren’s great scripts.
3. Use Backup Compression or similar technology
Im surprised by how many people use SQL Server 2008 R2+ and still don’t use backup compression. This of course if a product of never properly configuring an instance after installation since the feature has been available even on Standard edition since the 2008 R2 release. When dealing with VLDBs it’s critical to use compression to keep the backup sizes in check. Extra points if your strategy is also using filegroup backups.
Of course you don’t have to stick to the native backups. There are many 3rd party tools that offer some great features as well on top of compression like multi-threading, configurable CPU consumption, object-level restore, etc. Also, most modern storage will have some type of deduplication tech to cut down on storage cost as well.
4. Use Table Partitioning
This one has been available since SQL 2005 but is still an Enterprise only feature (or a Premium feature if we’re talking about Azure SQL database V12). Why is table partitioning important for a VLDB? Let’s see:
a) You can have a table that spans multiple filegroups. This gives you all kinds of flexibility for storage, performance and recovery.
b) You can load data and extract data as a metadata-only operation with SWITCH commands.
c) If you’re running SQL Server 2014, you can update stats on particular partitions instead of one massive table.
d) You can do index maintenance at a partition level instead of the entire table. Bonus points if you’re on SQL Server 2014 so you can even do online-rebuilds per partition.
e) It makes life easier to use Columnstore indexes with SQL Server 2012 because you can manage the columnstore at the partition level.
f) You can get better performance for queries that specify predicates that allow SQL Server to do partition elimination.
g) You can specify different compression levels on a per partition basis.
In general the toughest thing about partitioning is dealing with a table that wasn’t designed this way from the beginning. In those cases, do your work on your development environment to script out an entire solution to migrate the data over to a partitioned table and then work little by little to move it all on your production server.
5. Use Data Compression
As your VLDB grows you’re going to want to keep it’s size in check and you’re going to have a lot of accumulated data that is not queried often. For example, clients in the finance industry usually have to comply with all kinds of regulations that might force them to keep records worth of several years of data. It doesn’t make sense to keep the data from 7 years ago consuming as much storage as the data that is from last week.
Starting with SQL Server 2008 Enterprise, we’ve had some different options on the menu for compression. Row compression is recommended for the more frequent data that is still very active and I recommend page compression for all the data that has to stay available but is not frequently queried on the system. Depending on the type of queries you do, compression might even perform better because of the decrease in IO.
Like I mentioned above, combining this with partitioning is a very powerful combination since you’ll be able to do something like uncompressed for the 3 newest partitions, row compression for the 3 next ones and page compression for the rest.
If you’re on SQL 2014 Enterprise then you can also take advantage of the clustered Columnstore indexes and get great compression gains as well. The non-clustered columnstore (2012) is also compressed but since it’s non-clustered it doesn’t give the gains of the clustered version in terms of reducing data footprint.
6. Implement a High Availability Solution
Duh, this should be on the Must-Do for ANY modern 24×7 production system. And yet I still frequently see systems that rely purely on backups as their recovery strategy. Now if your database is small and you can simply fire up your restore scripts on a different server and have it up and running withing your RTO then sure, this might not be a priority. But this article is talking about VLDBs, and most likely it will take more than a few minutes to run a full RESTORE sequence of that 10TB data warehouse.
Now depending on your budget you might not be able to keep a warm copy of your VLDB so that means you wouldn’t use Log shipping, Mirroring, Availability Groups or storage replication. If that’s the case then Windows clustering is a solid solution and it has consistently gotten better and more resilient on every release.
7. Take advantage of Columnstore indexes
Columnstore Indexes are a great fit for VLDBs, specially if you’re running SQL 2014. This Enterprise-only feature was introduced in SQL 2012 but with the caveat that it turned the table into a read-only object. As I mentioned before this was alleviated if you had a partitioned table so then you could only rebuild the columnstore on the partition where you were loading data.
Regardless, with the release of SQL Server 2014 this feature really shines when used on VLDBs for the following reasons:
a) It’s read-write so no more limitations on that front.
a) Huge compression gains. The columnar storage means that more similar values are packed together and thus better compression is achieved. On top of that you can even activate the new archival compression to squeeze even more space gains.
b) Processing in batch mode. This will optimize and greatly increase performance for queries that scan many rows.
A big note on columnstores though: they are best suited for Append-Only modification patterns. If you’re going to be deleting or updating many of the columnstore rows you will not get maximum performance out of it.
8. Use Tiered storage
Tiered storage is getting a lot more popular with vendors because of the ability to build cost-effective solutions that perform well for different types of data. Usually a tiered storage system will have a cache, a front layer of SSDs, some fast magnetic drives and finally some slower but very high density magnetic drives. Depending on the vendor you have the option of creating mountpoints on a specific tier or it can even automatically do the moving of the data between tiers depending on the usage.
The advantages for a VLDB should be obvious for any experienced DBA. You can put the “fresher” data on the faster tiers and keep the stuff from years past in the slower tiers. You can even move up or down your capacity on each tier depending on your needs and at the same time your entire database is available for querying. Since this is a hardware feature, SQL Server will not be even aware of what’s going on under the hood and it will all be transparent to your end users. The recent data will be faster and the old data might be slower but available nonetheless.
9. Check for duplicate, redundant or unused indexes
As time goes by many databases start accumulating a lot of trash. Many people work on them, people apply DTA suggestions blindly, someone creates an index for a query that they ran once, etc. All kinds of things. The end result is that a significant portion of your VLDB might be occupied by redundant pieces or completely unused pieces.
Thanks to the DMVs we can easily check the definition and usage of indexes and get rid of most of the trash. Look for indexes that might be subsets of others and speak to the developers to see if it’s necessary. Same with indexes with the same keys but different INCLUDES, there might be an opportunity for index consolidation there.
Finally, use the Index Usage DMV to look at your index usage pattern over time and over a full business cycle. If you have unused indexes even after your entire business cycle is done then propose to disable them and eventually drop them.
The end result that we want is a lean, mean database schema where every table and index has a known purpose and function.
10. Make sure your tempdb is properly configured
This one should also apply to any production system but again, it has a really big impact on a VLDB. The main reasons are that ETL processes will usually do operations on temp tables, execution plans that work with a lot of data might use tempdb a lot more aggressively and also row-versioning concurrency is common on these systems to avoid shared locking. So always make sure that:
a) You have enough tempdb files depending on your cores (I usually say start with 50% the amount of cores to a max of 8 and increase if necessary).
b) Your storage tier holding tempdb is fast enough.
c) Your execution plans are optimal and not doing a lot of spills or spooling on tempdb.
And that’s the list! I’m sure I missed some good ones and there were many tips mentioned on the presentation that I didn’t include to keep the list down to 10. If you have any other suggestions, leave a comment or shoot us an email!