AlwaysOn Availability Groups and Automation

Last week I encountered a unique application requirement—we have a database environment configured with AlwaysOn Availability Groups for high availability (HA) and disaster recovery (DR), and our application was going to be creating and dropping databases from the server on a regular basis. So, I had to develop some code to handle this process automatically. I had a couple of things going in my favor on this—it was our custom developed application that was doing this, so I knew the procedures I was writing would be called as part of the process, and additionally, our Availability Group was only two nodes at the moment, so my version 1 code could be relatively simplistic and work. Aaron Bertrand (b|t), posted on this stack exchange thread, his code is a good start. I’m not going to put all of my code in this post—it’s ready for our prime time, but I have a few more fixed I’d like to make before releasing the code into the wild.

Dropping a Database

First of all, I need to say it’s very important to secure these procedures—they can do bad things to your environment if run out of context—particularly this one. I denied execute on everyone except the user who would be calling the procedure. I didn’t want any accidents happening. Dropping a database from an availability group is slight different then doing it from a standalone server. The process is as follows:

  1. Remove the database from the availability group (on the primary)
  2. Drop the database from the primary
  3. Drop the database from the secondary node(s)

Since, we have to initiate this from the primary instance we need to find out two pieces of data—1) what availability group are we removing the database from and 2) is the instance we are on the primary instance. In my code—I didn’t actually have to drop the database from the primary server. That piece was being called from another proc. So, I just had to remove from the availability group, and remove on the secondary. There are a number of ways to connect the secondary database—this needs to happen in SQLCMD mode, which isn’t possible in a stored procedure. We could use a linked server, or we could enable xp_cmdshell and run a SQLCMD script, and then disable xp_cmdshell. This isn’t my favorite technique from a security perspective, but I was under a time crunch, and the procedure is locked down, but in the future I will probably rebuild this with linked servers (created within the procedure)



ALTER PROCEDURE [dbo].[RemoveDBfromAG] @dbname VARCHAR(50)



EXEC sp_configure 'show advanced options'




EXEC sp_configure 'xp_cmdshell'





DECLARE @groupid VARCHAR(50)

DECLARE @groupname VARCHAR(50)


DECLARE @primary VARCHAR(50)

DECLARE @secondary VARCHAR(50)


DECLARE @dropsecondary VARCHAR(500)

SELECT @dbid = database_id

FROM sys.databases

WHERE NAME = @dbname

SELECT @groupid = group_id

FROM sys.availability_databases_cluster

WHERE database_name = @dbname

SELECT @groupname = NAME

FROM sys.availability_groups

WHERE group_id = @groupid


SELECT @primary = primary_replica

FROM sys.dm_hadr_availability_group_states

WHERE group_id = @groupid

SELECT @secondary = node_name

FROM sys.dm_hadr_availability_replica_cluster_nodes

WHERE node_name != @primary

SELECT @sql = 'alter availability group ' + @groupname + ' remove database [' + @dbname + ']'

SELECT @dropsecondary = 'sqlcmd -S "' + @secondary + '" -E -Q "exec ReturnsTestInstanceDropSecondaryDB [' + @dbname + ']"'


SELECT primary_replica

FROM sys.dm_hadr_availability_group_states

WHERE primary_replica = @primary







EXECUTE sp_executesql @sql

WAITFOR DELAY '00:00:25'

EXEC xp_cmdshell @dropsecondary

EXEC sp_configure 'xp_cmdshell'







The one particularly unique thing you will notice in the code—is a “WAITFORDELAY”—what I observed is that after the secondary database is removed from the availability group, it goes into recovering for about 10 seconds—and we are unable to drop a database while it’s in recovery. By implementing that wait (the T-SQL equivalent of a sleep command) the database was able to be dropped.

Adding a New Database

Adding a new database has similar requirements—we have a couple of additional factors though. We have to verify that the instance we are on, is the primary for the availability group we are adding the database into. This is where I really need to fix my code—it assumes that there are only two nodes in our availability group cluster. I need to refactor the code to potentially loop through the other four secondaries (or 7 if we are talking about SQL Server 2014). Also, I’m using a linked server connection—this also assumes that the drive letter and path to the data file directory on the secondary are also the same. To summarize, the process is as follows:

  1. Accept the availability group and database names as input parameters
  2. Verify that the node you are working on is the primary for the availability group
  3. Backup the database and transaction log of the database to file share you’d like to add to the availability group
  4. Add the database to the availability group on the primary
  5. Restore the database and transaction log to the secondary with norecovery
  6. Alter the database to set the availability group

My code for this isn’t totally flushed out—it works in my environment, but I don’t think it’s ready for sharing. I’ll share later, I promise.

Pitfalls to Avoid

This isn’t that different than just building out an availability group, but many of the same caveats apply, you need to ensure agent jobs and logins affiliated with a database are on all nodes in the availability groups. Additionally, the procedures to add and remove databases from your availability group, need to run on all of the nodes. Also, if you are doing this programmatically, the connection should use the listener, so that your application is always connecting to the primary instance.

On Presenting—Making Bad Days Good

Last weekend I had the pleasure of speaking at a wonderful technology event, in the Caribbean. Ok, so it wasn’t the Caribbean, it was Cleveland, but a lot of my friends were there, I had a good time and the event was awesome. Thanks Adam (b|t) and crew for running a really fantastic SQL Saturday.

Anyway, on Friday my good friend Erin Stellato (b|t) asked if I could do an additional presentation, that I’ve done frequently in the past (it’s on High Availability and Disaster Recovery in SQL Server—one of my favorite topics). I accepted the offer, but the only issue is that I didn’t have a demo environment, and if there’s anything I’ve learned about SQL Server audiences in my years of speaking, is that they love demos. So I spent a decent amount of time on Friday and Saturday morning getting a virtualization environment built, and didn’t pay too much attention to my other presentation, on columnstore indexes.

The HA and DR went great, but I flustered a little in my columnstore session—I knew the materials, but I hadn’t touched the demos in a while, and I just really didn’t like the general flow of the session. Combining the way I felt, with the attendee feedback (attendees—always give feedback, speakers love it, even if it’s bad), I decided to refactor both my demos and some visualizations in my slides, in an effort to make things clearer. Fortunately, I get to present the session again this week (thanks Edmonton SQL Server UG), so I’ll get to test the new slides.

The moral of this story, is sometimes we have bad days—everyone does. However, getting overly upset about it, doesn’t do a whole lot of good—think about where you can improve and do it.


T-SQL Tuesday—Bad Bets

The year was 94, in my trunk is raw, and in the rear view mirror was the mother $%^in’ law. Got two choices y’all pull over the car or bounce on the devil, put the pedal to the floor” – Jay Z 99 Problems

Ok, so the situation I’m going to be talking about isn’t as extreme as Jay’s of being chased by the cops, with a large load of uncut cocaine in my trunk. The year was actually 2010, things had been going well for me in my job, I had just completed a major SQL consolidation project, built out DR and HA where there was none before, and was speaking on a regular basis at SQL Community events. A strange thing happened though—I got turned down to go to PASS Summit (in retrospect, I should have just paid out of pocket,) and I wasn’t happy about. All of the resources in the firm were being sucked into the forthcoming SAP project. So, back when I thought staying with a company for a long time was a very good thing (here’s a major career hint—focus on your best interest first, because the company will focus on its best interest ahead of yours), I decided to apply for the role of Infrastructure Lead for the SAP project. I’ve blogged about this in great detail here, so if you want to read the gory details you can, but I’ll summarize below.

The project was a nightmare—my manager was quite possibly mental, the CIO was an idiot who had no idea how to set a technical direction, and as an Infrastructure team we were terribly under-resourced. The only cool part, was due to the poor planning of the CIO and my manager, the servers ended up in Switzerland, so I got to spend about a month there, and work with some really good dedicated folks there. I was working 60-70 hours a week, gaining weight and was just generally unhappy with life. I had a couple of speaking engagements that quarter the biggest of which was SQL Rally in Orlando—I caught up with friends (thank you Karen, Kevin, and Jen) and started talking about career prospects. I realized how what I was doing was useless, and wasn’t going to move my career forward in any meaningful sense. So I started applying for jobs, and I ended up in a great position.

I’ve since left that role, and am consulting for a great company now, but almost every day I look back fondly at the all the experiences I got to have in that job (the day after I met my boss, he offered to pay my travel to PASS Summit 2011) and how much it improved my technical skills and gave me tons of confidence to face any system.

The moral of this story, is to think about your life ahead of your firms. The job market is great for data pros—if you are unhappy, leave. Seriously—I had 5 recruiters contact me today.

What Not To Do With Hadoop

I gave a presentation on Hadoop to the Toronto SQL Server User’s Group this week—it was a really good time, and I got a ton of questions covering a wide range of areas. One thing to note—distrubution of data in the cluster is managed by the name node, which also does the job task management. I did get one question—that I think was more theoretical than anything, but I wanted to put it here for posterity.

“Can we mix Windows and Linux nodes in Hadoop cluster?”

One of the things I love about Hadoop is ease of deployment—the clustering is all network (no shared storage) and clustering is inherit in the software, there is nothing to configure. So you won’t see tips from me on how to add disk to your Hadoop cluster—you just tell the cluster where your new node is and what it’s IP address is and your set.

Back To the Question

I haven’t worked with Hortonworks’ Windows distribution of Hadoop much—so I don’t even know if it’s theoretically possible to mix nodes. But just don’t do it—a) I would never mix even versions (e.g. Windows 2003/2008—and yes I know you can’t) in any cluster computing scenario. So to mix different O/S platforms strikes me as insanity. Point b) no one else is doing this—support will be painful. One of the beauties of open source software is the community support—forums, email list, and blogs. If you are using a system configuration the virtually no one else is using, it will be much harder to get support.

The Answer

No. A million times no.



Backing Up Your SQL Server Database to Windows Azure Blob Storage

So recently, I was speaking at my first SQL Saturday of 2014 in Nashville (great job by Tammy Clark (b|t) and her team to put on a great event!) and due to a weird confluence of circumstances, I couldn’t get to my laptop (it involves my car being in the shop, an emergency performance tuning engagement snow, and not being about to get into my office.)

So, I thought, no fear, I have my trusty Surface Pro and a MSDN account—so I fired up a couple of virtual machines running SQL Server 2012 and 2014. The databases I use for my Data Compression and Columnstore presentations are Adventure Works, but I do a lot of prep work to build some big tables to make the demos be more dramatic. Unfortunately, when I started down this performance was pretty slow—something I’m likely attributing to thin provisioning on the Windows Azure Infrastructure side.

Since both databases (2012 and 2014) are the same—I thought let me just do a backup and restore, especially since I didn’t need to actually do work—I just had to fire up a restore. Additionally, I wanted to learn how to do something new.

First Things First

So I’m going to go on limb and assume that you have a Windows Azure account. From there you’ll need to create a storage container.


You want to make note of a couple of things in your storage account. I added a new storage account called ‘sqlbackupsjd’, and then I created a container a container called ‘mycontainer.’ You will want to make note of the URL associated with this container. We will need it when we get back to SQL Server.

Also, you’ll want to click ‘Manage Access Keys’ and grab the primary access key.


Moving on to SQL Server

You’ll need to be on SQL Server 2012 SP1 CU2 or forward to take advantage of this feature. The first thing we will need to do is create a credential to be able to pass our security information to Windows Azure.

So in my case, ‘mystorageaccount’ is ‘sqlbackupsjd’ and the secret is my storage access key. From here—backing up the database is really straightforward.


Restoring is just the opposite of the process. On the target server we create the credential, and then do the restore.



Cloud backup is one of my favorite use cases for Azure services. Much like DR, many organizations have a requirement to have offsite backups. This process can be an easy way to automate this process, without worrying about handing off your tapes to someone else. (Or storing them in the DBA’s wine cellar). It’s a really easy process with SQL 2012.


Don’t $%^&ing Delete Your SQL Server Cluster Resource

I track the metrics on my blog to see which posts are popular, in an effort to gauge my writing and what people are interested in. One post in particular, that I wish wasn’t popular but is, is “Recovering From a Deleted Cluster Resource.”

This post stemmed from an incident at my previous employer, where an offshore DBA had deleted the cluster resource from failover cluster manager “because he used to work at Microsoft, and they did this all the time” (bad idea jeans). In most cases that I’ve seen, people will try to do this when there is some security problem with a cluster. This won’t fix your security problem, however this KB article might.

What will happen however, is that your clustered instance will go down. HARD. Your clustered instance is dependent on that resource–so when the cluster loses contact with the resource, it will stop the SQL Service, and won’t be able to start back up, because of the missing dependency. You actually run the risk of having data corruption if this happens. Fun, fun.

Note—this isn’t the worst thing you can do to a cluster. That would be deleted the active directory cluster computer object—if your AD recycle bin isn’t turned on, there is no recovery except to rebuild the cluster. Also, formatting cluster disks is also a really terrible idea.

A Little Bit More on Columnstore (SQL 2014) Internals

I wrote a post on Monday with some details about columnstore index insert and update process. I mistakenly posted initially (I’ve since updated) that the tuple mover process kicked in when 1000 rows were loaded (thanks to Jimmy May from Microsoft b|t)—the correct answer is 220 or 1,048,576 rows. Remus confirms this here, but I wanted to test out the process for myself, and I found some interesting things.

Before I go too deep into my lab work, this is pretty important to the columnstore index performance story for a couple of reasons. First, when a columnstore index has open delta stores, it needs to parse those in order to answer the query. This is an expensive operation, as it prevents the query from being completely execute in batch mode and in general the insert process is slower, as Microsoft has written an optimized insert API for this process. Additionally, we would like our columnstore row groups to be as large as possible to optimize the row elimination process in answering the query. Let’s examine some of the impact.

Batch Loading

I’m not sure if I’ve found this documented anywhere, but the threshold to trigger the bulk insert API appears to be 102,400 rows—if we do an insert of 102,399 rows we see an open delta store. See below:

Note the message “Column 0 was omitted from VertiPaq during column index build.”—that’s indicative of the data getting processed directly into the columnstore. Also note the time—279 ms to load 102,400 rows. If we check the sys.column_store_row_groups for the table.

As we see—that load was added to the columnstore and compressed. Note—102,400 rows in our row group is not optimal for performance. We’d really like to have it be much larger for better segment elimination. In fact, if we do a rebuild on the index, we will see our rows get distributed more evenly.


Now let’s see what happens if we only load 102,399 rows.



So first of all, we see something huge—the execution time is 994 ms compared to 279 ms. That means it most likely got loaded to a b-tree delta store. Let’s confirm.

Sure enough—we see an open delta store. So I run my standard warehouse query against the table:


If I rebuild the index (so the delta store gets compressed) my logical reads drop, and my execution time drops slightly. Note—this is a small size dataset in terms of columnstore—in a much larger data set the effects would be much more dramatic.



One Last Thing

While I was playing with the tuple mover process yesterday I noticed something unique—a row group with the state of “invisible”

This description is documented in this connect item, and will be in books online in the near future. According to Remus:

“The Tuple Mover reads one closed delta store at a time and starts building the corresponding compressed segments. During this time scans continue to see and read the delta store. When the Tuple Mover has finished compressing the delta store, the newly created segments are made visible and the delta store is made invisible in a single, atomic operation. New scans will see and scan the compressed format. The Tuple Mover then waits for all scans still operating in the delta store to drain after which the delta store is removed.” 

I just happened the run the row groups query during the time the tuple mover was firing, and got to see this.


Ideally, your data warehouse loads are in perfect 1,024,800 row increments, so your row groups are as large as possible, and you avoid having open delta stores. Since the world is not ideal, you will probably want to monitor for open delta stores as they can greatly impact performance. Additionally, if your row groups get too “fragmented” (I use that word hesitantly—it’s not really fragmentation, it’s just a large number of row groups with less than optimal row counts) you may want to consider rebuilding the columnstore periodically—or even better the partition that it’s in.

SQL Server 2014 Columnstore Index Update Internals

I have this bad habit of wanting to know how things actually work that often leads me down rabbit holes. I’ve taken apart way too many mechanical things in my lifetime, and mostly have gotten them back together (though if you’re a cyclist—I never recommend taking apart a Shimano STI lever). Anyway, in SQL Server 2014, we now have the ability to update a columnstore index.

Given the highly compressed nature, the CPU cost of making a direct update to the columnstore index (which is stored as lobs made up of segments) would be extremely high. So in order to facilitate updates and inserts, SQL Server uses the concept of a delta store. This delta store is a b-tree (row oriented) structure where inserts and updates are stored until they reach a threshold of approximately 1000 rows 1048576 rows (thanks Jimmy May for catching–not sure where I got a 1000 rows from). Then when memory is available—they are moved into the columnstore. Let’s take a look at the process in detail.

Here we update a table with a columnstore index:

For the moment this data gets put into a delta store—which we can see from the following query:

The delta_store_hobt_id helps us here. Next I’m going to search sys.partitions for this hobt_id. Nothing

I did some more chasing, but I’ll spare you those details. I ended up at sys.syscsrowgroups—don’t look for it. It’s in Resource DB—I told your this was a rabbit hole. That didn’t really help—I then went sys.dm_db_database_page_allocations. There we can see the IN_ROW_DATA, that is where our delta store pages reside.

From there—we can run DBCC page to look at the contents of the page:

PageID 608240 is the delta store page—as you see, we can still see real data. If we look at the columnstore compressed page (522712), we see a bunch of gibberish on the page, indicative of compression (and the fact that it’s a lob now)

My VM was being fussy, so I forced the tuple mover (the process which moves data from the delta store into the columnstore) by doing an index rebuild.

Since there are no page allocations that aren’t of the type LOB_DATA all of our data has been compressed and added to the columnstore.

So that’s the path our data takes. Many thanks to Remus Rusanu (b|t) for both helping write this new feature and documenting it.

My Favorite New Feature in SQL Server 2014

Happy New Year readers, I was slack in blogging over the holidays, but I’m back and ready to get going with 2014. Last night, our user group featured a speaker gave and overview of all of the new features in SQL Server 2014 (which I think will arrive in the first half of this year) and there are some really things:

  • In-Memory OLTP (Formerly known as Hekaton)
  • New Cardinality Estimator
  • Security Enhancements (new roles for better separation of duties)
  • Updateable and clustered columnstore indexes (I’ve blogged and presented on these)
  • Enhancements to AlwaysOn Availability Groups
  • Better integration with Windows Azure services

All of these are really nice features—in particular I really like the option of being able to use Windows Azure for offsite backup storage and even a disaster recovery (DR) site in a number of different SQL Server DR scenarios. However, one thing stands out, and it is even in the standard edition of SQL Server 2014—Backup Encryption.

Due to their nature database backups tend to travel a lot of places, and pass through a lot of hands. In some shops, you may be backing up to tape, and then shipping that tape off to a 3rd party (like Iron Mountain). Even though, young Timmy who unloads the tape library seems trustworthy—do you know what he’s doing with those tapes? Or if he made a copy of them—and sold a copy of our customer database to the Russian mafia? You might want to check to see how Timmy paid for his new M5.

SQL Server somewhat fixed this Transparent Data Encryption (TDE) being included in SQL 2008—if we turn on TDE, by default our backups are encrypted, along with the performance hit that entails, and the fact that it requires enterprise edition. TDE is a nice feature—I’ve used it where I had high security requirements, but you may still have sensitive data, that either can’t support the performance hit of encryption or isn’t of the level of sensitivity where it needs to be encrypted. Or maybe your company won’t spring for Enterprise Edition.

In SQL 2014, things get a lot simple in terms of encrypting backups. First let’s take a look at a backup file (unencrypted). So we create a database, add a table, and take a backup.


If Timmy’s Russian mafia friends had a copy of this backup they have a couple of options—they can restore it to another instance of SQL Server, or if they are looking for a specific of piece of data, they could fire up the old text or hex editor (Hey look—it’s John Smith’s SSN).


So let’s see how we do this encryption thing. First we need to create a cert (NOTE: PROTECT THIS CERT WITH YOUR LIFE AND JOB—YOU CAN’T RESTORE WITHOUT IT)


Once our certificate is created we can either backup using the WITH ENCRYPTION option, or through backup options in the GUI.


Let’s try that hex editor again:


Since the file is encrypted, everything is a mash of gibberish. More importantly, no one can take our backup file and restore it to another database server. This is a very good thing for us. However, when Timmy starts sending the Russian mob backups they can’t do anything with, he may not be long for this world. So start looking for a new backup technician when you turn this feature on.


My Blog–2013 in review

The stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 29,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 11 sold-out performances for that many people to see it.

Click here to see the complete report.


Get every new post delivered to your Inbox.

Join 1,636 other followers

%d bloggers like this: