arapaima.uk Data, Databases, Delivery

Using Vagrant to create a Virtual Machine running SQL Server on CentOS Linux

In my mind, the ability to do this kind of thing is the really big “win” with SQL Server on Linux. In their own words,

Vagrant is a tool for building and managing virtual machine environments in a single workflow. With an easy-to-use workflow and focus on automation, Vagrant lowers development environment setup time, increases production parity, and makes the “works on my machine” excuse a relic of the past.

I’ve been doing presentations at conferences and meetups for a few years, so I’m familiar with the pain of setting up “Demo VMs” for SQL Server on Windows. I also have less than fond memories of setting up and maintaining multiple VMs on a single host to reproduce client problems with clustering, replication, etc.; fortunately I don’t tend to get involved with this kind of thing too much these days, so it’s not in the scope of this example.

Anyway, with the advent of SQL Server on Linux, it struck me that Vagrant, which I’d used for some other stuff in the “day job” might be a way out of this spiral of despair.

The promise is that having set up a couple of config files, we can stand up a new VM with everything configured and in the right place, merely by typing vagrant up at a command prompt, and remove it again by typing vagrant destroy. If we want to keep it around for another day, vagrant halt is at our disposal. We can even ssh to our new VM by typing vagrant ssh, without any fiddling around with keys or passwords.

Preamble

Yes, I’m aware it’s possible to manage Windows guests with Vagrant, even from a Windows host, but my experience of trying this is that it’s a world of pain, not least because the Windows box files are 800lb gorillas. The most complete attempt I’ve seen has been by chocolately creator Rob Reynolds, who I suppose has (or had) some fairly unique needs in this area.

My own use case is further complicated by the fact that I’ve been using Fedora Linux on the desktop for the last few years, which doesn’t really support Virtualbox - though I’m aware it does mostly work - but prefers kvm-based virtual machines which can be managed through a number of utilities, including the supposedly idiot-proof Gnome Boxes. There are some slightly outdated results from Phoronix that suggest there may be some performance gains from using KVM as opposed to Virtualbox.

Anyway, on with the show…

There are three supported platforms (plus Docker) listed on the installation guide for SQL Server on Linux, namely Ubuntu, Suse, and RedHat.

Given that, I decided to proceed with the creation of a Centos VM, given that it’s “quite like” RedHat, and also has a vendor-created image in the Vagrant Catalog that supports libvirt (almost all boxes, including the Ubuntu ones, are VirtualBox only). Suse also has vendor images for libvirt, but I haven’t used Suse Linux since the Spice Girls were topping the charts and wasn’t inclined to investigate further.

The Vagrantfile

The Vagrantfile is a file called Vagrantfile that is used to tell Vagrant what the VM you want to create needs to look like.

In this case it’s fairly simple, but I’ve left the autogenerated comments in the version in the github repo that accompanies this post (they’re useful if, as is almost certain, you want to reproduce this on a slighly different platform to me, i.e. Virtualbox!), so I’ve extracted the highlights to a gist here.

We specify the “base box”, in this case Centos 7 on which our VM is going to be based. After that, we just specify the ways in which our VM is going to diverge from this base. In our case, we are setting up a forwarded port for 1433, so that clients on the host can connect to the SQL Server as if it were “local”.

We’re increasing the default memory allocation to the VM from the default (which I think is 512MB) to 4GB - the minimum requirement for SQL Server on Linux is 3.25, and I have read that the installer will fail if you have less than this available.

After that, we specify a shell script to run to set up everything else; In Vagrant-speak this is known as a “provisioner”.

Vagrant doesn’t have to use shell scripts for configuration, it supports a number of alternative provisioners, such as the usual suspects of Ansible, Chef, and Puppet, in addition to a couple of others.

The advantage of these latter approaches, of course, is that they are idempotent, making the scripts easy to build up over time; I intend to revisit this example, probably with Ansible, once SQL Server on Linux is generally available and the installation procedure is a bit less “dynamic”.

The shell script fetches updates with yum (for those more accustomed to Debian-derivatives such as Ubuntu this is like apt), then adds the Microsoft repository definitions to yum’s configuration. It also installs a package called tcping from another repo called epel (which stands for Extra Packages for Enterprise Linux), which we’re going to need in our script. Having got everything downloaded, we install the SQL Server client and server, which are separate packages.

Not only are they separate packages, they require different mechanisms for accepting the terms of the EULA; one requires an environment variable ACCEPT_EULA=y, and the other requires a parameter accept-eula!

We also pass the top-secret sa password as an environment variable; this is required to run the installation silently.

Having done all that, we wait for the service to start before proceeding. This is why tcping was required, it’s among the simplest ways to figure out if there’s anything listening on a given port.

Finally, we restore a database from a backup and run a script to install the tSQLt unit testing framework. By default, Vagrant will rsync the folder containing the Vagrantfile to a /Vagrant folder inside our VM, so we can simply put any files we need for provisioning (or for anything else) inside this folder. In this case, I’ve extracted the contents of the tSQLt.zip from the tSQLt downloads page.

Originally, I had the database backup copied from the original repo here too, but had to replace this with a call to curl; I’m glad to say I’d never run into the Github file size limit before! Obviously this makes the re-provisioning process a bit slower, it’s best to have these large file dependencies somewhere local if you can manage it.

What we didn’t do

There’s nothing in the (abbreviated) Vagrantfile about networking (other than the forwarded port), storage, logins, cores, etc. etc. All these things are configurable, but the point of this approach is that we trust Vagrant to “do the right thing” unless we specify otherwise.

The Moment of Truth

vagrant up

Profit

vscode on fedora connected to linux on centos in vagrant!

To be clear, this is a picture of Visual Studio Code, running on Fedora Linux 25 (the host), connected to SQL Server running on Centos 7.3 (the guest) in a virtual machine provisioned by vagrant. The mssql extension for Visual Studio Code is a very recent alpha; this was to do with .net core compatibility on “modern” versions of Linux.

The VM is a “regular” VM, you can see it here in the Fedora Virtual Machine Manager application with the console open showing the SQL Server process:

vagrant vm opened in virt-manager showing the sql server process

The Wrap

I was slightly surprised that all this worked as well as it did; not so much the Vagrant part as the SQL Server on Linux part, which is certainly more complete than when I last looked at it. In particular, I wasn’t expecting to be able to install tSQLt on Linux - I did have to make a change to the SetClrEnabled.sql script that is distributed with tSQLt to turn off clr strict security, but apart from that it all went pretty smoothly. I have a presentation about tSQLt to do next month, which was one of the motivations for this exercise, and I’ll certainly be setting aside some time in the next day or two to see if everything else works the way one might expect. If you want to try this out at home, and assuming your setup is roughly like mine (kvm rather than Virtualbox, vagrant already working, etc, etc):

mkdir hereGoesNothing && cd hereGoesNothing;
git clone https://github.com/gavincampbell/VagrantSqlServerCentosEtc .
vagrant up

If your setup is different in relevant ways, there will be some steps between 2 and 3 where you install things and hack away at the Vagrantfile.

Enabling per-branch configuration in a Jenkins Multibranch Pipeline

For reasons, you might want your Jenkins Multibranch Pipeline jobs to do a different thing depending on which branch is being built.

Fortunately, the multibranch plugin provides us with a built-in variable BRANCH_NAME, which we can use to figure out which branch we are currently building.

In such scenarios, it’s not a bad idea to create a minimal Jenkinsfile at the repo root that contains just enough logic to figure out which branch we are on, and then call another groovy script that contains the actual build definition:

In this script we load an external groovy file based on the current branch, and then call the function defined therein. In this particular case, the run_build() functions don’t do anything particularly exciting, but they probably do enough to demonstrate this mechanism.

The script for the master branch:

The script for any other branch:

The most important line in each of these scripts is the return this at the end; as documented, this is required for the functions to be callable from the outer script. The checkout scm step in the root Jenkinsfile is also required, as without it the rest of the scripts won’t be fetched from the repo. All three of these files are in the root of the repo, this is because gist doesn’t support folders. In “real life”, it’s probably a good idea to create a separate folder for these scripts, and provide the path to the load function.

Trying this out at home

If you don’t have a Jenkins server handy, you can create a new one with docker run -p 8080:8080 jenkinsci/jenkins:lts1 and connect to it on http://localhost:8080. After that, it’s just a question of creating a new job of type “Mutibranch Pipeline” and specifying an appropriate name.

In the “Branch Sources” section, add a source of type “Git” (not “Github”!), and provide it with the path to the source repo, in this case the gist.

Using a gist as a source repo?

It might be a lesser known fact about Github that gists are really just repos with a few extra rules that can be forked and cloned like any other Github repo. What it doesn’t say in that linked page is that you can also create branches, even if these new branches aren’t visible in the Gist UI. So, if you create a fork of https://gist.github.com/gavincampbell/480843552e43efa84c60f9bb4840d6c1, you should end up with an identical gist in your own Github account. You can then paste the url of this gist into the Jenkins job definition as shown:

Gist url in git scm step

The other thing to note here is the checkbox for “Scan Multibranch Pipeline Triggers”; since we aren’t configuring any push notifications from our git repo, we need to get Jenkins to scan the repo periodically to look for any new branches, or new commits in existing branches. (In my particular case, since the Jenkins instance is just an ephemeral Docker image, there’s no route to it anyway.)

When you save the job configuration, Jenkins will scan the source repo, and create the first pipeline job:

pipeline job for master branch

Creating a new branch

As noted above, there’s nothing in the gist UI to support creating new branches. However, if we clone the gist repo to a local folder we can create the branch and push it back to the origin. You can substitute the url to your own fork of the gist in place of mine (there’s only one branch in mine, and you don’t have permission to create more!). The downside of this approach of using a gist rather than a “proper” repo is that by default you get a long guid instead of a readable folder name.

$ mkdir j && cd j;
$ git clone https://gist.github.com/gavincampbell/480843552e43efa84c60f9bb4840d6c1
Cloning into '480843552e43efa84c60f9bb4840d6c1'...
remote: Counting objects: 32, done.
remote: Compressing objects: 100% (32/32), done.
remote: Total 32 (delta 7), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (32/32), done.
Checking connectivity... done.
$ cd 480843552e43efa84c60f9bb4840d6c1/
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
$ git checkout -b fancyFeature
Switched to a new branch 'fancyFeature'
$ git push --set-upstream origin fancyFeature

Total 0 (delta 0), reused 0 (delta 0)
To https://gist.github.com/gavincampbell/480843552e43efa84c60f9bb4840d6c1
 * [new branch]      fancyFeature -> fancyFeature
Branch fancyFeature set up to track remote branch fancyFeature from origin.

The next time the repo is scanned, the new job will be created in Jenkins:

job created for new branch

When the branch is deleted, the job wil disappear too:

$ git push origin --delete fancyFeature

To https://gist.github.com/gavincampbell/480843552e43efa84c60f9bb4840d6c1
 - [deleted]         fancyFeature

Looking in the “Multibranch Pipeline Log” confirms this:

[Thu May 25 22:40:36 UTC 2017] Finished branch indexing. Indexing took 1.7 sec
Evaluating orphaned items in jenkins-branch-conditions
Will remove fancyFeature as it is #1 in the list
Finished: SUCCESS

  1. Depending on your system configuration, this step may have one or more prerequisites! [return]

Automating SSDT build and deployment with Jenkins from a local git repo

This is a short illustration of using a local installation of Jenkins on Windows to build an SSDT project from a local git repo and deploy it to a SQL Server on the same machine.

This is probably useful for a quick demonstration or to understand how the various moving parts fit together, but possibly less applicable to “Real Life” production environments.

There are no build agents and no git remotes; all the action takes place on the Jenkins master, and the git repo is local to the same machine. No “git management” software such as GitHub/VSTS/GitLab/BitBucket/etc is involved (except for some slight cheating regarding the build definition itself.)

Getting set up

I am using Windows 10 with SQL Server 2016 Developer Edition and Visual Studio 2015 Professional. Most of this stuff should work with most editions and versions of SQL Server and Visual Studio, but some of the paths will vary here and there. I am using SQL Server Data Tools Version 14.0.61021.0, released on October 26, 2016.

Installing Jenkins

I used the windows installer linked here (note that this link will trigger the download immediately). At the time of writing this is version 2.46.1. In the initial setup, I selected the option to “Install Recommended Plugins”; this installs more than is required for this example.

Further Jenkins twiddling

The MSBuild plugin needs to be installed separately (as it isn’t “recommended”!), which can be done from http://localhost:8080/pluginManager/available or by clicking through Jenkins -> Manage Jenkins -> Manage Plugins -> Available. We need to tell Jenkins where it can find MSBuild.exe, this is done on http://localhost:8080/configureTools/ (Jenkins -> Manage Jenkins -> Global Tool Configuration). You can find out where MSBuild is installled by opening the “Developer Command Prompt for Visual Studio” and typing

C:\Program Files (x86)\Microsoft Visual Studio 14.0>where msbuild
C:\Program Files (x86)\MSBuild\14.0\Bin\MSBuild.exe
C:\Windows\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe

I used the first of these, as this is the one installed by Visual Studio1 2015, and should avoid having to re-read the somewhat convoluted StackOverflow thread here.

I already had git installed and in the path, so there was no need to configure this separately in Jenkins.

$ git --version
git version 2.11.0.windows.3

Compromising your SQL Server 2

By default, the Jenkins service will run as “Local System” on Windows. In order to allow this account - which will authenticate to SQL Server as the machine account - to deploy the database, I made “Local System” an sa of the SQL Server:

EXEC sp_addsrvrolemember 'NT AUTHORITY\SYSTEM', 'sysadmin';

There are many circumstances in which this wouldn’t be a good idea, if any of them apply in your scenario you should make appropriate adjustments to the scripts that follow.

Find something to build

I started by cloning a repo I had prepared earlier, containing a copy of the Chinook database.

C:\Projects>git clone https://github.com/arapaima-uk/Chinook.JenkinsDemo.git
Cloning into 'Chinook.JenkinsDemo'...
remote: Counting objects: 26, done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 26 (delta 5), reused 26 (delta 5), pack-reused 0
Unpacking objects: 100% (26/26), done.

C:\Projects>tree
Folder PATH listing
Volume serial number is 000000B7 806E:E890
C:.
└───Chinook.JenkinsDemo
    └───Chinook.JenkinsDemo
        └───dbo
            └───Tables

C:\Projects>

having done that, I then removed the reference to the GitHub remote with

C:\Projects\Chinook.JenkinsDemo>git remote rm origin

From here on, everything is local (geddit?).

Creating the Jenkins job

The first step is to create the Jenkins job that will build our project into a dacpac, and deploy it to a local SQL Server.

The job definition is contained in the following Jenkinsfile, which consists of three fairly self-explanatory stages. The first stage git checkout checks out our master branch from the local (indicated by the file:\\ prefix) repo. The second stage calls the MSBuild tool we defined earlier, taking advantage of the fact that our project is very simple to provide very few parameters on the command line. The third calls out to the shell to call sqlpackage.exe, again with very few parameters. This last stage is the one that requires whatever account Jenkins is running under to be able to authenticate to the SQL Server (though there are alternatives involving storing credentials in Jenkins).

Conveniently, since this is stored in a GitHub gist, and a gist is a git repo, I can create a new pipeline job and specify “Pipeline Script from SCM” in the build definition then provide the url of the gist for the repository url.

Note that this repo contains a single file, namely the build definition - all the project files are coming from the git repo on our local machine. If you want to copy and paste the complete url, it’s in the build output log reproduced below.

Screenshot of Jenkins showing pipeline from SCM

One more detail is that the job needs to be set to “Poll SCM”, but the schedule can be left empty (ignore the warning). This isn’t required to test the build, but will be required later on to trigger the build on a commit to our git repo.

Jenkins Pipeline Job showing Poll SCM Setting with empty schedule

Testing the build

We should now be able to trigger the build from the Jenkins dashboard, and see some output like the following (under “Console Output”)

Started by user arapaima
Obtained Jenkinsfile from git https://gist.github.com/gavincampbell/e3dfa25abf427752e919eb2110f03852
[Pipeline] node
Running on master in C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo
[Pipeline] {
[Pipeline] stage
[Pipeline] { (git checkout)
[Pipeline] git
 > git.exe rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git.exe config remote.origin.url file:///C:/Projects/Chinook.JenkinsDemo # timeout=10
Fetching upstream changes from file:///C:/Projects/Chinook.JenkinsDemo
 > git.exe --version # timeout=10
 > git.exe fetch --tags --progress file:///C:/Projects/Chinook.JenkinsDemo +refs/heads/*:refs/remotes/origin/*
 > git.exe rev-parse "refs/remotes/origin/master^{commit}" # timeout=10
 > git.exe rev-parse "refs/remotes/origin/origin/master^{commit}" # timeout=10
Checking out Revision 8ccbac95d2edd4ce0cbf14ec9f5f3f7ac2868eac (refs/remotes/origin/master)
 > git.exe config core.sparsecheckout # timeout=10
 > git.exe checkout -f 8ccbac95d2edd4ce0cbf14ec9f5f3f7ac2868eac
 > git.exe branch -a -v --no-abbrev # timeout=10
 > git.exe branch -D master # timeout=10
 > git.exe checkout -b master 8ccbac95d2edd4ce0cbf14ec9f5f3f7ac2868eac
 > git.exe rev-list 8ccbac95d2edd4ce0cbf14ec9f5f3f7ac2868eac # timeout=10
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Build Dacpac from SQLProj)
[Pipeline] tool
[Pipeline] bat
[BuildDeploySsdtFromLocalRepo] Running batch script

C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo>"C:\Program Files (x86)\MSBuild\14.0\Bin\MSBuild.exe"  /p:Configuration=Release 
Microsoft (R) Build Engine version 14.0.25420.1
Copyright (C) Microsoft Corporation. All rights reserved.

Building the projects in this solution one at a time. To enable parallel build, please add the "/m" switch.
Build started 05/04/2017 00:55:50.
Project "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo.sln" on node 1 (default targets).
ValidateSolutionConfiguration:
  Building solution configuration "Release|Any CPU".
Project "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo.sln" (1) is building "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo\Chinook.JenkinsDemo.sqlproj" (2) on node 1 (default targets).
GenerateSqlTargetFrameworkMoniker:
Skipping target "GenerateSqlTargetFrameworkMoniker" because all output files are up-to-date with respect to the input files.
CoreCompile:
Skipping target "CoreCompile" because all output files are up-to-date with respect to the input files.
SqlBuild:
Skipping target "SqlBuild" because all output files are up-to-date with respect to the input files.
CopyFilesToOutputDirectory:
  Chinook.JenkinsDemo -> C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo\bin\Release\Chinook.JenkinsDemo.dll
SqlPrepareForRun:
  Chinook.JenkinsDemo -> C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo\bin\Release\Chinook.JenkinsDemo.dacpac
Done Building Project "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo\Chinook.JenkinsDemo.sqlproj" (default targets).
Done Building Project "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo.sln" (default targets).

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:00.53
[Pipeline] stash
Stashed 1 file(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Deploy Dacpac to SQL Server)
[Pipeline] unstash
[Pipeline] bat
[BuildDeploySsdtFromLocalRepo] Running batch script

C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo>"C:\Program Files (x86)\Microsoft SQL Server\130\DAC\bin\sqlpackage.exe" /Action:Publish /SourceFile:"Chinook.JenkinsDemo\bin\Release\Chinook.JenkinsDemo.dacpac" /TargetServerName:(local) /TargetDatabaseName:Chinook 
Publishing to database 'Chinook' on server '(local)'.
Initializing deployment (Start)
Initializing deployment (Complete)
Analyzing deployment plan (Start)
Analyzing deployment plan (Complete)
Updating database (Start)
Update complete.
Updating database (Complete)
Successfully published database.
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS

The “stage view” of the build should be showing green:

passing builds in Jenkins

Triggering the build from the git repo

Many approaches to building a project automatically on each commit rely on the CI server, in this case Jenkins, polling the source control system at some pre-defined interval.

However, if the geo-political events of 2016 are any guide, polling doesn’t always lead to the outcome we expect, so in this case we’ll create a git “hook” to trigger the build on every commit to master.

A git hook is a script that runs in response to a specific event that occurs in git. There are a bunch of events that can fire hooks, these can be used to enforce commit message conventions, run static code analysis, prevent force-pushes, and all kinds of other arbitrary busywork. The event that’s of interest here is the post-commit hook, which runs after a commit has been added. In “Real Life”, hooks are normally added to the server copy of the repo but since there’s no remote in our setup, everything has to be local.

Notably, even though we’re on Windows in this example, hooks are executed by the msysgit subsystem so much of the usual UNIX shell stuff will work.

Hooks are stored in the .git folder in our repo (this is usually hidden on Windows) in files that match the name of the hook. So, our file is called post-commit (no extension) and contains the #!/bin/sh “shebang” plus a single line that uses curl to hit a specific URL in the Jenkins server that will cause our Jenkins job to poll its source repo as defined in the build pipeline (i.e. file:///C:/Projects/Chinook.JenkinsDemo )

C:\Projects\Chinook.JenkinsDemo\.git\hooks>type post-commit
#!/bin/sh
curl http://localhost:8080/git/notifyCommit?url=file:///C:/Projects/Chinook.JenkinsDemo

Note for git beginners: this file isn’t in the repo when you download it, you’ll need to add it yourself!

Now, all that remains is to make an arbitrary change to our project and add a commit.

Visual Studio showing changed table and commit dialog

If everything is working the pipeline job should be triggered, and the new build will also log that it was “Started by an SCM Change”.

Build Triggered by SCM Change

To clarify what’s happened here, we made a change and committed it to our local repo, the post-commit hook called the url in Jenkins, and Jenkins then “polled” the git repo, found a new commit, and triggered the pipeline job.

Until next time, happy triggering!

If you commit the change from the git command line, you may see output along the lines of:

$ git commit -am "add shoesize to artist table"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   135  100   135    0     0   9000      0 --:--:-- --:--:-- --:--:--  9000Scheduled polling of BuildDeploySsdtFromLocalRepo
No Git consumers using SCM API plugin for: file:///C:/Projects/Chinook.JenkinsDemo

The “error” message is nothing to worry about, as long as the line “Scheduled Polling of myJobName is present then everything is hunky-dory!

Recovering from the "La-La-Land Moment"

How SSDT can help with restoring a SQL Server database to “just before that last deployment”

For as long as I can remember, SSDT and its predecessors have had the option to “Back up database before deployment”, currently available in the “Advanced Publish Settings” dialog, among other places. Regrettably, I’ve never really had much use for this particular option. Whilst restoring from backup might be a valid strategy for recovering from some kinds of deployment disaster, this could add a great deal of time to the deployment process, assuming a database of non-trivial size.

In any case, there are really only two kinds of database - the ones where you don’t care about the data, such as development and test environments, and the ones (one?) where you do. In the former case, backups don’t matter anyway, and in the latter case it is to be hoped that there is already some kind of backup solution in place, ideally managed by someone who isn’t you.

Further, the “Backup database before deployment” option appears not to offer much control over how the database is backed up. The backup will be created in the default backup directory of the instance, and will be a full rather than a copy only backup, which has the potential to endanger the “real” disaster recovery plans in the event that these include differential database backups, as the SSDT-generated backup will reset the differential base, meaning this backup, which the DBAs don’t even know about, will be necessary, along with the subsequent differentials, in case the database needs to be restored.

It’s likely that the “real” backup strategy will involve some combination of full, differential, and transaction log backups, and if you’re especially lucky will be managed by some kind of “Enterprise Backup Solution”, which will allow rapid restores to any point in time, just as soon as you figure out where it is that the “Enterprise Backup Solution” keeps the backup files for your database.

So, assuming the worst case scenario has arisen, your newly deployed release has been writing rubbish data to the database for the last few hours, the “war room” has been convened and someone you’ve never heard of called the “Problem Management Executive Consultant” has decided that the right course of action is to “Restore the database to immediately before the deployment”, how are you to decide when this was?

Well, hopefully you know from your release logs what time the database was deployed, but what if there was a better way? And what if you aren’t sure what timezone the deployment server is in or whether it uses daylight savings time? Well, one solution to this involves a somewhat neglected feature of SQL Server known as Marked Transactions. (See, it doesn’t even get its own page in MSDN!)

Marked Transactions

The syntax BEGIN TRANSACTION tran-name WITH MARK 'Mark Description' will record the name of the transaction in the transaction log, along with the date, time, LSN, etc. The description is optional, but that gets saved too.

We can see this in action using the pubs database, which if you’re younger than a certain age, sadly isn’t what you think.

Note that if you’re playing along at home, there are a couple of extra considerations; the database needs to be using the full (or bulk-logged) recovery model and a full backup must already have been taken, i.e. the database is not in the pseudo-simple recovery model.

USE pubs;
GO

BEGIN TRANSACTION update_auths WITH MARK 'update authors entry'
UPDATE authors set 
phone = '408 496-7223' WHERE au_id = '172-32-1176'
COMMIT TRANSACTION

If we take a peek at the transaction log we can see our marked transaction there:

SELECT  [Transaction ID] ,
        [Current LSN] ,
        [Transaction Name] ,
        [Operation] ,
        Description ,
        [Transaction SID] ,
        sp.name
FROM    fn_dblog(NULL, NULL) AS f
        JOIN sys.server_principals sp ON f.[Transaction SID] = sp.sid
WHERE   [transaction Name] = 'update_auths'
Transaction ID Current LSN Transaction Name Operation Description Transaction SID name
0000:000004ee 00000022:000005f2:0001 update_auths LOP_BEGIN_XACT 2017/03/16 18:09:58:327;update_auths;0x0105000000000005150000004cca9a3fa9173a6eba0c5dc9e9030000 0x0105000000000005150000004CCA9A3FA9173A6EBA0C5DC9E9030000 ARAPAIMA\Arapaima

The logmarkhistory table in msdb also stores a list of our marked transactions:

SELECT * FROM msdb..logmarkhistory
database_name mark_name description user_name lsn mark_time
pubs update_auths update authors entry ARAPAIMA\Arapaima 34000000152200003 2017-03-16 18:09:58.327

So, it seems we can use this functionality to give us a named point in the transaction log to restore to in the event of career-threatening disaster.

The thing to note is that we need to update something in the database where we’re creating the mark, but it doesn’t really matter what we update. It’s often useful to have a Releases table in our databases where we can store the version number, date, and other fun facts about our database, so this is as good a candidate as any.

CREATE TABLE [dbo].[Releases]
(
        ReleaseVersion VARCHAR(20) NOT NULL PRIMARY KEY,
        DateDeployed DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME() 
        --Not taking any chances with the time zone!

)

We need to create this table and add it to our database project.

I’ve created a proc to do the inserting:

CREATE PROCEDURE [dbo].[spFireInTheHole]
	@ReleaseVersion varchar(20)
AS
BEGIN TRANSACTION @ReleaseVersion WITH MARK @ReleaseVersion 
  INSERT INTO dbo.Releases(ReleaseVersion) VALUES (@ReleaseVersion);
COMMIT TRANSACTION
RETURN 0

And now for the tricky part…

The first time we deploy the database project after adding these objects, they won’t be present in the target database, so we won’t be able to call them from the pre-deployment script. This means we need to wrap them with IF EXISTS in the pre-deployment script so that they don’t get called if they don’t exist. The drawback of this approach, naturally, is that “Release 0.0.1” won’t be recorded in your database for posterity. In my experience of such matters, the gap between Release 0.0.1 and Release 0.0.2 is normally measured in minutes rather than years, so I’m not particularly concerned about this.

Finally, we are going to use a sqlcmd variable to hold the release number; this means we can pass it in at deployment time. Most deployment tools know how to do this, for this example we’ll just pass it on the command line.

The pre-deployment script

/*
 Pre-Deployment Script Template							
--------------------------------------------------------------------------------------
 This file contains SQL statements that will be executed before the build script.	
 Use SQLCMD syntax to include a file in the pre-deployment script.			
 Example:      :r .\myfile.sql								
 Use SQLCMD syntax to reference a variable in the pre-deployment script.		
 Example:      :setvar TableName MyTable							
               SELECT * FROM [$(TableName)]					
--------------------------------------------------------------------------------------
*/

IF EXISTS(SELECT * FROM sys.tables where name = 'Releases') 
AND 
EXISTS (SELECT * FROM sys.procedures WHERE NAME = 'spFireInTheHole')
BEGIN
	
        EXEC spFireInTheHole @ReleaseVersion = '$(ReleaseVersion)';
END

Now, after we build our dacpac, we supply the version number at deploy time:

sqlpackage.exe /Action:Publish /SourceFile:pubs.dacpac /TargetServerName:(local) /TargetDatabaseName:pubs /v:ReleaseVersion=0.2.0

If we run a few “Releases”, we can see the marks building up in msdb..logmarkhistory:

database name mark name description user name lsn mark_time
pubs 0.0.2 0.0.2 ARAPAIMA\Arapaima 34000000172100002 2017-03-16 20:06:40.490
pubs 0.0.3 0.0.3 ARAPAIMA\Arapaima 34000000172200003 2017-03-16 20:07:14.673
pubs 0.0.3 0.0.3 ARAPAIMA\Arapaima 34000000172300002 2017-03-16 20:08:16.183
pubs 0.0.4 0.0.4 ARAPAIMA\Arapaima 34000000172400003 2017-03-16 20:08:39.760

In this particular setup, you’ll get an error if you try to supply the same version number more than once, as ReleaseVersion is the primary key of the Releases table.

Disaster Strikes

So, the career-threatening disaster has happened, and we need to restore our database to its state immediately prior to release 0.6.0. If there hasn’t been a log backup since the deployment, the first step is to make one.

Then, with a bit of DBA magic, we can restore our database to just before the deployment started. I generated these scripts from SQL Server Management Studio, unless you’ve got a competent adult handy I suggest you do the same. The crucial bit is in the last line where we specify STOPBEFOREMARK, which is exactly what we want to do.

USE [master]
BACKUP LOG [pubs] TO  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs_LogBackup_2017-03-16_21-05-56.bak' WITH NOFORMAT, NOINIT,  NAME = N'pubs_LogBackup_2017-03-16_21-05-56', NOSKIP, NOREWIND, NOUNLOAD,  NORECOVERY ,  STATS = 5
RESTORE DATABASE [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 2,  NORECOVERY,  NOUNLOAD,  STATS = 5
GO
RESTORE LOG [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 3,  NORECOVERY,  NOUNLOAD,  STATS = 10
GO
RESTORE LOG [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 4,  NORECOVERY,  NOUNLOAD,  STATS = 10
GO
RESTORE LOG [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 5,  NOUNLOAD,  STATS = 10,  STOPBEFOREMARK = N'0.6.0' AFTER N'2017-03-16T20:46:07'
GO

And with that, the database is restored to its state immediately prior to the deployment - you can check this by looking at the Releases table - , and all that is left to do is blame the guy that left last month for the “rogue code” that “crept” into the release. This is normally possible by editing old git commit messages and force pushing the master branch, but that’s a topic for another day.

A script to download and install the Pester testing framework for PowerShell

I finally got around to looking into Pester for test-driven development with PowerShell. In case it’s useful to anyone, and so that I don’t lose it, I created a script to download the latest release version, store it in the user’s PowerShell modules folder, and run a quick test to make sure everything is working. The script is stored in a GitHubGist and is reproduced below. Some of the cmdlets require a fairly recent version of PowerShell, possibly even 5.0. The steps are roughly as follows:

  • Figure out what the latest release number is.
  • Assemble the correct url to download the zip file.
  • Download and “Unblock” the zip file.
  • Figure out where the user’s modules folder is and create it if it doesn’t exist. I had to exclude the vscode modules folder for this to work for me.
  • Unzip the downloaded file to the right place.
  • Import the newly installed modules.
  • Scaffold a test with New-Fixture. The name is randomised to minimise the chance of overwriting something that’s already there.
  • Hack the generated files about a bit.
  • Run the test.
  • Clean up the generated files.
  • Profit!

The script

Testing whether a branch exists before checking out in Jenkins Pipeline

For reasons, I recently found myself in a scenario where I needed to test whether a branch existed before checking it out, and resorting to a sensible default - such as checking out master, if it didn’t. From the command line, this is a simple matter of git branch -l | grep myBranch, but I needed to do this from the context of a Jenkins pipeline job.

Preliminaries

For simplicity, I’m creating a local repo I can point my Jenkins job at, right inside the JENKINS_HOME folder.

jenkins@d99f4f77acdb:/$ cd $JENKINS_HOME
jenkins@d99f4f77acdb:~$ mkdir testBranch ; cd testBranch
jenkins@d99f4f77acdb:~/testBranch$ git init
Initialized empty Git repository in /var/jenkins_home/testBranch/.git/
jenkins@d99f4f77acdb:~/testBranch$ echo "Half a league, half a league, Half a league onward." > myFile.txt
jenkins@d99f4f77acdb:~/testBranch$ git add .
jenkins@d99f4f77acdb:~/testBranch$ git commit -m "Initial Commit"
[master (root-commit) 7ef4b3b] Initial Commit
 1 file changed, 1 insertion(+)
 create mode 100644 myFile.txt
jenkins@d99f4f77acdb:~/testBranch$ git checkout -b thisBranchExists
Switched to a new branch 'thisBranchExists'
jenkins@d99f4f77acdb:~/testBranch$ echo "All in the valley of Death Rode the six hundred" >> myFile.txt
jenkins@d99f4f77acdb:~/testBranch$ git commit -am "a change"
jenkins@d99f4f77acdb:~/testBranch$ git checkout master
Switched to branch 'master'
jenkins@d99f4f77acdb:~/testBranch$ git log --oneline --decorate --all --graph
* 550587f (thisBranchExists) a change
* 7ef4b3b (HEAD, master) Initial Commit

With that out of the way, we have a repo with two branches, one of which is a single commit ahead of the other. Our goal, in the pipeline script, is to checkout the thisBranchExists branch, if it exists, or to fall back to master if it doesn’t.

The Pipeline script

I’ve created a simple script in the Pipeline UI to demonstrate the use of resolveScm to figure out whether the branch we want exists. resolveScm will go through the contents of targets in order, and return the first one that matches a branch name in the repo. We then checkout the branch and print the contents of the file to the console with cat.

node{
def b = resolveScm source: [$class: 'GitSCMSource', credentialsId: '', 
    excludes: '', id: '_', ignoreOnPushNotifications: false, includes: '*', 
    remote: 'file://$JENKINS_HOME/testBranch'], targets: ['thisBranchExists', 'master']
checkout b
sh 'cat myFile.txt'
   
}

The Happy path

When we run this job we get the following output, indicating that our branch was located successfully. The contents of the file after the second commit are printed to the console.

Running on master in /var/jenkins_home/workspace/checkout_branch_if_exists
[Pipeline] {
[Pipeline] resolveScm
Checking for first existing branch from [thisBranchExists, master]...
 > git rev-parse --is-inside-work-tree # timeout=10
Setting origin to file://$JENKINS_HOME/testBranch
 > git config remote.origin.url file://$JENKINS_HOME/testBranch # timeout=10
Fetching & pruning origin...
Fetching upstream changes from origin
 > git --version # timeout=10
 > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* --prune
Getting remote branches...
Seen branch in repository origin/master
Seen branch in repository origin/thisBranchExists
Seen 2 remote branches
Checking branch master
Checking branch thisBranchExists
Found thisBranchExists at revision 550587fa43778989f50ca314feea38fdee50b21c
[Pipeline] checkout
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url file:///var/jenkins_home/testBranch # timeout=10
Fetching upstream changes from file:///var/jenkins_home/testBranch
 > git --version # timeout=10
 > git fetch --tags --progress file:///var/jenkins_home/testBranch +refs/heads/*:refs/remotes/origin/*
Checking out Revision 550587fa43778989f50ca314feea38fdee50b21c (thisBranchExists)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 550587fa43778989f50ca314feea38fdee50b21c
 > git rev-list 550587fa43778989f50ca314feea38fdee50b21c # timeout=10
[Pipeline] sh
[checkout_branch_if_exists] Running shell script
+ cat myFile.txt
Half a league, half a league, Half a league onward.
All in the valley of Death Rode the six hundred
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS

The “less-happy” path

Lets see what happens if we change our list of targets to ['thisBranchDoesNotExist', 'master']. This time we don’t find our “preferred” branch, so we fall back to master and print the contents of the file as it was after the first commit.

[Pipeline] node
Running on master in /var/jenkins_home/workspace/checkout_branch_if_exists
[Pipeline] {
[Pipeline] resolveScm
Checking for first existing branch from [thisBranchDoesNotExist, master]...
 > git rev-parse --is-inside-work-tree # timeout=10
Setting origin to file://$JENKINS_HOME/testBranch
 > git config remote.origin.url file://$JENKINS_HOME/testBranch # timeout=10
Fetching & pruning origin...
Fetching upstream changes from origin
 > git --version # timeout=10
 > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* --prune
Getting remote branches...
Seen branch in repository origin/master
Seen branch in repository origin/thisBranchExists
Seen 2 remote branches
Checking branch thisBranchExists
Checking branch master
Done.
Found master at revision 7ef4b3b3eaf296528e47cdf8946545b38f49fb95
[Pipeline] checkout
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url file:///var/jenkins_home/testBranch # timeout=10
Fetching upstream changes from file:///var/jenkins_home/testBranch
 > git --version # timeout=10
 > git fetch --tags --progress file:///var/jenkins_home/testBranch +refs/heads/*:refs/remotes/origin/*
Checking out Revision 7ef4b3b3eaf296528e47cdf8946545b38f49fb95 (master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7ef4b3b3eaf296528e47cdf8946545b38f49fb95
 > git rev-list 550587fa43778989f50ca314feea38fdee50b21c # timeout=10
[Pipeline] sh
[checkout_branch_if_exists] Running shell script
+ cat myFile.txt
Half a league, half a league, Half a league onward.
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS

The Unhappy path

Finally, let’s see what happens if none of the branches we specify in targets exist in the repo. Regrettably, the output is rather shorter, as the step fails if no matching branches are found.

Running on master in /var/jenkins_home/workspace/checkout_branch_if_exists
[Pipeline] {
[Pipeline] resolveScm
Checking for first existing branch from [thisBranchDoesNotExist, thisBranchDoesNotExistEither]...
 > git rev-parse --is-inside-work-tree # timeout=10
Setting origin to file://$JENKINS_HOME/testBranch
 > git config remote.origin.url file://$JENKINS_HOME/testBranch # timeout=10
Fetching & pruning origin...
Fetching upstream changes from origin
 > git --version # timeout=10
 > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* --prune
Getting remote branches...
Seen branch in repository origin/master
Seen branch in repository origin/thisBranchExists
Seen 2 remote branches
Checking branch master
Checking branch thisBranchExists
Done.
Could not find any matching branch%n
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
ERROR: Could not find any matching branch
Finished: FAILURE

What's in a name?

Continuing our horticultural theme, in this article we’ll look at the built-in support in SSDT for renaming database objects including tables, columns, and programmable objects, as well as peering into the details of how these changes are managed at deployment time.

That which we call a rose. By any other name would smell as sweet

Renaming columns

Refactoring Databases, p 109

The easy way

We can rename a column just by right-clicking in the CREATE TABLE script and selecting Refactor → Rename.

Renaming the InvoiceId Column by right-clicking in the editor window

Under normal circumstances, renaming the Primary Key column of a table such as “Invoices” would be a recipe for disaster, but SSDT can help to ease such changes by automatically updating all references to the column to use the new name. In this case we are renaming the column InvoiceId to Invoice_Id, and by specifying the option to preview the changes, we can see a list of all the objects that reference this column by its old name.

SSDT shows a preview of which objects will be updated to refer to the new name

There’s something of note here, which is that only the InvoiceId column from the Invoices table is being renamed, any other columns called InvoiceId (such as the one in the InvoiceLine table) are unaffected. The foreign key constraint on that particular column, however, is updated to use the new name of the referenced column.

What this demonstrates is that there is something more than global search and replace going on here; SSDT is using its in-memory model of the database to determine which changes need to be made1.

The refactorlog

When we click apply, two things happen. The first is that all the references to this column are updated to use the new name. The second is that a new file appears in the solution, with the extension .refactorlog.

<?xml version="1.0" encoding="utf-8"?>
<Operations Version="1.0" xmlns="http://schemas.microsoft.com/sqlserver/dac/Serialization/2012/02">
  <Operation Name="Rename Refactor" Key="209f3afd-7195-401f-853f-aa3a906d39db" ChangeDateTime="11/08/2016 20:02:31">
  <Property Name="ElementName" Value="[dbo].[Invoice].[InvoiceId]" />
  <Property Name="ElementType" Value="SqlSimpleColumn" />
  <Property Name="ParentElementName" Value="[dbo].[Invoice]" />
  <Property Name="ParentElementType" Value="SqlTable" />
  <Property Name="NewName" Value="[Invoice_Id]" />
  </Operation>
</Operations>

This file is how sqlpackage.exe (or SSDT publish, or DacFX.Deploy) will detemine at deploy time that we are renaming this column from InvoiceID to Invoice_Id rather than dropping the InvoiceID column and creating a new column called Invoice_ID. We can see in the XML that this is specifying the column and table name, and the precise action to perform. This is known, in the jargon, as “preserving the intent” of the refactoring. If we build a project containing a .refactorlog file and examine the resulting .dacpac, we can see that the regular .dacpac contents have been joined by a refactor.xml file.

$ unzip -l Refactoring.Chinook.dacpac
Archive:  Refactoring.Chinook.dacpac
  Length      Date    Time    Name
---------  ---------- -----   ----
    69711  2016-11-10 05:44   model.xml
      606  2016-11-10 05:44   refactor.xml
      203  2016-11-10 05:44   DacMetadata.xml
     1118  2016-11-10 05:44   Origin.xml
      175  2016-11-10 05:44   [Content_Types].xml
---------                     -------
    71813                     5 files

The contents of this file are the same as the .refactorlog file from our solution.

When we publish the project we see the following output:

The following operation was generated from a refactoring log file 209f3afd-7195-401f-853f-aa3a906d39db
Rename [dbo].[Invoice].[InvoiceId] to Invoice_Id
Caution: Changing any part of an object name could break scripts and stored procedures.

Altering [dbo].[InvoicesWithLineTotals]...
Altering [dbo].[UpdateInvoiceBillingAddress]...
Update complete.

The publish action has read the refactorlog file and taken the appropriate action. In addition, the “key” for this refactoring has been stored in a new table in our database called dbo._RefactorLog, which gets created the first time we deploy a dacpac containing a refactor.xml file:

-- Refactoring step to update target server with deployed transaction logs
IF NOT EXISTS (SELECT OperationKey FROM [dbo].[__RefactorLog] WHERE OperationKey = '209f3afd-7195-401f-853f-aa3a906d39db')
INSERT INTO [dbo].[__RefactorLog] (OperationKey) values ('209f3afd-7195-401f-853f-aa3a906d39db')

On subsequent deployments, this table is read and any refactorings recorded here are skipped from the deployment.

Appearances can be deceptive

There’s another UI wrinkle here which is worth examining, so we will rename another column, this time using the table designer.

Another entry has been added to the refactorlog file, and it appears as if the references to this column elsewhere in the model have been updated (note the red tick showing that PlaylistTrack.sql and InvoiceLine.sql have been modified.) Renaming a column using the table designer However, when we go to build the project, we get an error:

SQL71501: Procedure: [dbo].[ChangeTrackPriceByFactor] has an unresolved reference to object [dbo].[Track].[TrackId].	

(Alternatively, the error will appear in the SSDT UI as soon as the Intellisense catches up). There was a stored procedure referencing this column by name, which is now causing the build to fail (remember that deferred name resolution works for table names but not for column names.

This is because of a detail of how foreign keys - and primary keys, for that matter - are maintained by SQL Server itself. If we look at the contents of sys.foreign_key_columns, there isn’t a column name in sight (sys.index_columns looks much the same):

constraint_ object_id constraint_ column_id parent_ object_id parent_ column_id referenced_ object_id referenced_ column_id
1045578763 1 885578193 3 565577053 1
917578307 1 565577053 3 597577167 1
965578478 1 725577623 2 629577281 1
933578364 1 629577281 13 661577395 1

If we fix the reference in our stored procedure and go on to generate a publish script, all we see for this change is

PRINT N'The following operation was generated from a refactoring log file 8a905288-76de-4cc4-aad9-c6dddf081a17';

PRINT N'Rename [dbo].[Track].[TrackId] to Track_Id';


GO
EXECUTE sp_rename @objname = N'[dbo].[Track].[TrackId]', @newname = N'Track_Id', @objtype = N'COLUMN';


GO
PRINT N'Altering [dbo].[ChangeTrackPriceByFactor]...';


GO
ALTER PROCEDURE [dbo].[ChangeTrackPriceByFactor]
	@TrackID int,
	@Factor NUMERIC(10, 2)
AS
	UPDATE Track SET UnitPrice *= @Factor WHERE Track_Id = @TrackID;

RETURN 0
GO
-- Refactoring step to update target server with deployed transaction logs
IF NOT EXISTS (SELECT OperationKey FROM [dbo].[__RefactorLog] WHERE OperationKey = '8a905288-76de-4cc4-aad9-c6dddf081a17')
INSERT INTO [dbo].[__RefactorLog] (OperationKey) values ('8a905288-76de-4cc4-aad9-c6dddf081a17')

GO

The refactorlog is updated to include this change, but there are no changes deployed to any of the tables that referenced the TrackId column via foreign keys. This is because foreign keys - as noted above - don’t really use column names, they use object and column ids, so it is sufficient to rename the referenced column with sp_rename. Stored Procedures, triggers, and other programmable objects, however, do reference columns and tables by name - in sys.sql_modules - so these references aren’t updated automatically.

However, SSDT itself takes into consideration that when we rename a column referenced by a foreign (or primary) key, the definition required to create the constraint from scratch will need to be updated, which is why the files containing the referencing tables are all updated.

This behaviour may seem inconsistent, but it is in fact consistent with the behaviour of sp_rename itself, which is to say that constraints and indexes aren’t broken by sp_rename, but stored procedures, triggers, etc. are.

The wrong way

In contrast, renaming a column by editing the Transact-SQL file directly delivers the promised disaster, as SSDT will attempt to drop the column with the old name and create a new column with the new name.

In the best-case scenario we get a validation error that stops the project from building, assuming the renamed column is referenced by some other object in the project.

Renaming a column by editing the .sql file

In all other scenarios, the script executed at deploy time is as follows:

PRINT N'Altering [dbo].[Genre]...';

GO
ALTER TABLE [dbo].[Genre] DROP COLUMN [Name];

GO
ALTER TABLE [dbo].[Genre]
    ADD [GenreName] NVARCHAR (120) NULL;

The only safety net left at this point is the BlockOnPossibleDataLoss deploy-time option, which is enabled by default. This will stop the deployment from proceeding.

Msg 50000, Level 16, State 127, Line 48
Rows were detected. The schema update is terminating because data loss might occur.
** An error was encountered during execution of batch. Exiting.

Renaming tables

Refactoring Databases, p 113

There are two options in the refactoring context menu that are relevant to naming tables; “Rename” and “Move to Schema”. In some RDBMSs, notably Oracle, the notion of a schema is tightly coupled to the notion of a user, such that the user account in question “owns” the tables and other objects contained therein. SQL Server implemented a similar concept prior to SQL Server 2005, when the link between users and schemas was severed such that a schema became more like a namespace, or even a filesystem folder, since a schema remains a securable object. Under either analogy - namespace or folder - the name of the schema can be considered to be a part of the (qualified) name of the table, meaning that moving an object to a new schema is merely a special case of renaming.

The right way

Right-click refactor menu for a table

When we rename a table, we get the usual “refactor preview” showing the changes about to be applied to the project:

Refactor preview for renaming a table

On clicking apply, the relevant objects are updated and a new entry is inserted into the refactorlog file:

<Operation Name="Rename Refactor" Key="cce28c30-adb0-4019-876e-d93cc2ca0011" ChangeDateTime="11/12/2016 14:22:18">
   <Property Name="ElementName" Value="[dbo].[Artist]" />
   <Property Name="ElementType" Value="SqlTable" />
   <Property Name="ParentElementName" Value="[dbo]" />
   <Property Name="ParentElementType" Value="SqlSchema" />
   <Property Name="NewName" Value="[Artiste]" />
 </Operation>

The process for moving a table between schemas is similar, we are presented with a preview of the changes about to be made:

Refactor preview for move to schema

and a new entry is made in the refactorlog file.

<Operation Name="Move Schema"Key="985e03c6-37f8-48c8-8ce8-5ed37fbb7c00"ChangeDateTime="11/12/2016 14:46:31">
  <Property Name="ElementName" Value="[dbo].[Invoice]" />
  <Property Name="ElementType" Value="SqlTable" />
  <Property Name="NewSchema" Value="Sales" />
  <Property Name="IsNewSchemaExternal" Value="False" />
</Operation>

Finally, when we come to deploy our change, the table is moved to the new schema and all the referencing objects are updated:

PRINT N'The following operation was generated from a refactoring log file 985e03c6-37f8-48c8-8ce8-5ed37fbb7c00';

PRINT N'Move object [dbo].[Invoice] to different schema [Sales]';

GO
ALTER SCHEMA [Sales] TRANSFER [dbo].[Invoice];

GO
ALTER VIEW [dbo].[InvoicesWithLineTotals] 
	AS SELECT I.[Invoice_Id],
	InvoiceTotal
	FROM [Sales].Invoice AS I CROSS APPLY dbo.CalculateInvoiceTotal(I.[Invoice_Id]);
GO

The wrong way

As with columns, if we rename a table by editing the .sql file directly, we get a potentially undesirable outcome, namely that a new table will created with the new name, possibly at the expense of the current table. If we are lucky we get a validation error that stops the project from building:

Error from renaming a table in the sql file

If we are less lucky we may get some warnings (remember that deferred name resolution means that a missing table is only a warning rather than an error in a stored procedure), but at deploy time we will get a new table created with the new name, and possibly even a DROP TABLE for the existing table, assuming we have the appropriate options set. (By default, SSDT won’t drop objects from the database unless we specify “Drop objects in target but not in source”).

PRINT N'Dropping [dbo].[PlaylistTrack]...';
GO
DROP TABLE [dbo].[PlaylistTrack];
GO
PRINT N'Creating [dbo].[Playlist_Track]...';
GO
CREATE TABLE [dbo].[Playlist_Track] (
    [PlaylistId] INT NOT NULL,
    [TrackId]    INT NOT NULL,
    CONSTRAINT [PK_PlaylistTrack] PRIMARY KEY NONCLUSTERED ([PlaylistId] ASC, [TrackId] ASC)
);

Renaming Programmable Objects (Views, Functions, Stored Procedures, other miscellany in [sys.sql_modules]

Refactoring Databases, p 117

As noted in the discussion of dropping programmable objects, operations involving these objects involve substantially less risk of catastrophic data loss and consequent unemployment than similar operation involving tables and columns.

It is still important to use the Refactor → Rename and Refactor → Move to Schema techniques to rename these objects rather than directly editing the .sql files, so that an entry is written to the .refactorlog file. This will ensure that the existing object is altered rather than a new one created at the expense of the old one.

<Operation Name="Rename Refactor" Key="fc4b928d-9b00-4028-9cac-5859ba9b666c" ChangeDateTime="11/12/2016 22:50:20">
  <Property Name="ElementName" Value="[dbo].[ChangeTrackPriceByFactor]" />
  <Property Name="ElementType" Value="SqlProcedure" />
  <Property Name="ParentElementName" Value="[dbo]" />
  <Property Name="ParentElementType" Value="SqlSchema" />
  <Property Name="NewName" Value="[ChangeTrackPriceByMultiplier]" />
</Operation>

  1. It isn’t magic, this doesn’t work with dynamic SQL, for instance. [return]

Database Pruning

After a period of time in use, most databases, like many systems that have grown in an organic matter, can benefit from some judicious pruning. This can take the form of removing columns or even tables no longer required to support the application, or that store redundant - and hence possibly erroneous - copies of information stored elsewhere. Equally there may be stored procedures, functions, and even triggers that contain out-of-date versions of application logic.

The gardener's assistant; a practical and scientific exposition of the art of gardening in all its branches (1910) (14761716416).jpg

Dropping a column

Refactoring Databases, p 72

Since SSDT operates in a declarative manner, each table is defined in a CREATE TABLE script, and deleting a column is as simple as deleting the relevant line from the script. However, there are a couple of features of SSDT that are relevant here. If the column is referenced by any other database objects such as views, functions, or stored procedures, SSDT will display an error:

SSDT Broken Reference Error

The full text of the error message reads SQL71501: Procedure [dbo.UpdateInvoiceBillingAddress] has an unresolved reference to object [dbo].[Invoice].[BillingPostalCode], and the offending reference in the stored procedure1 has acquired a “red squiggly”. There’s no “builtin” refactoring action defined in the right-click context menu for deletions, presumably as it’s difficult to programmatically determine the “intent” of a deleting a column.

Refactoring Context Menu

What is possible, however, is to use the “find all references” tool before deleting the column to enable us to take appropriate action.

Find All References

Dropping a table

Refactoring Databases, p 77

This is where things get more serious. The same technique of using the “Find All References” tool to assess the damage we’re about to do applies here, but there is a subtle difference in what happens when we actually brandish the pruning shears. It turns out that removing an entire table referenced by a stored procedure is only worthy of a warning or five rather than an error. This is due to an oddity of deferred name resolution for stored procedures, namely that it is permitted to reference a non-existent table in the text of a stored procedure, but not permitted to reference a non-existent column in a table that does exist.

Removing a table is only a warning

Interestingly, if we convert our stored procedure to a user-defined function 2 the warnings turn into errors. It turns out deferred name resolution doesn’t work at all for functions, presumably as being “purely functional” and not side-effecting it is less likely that there will be objects that exist at runtime but not at creation time.

Functions don't support deferred name resolution

Dropping Programmable Objects (Views, Functions, Stored Procedures, other miscellany in sys.sql_modules)

Refactoring Databases, p 79

On the face of it this is simpler, as there is no data being thrown out with the bathwater.

Dropping Triggers

You should do this without hesitation. It’s 2016.

Dropping views and functions

Generally, this is unproblematic. Since these don’t support deferred name resolution, you can get error checking in SSDT.

Name checking in views

Pulling the trigger3

There’s one more thing to consider, which is what happens when we come to deploy our changes. Whether we do this by publishing from Visual Studio or at the command line using sqlpackage.exe, if we are dropping a table or a column that contains data we will get an error along the lines of

(48,1): SQL72014: .Net SqlClient Data Provider: Msg 50000, Level 16, State 127, Line 6 Rows were detected. The schema update is terminating because data loss might occur.
(43,0): SQL72045: Script execution error.  The executed script:
IF EXISTS (SELECT TOP 1 1
           FROM   [dbo].[PlaylistTrack])
    RAISERROR (N'Rows were detected. The schema update is terminating because data loss might occur.', 16, 127)
        WITH NOWAIT;

The IF EXISTS check gets inserted into the deployment script for every table, and every table containing a column, that is being dropped. It is worth noting that since the check is for EXISTS (SELECT TOP 1 1), this check will fail and the deployment will be blocked even if we are dropping a column that only contains NULL values - I have found this to be mildly irritating in the past, particularly for “inadvertently” created columns.

To inhibit this behaviour and allow our potentially destructive change to proceed, we need to specify this at deploy time, the mechanism for which depends on the method we are using to deploy our project.

  • For projects deployed using the Visual Studio “Publish” dialog, uncheck “Block Incremental Deployment if Data Loss might Occur” in the “Advanced Publish Settings” dialog available by clicking “Advanced” in the “Publish” dialog.

  • For projects deployed using sqlpackage.exe, we need to specify the parameter /p:BlockOnPossibleDataLoss=False

  • For projects that use a publish profile to specify deployment options, we need to add the element <BlockOnPossibleDataLoss>False</BlockOnPossibleDataLoss> to the .publish.xml file.

In each case, the default is “true”, meaning potentially destructive changes are blocked by default. This is for the benefit of those “enterprise” customers that deploy direct to production with no testing - remember that these people aren’t our problem but they are the SSDT development team’s problem, since they are paying to keep the lights on at SSDT HQ!

In general, this should always be set to false, meaning “allow potentially destructive changes”. This is unproblematic as long as production isn’t the first environment where you deploy your changes.

Similarly, the default behaviour of SSDT is not to drop objects that are in the target (i.e. the database) but not in the source (i.e. the database project). To allow these changes to be applied, we need to do one of the following.

  • For projects deployed using the Visual Studio “Publish” dialog, check “Drop objects in target but not in source” in the “Advanced Publish Settings” dialog available by clicking “Advanced” in the “Publish” dialog.

  • For projects deployed using sqlpackage.exe, we need to specify the parameter /p:DropObjectsNotInSource=True

  • For projects that use a publish profile to specify deployment options, we need to add the element <DropObjectsNotInSource>True</DropObjectsNotInSource> to the .publish.xml file.

A note on “Drop objects not in source”

If this option is selected, SSDT will drop all objects from the target database that are not defined in the project. Rather inconveniently, this includes users, permissions - including, cruciallly, the CONNECT permission - and all the other things we need to be present for our application (or us!) to be able to connect to the database. Rather than specify all these items as part of the project, it is often simpler to ignore these at deployment time using the publish profile. The most important ones are probably users, permissions, roles, and role memberships, but anyone with a more obscure security model (Application Roles??) might want to investigate some of the other options.

Advanced Publish Settings showing drop objects

The difference between the “Drop” and “Ignore” settings is that the former apply only to the target - so selecting object types here prevents objects of that type from being dropped. The “ignore” settings allow us to specify that objects of the selected types are not dropped in the target if they are absent from the source, but also not created or modified in the target even if they are present in the source.

As above, these settings can also be specified as command-line arguments to sqlpackage.exe or as xml in a publish profile.


  1. This procedure isn’t in the original Chinook database, I added it for the purpose of this example. [return]
  2. which it should have been in the first place, as it doesn’t modify any data! [return]
  3. No, not that kind of trigger, this refers to the metaphorical kind. [return]

Refactoring Databases with SSDT

The book Refactoring Databases by Scott Ambler and Pramod Sadalage, first published over ten years ago, has become something of a modern classic in the field of agile database delivery. The authors give a definition (in fact taken from an earlier book) of a database refactoring as

…a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics.

The book is divided into two sections, the first being a discussion of agile database development techniques, placing database refactoring in a wider technical and organisational contect. This material, intended to be read in order, is recommended reading for anyone struggling to improve the working practices associated with database delivery in any organisation, irrespective of the tools being used.

The second is a collection of named “refactorings” along with the steps required to implement each one, and is structured as a reference work rather than as a continuous narrative. There are online versions of this catalog maintained at the websites of Scott Ambler and Pramod Sadalage respectively.

This article marks the start of a series that examines a number of these named “refactorings” and looks at how they can be implemented using Microsoft® SQL Server and Microsoft® Visual Studio, in particular through the use of SQL Server Data Tools, known as “SSDT”.

The code examples in the book use Oracle with bits of Java and Hibernate thrown in as appropriate, so the procedures here will differ where the behaviour of SQL Server differs in some relevant way. In addition, the “declarative” development paradigm encouraged by SSDT hides a lot of the detail of the DDL behind the scenes, meaning that the steps outlined in the book to achieve each refactoring may not all occur in the same places.

That said, it is worth referring to the discussions of each individual refactoring contained in the book, since these provide a checklist of considerations before embarking on any change, including motivations, tradeoffs, schema update and data migration mechanics, and required application changes, most of which are outside the scope of SSDT. I will attempt to provide page numbers where appropriate, though I realise these are less useful for those with access to only an electronic copy of the book.

In general, the applicability of SSDT is restricted to managing schema changes, with a couple of extensibility points - namely Pre-deployment and Post-deployment scripts - to manage data movements.

This series does presuppose some experience with SSDT; an overview can be obtained from MSDN here, particularly the material relating to Project-Oriented Offline Database Development. In the event that I ever get around to writing any introductory material of my own I’ll come back and add links here as appropriate. Most of the examples use the “Chinook” sample database available from codeplex. The SSDT project I am using is on Github, but note that I’ve made no attempt to make the commits sync with the articles in the series, and there are no guarantees, express or implied, that any given commit will actually be buildable.

Verschlimmbesserung

Welcome to this, my new site. I’ll be adding content about Continuous Integration, database systems including but not limited to Microsoft® SQL Server and development tools including but not limited to Microsoft® Visual Studio. The site itself is provisioned in a moderately interesting way, which you can read about here.