arapaima.uk Data, Databases, Delivery

Recovering from the "La-La-Land Moment"

How SSDT can help with restoring a SQL Server database to “just before that last deployment”

For as long as I can remember, SSDT and its predecessors have had the option to “Back up database before deployment”, currently available in the “Advanced Publish Settings” dialog, among other places. Regrettably, I’ve never really had much use for this particular option. Whilst restoring from backup might be a valid strategy for recovering from some kinds of deployment disaster, this could add a great deal of time to the deployment process, assuming a database of non-trivial size.

In any case, there are really only two kinds of database - the ones where you don’t care about the data, such as development and test environments, and the ones (one?) where you do. In the former case, backups don’t matter anyway, and in the latter case it is to be hoped that there is already some kind of backup solution in place, ideally managed by someone who isn’t you.

Further, the “Backup database before deployment” option appears not to offer much control over how the database is backed up. The backup will be created in the default backup directory of the instance, and will be a full rather than a copy only backup, which has the potential to endanger the “real” disaster recovery plans in the event that these include differential database backups, as the SSDT-generated backup will reset the differential base, meaning this backup, which the DBAs don’t even know about, will be necessary, along with the subsequent differentials, in case the database needs to be restored.

It’s likely that the “real” backup strategy will involve some combination of full, differential, and transaction log backups, and if you’re especially lucky will be managed by some kind of “Enterprise Backup Solution”, which will allow rapid restores to any point in time, just as soon as you figure out where it is that the “Enterprise Backup Solution” keeps the backup files for your database.

So, assuming the worst case scenario has arisen, your newly deployed release has been writing rubbish data to the database for the last few hours, the “war room” has been convened and someone you’ve never heard of called the “Problem Management Executive Consultant” has decided that the right course of action is to “Restore the database to immediately before the deployment”, how are you to decide when this was?

Well, hopefully you know from your release logs what time the database was deployed, but what if there was a better way? And what if you aren’t sure what timezone the deployment server is in or whether it uses daylight savings time? Well, one solution to this involves a somewhat neglected feature of SQL Server known as Marked Transactions. (See, it doesn’t even get its own page in MSDN!)

Marked Transactions

The syntax BEGIN TRANSACTION tran-name WITH MARK 'Mark Description' will record the name of the transaction in the transaction log, along with the date, time, LSN, etc. The description is optional, but that gets saved too.

We can see this in action using the pubs database, which if you’re younger than a certain age, sadly isn’t what you think.

Note that if you’re playing along at home, there are a couple of extra considerations; the database needs to be using the full (or bulk-logged) recovery model and a full backup must already have been taken, i.e. the database is not in the pseudo-simple recovery model.

USE pubs;
GO

BEGIN TRANSACTION update_auths WITH MARK 'update authors entry'
UPDATE authors set 
phone = '408 496-7223' WHERE au_id = '172-32-1176'
COMMIT TRANSACTION

If we take a peek at the transaction log we can see our marked transaction there:

SELECT  [Transaction ID] ,
        [Current LSN] ,
        [Transaction Name] ,
        [Operation] ,
        Description ,
        [Transaction SID] ,
        sp.name
FROM    fn_dblog(NULL, NULL) AS f
        JOIN sys.server_principals sp ON f.[Transaction SID] = sp.sid
WHERE   [transaction Name] = 'update_auths'
Transaction ID Current LSN Transaction Name Operation Description Transaction SID name
0000:000004ee 00000022:000005f2:0001 update_auths LOP_BEGIN_XACT 2017/03/16 18:09:58:327;update_auths;0x0105000000000005150000004cca9a3fa9173a6eba0c5dc9e9030000 0x0105000000000005150000004CCA9A3FA9173A6EBA0C5DC9E9030000 ARAPAIMA\Arapaima

The logmarkhistory table in msdb also stores a list of our marked transactions:

SELECT * FROM msdb..logmarkhistory
database_name mark_name description user_name lsn mark_time
pubs update_auths update authors entry ARAPAIMA\Arapaima 34000000152200003 2017-03-16 18:09:58.327

So, it seems we can use this functionality to give us a named point in the transaction log to restore to in the event of career-threatening disaster.

The thing to note is that we need to update something in the database where we’re creating the mark, but it doesn’t really matter what we update. It’s often useful to have a Releases table in our databases where we can store the version number, date, and other fun facts about our database, so this is as good a candidate as any.

CREATE TABLE [dbo].[Releases]
(
        ReleaseVersion VARCHAR(20) NOT NULL PRIMARY KEY,
        DateDeployed DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME() 
        --Not taking any chances with the time zone!

)

We need to create this table and add it to our database project.

I’ve created a proc to do the inserting:

CREATE PROCEDURE [dbo].[spFireInTheHole]
	@ReleaseVersion varchar(20)
AS
BEGIN TRANSACTION @ReleaseVersion WITH MARK @ReleaseVersion 
  INSERT INTO dbo.Releases(ReleaseVersion) VALUES (@ReleaseVersion);
COMMIT TRANSACTION
RETURN 0

And now for the tricky part…

The first time we deploy the database project after adding these objects, they won’t be present in the target database, so we won’t be able to call them from the pre-deployment script. This means we need to wrap them with IF EXISTS in the pre-deployment script so that they don’t get called if they don’t exist. The drawback of this approach, naturally, is that “Release 0.0.1” won’t be recorded in your database for posterity. In my experience of such matters, the gap between Release 0.0.1 and Release 0.0.2 is normally measured in minutes rather than years, so I’m not particularly concerned about this.

Finally, we are going to use a sqlcmd variable to hold the release number; this means we can pass it in at deployment time. Most deployment tools know how to do this, for this example we’ll just pass it on the command line.

The pre-deployment script

/*
 Pre-Deployment Script Template							
--------------------------------------------------------------------------------------
 This file contains SQL statements that will be executed before the build script.	
 Use SQLCMD syntax to include a file in the pre-deployment script.			
 Example:      :r .\myfile.sql								
 Use SQLCMD syntax to reference a variable in the pre-deployment script.		
 Example:      :setvar TableName MyTable							
               SELECT * FROM [$(TableName)]					
--------------------------------------------------------------------------------------
*/

IF EXISTS(SELECT * FROM sys.tables where name = 'Releases') 
AND 
EXISTS (SELECT * FROM sys.procedures WHERE NAME = 'spFireInTheHole')
BEGIN
	
        EXEC spFireInTheHole @ReleaseVersion = '$(ReleaseVersion)';
END

Now, after we build our dacpac, we supply the version number at deploy time:

sqlpackage.exe /Action:Publish /SourceFile:pubs.dacpac /TargetServerName:(local) /TargetDatabaseName:pubs /v:ReleaseVersion=0.2.0

If we run a few “Releases”, we can see the marks building up in msdb..logmarkhistory:

database name mark name description user name lsn mark_time
pubs 0.0.2 0.0.2 ARAPAIMA\Arapaima 34000000172100002 2017-03-16 20:06:40.490
pubs 0.0.3 0.0.3 ARAPAIMA\Arapaima 34000000172200003 2017-03-16 20:07:14.673
pubs 0.0.3 0.0.3 ARAPAIMA\Arapaima 34000000172300002 2017-03-16 20:08:16.183
pubs 0.0.4 0.0.4 ARAPAIMA\Arapaima 34000000172400003 2017-03-16 20:08:39.760

In this particular setup, you’ll get an error if you try to supply the same version number more than once, as ReleaseVersion is the primary key of the Releases table.

Disaster Strikes

So, the career-threatening disaster has happened, and we need to restore our database to its state immediately prior to release 0.6.0. If there hasn’t been a log backup since the deployment, the first step is to make one.

Then, with a bit of DBA magic, we can restore our database to just before the deployment started. I generated these scripts from SQL Server Management Studio, unless you’ve got a competent adult handy I suggest you do the same. The crucial bit is in the last line where we specify STOPBEFOREMARK, which is exactly what we want to do.

USE [master]
BACKUP LOG [pubs] TO  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs_LogBackup_2017-03-16_21-05-56.bak' WITH NOFORMAT, NOINIT,  NAME = N'pubs_LogBackup_2017-03-16_21-05-56', NOSKIP, NOREWIND, NOUNLOAD,  NORECOVERY ,  STATS = 5
RESTORE DATABASE [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 2,  NORECOVERY,  NOUNLOAD,  STATS = 5
GO
RESTORE LOG [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 3,  NORECOVERY,  NOUNLOAD,  STATS = 10
GO
RESTORE LOG [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 4,  NORECOVERY,  NOUNLOAD,  STATS = 10
GO
RESTORE LOG [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 5,  NOUNLOAD,  STATS = 10,  STOPBEFOREMARK = N'0.6.0' AFTER N'2017-03-16T20:46:07'
GO

And with that, the database is restored to its state immediately prior to the deployment - you can check this by looking at the Releases table - , and all that is left to do is blame the guy that left last month for the “rogue code” that “crept” into the release. This is normally possible by editing old git commit messages and force pushing the master branch, but that’s a topic for another day.

A script to download and install the Pester testing framework for PowerShell

I finally got around to looking into Pester for test-driven development with PowerShell. In case it’s useful to anyone, and so that I don’t lose it, I created a script to download the latest release version, store it in the user’s PowerShell modules folder, and run a quick test to make sure everything is working. The script is stored in a GitHubGist and is reproduced below. Some of the cmdlets require a fairly recent version of PowerShell, possibly even 5.0. The steps are roughly as follows:

  • Figure out what the latest release number is.
  • Assemble the correct url to download the zip file.
  • Download and “Unblock” the zip file.
  • Figure out where the user’s modules folder is and create it if it doesn’t exist. I had to exclude the vscode modules folder for this to work for me.
  • Unzip the downloaded file to the right place.
  • Import the newly installed modules.
  • Scaffold a test with New-Fixture. The name is randomised to minimise the chance of overwriting something that’s already there.
  • Hack the generated files about a bit.
  • Run the test.
  • Clean up the generated files.
  • Profit!

The script

Testing whether a branch exists before checking out in Jenkins Pipeline

For reasons, I recently found myself in a scenario where I needed to test whether a branch existed before checking it out, and resorting to a sensible default - such as checking out master, if it didn’t. From the command line, this is a simple matter of git branch -l | grep myBranch, but I needed to do this from the context of a Jenkins pipeline job.

Preliminaries

For simplicity, I’m creating a local repo I can point my Jenkins job at, right inside the JENKINS_HOME folder.

jenkins@d99f4f77acdb:/$ cd $JENKINS_HOME
jenkins@d99f4f77acdb:~$ mkdir testBranch ; cd testBranch
jenkins@d99f4f77acdb:~/testBranch$ git init
Initialized empty Git repository in /var/jenkins_home/testBranch/.git/
jenkins@d99f4f77acdb:~/testBranch$ echo "Half a league, half a league, Half a league onward." > myFile.txt
jenkins@d99f4f77acdb:~/testBranch$ git add .
jenkins@d99f4f77acdb:~/testBranch$ git commit -m "Initial Commit"
[master (root-commit) 7ef4b3b] Initial Commit
 1 file changed, 1 insertion(+)
 create mode 100644 myFile.txt
jenkins@d99f4f77acdb:~/testBranch$ git checkout -b thisBranchExists
Switched to a new branch 'thisBranchExists'
jenkins@d99f4f77acdb:~/testBranch$ echo "All in the valley of Death Rode the six hundred" >> myFile.txt
jenkins@d99f4f77acdb:~/testBranch$ git commit -am "a change"
jenkins@d99f4f77acdb:~/testBranch$ git checkout master
Switched to branch 'master'
jenkins@d99f4f77acdb:~/testBranch$ git log --oneline --decorate --all --graph
* 550587f (thisBranchExists) a change
* 7ef4b3b (HEAD, master) Initial Commit

With that out of the way, we have a repo with two branches, one of which is a single commit ahead of the other. Our goal, in the pipeline script, is to checkout the thisBranchExists branch, if it exists, or to fall back to master if it doesn’t.

The Pipeline script

I’ve created a simple script in the Pipeline UI to demonstrate the use of resolveScm to figure out whether the branch we want exists. resolveScm will go through the contents of targets in order, and return the first one that matches a branch name in the repo. We then checkout the branch and print the contents of the file to the console with cat.

node{
def b = resolveScm source: [$class: 'GitSCMSource', credentialsId: '', 
    excludes: '', id: '_', ignoreOnPushNotifications: false, includes: '*', 
    remote: 'file://$JENKINS_HOME/testBranch'], targets: ['thisBranchExists', 'master']
checkout b
sh 'cat myFile.txt'
   
}

The Happy path

When we run this job we get the following output, indicating that our branch was located successfully. The contents of the file after the second commit are printed to the console.

Running on master in /var/jenkins_home/workspace/checkout_branch_if_exists
[Pipeline] {
[Pipeline] resolveScm
Checking for first existing branch from [thisBranchExists, master]...
 > git rev-parse --is-inside-work-tree # timeout=10
Setting origin to file://$JENKINS_HOME/testBranch
 > git config remote.origin.url file://$JENKINS_HOME/testBranch # timeout=10
Fetching & pruning origin...
Fetching upstream changes from origin
 > git --version # timeout=10
 > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* --prune
Getting remote branches...
Seen branch in repository origin/master
Seen branch in repository origin/thisBranchExists
Seen 2 remote branches
Checking branch master
Checking branch thisBranchExists
Found thisBranchExists at revision 550587fa43778989f50ca314feea38fdee50b21c
[Pipeline] checkout
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url file:///var/jenkins_home/testBranch # timeout=10
Fetching upstream changes from file:///var/jenkins_home/testBranch
 > git --version # timeout=10
 > git fetch --tags --progress file:///var/jenkins_home/testBranch +refs/heads/*:refs/remotes/origin/*
Checking out Revision 550587fa43778989f50ca314feea38fdee50b21c (thisBranchExists)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 550587fa43778989f50ca314feea38fdee50b21c
 > git rev-list 550587fa43778989f50ca314feea38fdee50b21c # timeout=10
[Pipeline] sh
[checkout_branch_if_exists] Running shell script
+ cat myFile.txt
Half a league, half a league, Half a league onward.
All in the valley of Death Rode the six hundred
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS

The “less-happy” path

Lets see what happens if we change our list of targets to ['thisBranchDoesNotExist', 'master']. This time we don’t find our “preferred” branch, so we fall back to master and print the contents of the file as it was after the first commit.

[Pipeline] node
Running on master in /var/jenkins_home/workspace/checkout_branch_if_exists
[Pipeline] {
[Pipeline] resolveScm
Checking for first existing branch from [thisBranchDoesNotExist, master]...
 > git rev-parse --is-inside-work-tree # timeout=10
Setting origin to file://$JENKINS_HOME/testBranch
 > git config remote.origin.url file://$JENKINS_HOME/testBranch # timeout=10
Fetching & pruning origin...
Fetching upstream changes from origin
 > git --version # timeout=10
 > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* --prune
Getting remote branches...
Seen branch in repository origin/master
Seen branch in repository origin/thisBranchExists
Seen 2 remote branches
Checking branch thisBranchExists
Checking branch master
Done.
Found master at revision 7ef4b3b3eaf296528e47cdf8946545b38f49fb95
[Pipeline] checkout
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url file:///var/jenkins_home/testBranch # timeout=10
Fetching upstream changes from file:///var/jenkins_home/testBranch
 > git --version # timeout=10
 > git fetch --tags --progress file:///var/jenkins_home/testBranch +refs/heads/*:refs/remotes/origin/*
Checking out Revision 7ef4b3b3eaf296528e47cdf8946545b38f49fb95 (master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7ef4b3b3eaf296528e47cdf8946545b38f49fb95
 > git rev-list 550587fa43778989f50ca314feea38fdee50b21c # timeout=10
[Pipeline] sh
[checkout_branch_if_exists] Running shell script
+ cat myFile.txt
Half a league, half a league, Half a league onward.
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS

The Unhappy path

Finally, let’s see what happens if none of the branches we specify in targets exist in the repo. Regrettably, the output is rather shorter, as the step fails if no matching branches are found.

Running on master in /var/jenkins_home/workspace/checkout_branch_if_exists
[Pipeline] {
[Pipeline] resolveScm
Checking for first existing branch from [thisBranchDoesNotExist, thisBranchDoesNotExistEither]...
 > git rev-parse --is-inside-work-tree # timeout=10
Setting origin to file://$JENKINS_HOME/testBranch
 > git config remote.origin.url file://$JENKINS_HOME/testBranch # timeout=10
Fetching & pruning origin...
Fetching upstream changes from origin
 > git --version # timeout=10
 > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* --prune
Getting remote branches...
Seen branch in repository origin/master
Seen branch in repository origin/thisBranchExists
Seen 2 remote branches
Checking branch master
Checking branch thisBranchExists
Done.
Could not find any matching branch%n
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
ERROR: Could not find any matching branch
Finished: FAILURE

What's in a name?

Continuing our horticultural theme, in this article we’ll look at the built-in support in SSDT for renaming database objects including tables, columns, and programmable objects, as well as peering into the details of how these changes are managed at deployment time.

That which we call a rose. By any other name would smell as sweet

Renaming columns

Refactoring Databases, p 109

The easy way

We can rename a column just by right-clicking in the CREATE TABLE script and selecting Refactor → Rename.

Renaming the InvoiceId Column by right-clicking in the editor window

Under normal circumstances, renaming the Primary Key column of a table such as “Invoices” would be a recipe for disaster, but SSDT can help to ease such changes by automatically updating all references to the column to use the new name. In this case we are renaming the column InvoiceId to Invoice_Id, and by specifying the option to preview the changes, we can see a list of all the objects that reference this column by its old name.

SSDT shows a preview of which objects will be updated to refer to the new name

There’s something of note here, which is that only the InvoiceId column from the Invoices table is being renamed, any other columns called InvoiceId (such as the one in the InvoiceLine table) are unaffected. The foreign key constraint on that particular column, however, is updated to use the new name of the referenced column.

What this demonstrates is that there is something more than global search and replace going on here; SSDT is using its in-memory model of the database to determine which changes need to be made1.

The refactorlog

When we click apply, two things happen. The first is that all the references to this column are updated to use the new name. The second is that a new file appears in the solution, with the extension .refactorlog.

<?xml version="1.0" encoding="utf-8"?>
<Operations Version="1.0" xmlns="http://schemas.microsoft.com/sqlserver/dac/Serialization/2012/02">
  <Operation Name="Rename Refactor" Key="209f3afd-7195-401f-853f-aa3a906d39db" ChangeDateTime="11/08/2016 20:02:31">
  <Property Name="ElementName" Value="[dbo].[Invoice].[InvoiceId]" />
  <Property Name="ElementType" Value="SqlSimpleColumn" />
  <Property Name="ParentElementName" Value="[dbo].[Invoice]" />
  <Property Name="ParentElementType" Value="SqlTable" />
  <Property Name="NewName" Value="[Invoice_Id]" />
  </Operation>
</Operations>

This file is how sqlpackage.exe (or SSDT publish, or DacFX.Deploy) will detemine at deploy time that we are renaming this column from InvoiceID to Invoice_Id rather than dropping the InvoiceID column and creating a new column called Invoice_ID. We can see in the XML that this is specifying the column and table name, and the precise action to perform. This is known, in the jargon, as “preserving the intent” of the refactoring. If we build a project containing a .refactorlog file and examine the resulting .dacpac, we can see that the regular .dacpac contents have been joined by a refactor.xml file.

$ unzip -l Refactoring.Chinook.dacpac
Archive:  Refactoring.Chinook.dacpac
  Length      Date    Time    Name
---------  ---------- -----   ----
    69711  2016-11-10 05:44   model.xml
      606  2016-11-10 05:44   refactor.xml
      203  2016-11-10 05:44   DacMetadata.xml
     1118  2016-11-10 05:44   Origin.xml
      175  2016-11-10 05:44   [Content_Types].xml
---------                     -------
    71813                     5 files

The contents of this file are the same as the .refactorlog file from our solution.

When we publish the project we see the following output:

The following operation was generated from a refactoring log file 209f3afd-7195-401f-853f-aa3a906d39db
Rename [dbo].[Invoice].[InvoiceId] to Invoice_Id
Caution: Changing any part of an object name could break scripts and stored procedures.

Altering [dbo].[InvoicesWithLineTotals]...
Altering [dbo].[UpdateInvoiceBillingAddress]...
Update complete.

The publish action has read the refactorlog file and taken the appropriate action. In addition, the “key” for this refactoring has been stored in a new table in our database called dbo._RefactorLog, which gets created the first time we deploy a dacpac containing a refactor.xml file:

-- Refactoring step to update target server with deployed transaction logs
IF NOT EXISTS (SELECT OperationKey FROM [dbo].[__RefactorLog] WHERE OperationKey = '209f3afd-7195-401f-853f-aa3a906d39db')
INSERT INTO [dbo].[__RefactorLog] (OperationKey) values ('209f3afd-7195-401f-853f-aa3a906d39db')

On subsequent deployments, this table is read and any refactorings recorded here are skipped from the deployment.

Appearances can be deceptive

There’s another UI wrinkle here which is worth examining, so we will rename another column, this time using the table designer.

Another entry has been added to the refactorlog file, and it appears as if the references to this column elsewhere in the model have been updated (note the red tick showing that PlaylistTrack.sql and InvoiceLine.sql have been modified.) Renaming a column using the table designer However, when we go to build the project, we get an error:

SQL71501: Procedure: [dbo].[ChangeTrackPriceByFactor] has an unresolved reference to object [dbo].[Track].[TrackId].	

(Alternatively, the error will appear in the SSDT UI as soon as the Intellisense catches up). There was a stored procedure referencing this column by name, which is now causing the build to fail (remember that deferred name resolution works for table names but not for column names.

This is because of a detail of how foreign keys - and primary keys, for that matter - are maintained by SQL Server itself. If we look at the contents of sys.foreign_key_columns, there isn’t a column name in sight (sys.index_columns looks much the same):

constraint_ object_id constraint_ column_id parent_ object_id parent_ column_id referenced_ object_id referenced_ column_id
1045578763 1 885578193 3 565577053 1
917578307 1 565577053 3 597577167 1
965578478 1 725577623 2 629577281 1
933578364 1 629577281 13 661577395 1

If we fix the reference in our stored procedure and go on to generate a publish script, all we see for this change is

PRINT N'The following operation was generated from a refactoring log file 8a905288-76de-4cc4-aad9-c6dddf081a17';

PRINT N'Rename [dbo].[Track].[TrackId] to Track_Id';


GO
EXECUTE sp_rename @objname = N'[dbo].[Track].[TrackId]', @newname = N'Track_Id', @objtype = N'COLUMN';


GO
PRINT N'Altering [dbo].[ChangeTrackPriceByFactor]...';


GO
ALTER PROCEDURE [dbo].[ChangeTrackPriceByFactor]
	@TrackID int,
	@Factor NUMERIC(10, 2)
AS
	UPDATE Track SET UnitPrice *= @Factor WHERE Track_Id = @TrackID;

RETURN 0
GO
-- Refactoring step to update target server with deployed transaction logs
IF NOT EXISTS (SELECT OperationKey FROM [dbo].[__RefactorLog] WHERE OperationKey = '8a905288-76de-4cc4-aad9-c6dddf081a17')
INSERT INTO [dbo].[__RefactorLog] (OperationKey) values ('8a905288-76de-4cc4-aad9-c6dddf081a17')

GO

The refactorlog is updated to include this change, but there are no changes deployed to any of the tables that referenced the TrackId column via foreign keys. This is because foreign keys - as noted above - don’t really use column names, they use object and column ids, so it is sufficient to rename the referenced column with sp_rename. Stored Procedures, triggers, and other programmable objects, however, do reference columns and tables by name - in sys.sql_modules - so these references aren’t updated automatically.

However, SSDT itself takes into consideration that when we rename a column referenced by a foreign (or primary) key, the definition required to create the constraint from scratch will need to be updated, which is why the files containing the referencing tables are all updated.

This behaviour may seem inconsistent, but it is in fact consistent with the behaviour of sp_rename itself, which is to say that constraints and indexes aren’t broken by sp_rename, but stored procedures, triggers, etc. are.

The wrong way

In contrast, renaming a column by editing the Transact-SQL file directly delivers the promised disaster, as SSDT will attempt to drop the column with the old name and create a new column with the new name.

In the best-case scenario we get a validation error that stops the project from building, assuming the renamed column is referenced by some other object in the project.

Renaming a column by editing the .sql file

In all other scenarios, the script executed at deploy time is as follows:

PRINT N'Altering [dbo].[Genre]...';

GO
ALTER TABLE [dbo].[Genre] DROP COLUMN [Name];

GO
ALTER TABLE [dbo].[Genre]
    ADD [GenreName] NVARCHAR (120) NULL;

The only safety net left at this point is the BlockOnPossibleDataLoss deploy-time option, which is enabled by default. This will stop the deployment from proceeding.

Msg 50000, Level 16, State 127, Line 48
Rows were detected. The schema update is terminating because data loss might occur.
** An error was encountered during execution of batch. Exiting.

Renaming tables

Refactoring Databases, p 113

There are two options in the refactoring context menu that are relevant to naming tables; “Rename” and “Move to Schema”. In some RDBMSs, notably Oracle, the notion of a schema is tightly coupled to the notion of a user, such that the user account in question “owns” the tables and other objects contained therein. SQL Server implemented a similar concept prior to SQL Server 2005, when the link between users and schemas was severed such that a schema became more like a namespace, or even a filesystem folder, since a schema remains a securable object. Under either analogy - namespace or folder - the name of the schema can be considered to be a part of the (qualified) name of the table, meaning that moving an object to a new schema is merely a special case of renaming.

The right way

Right-click refactor menu for a table

When we rename a table, we get the usual “refactor preview” showing the changes about to be applied to the project:

Refactor preview for renaming a table

On clicking apply, the relevant objects are updated and a new entry is inserted into the refactorlog file:

<Operation Name="Rename Refactor" Key="cce28c30-adb0-4019-876e-d93cc2ca0011" ChangeDateTime="11/12/2016 14:22:18">
   <Property Name="ElementName" Value="[dbo].[Artist]" />
   <Property Name="ElementType" Value="SqlTable" />
   <Property Name="ParentElementName" Value="[dbo]" />
   <Property Name="ParentElementType" Value="SqlSchema" />
   <Property Name="NewName" Value="[Artiste]" />
 </Operation>

The process for moving a table between schemas is similar, we are presented with a preview of the changes about to be made:

Refactor preview for move to schema

and a new entry is made in the refactorlog file.

<Operation Name="Move Schema"Key="985e03c6-37f8-48c8-8ce8-5ed37fbb7c00"ChangeDateTime="11/12/2016 14:46:31">
  <Property Name="ElementName" Value="[dbo].[Invoice]" />
  <Property Name="ElementType" Value="SqlTable" />
  <Property Name="NewSchema" Value="Sales" />
  <Property Name="IsNewSchemaExternal" Value="False" />
</Operation>

Finally, when we come to deploy our change, the table is moved to the new schema and all the referencing objects are updated:

PRINT N'The following operation was generated from a refactoring log file 985e03c6-37f8-48c8-8ce8-5ed37fbb7c00';

PRINT N'Move object [dbo].[Invoice] to different schema [Sales]';

GO
ALTER SCHEMA [Sales] TRANSFER [dbo].[Invoice];

GO
ALTER VIEW [dbo].[InvoicesWithLineTotals] 
	AS SELECT I.[Invoice_Id],
	InvoiceTotal
	FROM [Sales].Invoice AS I CROSS APPLY dbo.CalculateInvoiceTotal(I.[Invoice_Id]);
GO

The wrong way

As with columns, if we rename a table by editing the .sql file directly, we get a potentially undesirable outcome, namely that a new table will created with the new name, possibly at the expense of the current table. If we are lucky we get a validation error that stops the project from building:

Error from renaming a table in the sql file

If we are less lucky we may get some warnings (remember that deferred name resolution means that a missing table is only a warning rather than an error in a stored procedure), but at deploy time we will get a new table created with the new name, and possibly even a DROP TABLE for the existing table, assuming we have the appropriate options set. (By default, SSDT won’t drop objects from the database unless we specify “Drop objects in target but not in source”).

PRINT N'Dropping [dbo].[PlaylistTrack]...';
GO
DROP TABLE [dbo].[PlaylistTrack];
GO
PRINT N'Creating [dbo].[Playlist_Track]...';
GO
CREATE TABLE [dbo].[Playlist_Track] (
    [PlaylistId] INT NOT NULL,
    [TrackId]    INT NOT NULL,
    CONSTRAINT [PK_PlaylistTrack] PRIMARY KEY NONCLUSTERED ([PlaylistId] ASC, [TrackId] ASC)
);

Renaming Programmable Objects (Views, Functions, Stored Procedures, other miscellany in [sys.sql_modules]

Refactoring Databases, p 117

As noted in the discussion of dropping programmable objects, operations involving these objects involve substantially less risk of catastrophic data loss and consequent unemployment than similar operation involving tables and columns.

It is still important to use the Refactor → Rename and Refactor → Move to Schema techniques to rename these objects rather than directly editing the .sql files, so that an entry is written to the .refactorlog file. This will ensure that the existing object is altered rather than a new one created at the expense of the old one.

<Operation Name="Rename Refactor" Key="fc4b928d-9b00-4028-9cac-5859ba9b666c" ChangeDateTime="11/12/2016 22:50:20">
  <Property Name="ElementName" Value="[dbo].[ChangeTrackPriceByFactor]" />
  <Property Name="ElementType" Value="SqlProcedure" />
  <Property Name="ParentElementName" Value="[dbo]" />
  <Property Name="ParentElementType" Value="SqlSchema" />
  <Property Name="NewName" Value="[ChangeTrackPriceByMultiplier]" />
</Operation>

  1. It isn’t magic, this doesn’t work with dynamic SQL, for instance. [return]

Database Pruning

After a period of time in use, most databases, like many systems that have grown in an organic matter, can benefit from some judicious pruning. This can take the form of removing columns or even tables no longer required to support the application, or that store redundant - and hence possibly erroneous - copies of information stored elsewhere. Equally there may be stored procedures, functions, and even triggers that contain out-of-date versions of application logic.

The gardener's assistant; a practical and scientific exposition of the art of gardening in all its branches (1910) (14761716416).jpg

Dropping a column

Refactoring Databases, p 72

Since SSDT operates in a declarative manner, each table is defined in a CREATE TABLE script, and deleting a column is as simple as deleting the relevant line from the script. However, there are a couple of features of SSDT that are relevant here. If the column is referenced by any other database objects such as views, functions, or stored procedures, SSDT will display an error:

SSDT Broken Reference Error

The full text of the error message reads SQL71501: Procedure [dbo.UpdateInvoiceBillingAddress] has an unresolved reference to object [dbo].[Invoice].[BillingPostalCode], and the offending reference in the stored procedure1 has acquired a “red squiggly”. There’s no “builtin” refactoring action defined in the right-click context menu for deletions, presumably as it’s difficult to programmatically determine the “intent” of a deleting a column.

Refactoring Context Menu

What is possible, however, is to use the “find all references” tool before deleting the column to enable us to take appropriate action.

Find All References

Dropping a table

Refactoring Databases, p 77

This is where things get more serious. The same technique of using the “Find All References” tool to assess the damage we’re about to do applies here, but there is a subtle difference in what happens when we actually brandish the pruning shears. It turns out that removing an entire table referenced by a stored procedure is only worthy of a warning or five rather than an error. This is due to an oddity of deferred name resolution for stored procedures, namely that it is permitted to reference a non-existent table in the text of a stored procedure, but not permitted to reference a non-existent column in a table that does exist.

Removing a table is only a warning

Interestingly, if we convert our stored procedure to a user-defined function 2 the warnings turn into errors. It turns out deferred name resolution doesn’t work at all for functions, presumably as being “purely functional” and not side-effecting it is less likely that there will be objects that exist at runtime but not at creation time.

Functions don't support deferred name resolution

Dropping Programmable Objects (Views, Functions, Stored Procedures, other miscellany in sys.sql_modules)

Refactoring Databases, p 79

On the face of it this is simpler, as there is no data being thrown out with the bathwater.

Dropping Triggers

You should do this without hesitation. It’s 2016.

Dropping views and functions

Generally, this is unproblematic. Since these don’t support deferred name resolution, you can get error checking in SSDT.

Name checking in views

Pulling the trigger3

There’s one more thing to consider, which is what happens when we come to deploy our changes. Whether we do this by publishing from Visual Studio or at the command line using sqlpackage.exe, if we are dropping a table or a column that contains data we will get an error along the lines of

(48,1): SQL72014: .Net SqlClient Data Provider: Msg 50000, Level 16, State 127, Line 6 Rows were detected. The schema update is terminating because data loss might occur.
(43,0): SQL72045: Script execution error.  The executed script:
IF EXISTS (SELECT TOP 1 1
           FROM   [dbo].[PlaylistTrack])
    RAISERROR (N'Rows were detected. The schema update is terminating because data loss might occur.', 16, 127)
        WITH NOWAIT;

The IF EXISTS check gets inserted into the deployment script for every table, and every table containing a column, that is being dropped. It is worth noting that since the check is for EXISTS (SELECT TOP 1 1), this check will fail and the deployment will be blocked even if we are dropping a column that only contains NULL values - I have found this to be mildly irritating in the past, particularly for “inadvertently” created columns.

To inhibit this behaviour and allow our potentially destructive change to proceed, we need to specify this at deploy time, the mechanism for which depends on the method we are using to deploy our project.

  • For projects deployed using the Visual Studio “Publish” dialog, uncheck “Block Incremental Deployment if Data Loss might Occur” in the “Advanced Publish Settings” dialog available by clicking “Advanced” in the “Publish” dialog.

  • For projects deployed using sqlpackage.exe, we need to specify the parameter /p:BlockOnPossibleDataLoss=False

  • For projects that use a publish profile to specify deployment options, we need to add the element <BlockOnPossibleDataLoss>False</BlockOnPossibleDataLoss> to the .publish.xml file.

In each case, the default is “true”, meaning potentially destructive changes are blocked by default. This is for the benefit of those “enterprise” customers that deploy direct to production with no testing - remember that these people aren’t our problem but they are the SSDT development team’s problem, since they are paying to keep the lights on at SSDT HQ!

In general, this should always be set to false, meaning “allow potentially destructive changes”. This is unproblematic as long as production isn’t the first environment where you deploy your changes.

Similarly, the default behaviour of SSDT is not to drop objects that are in the target (i.e. the database) but not in the source (i.e. the database project). To allow these changes to be applied, we need to do one of the following.

  • For projects deployed using the Visual Studio “Publish” dialog, check “Drop objects in target but not in source” in the “Advanced Publish Settings” dialog available by clicking “Advanced” in the “Publish” dialog.

  • For projects deployed using sqlpackage.exe, we need to specify the parameter /p:DropObjectsNotInSource=True

  • For projects that use a publish profile to specify deployment options, we need to add the element <DropObjectsNotInSource>True</DropObjectsNotInSource> to the .publish.xml file.

A note on “Drop objects not in source”

If this option is selected, SSDT will drop all objects from the target database that are not defined in the project. Rather inconveniently, this includes users, permissions - including, cruciallly, the CONNECT permission - and all the other things we need to be present for our application (or us!) to be able to connect to the database. Rather than specify all these items as part of the project, it is often simpler to ignore these at deployment time using the publish profile. The most important ones are probably users, permissions, roles, and role memberships, but anyone with a more obscure security model (Application Roles??) might want to investigate some of the other options.

Advanced Publish Settings showing drop objects

The difference between the “Drop” and “Ignore” settings is that the former apply only to the target - so selecting object types here prevents objects of that type from being dropped. The “ignore” settings allow us to specify that objects of the selected types are not dropped in the target if they are absent from the source, but also not created or modified in the target even if they are present in the source.

As above, these settings can also be specified as command-line arguments to sqlpackage.exe or as xml in a publish profile.


  1. This procedure isn’t in the original Chinook database, I added it for the purpose of this example. [return]
  2. which it should have been in the first place, as it doesn’t modify any data! [return]
  3. No, not that kind of trigger, this refers to the metaphorical kind. [return]

Refactoring Databases with SSDT

The book Refactoring Databases by Scott Ambler and Pramod Sadalage, first published over ten years ago, has become something of a modern classic in the field of agile database delivery. The authors give a definition (in fact taken from an earlier book) of a database refactoring as

…a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics.

The book is divided into two sections, the first being a discussion of agile database development techniques, placing database refactoring in a wider technical and organisational contect. This material, intended to be read in order, is recommended reading for anyone struggling to improve the working practices associated with database delivery in any organisation, irrespective of the tools being used.

The second is a collection of named “refactorings” along with the steps required to implement each one, and is structured as a reference work rather than as a continuous narrative. There are online versions of this catalog maintained at the websites of Scott Ambler and Pramod Sadalage respectively.

This article marks the start of a series that examines a number of these named “refactorings” and looks at how they can be implemented using Microsoft® SQL Server and Microsoft® Visual Studio, in particular through the use of SQL Server Data Tools, known as “SSDT”.

The code examples in the book use Oracle with bits of Java and Hibernate thrown in as appropriate, so the procedures here will differ where the behaviour of SQL Server differs in some relevant way. In addition, the “declarative” development paradigm encouraged by SSDT hides a lot of the detail of the DDL behind the scenes, meaning that the steps outlined in the book to achieve each refactoring may not all occur in the same places.

That said, it is worth referring to the discussions of each individual refactoring contained in the book, since these provide a checklist of considerations before embarking on any change, including motivations, tradeoffs, schema update and data migration mechanics, and required application changes, most of which are outside the scope of SSDT. I will attempt to provide page numbers where appropriate, though I realise these are less useful for those with access to only an electronic copy of the book.

In general, the applicability of SSDT is restricted to managing schema changes, with a couple of extensibility points - namely Pre-deployment and Post-deployment scripts - to manage data movements.

This series does presuppose some experience with SSDT; an overview can be obtained from MSDN here, particularly the material relating to Project-Oriented Offline Database Development. In the event that I ever get around to writing any introductory material of my own I’ll come back and add links here as appropriate. Most of the examples use the “Chinook” sample database available from codeplex. The SSDT project I am using is on Github, but note that I’ve made no attempt to make the commits sync with the articles in the series, and there are no guarantees, express or implied, that any given commit will actually be buildable.

Verschlimmbesserung

Welcome to this, my new site. I’ll be adding content about Continuous Integration, database systems including but not limited to Microsoft® SQL Server and development tools including but not limited to Microsoft® Visual Studio. The site itself is provisioned in a moderately interesting way, which you can read about here.