arapaima.uk Data, Databases, Delivery

Expose VSTS secrets as environment variables with this one weird trick...

Config as environment variables

I’m a big fan of the Twelve-Factor App “methodology”1 for building and deploying applications, and whilst much of it is geared towards web apps in Heroku-esque environments, I think the principles - or “factors” - are well worth bearing in mind when considering the delivery of other types of application.

Factor 3 of the 12 reads as follows

An app’s config is everything that is likely to vary between deploys (staging, production, developer environments, etc). This includes:

  • Resource handles to the database, Memcached, and other backing services
  • Credentials to external services such as Amazon S3 or Twitter
  • Per-deploy values such as the canonical hostname for the deploy

There are a number of benefits to this approach, the main ones I can think of are:

  • The obvious one about not having credentials stored in source control. Nobody does this anymore, right?
  • If environment specific information such as server names are stored with the source control, then changes in the infrastructure will result in new commits in the source repo, meaning that the commit history will no longer merely “tell the story” of the application, but will also contain numerous sub-plots regarding the infrastructure.

The downside

There is some debate over whether environment variables are really the best place for secret information, and there are definitely customers for whom this approach would be considered too high risk. However, I wouldn’t have thought these included the customers where the credentials are currently stored with the application source code!

The main alternatives generally revolve around storing credentials somewhere where the infrastructure automation tools - Ansible, etc. - can see them and using these tools to deploy a file which the applications can read.

Never mind the downside, on with the weird trick…

I used the following example of the “config as environment variables” approach in a recent talk about SSDT and VSTS, using a Powershell Script to read config values from environment variables and deploy a dacpac to a SQL Azure database using SQL Authentication.

$serverName=$env:Chinook_ServerName
$dbName = $env:Chinook_DatabaseName
$dbUser = $env:Chinook_DbUser
$dbPassword = $env:Chinook_DbPassword

$dacFxDll='C:\Program Files (x86)\Microsoft SQL Server\140\DAC\bin\Microsoft.SqlServer.Dac.dll'

Add-Type -Path $dacFxDll
$dacServices = New-Object Microsoft.SqlServer.Dac.DacServices "server=$serverName;User ID=$dbUser;Password=$dbPassword;"

$dacpacPath=Join-Path -Path $PSScriptRoot -ChildPath "\bin\Debug\ChinookDb.dacpac"
$publishProfilePath = Join-Path -Path $PSScriptRoot -ChildPath "CommonSettings.publish.xml"

$dacpac = [Microsoft.SqlServer.Dac.DacPackage]::Load($dacpacPath)
$dacProfile = [Microsoft.SqlServer.Dac.DacProfile]::Load($publishProfilePath)

$dacServices.Deploy($dacpac, $dbName, $true, $dacProfile.DeployOptions )

What this script does, in brief, is load the server name, database name, and credentials from environment variables, and then deploy a .dacpac using this information in addition to a publish profile(CommonSettings.publish.xml) that defines some common - to all environments - deployment configuration.

The main advantage of this approach is that the same deployment script can be used without modification in all environments, from the developer’s desktop through the various testing environments and on to UAT and Production.

So, for deployments from the desktop or other unmanaged environments, it is easy to specify these values by setting environment variables.

However, in VSTS, things are a little more complicated. It’s possible to use private agent queues to allocate specific build agents - which could have these variables set in advance - to specific environments, but what if we just want to use the hosted queue?

VSTS Release Management allows us to specify variables for each environment in our Release Definition. It’s fairly common to see these used as parameters to deployment tasks, but what is possibly less obvious from this interface is that these values are surfaced as environment variables in the build process.

Variables pane in VSTS Release Management

This means that our Powershell script above can still work unmodified - except, that is, for the password. This is masked in the screenshot as it is defined as a “secret” variable. Secret variables are not exposed as environment variables, but can only be passed as arguments to our build steps. What muddies the water slightly is that within living memory secret variables were exposed as environment variables, but this behaviour was “fixed” some time in 2016.

Finally, the weird trick

In the above example, there is an empty environment variable Chinook_DbPassword for each environment, and a corresponding secret variable that contains the actual password.

We need to add an additional task to our release definition to read the secret variable and surface it as an environment variable. This can be done through the VSTS logging commands2, which are worth checking out as they enable a number of “weird tricks” in addition to this particular one.

As per the documentation, Logging Commands are invoked by writing the command to standard output, which in the case of PowerShell is done via Write-Host.

Release Definition showing PowerShell step to decrypt secret variable

In accordance with the “rules”, we pass the secret variable as an argument to the script, then use the ##vso[task.setvariable] command to set the value of the non-secret environment variable to the value of the argument.

In the logs for the release, we can see the non-secret variables being set, with Chinook_dbPassword set to blank ([]).

2017-10-13T12:00:12.1607197Z Environment variables available are below.  
...
...
    [AGENT_HOMEDIRECTORY] --> [C:\LR\mms\Services\Mms\Provisioner\TaskAgent\agents\2.123.0]
...    
...
    [CHINOOK_DATABASENAME] --> [Chinook]
    [CHINOOK_DBPASSWORD] --> []
    [CHINOOK_DBUSER] --> [arapaima]
    [CHINOOK_SERVERNAME] --> [vstsdemochinook.database.windows.net]
...
...

When we get to the logs for the “Read Env Var” step:

2017-10-13T12:00:14.5784670Z ##[section]Starting: Read Env Var
2017-10-13T12:00:14.5954676Z ==============================================================================
2017-10-13T12:00:14.5954676Z Task         : PowerShell
2017-10-13T12:00:14.5954676Z Description  : Run a PowerShell script
2017-10-13T12:00:14.5954676Z Version      : 1.2.3
2017-10-13T12:00:14.5954676Z Author       : Microsoft Corporation
2017-10-13T12:00:14.5954676Z Help         : [More Information]
2017-10-13T12:00:14.5954676Z ==============================================================================
2017-10-13T12:00:14.6704451Z ##[command]. 'd:\a\_temp\2fa5955a-1363-464a-bed7-aed0cbea2c96.ps1' ********
2017-10-13T12:00:15.5054467Z ##[section]Finishing: Read Env Var

we can see that the secret variable is passed as a parameter to the inline script, but masked with asterisks in the release logs.

The “Deploy Dacpac” step contains a single action, namely running the same Powershell script as was used in every other environment:

2017-10-13T12:01:30.0026563Z ##[section]Starting: Deploy Dacpac
2017-10-13T12:01:30.0036563Z ==============================================================================
2017-10-13T12:01:30.0036563Z Task         : PowerShell
2017-10-13T12:01:30.0036563Z Description  : Run a PowerShell script
2017-10-13T12:01:30.0036563Z Version      : 1.2.3
2017-10-13T12:01:30.0036563Z Author       : Microsoft Corporation
2017-10-13T12:01:30.0036563Z Help         : [More Information]
2017-10-13T12:01:30.0036563Z ==============================================================================
2017-10-13T12:01:30.0076572Z ##[command]. 'd:\a\r1\a\ChinookDb-CI\BuildOutput\Deploy.ps1' 
2017-10-13T12:02:10.3410313Z ##[section]Finishing: Deploy Dacpac

This isn’t restricted to Release Definitions, the same technique will work in Build Definitions too. This is possibly a less common scenario - as builds shouldn’t normally contain environment-specific information - except in cases like database deployment where it’s generally necessary to deploy the database somewhere before we can do any automated testing.


  1. Their words, not mine! [return]
  2. I can’t find this page on docs.microsoft.com, so the link is to the Github source. [return]

Combining tSQLt mocks with Visual Studio SQL Server Unit Tests

This came up in a question after a recent talk about database unit testing; I’ve done something similar on a client project in the past, and it was in my “old” talk about testing. I thought I’d write it down here in case it’s useful to anyone, not least the person who was asking the question.

A .zip file of the complete solution can be downloaded from here.

For many years, Visual Studio Database Projects - in SSDT as well as in its predecessors - have included an additional template for generating SQL Server Unit Tests.

SQL Server Unit Tests

SQL Server Unit Tests live in a SQL Server Unit Test Class, which is contained in an ordinary (.NET) Unit Test Project.

There is some additional configuration scoped to the Test Project, namely the connection string(s) that will be used to execute the tests, and optionally the name of a database project to deploy before every test run.

The Test Class is an ordinary test class with some boilerplate code supplied as well as a designer for creating SQL Server Unit Tests.

The advantages of using Visual Studio to create our SQL Server Unit Tests are that the ouputs of the test are produced in .trx format, which is well understood by many CI tools, not least the ones from Microsoft, and that the tests can be run by the test runner built into Visual Studio.

The SQL Server Unit Test Designer

The test designer supports up to three Transact SQL scripts per test, called Pre-Test, Test, and Post-Test, which are run in the order you would expect. There is also the option to create class-scoped scripts for Test Initialize and Test Cleanup, which are run before and after every individual test in the class.

The SQL Server Unit Test Designer

For each of these scripts, there is the option to add test assertions, which are referred to “Test Conditions” in this framework. The built-in test conditions are not particularly flexible, but writing custom test conditions is a topic for another day.

At the time of writing, it is almost exactly ten years since the inventor of NUnit (and of other things) decided that common SetUp and TearDown methods probably weren’t such a good idea after all, mainly as they make it difficult to see what a test is actually doing.

This is complicated further in SQL Server Unit Tests, as the Pre-Test and Post-Test scripts don’t use the same connection as the main Test script, which can lead to even more unexpected results.

So, this example won’t make use of the Pre-Test and Post-Test scripts, or of their class-scoped equivalents.

Finally, there is no support for mocks or fakes of any kind in the Visual Studio SQL Server unit testing framework.

Fortunately, there is another testing framework that provides just such support, namely tSQLt.

tSQLt

In the .NET/Java/whatever worlds, after deciding on a Unit Test Framework to use - NUnit, xUnit, MSTest, etc. - we are still faced with a further decision regarding what mocking framework to use, for instance Moq, NSub, Rhino, the Microsoft one that nobody’s heard of because it’s only in VS Enterprise, etc. etc.

In the case of tSQLt, the Unit Test framework and the mocking framework are bundled together into a single package.

However, this doesn’t mean that these components can’t be used in isolation from one another.

Crossing the streams

The present example will demonstrate the use of the mocking facilities of tSQLt in conjunction with the test designer and test runner built into Visual Studio.

There are a few ways of getting the tSQLt objects deployed to where they are needed for testing, the way I use most often is basically this one, whereby we create a .dacpac of just the tSQLt objects (or use one we made earlier!), and create a second database project with a Database Reference of type “Same Database” to the project we are trying to test, and a reference to our tSQLt .dacpac. The .dacpac file needs to be somewhere in our source folders, as it will be added by path. We also need a reference to the master database, as this is required to build the tSQLt project.

Solution Explorer View showing tsqlt projects

In the above illustration, KebabTestDemo is the application we are testing, KebabTestDemo.tSQLt is the database project that contains only the references to our tSQLt dacpac and to master, and KebabTestDemo.Tests is the Unit Test project that contains our SQL Server Unit Test.

In the “SQL Server Test Configuration” dialog, we specify the connection string to use for runnng our tests. This information is stored in the app.config for the test assembly, meaning it is scoped to the project rather than to the individual test classes.

The SQL Server Test Configuration dialog

This dialog also allows us to specify that we want to deploy a database project at the start of every test run, so that all of our latest changes get included. The keen-eyed will notice, however, that we can only specify one project here, whereas we have two database projects in our solution. Normally I’d just leave this blank and try to remember to deploy my updated project before every test run in Visual Studio, and hope that the CI Server “remembers” to deploy its projects before its own test runs.

However, there is a solution, outlined in an MSDN blog post from 2009 that allows us to take advantage of this “automatic deployment” feature from Visual Studio.

In short, we subclass the SqlDatabaseTestService class used in SqlDatabaseSetup.cs to allow us to deploy not one, but two projects from the InitializeAssembly method.

class KebabDbTestService : SqlDatabaseTestService
{
    public void DeployCompositeProject()
    {
        DeployDatabaseProject(@"..\..\..\KebabTestDemo\KebabTestDemo.sqlproj", "Release", "System.Data.SqlClient", GetConnectionString());
        DeployDatabaseProject(@"..\..\..\KebabTestDemo.tSQLt\KebabTestDemo.tSQLt.sqlproj", "Release", "System.Data.SqlClient", GetConnectionString());

    }

    private static string GetConnectionString()
    {
        var config = (SqlUnitTestingSection) ConfigurationManager.GetSection("SqlUnitTesting");

        return config.ExecutionContext.ConnectionString;
    }
}

Note that this pays no attention to the app.config setting that tells us what projects to deploy, so caution is advised!

We then call our new code from the InitializeAssembly method in SqlDatabaseSetup:

// Setup the test database based on setting in the
// configuration file
//SqlDatabaseTestClass.TestService.DeployDatabaseProject();
//SqlDatabaseTestClass.TestService.GenerateData();

var service = new KebabDbTestService();

service.DeployCompositeProject();
(the commented out code is the previous contents of this method)

Transactions

One of the features of tSQLt is that all the procedures for running unit tests wrap every individual test in a transaction which is rolled back at the end of the test, meaning that the database is in the same state at the end of the test as at the beginning. This is unquestionably a good thing, as it means that the tests are all independent of one another, and that we don’t need to think about test “teardown” actions.

In the case of Visual Studio unit tests, we need to add this support manually. There are a few ways of doing this documented on MSDN, of which I’ll consider two.

The first is to insert BEGIN TRANSACTION and ROLLBACK TRANSACTION at the beginning and end of every test script. Whilst this is effective, you need to remember to do it every time. My preferred method requires further doctoring of the C# code behind the designer so that every test is wrapped in a TransactionScope. The only thing to remember here is that the Distributed Transaction Co-ordinator, better known as MSDTC, must be running on the machine where the test is executed, whether this is on your desktop or on a build server.

The only changes we make are the ones highlighted below; we add a reference to System.Transactions and a member variable of type System.Transactions.TransactionScope. We then initialise this variable in TestInitialize() and call its Dispose() method in TestCleanup(), which will throw away the transaction without committing it.

using System.Transactions;

namespace KebabTestDemo.Tests
{
    [TestClass()]
    public class KebabOrderLineTests : SqlDatabaseTestClass
    {
        TransactionScope _t;
        public KebabOrderLineTests()
        {
            InitializeComponent();
        }

        [TestInitialize()]
        public void TestInitialize()
        {
            _t = new TransactionScope();
            base.InitializeTest();
        }
        [TestCleanup()]
        public void TestCleanup()
        {
            base.CleanupTest();
            _t.Dispose();
        }

Now when we run our tests, each individual test will be wrapped in a transaction, which will be disposed of (i.e. rolled back) at the end of the test.

The Test Script

The test script now consists of only three lines; faking the table used in the test, calling the procedure, and selecting the results.

The completed Visual Studio Unit Test with tSQLt Mocks

The results will be processed by the “Test Condition” at the bottom of the picture, this is a “Data Checksum” condition, which is the only way to validate a multi-column, multi-row result using the built-in test conditions.

The checksum is configured using the following dialog; we have to select a database connection to use (in this case, it doesn’t really matter what that connection is), followed by a query that will return the same result set (including column names) as we expect the result set of the test to return. We then click “Retrieve” to execute the query, retrieve the results, and populate the checksum value (in this case -1371852473, visible in the screenshot above)

The Data Checksum Test Configuration Dialog

Running the Test

Having got to here, we are ready to run our test from the Visual Studio Test Explorer. This will re-deploy our project(s) and run the test, wrapped in a TransactionScope.

ScreenShot of passing test

Conclusion

This is a bit fiddly to set up, and even more fiddly to set up on a CI Server. However if you have some reason why you want or need to use the built-in testing facilities in Visual Studio, then hopefully this article has demonstrated a way to take advantage of the tSQLt mocking framework.

If you download the sample project, then the test should build and run the first time, once you set the connection string in the “SQL Server Test Configuration” dialog.

Automatically provisioning a brand new environment for every feature branch using VSTS and AzureRM

It’s fairly uncontentious to suggest that, all else being equal, providing each developer with an individual “sandbox”, or private development environment, is a worthwhile endeavour.

Often, these can be provisioned on the developers individual desktops, but when the application involves PaaS services such as databases, message queues, and other cloud-based services, things become more complicated. It’s generally possible to emulate most things on the desktop, but there are often small gaps in this emulation, not least in the communication and authentication protocols that link the services together.

The rest of this article will discuss how to use Visual Studio Team Services in conjunction with Azure Resource Manger (henceforward AzureRM) templates to automatically provision a new environment for every branch created in source control, and automatically destroy the environment using a service hook to Azure Automation when the branch is merged to master and deleted.

I stumbled on this technique whilst working on a proof of concept involving the data-related Azure services, such as Data Factory, Data Lake, and Azure SQL Data Warehouse, but the present example consists of a serverless Azure Function that writes to a Cosmos DB PaaS Database.

This technique should generalise to most Azure PaaS services, and yes, I’m sure you can do similar things with AWS and friends but I haven’t had cause to think about them for the time being.

The application code

The project I’m using for this example can be downloaded from this Github repo. The Github version of the code is mirrored from the version in VSTS using the Git Tasks extension for VSTS

The repo contains a Visual Studio (2017) solution with four projects:

  • FavouriteKebab - this is the Azure Function App
  • KebabDbResourceGroup - this is the AzureRM template that defines our infrastructure
  • KebabTests - contains a single test written with the MSTest framework
  • HelperScripts - a Powershell Script project containing a single script used in the build pipeline

Solution Explorer View

The next Instagram?

The details of what the app actually does aren’t particularly important for this example, but in brief it is expecting a JSON structure:

{
"name": "Bill",
"favouriteKebab":"Adana"
}

in the body of a POST message, which it will then store in the Cosmos DB database.

[FunctionName("KebabPost")]
public static HttpResponseMessage Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "post")]
HttpRequestMessage req,
[DocumentDB("kebabDb", "kebabPreferences",
ConnectionStringSetting = "DB_CONNECTION", CreateIfNotExists =true)] out dynamic nameAndKebabType, 
    TraceWriter log)
{

    log.Info("C# HTTP trigger function processed a request.");

    // Get request body
    dynamic data = req.Content.ReadAsAsync<object>().Result;

    // Set name to query string or body data
    nameAndKebabType = new {
    name = data?.name,
    favouriteKebab = data?.favouriteKebab
};
    if (data?.name == null || data?.favouriteKebab == null)
    {
        return req.CreateResponse(HttpStatusCode.BadRequest, 
            "Name and kebab are both required!");
    }

    else
    {
        return req.CreateResponse(HttpStatusCode.OK);
    }
}

The name of the DocumentDB database and collection will be the same in every environment, so I’ve just hardcoded them in the file ("kebabDB" and "kebabPreferences"). The other thing to note is that I am passing CreateIfNotExists=true in the binding options for the DocumentDB collection; it turns out to be non-trivial to create the database and the collection from an AzureRM template, so we let the app create them at runtime if required.

The test project contains a single test, which exercises the behaviour described above:

[TestMethod]
public void KebabPost_ValidInput_StoresValuesInDb()
{
string content = "{ 'name':'Barry' , 'favouriteKebab':'Kofte' }";

using (var client = new HttpClient())
{
    var response = client.PostAsync(functionUrl,
        new StringContent(content, Encoding.UTF8, "application/json")).Result;
    
}

using (var client = new DocumentClient(docDbUri, docDbKey))
{
    IQueryable<KebabPrefs> kebabPrefsQuery = client
    .CreateDocumentQuery<KebabPrefs>(
        UriFactory
        .CreateDocumentCollectionUri("kebabDb", "kebabPreferences"))
        .Where(kp => kp.name == "Barry");

    Assert.AreEqual("Kofte", kebabPrefsQuery.ToList()
        .First().favouriteKebab);


    }

Unit Testing enthusiasts will have noticed that this isn’t really a Unit Test at all; there are no fakes or mocks, we are using the “real” dependencies for everything.

If our Azure function were a bit bigger, it would be worth splitting out the “core” functionality into a separate assembly that could be meaningfully unit tested, but in the interests of brevity I’ve not done that here.

Integration tests of this kind are pretty common in PaaS projects, and a glance at the full source code of the test project will reveal many of the common “characteristics” of coded integration tests; acres of boilerplate code to read config files and set up the initial conditions, POCOs not present in the main application whose only purpose is to hold data for our test, etc. etc. Still, all of this fragility and maintenance difficulty is probably a price worth paying for increased confidence in our deployment pipeline.

The AzureRM Template

The full template is a bit too long to reproduce here, so I’ll call out some edited highlights.

  "parameters": {
    "database_account_name": {
      "defaultValue": "[concat('kebabdac', uniqueString(resourceGroup().id))]",
      "type": "string"
    },
    "hostingPlanName": {
      "defaultValue": "[concat('kebabhp', uniqueString(resourceGroup().id))]" ,
      "type": "string"
    },
    "functionAppName": {
      "defaultValue": "[concat('kebabApp', uniqueString(resourceGroup().id))]",
      "type": "string"
    },
    "storageAccountName": {
      "defaultValue": "[concat('kebabstor', uniqueString(resourceGroup().id))]",
      "type": "string"
    }
  }

The thing to note is that the template takes four parameters, each of which has a default value.

Each of the defaults consists of a prefix that identifies the resource type, combined with the result of the function uniqueString(resourceGroup().id). These unique strings are required as there are some objects in Azure (storage accounts or web apps for example) that require globally unique names that can be incorporated into a url of the form https://myapp.azurewebsites.net or similar. I can never remember which objects do require unique names and which ones don’t, so I tend to use uniqueString() for everything.

The template goes on to create a storage account, a Cosmos DB account, a Function App, and all the supporting bits these require. These end up with some pretty funky names thanks to the use of uniqueString(), but since you never really need to type these names anywhere this is more of an aesthetic consideration than a practical problem:

Azure resource names generated with uniqueString

At the end of the template is an outputs section:

  "outputs": {
    "functionAppName": {
      "type": "string",
      "value": "[parameters('functionAppName')]"

    },
    "docDbUri": {
      "type": "string",
      "value": "[Concat('https://',parameters('database_account_name'),'.documents.azure.com:443')]"
    },
    "docDbKey": {
      "type": "string",
      "value": "[listKeys(resourceId('Microsoft.DocumentDb/databaseAccounts', parameters('database_account_name')), '2015-04-08').primaryMasterKey]"
    }
  }

This defines three outputs, the Function App name, the Cosmos DB uri, and the Cosmos DB key. This is necessary as we didn’t specify these values ourselves, so we need AzureRM to tell us what it just created in order that we can use these names for other tasks in our build process.

The Build Definition

The Build Steps in VSTS

This is a fairly standard workflow, with a couple of things of note. We build the solution at the beginning and save the artifacts, but before we run our integration test(s?), we run the AzureRM template to “Create or Update” a resource group, and then deploy our Function App inside this resource group. This is so that we can run our integration tests with the real dependencies. The “trick”, such as it is, is in the AzureRM deployment step:

AzureRM Deploy step showing parameters.

By passing the name of the branch we are building as the name of the resource group, this means that the first time we build any given branch we will get a brand new environment with a Cosmos DB, a Function App, etc. etc., exclusively for that build.

After the template is deployed, we need a bit of Powershell to retrieve the outputs from the template.

param ([string]$resourceGroup)

$outputs = (Get-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroup | Sort Timestamp -Descending | Select -First 1).Outputs

$outputs.Keys | % { Write-Host ("##vso[task.setvariable variable="+$_+";]"+$outputs[$_].Value) }

What this does is to find the most recent deployment for the specified resource group - i.e. the one we just did - and map the outputs of the template to build variables in VSTS by name. This means that for every output in the template, we need to define a build variable with exactly the same name, which indeed we have done:

Build variables view

The values are all empty, as they will be assigned by the above script during each build.

We can then use these variables in the Function App deployment step to tell it where to deploy our Function:

FunctionAppDeployment

as well as in the Integration Testing step, where we use these variables to overwrite the values in the runsettings file used by the vstest test runner.

runsettings

The final step is just to update the Github mirror of the VSTS repo; in most corporate environments this won’t be necessary!

The Build Trigger

https://s3-eu-west-1.amazonaws.com/aksidjenakfjg/vsts-azurerm/buildtrigger.png

This build will be run for every new commit on every branch. A moment’s reflection will reveal that if the branch is created through the UI on the server, the build will be triggered immediately, as there will be a new branch with a new (to that branch) commit. The first build will always be the slowest, as this is the one that will create the resource group from scratch.

Equally, a new build will be triggered every time the developer pushes new commits to the feature branch, following which the artifacts will be deployed and the integration tests run, in the private environment dedicated to that branch.

Subsequent builds will be faster; the time penalty associated with creating the resource group from scratch is around two minutes in this particular case, using the hosted queue (2:32 to create from scratch, 0:38 to revalidate on subsequent builds). The overall build time is around 5 minutes the first time, and 3 minutes on subsequent runs. (since there’s only one integration test, the slowest part is the nuget restore!).

Branch Policy

In addition to enforcing a build on every commit, we can force another build at the point a pull request is made from our feature branch to master:

Master branch policy in vsts

This will “pre-merge” our feature branch into master, and trigger a new build. This will have the side effect of creating a new resource group called “merge”, in which the tests will be run again. However, since the same branch name is used for every pull-request build, the “merge” resource group will only be created once, and won’t be modified unless the AzureRM template changes.

Merging a Pull Request

Merging a PR

This screenshot shows the “moment of truth” immediately prior to merging a pull request. The “pre-merged” branch has been built successfully, and when we click “Complete Merge” the source branch will be deleted. The resource group, however, will still be hanging around incurring charges, and this could become an expensive business if we are creating a new environment for every single feature and never tearing them down.

Tearing down the environment

This is achieved by calling an Azure Automation runbook in a Service Hook from VSTS.

To create the hook, we first need to define our Automation runbook and give it a web endpoint. The integration of Azure Automation and source control is a “work in progress”, so I’ve stored the body of the runbook in a gist

.

This script will parse the JSON payload from VSTS, find the source branch name (from the refspec), and delete the resource group with that name. This runbook is associated with an endpoint url, which we provide to VSTS in the Web hook configuration.

The runbook requires an Azure Automation Account. I have created these in a separate resource group, not “managed” from VSTS.

We set the event to track to “Pull request merge commit created” - there’s no “branch deleted” event available here. Remember, the branch gets deleted when the PR is merged, assuming the box is ticked. Notably, we are only firing this event on merges to master.

Service Hook p1

On the second page of the dialog, we just supply the endpoint url for our Automation runbook.

Service Hook p2

Release Management

Once the branch is merged to master and its environment torn down, then what? We release the feature, of course:

Release Management in VSTS

This Release definition is triggered on every new build of the master branch. Since the branch is protected by a policy, these will only occur when a pull request has been created, successfully built, reviewed, and merged.

Release Trigger

As soon as the release is triggered, we do a deployment to QA. The “little man with a clock” next to the “Production” environment indicated that there is a manual approval required before a release can be deployed to production.

QA trigger

The deployment process

As is typical, the Release Definition is much simpler than the Build Definition. In this instance it consists of only two steps, deploying the Azure RM template and deploying the Function to our Function App.

Release tasks in VSTS

What’s notable though, is that we are using the same Azure RM template for QA and Production as we used for the ephemeral development environments. The only difference is that instead of allowing the parameters to default to their “unique string” values, we explicitly set all these parameters to “well known” values in the step definition:

Azure RM Release Step Definition

as well as in the step that deploys the function to our function app:

Azure Function Release Step Definition

The Production environment definition is exactly the same as the QA one, except that the “well known” values for the resource names are different.

This means that it doesn’t matter if the QA and Prod environments are maintained by a different team, or if they are created in a different subscription or in a different tenant; as long as we know the values to plug into our template, we can override our defaults with the “permanent” names.

The “permanent” resource names still need to respect the rules for uniqueness though, so if these names are important it is probably wise to create these resources in advance to avoid disappointment at deployment time.

Summary

If you’re still reading, you’ll be aware that this has been a somewhat “epic” article. The scenario outlined above allows us to automatically create and tear down a new environment for every single feature. Under normal circumstances, there will be four “persistent” resource groups, namely “QA” and “Production”, as well as “merge” - used for building and testing “pre-merged” pull requests, and “master” - used to rebuild and retest “master” after every merge commit.

The extra resource group for Azure Automation is also visible in this screenshot, as is the resource group for a feature branch, which will be automatically torn down when the branch is merged.

resource groups in the Azure Portal

All of the above is still a work in progress, if there are any glaring errors or omissions please do get in touch via the comments or via the contact form on the site.

Using Vagrant to create a Virtual Machine running SQL Server on CentOS Linux

In my mind, the ability to do this kind of thing is the really big “win” with SQL Server on Linux. In their own words,

Vagrant is a tool for building and managing virtual machine environments in a single workflow. With an easy-to-use workflow and focus on automation, Vagrant lowers development environment setup time, increases production parity, and makes the “works on my machine” excuse a relic of the past.

I’ve been doing presentations at conferences and meetups for a few years, so I’m familiar with the pain of setting up “Demo VMs” for SQL Server on Windows. I also have less than fond memories of setting up and maintaining multiple VMs on a single host to reproduce client problems with clustering, replication, etc.; fortunately I don’t tend to get involved with this kind of thing too much these days, so it’s not in the scope of this example.

Anyway, with the advent of SQL Server on Linux, it struck me that Vagrant, which I’d used for some other stuff in the “day job” might be a way out of this spiral of despair.

The promise is that having set up a couple of config files, we can stand up a new VM with everything configured and in the right place, merely by typing vagrant up at a command prompt, and remove it again by typing vagrant destroy. If we want to keep it around for another day, vagrant halt is at our disposal. We can even ssh to our new VM by typing vagrant ssh, without any fiddling around with keys or passwords.

Preamble

Yes, I’m aware it’s possible to manage Windows guests with Vagrant, even from a Windows host, but my experience of trying this is that it’s a world of pain, not least because the Windows box files are 800lb gorillas. The most complete attempt I’ve seen has been by chocolately creator Rob Reynolds, who I suppose has (or had) some fairly unique needs in this area.

My own use case is further complicated by the fact that I’ve been using Fedora Linux on the desktop for the last few years, which doesn’t really support Virtualbox - though I’m aware it does mostly work - but prefers kvm-based virtual machines which can be managed through a number of utilities, including the supposedly idiot-proof Gnome Boxes. There are some slightly outdated results from Phoronix that suggest there may be some performance gains from using KVM as opposed to Virtualbox.

Anyway, on with the show…

There are three supported platforms (plus Docker) listed on the installation guide for SQL Server on Linux, namely Ubuntu, Suse, and RedHat.

Given that, I decided to proceed with the creation of a Centos VM, given that it’s “quite like” RedHat, and also has a vendor-created image in the Vagrant Catalog that supports libvirt (almost all boxes, including the Ubuntu ones, are VirtualBox only). Suse also has vendor images for libvirt, but I haven’t used Suse Linux since the Spice Girls were topping the charts and wasn’t inclined to investigate further.

The Vagrantfile

The Vagrantfile is a file called Vagrantfile that is used to tell Vagrant what the VM you want to create needs to look like.

In this case it’s fairly simple, but I’ve left the autogenerated comments in the version in the github repo that accompanies this post (they’re useful if, as is almost certain, you want to reproduce this on a slighly different platform to me, i.e. Virtualbox!), so I’ve extracted the highlights to a gist here.

We specify the “base box”, in this case Centos 7 on which our VM is going to be based. After that, we just specify the ways in which our VM is going to diverge from this base. In our case, we are setting up a forwarded port for 1433, so that clients on the host can connect to the SQL Server as if it were “local”.

We’re increasing the default memory allocation to the VM from the default (which I think is 512MB) to 4GB - the minimum requirement for SQL Server on Linux is 3.25, and I have read that the installer will fail if you have less than this available.

After that, we specify a shell script to run to set up everything else; In Vagrant-speak this is known as a “provisioner”.

Vagrant doesn’t have to use shell scripts for configuration, it supports a number of alternative provisioners, such as the usual suspects of Ansible, Chef, and Puppet, in addition to a couple of others.

The advantage of these latter approaches, of course, is that they are idempotent, making the scripts easy to build up over time; I intend to revisit this example, probably with Ansible, once SQL Server on Linux is generally available and the installation procedure is a bit less “dynamic”.

The shell script fetches updates with yum (for those more accustomed to Debian-derivatives such as Ubuntu this is like apt), then adds the Microsoft repository definitions to yum’s configuration. It also installs a package called tcping from another repo called epel (which stands for Extra Packages for Enterprise Linux), which we’re going to need in our script. Having got everything downloaded, we install the SQL Server client and server, which are separate packages.

Not only are they separate packages, they require different mechanisms for accepting the terms of the EULA; one requires an environment variable ACCEPT_EULA=y, and the other requires a parameter accept-eula!

We also pass the top-secret sa password as an environment variable; this is required to run the installation silently.

Having done all that, we wait for the service to start before proceeding. This is why tcping was required, it’s among the simplest ways to figure out if there’s anything listening on a given port.

Finally, we restore a database from a backup and run a script to install the tSQLt unit testing framework. By default, Vagrant will rsync the folder containing the Vagrantfile to a /Vagrant folder inside our VM, so we can simply put any files we need for provisioning (or for anything else) inside this folder. In this case, I’ve extracted the contents of the tSQLt.zip from the tSQLt downloads page.

Originally, I had the database backup copied from the original repo here too, but had to replace this with a call to curl; I’m glad to say I’d never run into the Github file size limit before! Obviously this makes the re-provisioning process a bit slower, it’s best to have these large file dependencies somewhere local if you can manage it.

What we didn’t do

There’s nothing in the (abbreviated) Vagrantfile about networking (other than the forwarded port), storage, logins, cores, etc. etc. All these things are configurable, but the point of this approach is that we trust Vagrant to “do the right thing” unless we specify otherwise.

The Moment of Truth

vagrant up

Profit

vscode on fedora connected to linux on centos in vagrant!

To be clear, this is a picture of Visual Studio Code, running on Fedora Linux 25 (the host), connected to SQL Server running on Centos 7.3 (the guest) in a virtual machine provisioned by vagrant. The mssql extension for Visual Studio Code is a very recent alpha; this was to do with .net core compatibility on “modern” versions of Linux.

The VM is a “regular” VM, you can see it here in the Fedora Virtual Machine Manager application with the console open showing the SQL Server process:

vagrant vm opened in virt-manager showing the sql server process

The Wrap

I was slightly surprised that all this worked as well as it did; not so much the Vagrant part as the SQL Server on Linux part, which is certainly more complete than when I last looked at it. In particular, I wasn’t expecting to be able to install tSQLt on Linux - I did have to make a change to the SetClrEnabled.sql script that is distributed with tSQLt to turn off clr strict security, but apart from that it all went pretty smoothly. I have a presentation about tSQLt to do next month, which was one of the motivations for this exercise, and I’ll certainly be setting aside some time in the next day or two to see if everything else works the way one might expect. If you want to try this out at home, and assuming your setup is roughly like mine (kvm rather than Virtualbox, vagrant already working, etc, etc):

mkdir hereGoesNothing && cd hereGoesNothing;
git clone https://github.com/gavincampbell/VagrantSqlServerCentosEtc .
vagrant up

If your setup is different in relevant ways, there will be some steps between 2 and 3 where you install things and hack away at the Vagrantfile.

Enabling per-branch configuration in a Jenkins Multibranch Pipeline

For reasons, you might want your Jenkins Multibranch Pipeline jobs to do a different thing depending on which branch is being built.

Fortunately, the multibranch plugin provides us with a built-in variable BRANCH_NAME, which we can use to figure out which branch we are currently building.

In such scenarios, it’s not a bad idea to create a minimal Jenkinsfile at the repo root that contains just enough logic to figure out which branch we are on, and then call another groovy script that contains the actual build definition:

In this script we load an external groovy file based on the current branch, and then call the function defined therein. In this particular case, the run_build() functions don’t do anything particularly exciting, but they probably do enough to demonstrate this mechanism.

The script for the master branch:

The script for any other branch:

The most important line in each of these scripts is the return this at the end; as documented, this is required for the functions to be callable from the outer script. The checkout scm step in the root Jenkinsfile is also required, as without it the rest of the scripts won’t be fetched from the repo. All three of these files are in the root of the repo, this is because gist doesn’t support folders. In “real life”, it’s probably a good idea to create a separate folder for these scripts, and provide the path to the load function.

Trying this out at home

If you don’t have a Jenkins server handy, you can create a new one with docker run -p 8080:8080 jenkinsci/jenkins:lts1 and connect to it on http://localhost:8080. After that, it’s just a question of creating a new job of type “Mutibranch Pipeline” and specifying an appropriate name.

In the “Branch Sources” section, add a source of type “Git” (not “Github”!), and provide it with the path to the source repo, in this case the gist.

Using a gist as a source repo?

It might be a lesser known fact about Github that gists are really just repos with a few extra rules that can be forked and cloned like any other Github repo. What it doesn’t say in that linked page is that you can also create branches, even if these new branches aren’t visible in the Gist UI. So, if you create a fork of https://gist.github.com/gavincampbell/480843552e43efa84c60f9bb4840d6c1, you should end up with an identical gist in your own Github account. You can then paste the url of this gist into the Jenkins job definition as shown:

Gist url in git scm step

The other thing to note here is the checkbox for “Scan Multibranch Pipeline Triggers”; since we aren’t configuring any push notifications from our git repo, we need to get Jenkins to scan the repo periodically to look for any new branches, or new commits in existing branches. (In my particular case, since the Jenkins instance is just an ephemeral Docker image, there’s no route to it anyway.)

When you save the job configuration, Jenkins will scan the source repo, and create the first pipeline job:

pipeline job for master branch

Creating a new branch

As noted above, there’s nothing in the gist UI to support creating new branches. However, if we clone the gist repo to a local folder we can create the branch and push it back to the origin. You can substitute the url to your own fork of the gist in place of mine (there’s only one branch in mine, and you don’t have permission to create more!). The downside of this approach of using a gist rather than a “proper” repo is that by default you get a long guid instead of a readable folder name.

$ mkdir j && cd j;
$ git clone https://gist.github.com/gavincampbell/480843552e43efa84c60f9bb4840d6c1
Cloning into '480843552e43efa84c60f9bb4840d6c1'...
remote: Counting objects: 32, done.
remote: Compressing objects: 100% (32/32), done.
remote: Total 32 (delta 7), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (32/32), done.
Checking connectivity... done.
$ cd 480843552e43efa84c60f9bb4840d6c1/
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
$ git checkout -b fancyFeature
Switched to a new branch 'fancyFeature'
$ git push --set-upstream origin fancyFeature

Total 0 (delta 0), reused 0 (delta 0)
To https://gist.github.com/gavincampbell/480843552e43efa84c60f9bb4840d6c1
 * [new branch]      fancyFeature -> fancyFeature
Branch fancyFeature set up to track remote branch fancyFeature from origin.

The next time the repo is scanned, the new job will be created in Jenkins:

job created for new branch

When the branch is deleted, the job wil disappear too:

$ git push origin --delete fancyFeature

To https://gist.github.com/gavincampbell/480843552e43efa84c60f9bb4840d6c1
 - [deleted]         fancyFeature

Looking in the “Multibranch Pipeline Log” confirms this:

[Thu May 25 22:40:36 UTC 2017] Finished branch indexing. Indexing took 1.7 sec
Evaluating orphaned items in jenkins-branch-conditions
Will remove fancyFeature as it is #1 in the list
Finished: SUCCESS

  1. Depending on your system configuration, this step may have one or more prerequisites! [return]

Automating SSDT build and deployment with Jenkins from a local git repo

This is a short illustration of using a local installation of Jenkins on Windows to build an SSDT project from a local git repo and deploy it to a SQL Server on the same machine.

This is probably useful for a quick demonstration or to understand how the various moving parts fit together, but possibly less applicable to “Real Life” production environments.

There are no build agents and no git remotes; all the action takes place on the Jenkins master, and the git repo is local to the same machine. No “git management” software such as GitHub/VSTS/GitLab/BitBucket/etc is involved (except for some slight cheating regarding the build definition itself.)

Getting set up

I am using Windows 10 with SQL Server 2016 Developer Edition and Visual Studio 2015 Professional. Most of this stuff should work with most editions and versions of SQL Server and Visual Studio, but some of the paths will vary here and there. I am using SQL Server Data Tools Version 14.0.61021.0, released on October 26, 2016.

Installing Jenkins

I used the windows installer linked here (note that this link will trigger the download immediately). At the time of writing this is version 2.46.1. In the initial setup, I selected the option to “Install Recommended Plugins”; this installs more than is required for this example.

Further Jenkins twiddling

The MSBuild plugin needs to be installed separately (as it isn’t “recommended”!), which can be done from http://localhost:8080/pluginManager/available or by clicking through Jenkins -> Manage Jenkins -> Manage Plugins -> Available. We need to tell Jenkins where it can find MSBuild.exe, this is done on http://localhost:8080/configureTools/ (Jenkins -> Manage Jenkins -> Global Tool Configuration). You can find out where MSBuild is installled by opening the “Developer Command Prompt for Visual Studio” and typing

C:\Program Files (x86)\Microsoft Visual Studio 14.0>where msbuild
C:\Program Files (x86)\MSBuild\14.0\Bin\MSBuild.exe
C:\Windows\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe

I used the first of these, as this is the one installed by Visual Studio1 2015, and should avoid having to re-read the somewhat convoluted StackOverflow thread here.

I already had git installed and in the path, so there was no need to configure this separately in Jenkins.

$ git --version
git version 2.11.0.windows.3

Compromising your SQL Server 2

By default, the Jenkins service will run as “Local System” on Windows. In order to allow this account - which will authenticate to SQL Server as the machine account - to deploy the database, I made “Local System” an sa of the SQL Server:

EXEC sp_addsrvrolemember 'NT AUTHORITY\SYSTEM', 'sysadmin';

There are many circumstances in which this wouldn’t be a good idea, if any of them apply in your scenario you should make appropriate adjustments to the scripts that follow.

Find something to build

I started by cloning a repo I had prepared earlier, containing a copy of the Chinook database.

C:\Projects>git clone https://github.com/arapaima-uk/Chinook.JenkinsDemo.git
Cloning into 'Chinook.JenkinsDemo'...
remote: Counting objects: 26, done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 26 (delta 5), reused 26 (delta 5), pack-reused 0
Unpacking objects: 100% (26/26), done.

C:\Projects>tree
Folder PATH listing
Volume serial number is 000000B7 806E:E890
C:.
└───Chinook.JenkinsDemo
    └───Chinook.JenkinsDemo
        └───dbo
            └───Tables

C:\Projects>

having done that, I then removed the reference to the GitHub remote with

C:\Projects\Chinook.JenkinsDemo>git remote rm origin

From here on, everything is local (geddit?).

Creating the Jenkins job

The first step is to create the Jenkins job that will build our project into a dacpac, and deploy it to a local SQL Server.

The job definition is contained in the following Jenkinsfile, which consists of three fairly self-explanatory stages. The first stage git checkout checks out our master branch from the local (indicated by the file:\\ prefix) repo. The second stage calls the MSBuild tool we defined earlier, taking advantage of the fact that our project is very simple to provide very few parameters on the command line. The third calls out to the shell to call sqlpackage.exe, again with very few parameters. This last stage is the one that requires whatever account Jenkins is running under to be able to authenticate to the SQL Server (though there are alternatives involving storing credentials in Jenkins).

Conveniently, since this is stored in a GitHub gist, and a gist is a git repo, I can create a new pipeline job and specify “Pipeline Script from SCM” in the build definition then provide the url of the gist for the repository url.

Note that this repo contains a single file, namely the build definition - all the project files are coming from the git repo on our local machine. If you want to copy and paste the complete url, it’s in the build output log reproduced below.

Screenshot of Jenkins showing pipeline from SCM

One more detail is that the job needs to be set to “Poll SCM”, but the schedule can be left empty (ignore the warning). This isn’t required to test the build, but will be required later on to trigger the build on a commit to our git repo.

Jenkins Pipeline Job showing Poll SCM Setting with empty schedule

Testing the build

We should now be able to trigger the build from the Jenkins dashboard, and see some output like the following (under “Console Output”)

Started by user arapaima
Obtained Jenkinsfile from git https://gist.github.com/gavincampbell/e3dfa25abf427752e919eb2110f03852
[Pipeline] node
Running on master in C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo
[Pipeline] {
[Pipeline] stage
[Pipeline] { (git checkout)
[Pipeline] git
 > git.exe rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git.exe config remote.origin.url file:///C:/Projects/Chinook.JenkinsDemo # timeout=10
Fetching upstream changes from file:///C:/Projects/Chinook.JenkinsDemo
 > git.exe --version # timeout=10
 > git.exe fetch --tags --progress file:///C:/Projects/Chinook.JenkinsDemo +refs/heads/*:refs/remotes/origin/*
 > git.exe rev-parse "refs/remotes/origin/master^{commit}" # timeout=10
 > git.exe rev-parse "refs/remotes/origin/origin/master^{commit}" # timeout=10
Checking out Revision 8ccbac95d2edd4ce0cbf14ec9f5f3f7ac2868eac (refs/remotes/origin/master)
 > git.exe config core.sparsecheckout # timeout=10
 > git.exe checkout -f 8ccbac95d2edd4ce0cbf14ec9f5f3f7ac2868eac
 > git.exe branch -a -v --no-abbrev # timeout=10
 > git.exe branch -D master # timeout=10
 > git.exe checkout -b master 8ccbac95d2edd4ce0cbf14ec9f5f3f7ac2868eac
 > git.exe rev-list 8ccbac95d2edd4ce0cbf14ec9f5f3f7ac2868eac # timeout=10
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Build Dacpac from SQLProj)
[Pipeline] tool
[Pipeline] bat
[BuildDeploySsdtFromLocalRepo] Running batch script

C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo>"C:\Program Files (x86)\MSBuild\14.0\Bin\MSBuild.exe"  /p:Configuration=Release 
Microsoft (R) Build Engine version 14.0.25420.1
Copyright (C) Microsoft Corporation. All rights reserved.

Building the projects in this solution one at a time. To enable parallel build, please add the "/m" switch.
Build started 05/04/2017 00:55:50.
Project "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo.sln" on node 1 (default targets).
ValidateSolutionConfiguration:
  Building solution configuration "Release|Any CPU".
Project "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo.sln" (1) is building "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo\Chinook.JenkinsDemo.sqlproj" (2) on node 1 (default targets).
GenerateSqlTargetFrameworkMoniker:
Skipping target "GenerateSqlTargetFrameworkMoniker" because all output files are up-to-date with respect to the input files.
CoreCompile:
Skipping target "CoreCompile" because all output files are up-to-date with respect to the input files.
SqlBuild:
Skipping target "SqlBuild" because all output files are up-to-date with respect to the input files.
CopyFilesToOutputDirectory:
  Chinook.JenkinsDemo -> C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo\bin\Release\Chinook.JenkinsDemo.dll
SqlPrepareForRun:
  Chinook.JenkinsDemo -> C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo\bin\Release\Chinook.JenkinsDemo.dacpac
Done Building Project "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo\Chinook.JenkinsDemo.sqlproj" (default targets).
Done Building Project "C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo\Chinook.JenkinsDemo.sln" (default targets).

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:00.53
[Pipeline] stash
Stashed 1 file(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Deploy Dacpac to SQL Server)
[Pipeline] unstash
[Pipeline] bat
[BuildDeploySsdtFromLocalRepo] Running batch script

C:\Program Files (x86)\Jenkins\workspace\BuildDeploySsdtFromLocalRepo>"C:\Program Files (x86)\Microsoft SQL Server\130\DAC\bin\sqlpackage.exe" /Action:Publish /SourceFile:"Chinook.JenkinsDemo\bin\Release\Chinook.JenkinsDemo.dacpac" /TargetServerName:(local) /TargetDatabaseName:Chinook 
Publishing to database 'Chinook' on server '(local)'.
Initializing deployment (Start)
Initializing deployment (Complete)
Analyzing deployment plan (Start)
Analyzing deployment plan (Complete)
Updating database (Start)
Update complete.
Updating database (Complete)
Successfully published database.
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS

The “stage view” of the build should be showing green:

passing builds in Jenkins

Triggering the build from the git repo

Many approaches to building a project automatically on each commit rely on the CI server, in this case Jenkins, polling the source control system at some pre-defined interval.

However, if the geo-political events of 2016 are any guide, polling doesn’t always lead to the outcome we expect, so in this case we’ll create a git “hook” to trigger the build on every commit to master.

A git hook is a script that runs in response to a specific event that occurs in git. There are a bunch of events that can fire hooks, these can be used to enforce commit message conventions, run static code analysis, prevent force-pushes, and all kinds of other arbitrary busywork. The event that’s of interest here is the post-commit hook, which runs after a commit has been added. In “Real Life”, hooks are normally added to the server copy of the repo but since there’s no remote in our setup, everything has to be local.

Notably, even though we’re on Windows in this example, hooks are executed by the msysgit subsystem so much of the usual UNIX shell stuff will work.

Hooks are stored in the .git folder in our repo (this is usually hidden on Windows) in files that match the name of the hook. So, our file is called post-commit (no extension) and contains the #!/bin/sh “shebang” plus a single line that uses curl to hit a specific URL in the Jenkins server that will cause our Jenkins job to poll its source repo as defined in the build pipeline (i.e. file:///C:/Projects/Chinook.JenkinsDemo )

C:\Projects\Chinook.JenkinsDemo\.git\hooks>type post-commit
#!/bin/sh
curl http://localhost:8080/git/notifyCommit?url=file:///C:/Projects/Chinook.JenkinsDemo

Note for git beginners: this file isn’t in the repo when you download it, you’ll need to add it yourself!

Now, all that remains is to make an arbitrary change to our project and add a commit.

Visual Studio showing changed table and commit dialog

If everything is working the pipeline job should be triggered, and the new build will also log that it was “Started by an SCM Change”.

Build Triggered by SCM Change

To clarify what’s happened here, we made a change and committed it to our local repo, the post-commit hook called the url in Jenkins, and Jenkins then “polled” the git repo, found a new commit, and triggered the pipeline job.

Until next time, happy triggering!

If you commit the change from the git command line, you may see output along the lines of:

$ git commit -am "add shoesize to artist table"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   135  100   135    0     0   9000      0 --:--:-- --:--:-- --:--:--  9000Scheduled polling of BuildDeploySsdtFromLocalRepo
No Git consumers using SCM API plugin for: file:///C:/Projects/Chinook.JenkinsDemo

The “error” message is nothing to worry about, as long as the line “Scheduled Polling of myJobName is present then everything is hunky-dory!

Recovering from the "La-La-Land Moment"

How SSDT can help with restoring a SQL Server database to “just before that last deployment”

For as long as I can remember, SSDT and its predecessors have had the option to “Back up database before deployment”, currently available in the “Advanced Publish Settings” dialog, among other places. Regrettably, I’ve never really had much use for this particular option. Whilst restoring from backup might be a valid strategy for recovering from some kinds of deployment disaster, this could add a great deal of time to the deployment process, assuming a database of non-trivial size.

In any case, there are really only two kinds of database - the ones where you don’t care about the data, such as development and test environments, and the ones (one?) where you do. In the former case, backups don’t matter anyway, and in the latter case it is to be hoped that there is already some kind of backup solution in place, ideally managed by someone who isn’t you.

Further, the “Backup database before deployment” option appears not to offer much control over how the database is backed up. The backup will be created in the default backup directory of the instance, and will be a full rather than a copy only backup, which has the potential to endanger the “real” disaster recovery plans in the event that these include differential database backups, as the SSDT-generated backup will reset the differential base, meaning this backup, which the DBAs don’t even know about, will be necessary, along with the subsequent differentials, in case the database needs to be restored.

It’s likely that the “real” backup strategy will involve some combination of full, differential, and transaction log backups, and if you’re especially lucky will be managed by some kind of “Enterprise Backup Solution”, which will allow rapid restores to any point in time, just as soon as you figure out where it is that the “Enterprise Backup Solution” keeps the backup files for your database.

So, assuming the worst case scenario has arisen, your newly deployed release has been writing rubbish data to the database for the last few hours, the “war room” has been convened and someone you’ve never heard of called the “Problem Management Executive Consultant” has decided that the right course of action is to “Restore the database to immediately before the deployment”, how are you to decide when this was?

Well, hopefully you know from your release logs what time the database was deployed, but what if there was a better way? And what if you aren’t sure what timezone the deployment server is in or whether it uses daylight savings time? Well, one solution to this involves a somewhat neglected feature of SQL Server known as Marked Transactions. (See, it doesn’t even get its own page in MSDN!)

Marked Transactions

The syntax BEGIN TRANSACTION tran-name WITH MARK 'Mark Description' will record the name of the transaction in the transaction log, along with the date, time, LSN, etc. The description is optional, but that gets saved too.

We can see this in action using the pubs database, which if you’re younger than a certain age, sadly isn’t what you think.

Note that if you’re playing along at home, there are a couple of extra considerations; the database needs to be using the full (or bulk-logged) recovery model and a full backup must already have been taken, i.e. the database is not in the pseudo-simple recovery model.

USE pubs;
GO

BEGIN TRANSACTION update_auths WITH MARK 'update authors entry'
UPDATE authors set 
phone = '408 496-7223' WHERE au_id = '172-32-1176'
COMMIT TRANSACTION

If we take a peek at the transaction log we can see our marked transaction there:

SELECT  [Transaction ID] ,
        [Current LSN] ,
        [Transaction Name] ,
        [Operation] ,
        Description ,
        [Transaction SID] ,
        sp.name
FROM    fn_dblog(NULL, NULL) AS f
        JOIN sys.server_principals sp ON f.[Transaction SID] = sp.sid
WHERE   [transaction Name] = 'update_auths'
Transaction ID Current LSN Transaction Name Operation Description Transaction SID name
0000:000004ee 00000022:000005f2:0001 update_auths LOP_BEGIN_XACT 2017/03/16 18:09:58:327;update_auths;0x0105000000000005150000004cca9a3fa9173a6eba0c5dc9e9030000 0x0105000000000005150000004CCA9A3FA9173A6EBA0C5DC9E9030000 ARAPAIMA\Arapaima

The logmarkhistory table in msdb also stores a list of our marked transactions:

SELECT * FROM msdb..logmarkhistory
database_name mark_name description user_name lsn mark_time
pubs update_auths update authors entry ARAPAIMA\Arapaima 34000000152200003 2017-03-16 18:09:58.327

So, it seems we can use this functionality to give us a named point in the transaction log to restore to in the event of career-threatening disaster.

The thing to note is that we need to update something in the database where we’re creating the mark, but it doesn’t really matter what we update. It’s often useful to have a Releases table in our databases where we can store the version number, date, and other fun facts about our database, so this is as good a candidate as any.

CREATE TABLE [dbo].[Releases]
(
        ReleaseVersion VARCHAR(20) NOT NULL PRIMARY KEY,
        DateDeployed DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME() 
        --Not taking any chances with the time zone!

)

We need to create this table and add it to our database project.

I’ve created a proc to do the inserting:

CREATE PROCEDURE [dbo].[spFireInTheHole]
	@ReleaseVersion varchar(20)
AS
BEGIN TRANSACTION @ReleaseVersion WITH MARK @ReleaseVersion 
  INSERT INTO dbo.Releases(ReleaseVersion) VALUES (@ReleaseVersion);
COMMIT TRANSACTION
RETURN 0

And now for the tricky part…

The first time we deploy the database project after adding these objects, they won’t be present in the target database, so we won’t be able to call them from the pre-deployment script. This means we need to wrap them with IF EXISTS in the pre-deployment script so that they don’t get called if they don’t exist. The drawback of this approach, naturally, is that “Release 0.0.1” won’t be recorded in your database for posterity. In my experience of such matters, the gap between Release 0.0.1 and Release 0.0.2 is normally measured in minutes rather than years, so I’m not particularly concerned about this.

Finally, we are going to use a sqlcmd variable to hold the release number; this means we can pass it in at deployment time. Most deployment tools know how to do this, for this example we’ll just pass it on the command line.

The pre-deployment script

/*
 Pre-Deployment Script Template							
--------------------------------------------------------------------------------------
 This file contains SQL statements that will be executed before the build script.	
 Use SQLCMD syntax to include a file in the pre-deployment script.			
 Example:      :r .\myfile.sql								
 Use SQLCMD syntax to reference a variable in the pre-deployment script.		
 Example:      :setvar TableName MyTable							
               SELECT * FROM [$(TableName)]					
--------------------------------------------------------------------------------------
*/

IF EXISTS(SELECT * FROM sys.tables where name = 'Releases') 
AND 
EXISTS (SELECT * FROM sys.procedures WHERE NAME = 'spFireInTheHole')
BEGIN
	
        EXEC spFireInTheHole @ReleaseVersion = '$(ReleaseVersion)';
END

Now, after we build our dacpac, we supply the version number at deploy time:

sqlpackage.exe /Action:Publish /SourceFile:pubs.dacpac /TargetServerName:(local) /TargetDatabaseName:pubs /v:ReleaseVersion=0.2.0

If we run a few “Releases”, we can see the marks building up in msdb..logmarkhistory:

database name mark name description user name lsn mark_time
pubs 0.0.2 0.0.2 ARAPAIMA\Arapaima 34000000172100002 2017-03-16 20:06:40.490
pubs 0.0.3 0.0.3 ARAPAIMA\Arapaima 34000000172200003 2017-03-16 20:07:14.673
pubs 0.0.3 0.0.3 ARAPAIMA\Arapaima 34000000172300002 2017-03-16 20:08:16.183
pubs 0.0.4 0.0.4 ARAPAIMA\Arapaima 34000000172400003 2017-03-16 20:08:39.760

In this particular setup, you’ll get an error if you try to supply the same version number more than once, as ReleaseVersion is the primary key of the Releases table.

Disaster Strikes

So, the career-threatening disaster has happened, and we need to restore our database to its state immediately prior to release 0.6.0. If there hasn’t been a log backup since the deployment, the first step is to make one.

Then, with a bit of DBA magic, we can restore our database to just before the deployment started. I generated these scripts from SQL Server Management Studio, unless you’ve got a competent adult handy I suggest you do the same. The crucial bit is in the last line where we specify STOPBEFOREMARK, which is exactly what we want to do.

USE [master]
BACKUP LOG [pubs] TO  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs_LogBackup_2017-03-16_21-05-56.bak' WITH NOFORMAT, NOINIT,  NAME = N'pubs_LogBackup_2017-03-16_21-05-56', NOSKIP, NOREWIND, NOUNLOAD,  NORECOVERY ,  STATS = 5
RESTORE DATABASE [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 2,  NORECOVERY,  NOUNLOAD,  STATS = 5
GO
RESTORE LOG [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 3,  NORECOVERY,  NOUNLOAD,  STATS = 10
GO
RESTORE LOG [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 4,  NORECOVERY,  NOUNLOAD,  STATS = 10
GO
RESTORE LOG [pubs] FROM  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Backup\pubs.bak' WITH  FILE = 5,  NOUNLOAD,  STATS = 10,  STOPBEFOREMARK = N'0.6.0' AFTER N'2017-03-16T20:46:07'
GO

And with that, the database is restored to its state immediately prior to the deployment - you can check this by looking at the Releases table - , and all that is left to do is blame the guy that left last month for the “rogue code” that “crept” into the release. This is normally possible by editing old git commit messages and force pushing the master branch, but that’s a topic for another day.

A script to download and install the Pester testing framework for PowerShell

I finally got around to looking into Pester for test-driven development with PowerShell. In case it’s useful to anyone, and so that I don’t lose it, I created a script to download the latest release version, store it in the user’s PowerShell modules folder, and run a quick test to make sure everything is working. The script is stored in a GitHubGist and is reproduced below. Some of the cmdlets require a fairly recent version of PowerShell, possibly even 5.0. The steps are roughly as follows:

  • Figure out what the latest release number is.
  • Assemble the correct url to download the zip file.
  • Download and “Unblock” the zip file.
  • Figure out where the user’s modules folder is and create it if it doesn’t exist. I had to exclude the vscode modules folder for this to work for me.
  • Unzip the downloaded file to the right place.
  • Import the newly installed modules.
  • Scaffold a test with New-Fixture. The name is randomised to minimise the chance of overwriting something that’s already there.
  • Hack the generated files about a bit.
  • Run the test.
  • Clean up the generated files.
  • Profit!

The script

Testing whether a branch exists before checking out in Jenkins Pipeline

For reasons, I recently found myself in a scenario where I needed to test whether a branch existed before checking it out, and resorting to a sensible default - such as checking out master, if it didn’t. From the command line, this is a simple matter of git branch -l | grep myBranch, but I needed to do this from the context of a Jenkins pipeline job.

Preliminaries

For simplicity, I’m creating a local repo I can point my Jenkins job at, right inside the JENKINS_HOME folder.

jenkins@d99f4f77acdb:/$ cd $JENKINS_HOME
jenkins@d99f4f77acdb:~$ mkdir testBranch ; cd testBranch
jenkins@d99f4f77acdb:~/testBranch$ git init
Initialized empty Git repository in /var/jenkins_home/testBranch/.git/
jenkins@d99f4f77acdb:~/testBranch$ echo "Half a league, half a league, Half a league onward." > myFile.txt
jenkins@d99f4f77acdb:~/testBranch$ git add .
jenkins@d99f4f77acdb:~/testBranch$ git commit -m "Initial Commit"
[master (root-commit) 7ef4b3b] Initial Commit
 1 file changed, 1 insertion(+)
 create mode 100644 myFile.txt
jenkins@d99f4f77acdb:~/testBranch$ git checkout -b thisBranchExists
Switched to a new branch 'thisBranchExists'
jenkins@d99f4f77acdb:~/testBranch$ echo "All in the valley of Death Rode the six hundred" >> myFile.txt
jenkins@d99f4f77acdb:~/testBranch$ git commit -am "a change"
jenkins@d99f4f77acdb:~/testBranch$ git checkout master
Switched to branch 'master'
jenkins@d99f4f77acdb:~/testBranch$ git log --oneline --decorate --all --graph
* 550587f (thisBranchExists) a change
* 7ef4b3b (HEAD, master) Initial Commit

With that out of the way, we have a repo with two branches, one of which is a single commit ahead of the other. Our goal, in the pipeline script, is to checkout the thisBranchExists branch, if it exists, or to fall back to master if it doesn’t.

The Pipeline script

I’ve created a simple script in the Pipeline UI to demonstrate the use of resolveScm to figure out whether the branch we want exists. resolveScm will go through the contents of targets in order, and return the first one that matches a branch name in the repo. We then checkout the branch and print the contents of the file to the console with cat.

node{
def b = resolveScm source: [$class: 'GitSCMSource', credentialsId: '', 
    excludes: '', id: '_', ignoreOnPushNotifications: false, includes: '*', 
    remote: 'file://$JENKINS_HOME/testBranch'], targets: ['thisBranchExists', 'master']
checkout b
sh 'cat myFile.txt'
   
}

The Happy path

When we run this job we get the following output, indicating that our branch was located successfully. The contents of the file after the second commit are printed to the console.

Running on master in /var/jenkins_home/workspace/checkout_branch_if_exists
[Pipeline] {
[Pipeline] resolveScm
Checking for first existing branch from [thisBranchExists, master]...
 > git rev-parse --is-inside-work-tree # timeout=10
Setting origin to file://$JENKINS_HOME/testBranch
 > git config remote.origin.url file://$JENKINS_HOME/testBranch # timeout=10
Fetching & pruning origin...
Fetching upstream changes from origin
 > git --version # timeout=10
 > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* --prune
Getting remote branches...
Seen branch in repository origin/master
Seen branch in repository origin/thisBranchExists
Seen 2 remote branches
Checking branch master
Checking branch thisBranchExists
Found thisBranchExists at revision 550587fa43778989f50ca314feea38fdee50b21c
[Pipeline] checkout
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url file:///var/jenkins_home/testBranch # timeout=10
Fetching upstream changes from file:///var/jenkins_home/testBranch
 > git --version # timeout=10
 > git fetch --tags --progress file:///var/jenkins_home/testBranch +refs/heads/*:refs/remotes/origin/*
Checking out Revision 550587fa43778989f50ca314feea38fdee50b21c (thisBranchExists)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 550587fa43778989f50ca314feea38fdee50b21c
 > git rev-list 550587fa43778989f50ca314feea38fdee50b21c # timeout=10
[Pipeline] sh
[checkout_branch_if_exists] Running shell script
+ cat myFile.txt
Half a league, half a league, Half a league onward.
All in the valley of Death Rode the six hundred
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS

The “less-happy” path

Lets see what happens if we change our list of targets to ['thisBranchDoesNotExist', 'master']. This time we don’t find our “preferred” branch, so we fall back to master and print the contents of the file as it was after the first commit.

[Pipeline] node
Running on master in /var/jenkins_home/workspace/checkout_branch_if_exists
[Pipeline] {
[Pipeline] resolveScm
Checking for first existing branch from [thisBranchDoesNotExist, master]...
 > git rev-parse --is-inside-work-tree # timeout=10
Setting origin to file://$JENKINS_HOME/testBranch
 > git config remote.origin.url file://$JENKINS_HOME/testBranch # timeout=10
Fetching & pruning origin...
Fetching upstream changes from origin
 > git --version # timeout=10
 > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* --prune
Getting remote branches...
Seen branch in repository origin/master
Seen branch in repository origin/thisBranchExists
Seen 2 remote branches
Checking branch thisBranchExists
Checking branch master
Done.
Found master at revision 7ef4b3b3eaf296528e47cdf8946545b38f49fb95
[Pipeline] checkout
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url file:///var/jenkins_home/testBranch # timeout=10
Fetching upstream changes from file:///var/jenkins_home/testBranch
 > git --version # timeout=10
 > git fetch --tags --progress file:///var/jenkins_home/testBranch +refs/heads/*:refs/remotes/origin/*
Checking out Revision 7ef4b3b3eaf296528e47cdf8946545b38f49fb95 (master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7ef4b3b3eaf296528e47cdf8946545b38f49fb95
 > git rev-list 550587fa43778989f50ca314feea38fdee50b21c # timeout=10
[Pipeline] sh
[checkout_branch_if_exists] Running shell script
+ cat myFile.txt
Half a league, half a league, Half a league onward.
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS

The Unhappy path

Finally, let’s see what happens if none of the branches we specify in targets exist in the repo. Regrettably, the output is rather shorter, as the step fails if no matching branches are found.

Running on master in /var/jenkins_home/workspace/checkout_branch_if_exists
[Pipeline] {
[Pipeline] resolveScm
Checking for first existing branch from [thisBranchDoesNotExist, thisBranchDoesNotExistEither]...
 > git rev-parse --is-inside-work-tree # timeout=10
Setting origin to file://$JENKINS_HOME/testBranch
 > git config remote.origin.url file://$JENKINS_HOME/testBranch # timeout=10
Fetching & pruning origin...
Fetching upstream changes from origin
 > git --version # timeout=10
 > git fetch --tags --progress origin +refs/heads/*:refs/remotes/origin/* --prune
Getting remote branches...
Seen branch in repository origin/master
Seen branch in repository origin/thisBranchExists
Seen 2 remote branches
Checking branch master
Checking branch thisBranchExists
Done.
Could not find any matching branch%n
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
ERROR: Could not find any matching branch
Finished: FAILURE

What's in a name?

Continuing our horticultural theme, in this article we’ll look at the built-in support in SSDT for renaming database objects including tables, columns, and programmable objects, as well as peering into the details of how these changes are managed at deployment time.

That which we call a rose. By any other name would smell as sweet

Renaming columns

Refactoring Databases, p 109

The easy way

We can rename a column just by right-clicking in the CREATE TABLE script and selecting Refactor → Rename.

Renaming the InvoiceId Column by right-clicking in the editor window

Under normal circumstances, renaming the Primary Key column of a table such as “Invoices” would be a recipe for disaster, but SSDT can help to ease such changes by automatically updating all references to the column to use the new name. In this case we are renaming the column InvoiceId to Invoice_Id, and by specifying the option to preview the changes, we can see a list of all the objects that reference this column by its old name.

SSDT shows a preview of which objects will be updated to refer to the new name

There’s something of note here, which is that only the InvoiceId column from the Invoices table is being renamed, any other columns called InvoiceId (such as the one in the InvoiceLine table) are unaffected. The foreign key constraint on that particular column, however, is updated to use the new name of the referenced column.

What this demonstrates is that there is something more than global search and replace going on here; SSDT is using its in-memory model of the database to determine which changes need to be made1.

The refactorlog

When we click apply, two things happen. The first is that all the references to this column are updated to use the new name. The second is that a new file appears in the solution, with the extension .refactorlog.

<?xml version="1.0" encoding="utf-8"?>
<Operations Version="1.0" xmlns="http://schemas.microsoft.com/sqlserver/dac/Serialization/2012/02">
  <Operation Name="Rename Refactor" Key="209f3afd-7195-401f-853f-aa3a906d39db" ChangeDateTime="11/08/2016 20:02:31">
  <Property Name="ElementName" Value="[dbo].[Invoice].[InvoiceId]" />
  <Property Name="ElementType" Value="SqlSimpleColumn" />
  <Property Name="ParentElementName" Value="[dbo].[Invoice]" />
  <Property Name="ParentElementType" Value="SqlTable" />
  <Property Name="NewName" Value="[Invoice_Id]" />
  </Operation>
</Operations>

This file is how sqlpackage.exe (or SSDT publish, or DacFX.Deploy) will detemine at deploy time that we are renaming this column from InvoiceID to Invoice_Id rather than dropping the InvoiceID column and creating a new column called Invoice_ID. We can see in the XML that this is specifying the column and table name, and the precise action to perform. This is known, in the jargon, as “preserving the intent” of the refactoring. If we build a project containing a .refactorlog file and examine the resulting .dacpac, we can see that the regular .dacpac contents have been joined by a refactor.xml file.

$ unzip -l Refactoring.Chinook.dacpac
Archive:  Refactoring.Chinook.dacpac
  Length      Date    Time    Name
---------  ---------- -----   ----
    69711  2016-11-10 05:44   model.xml
      606  2016-11-10 05:44   refactor.xml
      203  2016-11-10 05:44   DacMetadata.xml
     1118  2016-11-10 05:44   Origin.xml
      175  2016-11-10 05:44   [Content_Types].xml
---------                     -------
    71813                     5 files

The contents of this file are the same as the .refactorlog file from our solution.

When we publish the project we see the following output:

The following operation was generated from a refactoring log file 209f3afd-7195-401f-853f-aa3a906d39db
Rename [dbo].[Invoice].[InvoiceId] to Invoice_Id
Caution: Changing any part of an object name could break scripts and stored procedures.

Altering [dbo].[InvoicesWithLineTotals]...
Altering [dbo].[UpdateInvoiceBillingAddress]...
Update complete.

The publish action has read the refactorlog file and taken the appropriate action. In addition, the “key” for this refactoring has been stored in a new table in our database called dbo._RefactorLog, which gets created the first time we deploy a dacpac containing a refactor.xml file:

-- Refactoring step to update target server with deployed transaction logs
IF NOT EXISTS (SELECT OperationKey FROM [dbo].[__RefactorLog] WHERE OperationKey = '209f3afd-7195-401f-853f-aa3a906d39db')
INSERT INTO [dbo].[__RefactorLog] (OperationKey) values ('209f3afd-7195-401f-853f-aa3a906d39db')

On subsequent deployments, this table is read and any refactorings recorded here are skipped from the deployment.

Appearances can be deceptive

There’s another UI wrinkle here which is worth examining, so we will rename another column, this time using the table designer.

Another entry has been added to the refactorlog file, and it appears as if the references to this column elsewhere in the model have been updated (note the red tick showing that PlaylistTrack.sql and InvoiceLine.sql have been modified.) Renaming a column using the table designer However, when we go to build the project, we get an error:

SQL71501: Procedure: [dbo].[ChangeTrackPriceByFactor] has an unresolved reference to object [dbo].[Track].[TrackId].	

(Alternatively, the error will appear in the SSDT UI as soon as the Intellisense catches up). There was a stored procedure referencing this column by name, which is now causing the build to fail (remember that deferred name resolution works for table names but not for column names.

This is because of a detail of how foreign keys - and primary keys, for that matter - are maintained by SQL Server itself. If we look at the contents of sys.foreign_key_columns, there isn’t a column name in sight (sys.index_columns looks much the same):

constraint_ object_id constraint_ column_id parent_ object_id parent_ column_id referenced_ object_id referenced_ column_id
1045578763 1 885578193 3 565577053 1
917578307 1 565577053 3 597577167 1
965578478 1 725577623 2 629577281 1
933578364 1 629577281 13 661577395 1

If we fix the reference in our stored procedure and go on to generate a publish script, all we see for this change is

PRINT N'The following operation was generated from a refactoring log file 8a905288-76de-4cc4-aad9-c6dddf081a17';

PRINT N'Rename [dbo].[Track].[TrackId] to Track_Id';


GO
EXECUTE sp_rename @objname = N'[dbo].[Track].[TrackId]', @newname = N'Track_Id', @objtype = N'COLUMN';


GO
PRINT N'Altering [dbo].[ChangeTrackPriceByFactor]...';


GO
ALTER PROCEDURE [dbo].[ChangeTrackPriceByFactor]
	@TrackID int,
	@Factor NUMERIC(10, 2)
AS
	UPDATE Track SET UnitPrice *= @Factor WHERE Track_Id = @TrackID;

RETURN 0
GO
-- Refactoring step to update target server with deployed transaction logs
IF NOT EXISTS (SELECT OperationKey FROM [dbo].[__RefactorLog] WHERE OperationKey = '8a905288-76de-4cc4-aad9-c6dddf081a17')
INSERT INTO [dbo].[__RefactorLog] (OperationKey) values ('8a905288-76de-4cc4-aad9-c6dddf081a17')

GO

The refactorlog is updated to include this change, but there are no changes deployed to any of the tables that referenced the TrackId column via foreign keys. This is because foreign keys - as noted above - don’t really use column names, they use object and column ids, so it is sufficient to rename the referenced column with sp_rename. Stored Procedures, triggers, and other programmable objects, however, do reference columns and tables by name - in sys.sql_modules - so these references aren’t updated automatically.

However, SSDT itself takes into consideration that when we rename a column referenced by a foreign (or primary) key, the definition required to create the constraint from scratch will need to be updated, which is why the files containing the referencing tables are all updated.

This behaviour may seem inconsistent, but it is in fact consistent with the behaviour of sp_rename itself, which is to say that constraints and indexes aren’t broken by sp_rename, but stored procedures, triggers, etc. are.

The wrong way

In contrast, renaming a column by editing the Transact-SQL file directly delivers the promised disaster, as SSDT will attempt to drop the column with the old name and create a new column with the new name.

In the best-case scenario we get a validation error that stops the project from building, assuming the renamed column is referenced by some other object in the project.

Renaming a column by editing the .sql file

In all other scenarios, the script executed at deploy time is as follows:

PRINT N'Altering [dbo].[Genre]...';

GO
ALTER TABLE [dbo].[Genre] DROP COLUMN [Name];

GO
ALTER TABLE [dbo].[Genre]
    ADD [GenreName] NVARCHAR (120) NULL;

The only safety net left at this point is the BlockOnPossibleDataLoss deploy-time option, which is enabled by default. This will stop the deployment from proceeding.

Msg 50000, Level 16, State 127, Line 48
Rows were detected. The schema update is terminating because data loss might occur.
** An error was encountered during execution of batch. Exiting.

Renaming tables

Refactoring Databases, p 113

There are two options in the refactoring context menu that are relevant to naming tables; “Rename” and “Move to Schema”. In some RDBMSs, notably Oracle, the notion of a schema is tightly coupled to the notion of a user, such that the user account in question “owns” the tables and other objects contained therein. SQL Server implemented a similar concept prior to SQL Server 2005, when the link between users and schemas was severed such that a schema became more like a namespace, or even a filesystem folder, since a schema remains a securable object. Under either analogy - namespace or folder - the name of the schema can be considered to be a part of the (qualified) name of the table, meaning that moving an object to a new schema is merely a special case of renaming.

The right way

Right-click refactor menu for a table

When we rename a table, we get the usual “refactor preview” showing the changes about to be applied to the project:

Refactor preview for renaming a table

On clicking apply, the relevant objects are updated and a new entry is inserted into the refactorlog file:

<Operation Name="Rename Refactor" Key="cce28c30-adb0-4019-876e-d93cc2ca0011" ChangeDateTime="11/12/2016 14:22:18">
   <Property Name="ElementName" Value="[dbo].[Artist]" />
   <Property Name="ElementType" Value="SqlTable" />
   <Property Name="ParentElementName" Value="[dbo]" />
   <Property Name="ParentElementType" Value="SqlSchema" />
   <Property Name="NewName" Value="[Artiste]" />
 </Operation>

The process for moving a table between schemas is similar, we are presented with a preview of the changes about to be made:

Refactor preview for move to schema

and a new entry is made in the refactorlog file.

<Operation Name="Move Schema"Key="985e03c6-37f8-48c8-8ce8-5ed37fbb7c00"ChangeDateTime="11/12/2016 14:46:31">
  <Property Name="ElementName" Value="[dbo].[Invoice]" />
  <Property Name="ElementType" Value="SqlTable" />
  <Property Name="NewSchema" Value="Sales" />
  <Property Name="IsNewSchemaExternal" Value="False" />
</Operation>

Finally, when we come to deploy our change, the table is moved to the new schema and all the referencing objects are updated:

PRINT N'The following operation was generated from a refactoring log file 985e03c6-37f8-48c8-8ce8-5ed37fbb7c00';

PRINT N'Move object [dbo].[Invoice] to different schema [Sales]';

GO
ALTER SCHEMA [Sales] TRANSFER [dbo].[Invoice];

GO
ALTER VIEW [dbo].[InvoicesWithLineTotals] 
	AS SELECT I.[Invoice_Id],
	InvoiceTotal
	FROM [Sales].Invoice AS I CROSS APPLY dbo.CalculateInvoiceTotal(I.[Invoice_Id]);
GO

The wrong way

As with columns, if we rename a table by editing the .sql file directly, we get a potentially undesirable outcome, namely that a new table will created with the new name, possibly at the expense of the current table. If we are lucky we get a validation error that stops the project from building:

Error from renaming a table in the sql file

If we are less lucky we may get some warnings (remember that deferred name resolution means that a missing table is only a warning rather than an error in a stored procedure), but at deploy time we will get a new table created with the new name, and possibly even a DROP TABLE for the existing table, assuming we have the appropriate options set. (By default, SSDT won’t drop objects from the database unless we specify “Drop objects in target but not in source”).

PRINT N'Dropping [dbo].[PlaylistTrack]...';
GO
DROP TABLE [dbo].[PlaylistTrack];
GO
PRINT N'Creating [dbo].[Playlist_Track]...';
GO
CREATE TABLE [dbo].[Playlist_Track] (
    [PlaylistId] INT NOT NULL,
    [TrackId]    INT NOT NULL,
    CONSTRAINT [PK_PlaylistTrack] PRIMARY KEY NONCLUSTERED ([PlaylistId] ASC, [TrackId] ASC)
);

Renaming Programmable Objects (Views, Functions, Stored Procedures, other miscellany in [sys.sql_modules]

Refactoring Databases, p 117

As noted in the discussion of dropping programmable objects, operations involving these objects involve substantially less risk of catastrophic data loss and consequent unemployment than similar operation involving tables and columns.

It is still important to use the Refactor → Rename and Refactor → Move to Schema techniques to rename these objects rather than directly editing the .sql files, so that an entry is written to the .refactorlog file. This will ensure that the existing object is altered rather than a new one created at the expense of the old one.

<Operation Name="Rename Refactor" Key="fc4b928d-9b00-4028-9cac-5859ba9b666c" ChangeDateTime="11/12/2016 22:50:20">
  <Property Name="ElementName" Value="[dbo].[ChangeTrackPriceByFactor]" />
  <Property Name="ElementType" Value="SqlProcedure" />
  <Property Name="ParentElementName" Value="[dbo]" />
  <Property Name="ParentElementType" Value="SqlSchema" />
  <Property Name="NewName" Value="[ChangeTrackPriceByMultiplier]" />
</Operation>

  1. It isn’t magic, this doesn’t work with dynamic SQL, for instance. [return]