Tuesday, July 12, 2016

Script to Copy Data Between Two SQL Server Databases

I recently had a requirement to "copy" the data between two identical database schemas. Actually the requirement was to "rebuild" a database from the ground up (objects and data). I almost always Visual Studio SQL Server projects (available after installing SSDT as well as with the latest versions of Visual Studio) when doing database development which means I can deploy an "empty" database with more or less the click of a mouse so I won't get into creating the actual database, only populating it. If you need to copy the database objects to a new database there are tools built within SQL Server Management Studio to allow you to do this. You can check out this TechNet post for more information. You can also generate DML or data population scripts from SSMS but it can result in massive files if you have a database with a decent amount of rows. My databases were rather large so I wanted to copy the data from one database to the other without having to explicitly write anything to disk.

This stored procedure handles tables with Identity columns as well as Computed columns. It also only attempts to populate tables that have zero rows. You might have different requirements so feel free to enhance it and post any enhancements that you think would be valuable to others.

CREATE PROCEDURE [dbo].[apUtilCopyDataFromDatabase] @SourceLinkedServerName     VARCHAR(128) = NULL,
                                       @SourceDatabaseName         VARCHAR(128),
                                       @SourceSchemaName           VARCHAR(128),
                                       @SourceTableNamePrefix      VARCHAR(128) = '',
                                       @TargetLinkedServerName     VARCHAR(128) = NULL,
                                       @TargetDatabaseName         VARCHAR(128) = NULL,
                                       @TargetSchemaName           VARCHAR(128) = 'dbo',
                                       @OnlyPopulateEmptyTablesInd BIT = 0
AS
  BEGIN
 SET NOCOUNT ON;
 DECLARE @SQLCursorData       NVARCHAR(MAX),
              @SQLTruncate         VARCHAR(MAX),
              @SQL                 VARCHAR(MAX),
              @TableName           SYSNAME,
              @SchemaName          VARCHAR(MAX),
              @HasIdentity         BIT,
              @CurrentDatabaseName VARCHAR(128) = DB_NAME(),
 @TargetDatabase VARCHAR(128) = @TargetDatabaseName; --ToDo: Cleanup confusing names  

      DECLARE @SQLNoCheck NVARCHAR(4000) = 'USE ' + @TargetDatabase + '; EXEC sp_MSforeachtable @command1 = ''ALTER TABLE ? NOCHECK CONSTRAINT ALL''; USE ' + @CurrentDatabaseName + ';';
 EXEC sp_executesql @SQLNoCheck;

      SET @SourceDatabaseName = IIF(@SourceLinkedServerName IS NOT NULL, QUOTENAME(@SourceLinkedServerName) + '.', '')
                                + QUOTENAME(@SourceDatabaseName) + '.'
                                + QUOTENAME(@SourceSchemaName);

      SET @TargetDatabaseName = IIF(@TargetLinkedServerName IS NOT NULL, QUOTENAME(@TargetLinkedServerName) + '.', '')
                                + IIF(@TargetDatabaseName IS NOT NULL, QUOTENAME(@TargetDatabaseName) + '.', '')
                                + QUOTENAME(@TargetSchemaName)

      SET @SQLCursorData = 'DECLARE TableCursor CURSOR GLOBAL FOR
        SELECT
          s.NAME
          ,t.NAME
        FROM ' + @TargetDatabase + '.sys.tables t
          JOIN ' + @TargetDatabase + '.sys.schemas s
            ON t.schema_id = s.schema_id
          JOIN ' + @TargetDatabase + '.sys.objects AS o
            ON t.object_id = o.object_id
          JOIN ' + @TargetDatabase + '.sys.partitions AS p
            ON o.object_id = p.object_id
        WHERE
          t.type = ''U''
          AND t.NAME LIKE ''' + @SourceTableNamePrefix + '%''
        GROUP  BY
          s.NAME
          ,t.NAME
      HAVING SUM(p.rows) = 0 
OR ' + CAST(@OnlyPopulateEmptyTablesInd AS VARCHAR(1)) + ' = 0;'

      EXEC sp_executesql @SQLCursorData;

      OPEN TableCursor;

      FETCH NEXT FROM TableCursor INTO @SchemaName, @TableName;

      DECLARE @ColumnList VARCHAR(MAX);

      WHILE @@FETCH_STATUS = 0
        BEGIN
            BEGIN TRY
PRINT CHAR(13) + CHAR(10)
                      + '-----------------------'
                      + QUOTENAME(@SchemaName) + '.'
                      + QUOTENAME(@TableName)
                      + '-----------------------'

--ToDo: Add logic to determine if there are FK constraints. TRUNCATE if not, DELETE if there are.
                IF (@OnlyPopulateEmptyTablesInd = 0)
BEGIN
SET @SQLTruncate = 'DELETE FROM '
  + QUOTENAME(@TargetDatabase) + '.'
  + QUOTENAME(@SchemaName) + '.'
  + QUOTENAME(@TableName) + ';';

EXEC(@SQLTruncate);

PRINT CAST(@@ROWCOUNT AS VARCHAR) + ' rows deleted.';
END

                DECLARE @SQLHasIdentity NVARCHAR(1000) = 
'USE ' + @TargetDatabase + '; 
SELECT @val = CAST(SUM(CAST(is_identity AS TINYINT)) AS BIT)
                        FROM ' + @TargetDatabase + '.sys.columns
                        WHERE object_id = OBJECT_ID(QUOTENAME(''' + @SchemaName + ''') + ''.'' + QUOTENAME(''' + @TableName + ''')); 
USE ' + @CurrentDatabaseName + ';';

                EXEC sp_executesql @SQLHasIdentity, N'@val bit OUTPUT', @val = @HasIdentity OUTPUT;

SET @ColumnList = NULL;

                DECLARE @SQLColumnList NVARCHAR(4000) = 
'USE ' + @TargetDatabase + '; 
DECLARE @ColList NVARCHAR(MAX);
SELECT @val = COALESCE(@val + '','', '''') + QUOTENAME(name)
FROM '
 + @TargetDatabase + '.sys.columns
WHERE
 object_id = OBJECT_ID(QUOTENAME(''' + @SchemaName + ''') + ''.'' + QUOTENAME(''' + @TableName + '''))
 AND is_computed = 0 
 AND NAME NOT IN ( ''CreatedDate'', ''CreatedUser'', ''LastUpdatedDate'', ''LastUpdatedUser'' ); 
 USE ' + @CurrentDatabaseName + ';';

                EXECUTE sp_executesql @SQLColumnList, N'@val VARCHAR(MAX) OUTPUT', @val = @ColumnList OUTPUT;                

                SET @SQL = IIF(@HasIdentity = 1, 'SET IDENTITY_INSERT ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@TableName) + ' ON;', '')
                           + 'INSERT INTO ' + @TargetDatabaseName + '.'
                           + @TableName + ' (' + @ColumnList + ') SELECT '
                           + @ColumnList + ' FROM '
                           + CONVERT(VARCHAR, @SourceDatabaseName) + '.'
                           + QUOTENAME(@TableName) + ' '
                           + IIF(@HasIdentity = 1, 'SET IDENTITY_INSERT ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@TableName) + ' OFF ', '')
                           + ';';

                EXEC(@SQL);

PRINT CAST(@@ROWCOUNT AS VARCHAR) + ' rows inserted.';
            END TRY
            BEGIN CATCH
                PRINT ERROR_MESSAGE();
            END CATCH

            FETCH NEXT FROM TableCursor INTO @SchemaName, @TableName;;
        END

      CLOSE TableCursor;

      DEALLOCATE TableCursor;

 DECLARE @SQLCheck NVARCHAR(4000) = 'USE ' + @TargetDatabase + '; EXEC sp_MSforeachtable @command1 = ''ALTER TABLE ? CHECK CONSTRAINT ALL''; USE ' + @CurrentDatabaseName + ';';
 EXEC sp_executesql @SQLCheck;

      RETURN 0
  END


Tuesday, June 21, 2016

Visual Studio, TFS, Git, & GitHub

For those unfamiliar with TFVC it is one of the two version control engines shipped with Team Foundation Server, available with Visual Studio Team Services, and available on Microsoft's GitHub "rival", CodePlex. Many developers assume TFVC when they refer to using TFS but this assumption went out the window with TFS 2013 when it began shipping with the option to use either TFVC or Git as a version control repository.

There are many sites such as git-scm.com that do a great job a describing Git and what makes it so powerful so I won't get into specifics here but I will say that it excels over most other version control platforms when it comes to large, distributed projects. One such project is the Linux operating system. In fact, it was Linux's creator, Linus Torvalds, that developed Git to support the large, distributed nature of the development of Linux. Even with smaller projects, developers are embracing Git partially due to applications like GitHub that easily allow sharing and collaboration across the globe.

Git does such a great job that Microsoft itself is moving away from TFVC in favor of Git for many of its own projects. It's even hosting many of its public (open source) projects on GitHub including the .NET Core project.

...which brings us to GitHub. GitHub is not Git and Git is not GitHub. Git is a standalone version control application whereas GitHub is a web-based Git repository hosting service. GitHub allows users to create and share Git repositories and interact with them via the GitHub.com website, APIs, command line tools, IDE plugins, etc.

Back to Microsoft and its integrations with Git and GitHub...

1) CodePlex offers a Git version control option (it is the default):

2) TFS and Visual Studio Team Services offer a Git option (also the default option):

3) Visual Studio 2015 provides local Git repository integration as well as
GitHub integration:


If you landed here it's because you spend at least part of your time developing on the Microsoft stack and are interested in Git. If you know and like Git and didn't know if or how it integrated with Microsoft development tools, now you know enough to find the resources you need to get started. If you've only just heard about Git in passing and work with Microsoft development tools, I would urge you to do some research on Git and GitHub prior to starting your next project so that you can make an informed decision as to which version control system to use and if you want to use hosted repository service like GitHub.

Useful Links
  1. Interesting Wired article that gives some history of Git and GitHub
  2. The Register article about Microsoft moving projects to Git
  3. Visual Studio GitHub extension for all version of Visual Studio prior to Visual Studio 2015 Update 1.
  4. posh-git PowerShell Interface Install Overview which is also part of the GitHub Desktop app install
  5. Combining TFVC and Git repos in the same TFS project blog post




Wednesday, June 1, 2016

Tuesday, May 31, 2016

Multiple Rows of Tabs/Docs in SSMS

For a long time I felt like a crazy person for constantly closing documents in SSMS just because I could not easily access the documents that no longer had a visible tab. I would find myself resizing Object Explorer, maximizing SSMS, etc. to be able to see all the tabs. I'm sure I'm not the only one. In fact, Sergey Vlasov has created a Visual Studio plugin to deal with exactly this. Since SSMS is built on top of Visual Studio, the plugin works for those that also work exclusively in SSMS.

Visual Studio Gallery Link


Thank you Sergey!

Wednesday, March 5, 2014

SSIS Lookup Transformation Using Salesforce.com as a Data Source

Pragmatic Works offers a Salesforce.com Source component as part of its Task Factory SSIS component library. Using this component along with the out-of-the-box SSIS Cache Transform, it is possible to populate a Lookup Transformation with Salesforce.com data.

*Note: this post assumes you have, at a minimum, installed the trial version of the Pragmatic Works TaskFactory product.

Example scenario where this is useful: We have a data warehouse that contains customer account data. Each customer account can have a broker that is responsible for servicing the account (each account can have 0 to 1 broker and each broker can have 0 to many accounts). We want to write this data warehouse account data to a Salesforce.com Account object. The Salesforce.com Account object also has a relationship to a Broker object. In order for us to maintain this relationship in Salesforce.com, it is necessary for us to pass the Salesforce.com internal Broker Id when writing the data to Salesforce.com. Since we do not maintain the Salesforce.com Broker Id in our data warehouse me must look it up from Salesforce.com using a Broker Number that is a unique Broker identifier in our data warehouse. Since we cannot use the Salesforce.com Connection Manager in a Lookup Transformation we will use an approach that leverages the Cache Transform to allow us to have a Lookup Transformation that uses Salesforce.com data.

Let's take a look at the Control Flow of our package. 

As you can see, this simple package contains two Data Flows. The first (PopulateLookupCache) populates the Cache Transform  and the second (OleDBSourceToSalesforce) writes data to Salesforce.com . I've highlighted three of the components necessary to make this happen. The first is the PopulateLookupCache Data Flow and the second and third are Connection Managers of types CACHE and SALESFORCE.

Let's look at the PopulateLookupCache Data Flow.

This Data Flow uses the Pragmatic Works SalesForce.com Source component to retrieve data from Salesforce.com using a SOQL query and populates a Cache Transform (LookupCache) with the internal Salesforce.com Id and the external Id (BrokerNumber). Below is the Cache Connection Manager definition. Note the Index Position value of 1 for the BrokerNumber column. This means that BrokerNumber will be the column used to perform the lookup when we define our Lookup Transformation.

Now that we've populated our Cache Transform we can move on to our primary Data Flow that will perform our lookup using our cached Salesforce.com data.

In this data flow we retrieve Account data from a SQL Server database using our OleDBSource Connection Manager to write data to the Account object in Salesforce.com. In between our source and destination components we have 1) a Data Conversion transformation to convert our source system's BrokerNumber from non-unicode (VARCHAR/STR) to unicode (NVARCHAR/WSTR) and 2) our broker lookup that uses Salesforce.com data to retrieve the Account's Salesforce.com internal Broker Id using our source system's broker number.

Let's take a look at our lookup. As you can see, we're using the Cache connection manager connection type as opposed to the OLE DB connection manager connection type that we're so used to. This allows us to use the Salesforce.com data that we wrote to the Cache Transform in our previous data flow.

Next, we set the Connection to our LookupCache Connection Manager.

Once we've set our data source we can configure the lookup columns.

We join on our data warehouse BrokerNumber to retrieve the Salesforce Broker Id that we alias as Broker__c since this is the name of our Salesforce.com Account object's broker field.

As you can see from this post, using the Pragmatic Works Task Factory Salesforce.com Source component and the out-of-the-box Cache Transform component, it is possible to populate a Lookup Transformation with Salesforce.com data even though the Lookup Transformation does not directly support a Salesforce.com Connection Manager.


Friday, February 28, 2014

RapidMiner jTDS Network error: IOException: Connection refused: connect

Network error: IOException: Connection refused: connect

There a quite a number of reasons why you might get this error. However, assuming the following:
  1. You're using jTDS to connect to connect to a SQL Server instance 
  2. You're trying to use Windows authentication (not passing SQL Server credentials)
  3. You've installed the ntlmauth.dll library (link to download page). 
Confirm the following via the SQL Server Configuration manager (Start->Run->SQLServerManager11.msc):

1.  The TCP/IP protocol is enabled.

2.  Your SQL Server instance is listening on the SQL Server default port of 1433 -or- you have entered the correct port in your jTDS configuration/connection string.

3.  The SQL Server Browser service is running.

If you confirmed and/or changed your configuration based on the above recommendations, restarted your SQL Server instance, and still cannot successfully connect, refer to the jTDS SourceForge FAQ page.