Friday, November 21, 2008

Brute Force Data Find Method for SQL Server

On several occasions, I have been tasked with finding some miscellaneous piece of information in a database. Usually, it is some Third party database that I am not that familiar with. For example, someone says, “I need to know where all the occurrences of a particular person’s email appears in this database.” My first approach was to try and find all the fields that might have EMAIL or MAIL or something like that in the title. I quickly found out that not all database designers are created equal. Standards are lacking and consistency is foreign word. To that end, I decided to create a brute force method for finding data. I may not be the quickest but it let’s someone else do the work; namely the database engine.

I am going to be using the Information_Schema views for this exercise. Since they are ANSI standard views, they should be compatible with future versions of SQL server as well. I try to stay away from system tables as there is no guarantee that they will remain consistent between versions. The way I decided to approach this was to query the Information_Schema.Columns view in order to retrieve a complete listing of tables and columns in the database. I use the results of this view to build another script which will actually do the work. This script I will call the “build” script. We will use the build script to build the “query” script.

The build script will consist of 3 parts; 1) create the table variable to hold the results, 2) populate that table with the results and 3) retrieve the values from the variable. Part 1 looks like this.

Build Script Part One
Set NoCount ON
Print 'Set NoCount On'
Print ''
Print 'Declare @Results Table '
Print '('
Print ' TableName varchar(50),'
Print ' FieldName varChar(50),'
Print ' FieldLength int,'
Print ' DataValue varChar(2000)'
Print ')'
Print ''

If we set the results screen in query analyzer to TEXT, then when we run this, it will generate the declaration much like you see it here. This will build the first part of the query script to declare the table value we will then populate. Part 2 will generate the statements to populate this. It looks like this.

Build Script Part Two
Select 'Insert Into @Results Select ''' + Table_Name + ''' as TableName, ''' +
Column_Name + ''' AS FieldName, ' +
Convert(varchar,Character_maximum_length) + ' as FieldLength, [' +
Column_Name + '] as DataValue ' +
'From [' + Table_Schema + '].[' + Table_Name + '] ' +
'Where Upper([' + Column_Name + ']) Like Upper(''%Prestwick%'')'
From Information_Schema.Columns

Where Data_Type in ('char', 'varchar','nvarchar','nchar') And
Character_maximum_length >= 9
and Table_Schema = 'dbo'

Let’s examine this code bit by bit. We are going to generate Insert statements. According to our @Results table from above, we are going to need 4 pieces of information. We will need the Table Name, the Field Name, the Field Length, and the actual data value where we found what we are looking for. As you can see from the code above, that is exactly what we are extracting from the Information_Schema.Columns view. We are using the meta data in the database to build insert statements that will return to query the data. On line 6, we specify the selection criteria for the “query” script. To eliminate and CASE issues, we will convert the value of the column as well as the value of the TextToFind to Upper case. You can massage this part to better fit your needs. If you know the value is exact, you can change the ‘LIKE’ to an ‘=’. In this case, I am looking for an address that contains the word Prestwick. I need to know the table and column. In the last part of the build query , I use the meta data in the columns view to eliminate some of the field. If I know that I am looking for text, I can immediately eliminate all the numeric, bit and date fields. If I know the text is at least 12 characters, there is no need to query any Char(6) fields. When this portion is run agains AdventureWorksDW database, here are some of the results.

Insert Into @Results Select 'DimCustomer' as TableName, 'CustomerAlternateKey' AS FieldName, 15 as FieldLength, [CustomerAlternateKey] as DataValue From [dbo].[DimCustomer] Where Upper([CustomerAlternateKey]) Like Upper('%Prestwick%')

Insert Into @Results Select 'DimCustomer' as TableName, 'FirstName' AS FieldName, 50 as FieldLength, [FirstName] as DataValue From [dbo].[DimCustomer] Where Upper([FirstName]) Like Upper('%Prestwick%')

Insert Into @Results Select 'DimCustomer' as TableName, 'MiddleName' AS FieldName, 50 as FieldLength, [MiddleName] as DataValue From [dbo].[DimCustomer] Where Upper([MiddleName]) Like Upper('%Prestwick%')

Insert Into @Results Select 'DimCustomer' as TableName, 'LastName' AS FieldName, 50 as FieldLength, [LastName] as DataValue From [dbo].[DimCustomer] Where Upper([LastName]) Like Upper('%Prestwick%')

Insert Into @Results Select 'DimCustomer' as TableName, 'Suffix' AS FieldName, 10 as FieldLength, [Suffix] as DataValue From [dbo].[DimCustomer] Where Upper([Suffix]) Like Upper('%Prestwick%')

Of course there are more records that fit this criteria but for now, I am only showing a few for example. Now that we have the declaration and the insert statements generated, the last part of the script is the retrieval of those records. That looks like this.

Build Script Part Three
Print ''
Print 'Select Distinct * From @Results'

If we take all three sections of the Build script and combine them together and run it, we can generate a script that will query each potential field in the database for the value you are looking for. Remember, to generate the “query” script, you will need to set the query analyzer to return text results, not grid results. The next step is to copy the contents of the results of the Build script and paste them into a new query window. If I run the entire script on the AdventureWorksDW database, I can now find out which tables and fields contain this value. The actual Query script that this generates will look at over 140 fields and let you know what it finds. The results of my script are as follows.


TableName FieldName FieldLength DataValue
----------------- ------------- ----------- ---------------------
DimCustomer AddressLine1 120 1588 Prestwick Drive
DimCustomer AddressLine1 120 1841 Prestwick Drive
DimCustomer AddressLine1 120 4152 Prestwick Drive
DimCustomer AddressLine1 120 4998 Prestwick Ave.
DimCustomer AddressLine1 120 5259 Prestwick Dr.
DimCustomer AddressLine1 120 6208 Prestwick Dr.
DimCustomer AddressLine1 120 6709 Prestwick Ave
DimCustomer AddressLine1 120 7568 Prestwick Court
DimCustomer AddressLine1 120 8079 Prestwick Drive
DimCustomer AddressLine1 120 9271 Prestwick Ave.
DimCustomer AddressLine1 120 9875 Prestwick Court
ProspectiveBuyer AddressLine1 120 1841 Prestwick Drive
ProspectiveBuyer AddressLine1 120 8079 Prestwick Drive
ProspectiveBuyer AddressLine1 120 9875 Prestwick Court
vTargetMail AddressLine1 120 1588 Prestwick Drive
vTargetMail AddressLine1 120 1841 Prestwick Drive
vTargetMail AddressLine1 120 4152 Prestwick Drive
vTargetMail AddressLine1 120 4998 Prestwick Ave.
vTargetMail AddressLine1 120 5259 Prestwick Dr.
vTargetMail AddressLine1 120 6208 Prestwick Dr.
vTargetMail AddressLine1 120 6709 Prestwick Ave
vTargetMail AddressLine1 120 7568 Prestwick Court
vTargetMail AddressLine1 120 8079 Prestwick Drive
vTargetMail AddressLine1 120 9271 Prestwick Ave.
vTargetMail AddressLine1 120 9875 Prestwick Court


From here I can see that the value ‘Prestwick’ appears in 3 different tables. This gives me a headstart on doing the rest of my research.