Thursday, August 7, 2008

Adding Regular Expressions (Regex) to SQL Server 2005

It is very easy to add Regular Expressions or other custom functionality to SQL Server 2005 because it support CLR. This means all you have to do it create a custom Assembly, mark it up with some attributes (similar to the way you do web services), and click the deploy button, and that is it. Microsoft really does make it easy using Visual Studio 2005 (and 2008 I assume). All you have to do it create a SQL Server Project. You can use VB.NET or C#, just pick the appropriate SQL Server Project. It pretty much walks you through creating the project. After it is done, you will need to right-click the project | Add | User-Defined Function... Give it whatever name you want. It gives you a simple stub. Just build and deploy. It deploys the assembly to the database it helps you with initially, and makes User-Defined functions (that call the assembly). You can then call your function like any other User-Defined function. The name and parameters show up in the User-Defined functions section under Database | Programmability | Functions | Scalar-valued Functions. It was also recommended by someone (see references) to in execute the following SQL (I only did it the first time I deployed) to enable CLR and install required support. sp_configure 'clr enabled',1 reconfigure There is one VERY important thing you need add to any method you want to be able to access from SQL. You must add the attribute [SqlFunction]. The method must also be public and static I believe. The parameters and return value have to be SQL types like: SqlChars, SqlString, SqlInt32, etc. You can use standard C# and VB.NET types everywhere within your method, but the parameters and return value MUST be SQL types. Below is my implementation (or at least part what I wrote and part an adaptation of parts from what other people wrote... see references) of three key Regular Expression methods I think are very useful.
  • RegexMatch - returns 1 if pattern can be found in input, else 0
  • RegexReplace - replaces all matches in input with a specified string
  • RegexSelectOne - returns the first, second, third, etc match that can be found in the input
  • RegexSelectAll - returns all matches delimited by separator that can be found in the input
Examples of how to use them in SQL:
  • select dbo.RegexMatch( N'123-45-6749', N'^\d{3}-\d{2}-\d{4} Returns 1 in this case since the phone number pattern is matched
  • select dbo.RegExReplace('Remove1All3Letters7','[a-zA-Z]','') Returns 137 since all alpha characters where replaced with no character
  • select dbo.RegexSelectOne('123-45-6749xxx222-33-4444', '\d{3}-\d{2}-\d{4}', 0) Returns 123-45-6789 since first match was specifed. If last parameter was 1 then the second match (222-33-4444) would be returned.
  • select dbo.RegexSelectAll('123-45-6749xxx222-33-4444', '\d{3}-\d{2}-\d{4}', '|') Returns 123-45-6749|222-33-4444
The actual implementation of this is nothing special other than the conversion of SQL Types. The complete source code (no project since it is so specific to your environment) to the implementation is available for download here.

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Text;

public partial class UserDefinedFunctions
{

    public static readonly RegexOptions Options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline;

    [SqlFunction]
    public static SqlBoolean RegexMatch(SqlChars input, SqlString pattern)
    {
        Regex regex = new Regex(pattern.Value, Options);
        return regex.IsMatch(new string(input.Value));
    }

    [SqlFunction]
    public static SqlString RegexReplace(SqlString expression, SqlString pattern, SqlString replace)
    {
        if (expression.IsNull || pattern.IsNull || replace.IsNull)
            return SqlString.Null;

        Regex r = new Regex(pattern.ToString());
        return new SqlString(r.Replace(expression.ToString(), replace.ToString()));
    }

    // returns the matching string. Results are separated by 3rd parameter
    [SqlFunction]
    public static SqlString RegexSelectAll(SqlChars input, SqlString pattern, SqlString matchDelimiter)
    {
        Regex regex = new Regex(pattern.Value, Options);
        Match results = regex.Match(new string(input.Value));

        StringBuilder sb = new StringBuilder();
        while (results.Success)
        {
            sb.Append(results.Value);

            results = results.NextMatch();

            // separate the results with newline|newline
            if (results.Success)
            {
                sb.Append(matchDelimiter.Value);
            }
        }

        return new SqlString(sb.ToString());

    }

    // returns the matching string
    // matchIndex is the zero-based index of the results. 0 for the 1st match, 1, for 2nd match, etc
    [SqlFunction]
    public static SqlString RegexSelectOne(SqlChars input, SqlString pattern, SqlInt32 matchIndex)
    {
        Regex regex = new Regex(pattern.Value, Options);
        Match results = regex.Match(new string(input.Value));

        string resultStr = "";
        int index = 0;

        while (results.Success)
        {
            if (index == matchIndex)
            {
                resultStr = results.Value.ToString();
            }

            results = results.NextMatch();
            index++;

        }

        return new SqlString(resultStr);

    }

};

What I used to get started. http://weblogs.sqlteam.com/jeffs/archive/2007/04/27/SQL-2005-Regular-Expression-Replace.aspx How to optimize regex calls: http://blogs.msdn.com/sqlclr/archive/2005/06/29/regex.aspx More Information: http://msdn.microsoft.com/en-us/magazine/cc163473.aspx I found this after writing this, but it explains other details I did not. http://www.codeproject.com/KB/string/SqlRegEx.aspx?display=PrintAll

6 comments:

Adam said...

Just geeks, I'm hoping for some help!!

Everything I've read about CLR says it's super easy. I've downloaded the C# code from Microsoft (http://msdn.microsoft.com/en-us/magazine/cc163473.aspx), but my Visual Studio 2005 doesn't have the option to create a SQL Server Project or even the "Languages" drop down on the left.

Do I need to install C# in VS somehow?!

Thanks,
Adam

Brent V said...

Adam,

Did you install Visual Studio 2005 explicitly? When you do you select the languages you want to have installed. By default C# is one of them. What version of Visual Studio 2005 are you running? You can check it by choosing About Microsoft Visual Studio under the Help menu (which is one of the menus in Visual Studio 2005). You should have something like Professional Edition or maybe you downloaded the FREE Express version.

My guess is that you still need to install a full version of Visual Studio 2005. I suspect you are probably looking at the Visual Studio 2005 that gets installed with MS SQL Server 2005. This is NOT what you need. It is really just a stripped down version of VS2005, and does not have support for C#, websites, etc that the full versions have.

You can if you have the MS SQL Server version of VS2005 installed by doing the same as I described above, but you will see that the name is Microsoft Visual Studio 2005, but it will not say anything else like Express or Professional Edition. You will also only see 3 products installed in that same window. You will likely see, SQL Server Analysis Services, SQL Server Integration Services, SQL Server Reporting Services. If C# was included in the installation, you would see Microsoft Visual C# 2005 in this same list.

Assuming I am correct, I recommend VS2005 Professional Edition since that is what I am using and I know it works. You can try the Visual C# 2008 Express Edition, but I have not tried it and suspect it does not support the SQL Server project type. You can download it here (http://www.microsoft.com/eXPress/download/). In theory, you could do this from any Library project I would imagine, but deploying manually is not for the faint of heart and I would not recommend it for someone not familiar with how to do so unless you find some docs on it. Visual Studio Profession 2005 does the deployment with a very nice user interface.

Let me know how it goes.

I hope you find this useful.

Brent

Anonymous said...

You don't need Studio to register. Use the Express version, create a Dll project called RegExForSQL. Add a class file. Replace the class with the UserDefinedFunctions above.
Compile the project

Copy the dll to an installation location. For instance c:\RegExForSQL

Use the following to register the DLL in you SQL database update the bold areas before running.



EXEC sp_configure 'clr enabled', '1'

GO

reconfigure
Use [DatabaseName]

GO
CREATE ASSEMBLY asmRexExp from 'C:\RegExForSQL\RegExForSQL.dll' WITH
PERMISSION_SET = SAFE



GO

--RegexMatch(SqlChars input, SqlString pattern) CREATE FUNCTION
USE [ScratchPad]
GO

/****** Object: UserDefinedFunction [dbo].[RegexMatch] Script Date:
11/03/2009 18:34:40 ******/
CREATE FUNCTION [dbo].[RegexMatch](@input [nvarchar](max), @pattern
[nvarchar](max))
RETURNS [bit] WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [asmRexExp].[RegExForSQL.UserDefinedFunctions].[RegexMatch]
GO


GO




/* Returns a comma separated string of found objects
* Example usage SELECT [Message], dbo.RegexReplace(
'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|
[01]?[0-9][0-9]?)\b', [Message], 'new text') as [NewText] from tblSample
* C# function --SqlString RegexReplace(SqlString expression, SqlString
pattern, SqlString replace)
*
*/
USE [ScratchPad]
GO

/****** Object: UserDefinedFunction [dbo].[RegexReplace] Script Date:
11/03/2009 18:34:20 ******/
CREATE FUNCTION [dbo].[RegexReplace](@expression [nvarchar](max), @pattern
[nvarchar](max), @replace [nvarchar](max))
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [asmRexExp].[RegExForSQL.UserDefinedFunctions].[RegexReplace]
GO


GO



/* Returns a comma separated string of found objects
* Example usage SELECT [Message], dbo.RegexSelectAll([Message],
'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[
01]?[0-9][0-9]?)\b', ',') as [IPAddress] from tblSample
* C# function --SqlString RegexSelectAll(SqlChars input, SqlString pattern,
SqlString matchDelimiter)
*
*/

GO

/****** Object: UserDefinedFunction [dbo].[RegexSelectAll] Script Date:
11/03/2009 18:34:00 ******/
CREATE FUNCTION [dbo].[RegexSelectAll](@input [nvarchar](max), @pattern
[nvarchar](max), @matchDelimiter [nvarchar](max))
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME
[asmRexExp].[RegExForSQL.UserDefinedFunctions].[RegexSelectAll]
GO


GO




/* Returns finding matchIndex of a zero based index
* Example usage SELECT [Message], dbo.RegexSelectOne([Message],
'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[
01]?[0-9][0-9]?)\b', 0) as [IPAddress] from tblSample
* C# function --SqlString RegexSelectOne(SqlChars input, SqlString pattern,
SqlInt32 matchIndex)
*
*/

GO

/****** Object: UserDefinedFunction [dbo].[RegexSelectOne] Script Date:
11/03/2009 18:33:34 ******/
CREATE FUNCTION [dbo].[RegexSelectOne](@input [nvarchar](max), @pattern
[nvarchar](max), @matchIndex [int])
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME
[asmRexExp].[RegExForSQL.UserDefinedFunctions].[RegexSelectOne]
GO

Brent V said...

Anonymous,

Thank you very much for sharing that wealth of knowledge. I knew it was possible, but I never had a need to figure out all the code. Good work!

Thank you,

Brent

Daniel said...

Thanks for the informative post and the links posted.

Darrell Robert Parker said...

Great stuff, got me past a roadblock with using regular sql functions to do a complex match.

Turned out that Visual Studio 2008 was also giving me issues with the server explorer, used devenv /setup in Visual studio command prompt to fix that and then your code worked perfectly.

I will be loading up my sql database with a bunch of regular expression functions after this.

Thanks again