Thursday, August 7, 2008

Adding Regular Expressions (Regex) to SQL Server 2005

It is very easy to add Regular Expressions or other custom functionality to SQL Server 2005 because it support CLR. This means all you have to do it create a custom Assembly, mark it up with some attributes (similar to the way you do web services), and click the deploy button, and that is it. Microsoft really does make it easy using Visual Studio 2005 (and 2008 I assume). All you have to do it create a SQL Server Project. You can use VB.NET or C#, just pick the appropriate SQL Server Project. It pretty much walks you through creating the project. After it is done, you will need to right-click the project | Add | User-Defined Function... Give it whatever name you want. It gives you a simple stub. Just build and deploy. It deploys the assembly to the database it helps you with initially, and makes User-Defined functions (that call the assembly). You can then call your function like any other User-Defined function. The name and parameters show up in the User-Defined functions section under Database | Programmability | Functions | Scalar-valued Functions. It was also recommended by someone (see references) to in execute the following SQL (I only did it the first time I deployed) to enable CLR and install required support. sp_configure 'clr enabled',1 reconfigure There is one VERY important thing you need add to any method you want to be able to access from SQL. You must add the attribute [SqlFunction]. The method must also be public and static I believe. The parameters and return value have to be SQL types like: SqlChars, SqlString, SqlInt32, etc. You can use standard C# and VB.NET types everywhere within your method, but the parameters and return value MUST be SQL types. Below is my implementation (or at least part what I wrote and part an adaptation of parts from what other people wrote... see references) of three key Regular Expression methods I think are very useful.
  • RegexMatch - returns 1 if pattern can be found in input, else 0
  • RegexReplace - replaces all matches in input with a specified string
  • RegexSelectOne - returns the first, second, third, etc match that can be found in the input
  • RegexSelectAll - returns all matches delimited by separator that can be found in the input
Examples of how to use them in SQL:
  • select dbo.RegexMatch( N'123-45-6749', N'^\d{3}-\d{2}-\d{4} Returns 1 in this case since the phone number pattern is matched
  • select dbo.RegExReplace('Remove1All3Letters7','[a-zA-Z]','') Returns 137 since all alpha characters where replaced with no character
  • select dbo.RegexSelectOne('123-45-6749xxx222-33-4444', '\d{3}-\d{2}-\d{4}', 0) Returns 123-45-6789 since first match was specifed. If last parameter was 1 then the second match (222-33-4444) would be returned.
  • select dbo.RegexSelectAll('123-45-6749xxx222-33-4444', '\d{3}-\d{2}-\d{4}', '|') Returns 123-45-6749|222-33-4444
The actual implementation of this is nothing special other than the conversion of SQL Types. The complete source code (no project since it is so specific to your environment) to the implementation is available for download here.

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Text;

public partial class UserDefinedFunctions
{

    public static readonly RegexOptions Options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline;

    [SqlFunction]
    public static SqlBoolean RegexMatch(SqlChars input, SqlString pattern)
    {
        Regex regex = new Regex(pattern.Value, Options);
        return regex.IsMatch(new string(input.Value));
    }

    [SqlFunction]
    public static SqlString RegexReplace(SqlString expression, SqlString pattern, SqlString replace)
    {
        if (expression.IsNull || pattern.IsNull || replace.IsNull)
            return SqlString.Null;

        Regex r = new Regex(pattern.ToString());
        return new SqlString(r.Replace(expression.ToString(), replace.ToString()));
    }

    // returns the matching string. Results are separated by 3rd parameter
    [SqlFunction]
    public static SqlString RegexSelectAll(SqlChars input, SqlString pattern, SqlString matchDelimiter)
    {
        Regex regex = new Regex(pattern.Value, Options);
        Match results = regex.Match(new string(input.Value));

        StringBuilder sb = new StringBuilder();
        while (results.Success)
        {
            sb.Append(results.Value);

            results = results.NextMatch();

            // separate the results with newline|newline
            if (results.Success)
            {
                sb.Append(matchDelimiter.Value);
            }
        }

        return new SqlString(sb.ToString());

    }

    // returns the matching string
    // matchIndex is the zero-based index of the results. 0 for the 1st match, 1, for 2nd match, etc
    [SqlFunction]
    public static SqlString RegexSelectOne(SqlChars input, SqlString pattern, SqlInt32 matchIndex)
    {
        Regex regex = new Regex(pattern.Value, Options);
        Match results = regex.Match(new string(input.Value));

        string resultStr = "";
        int index = 0;

        while (results.Success)
        {
            if (index == matchIndex)
            {
                resultStr = results.Value.ToString();
            }

            results = results.NextMatch();
            index++;

        }

        return new SqlString(resultStr);

    }

};

What I used to get started. http://weblogs.sqlteam.com/jeffs/archive/2007/04/27/SQL-2005-Regular-Expression-Replace.aspx How to optimize regex calls: http://blogs.msdn.com/sqlclr/archive/2005/06/29/regex.aspx More Information: http://msdn.microsoft.com/en-us/magazine/cc163473.aspx I found this after writing this, but it explains other details I did not. http://www.codeproject.com/KB/string/SqlRegEx.aspx?display=PrintAll

275 comments:

«Oldest   ‹Older   401 – 275 of 275
«Oldest ‹Older   401 – 275 of 275   Newer› Newest»