IFormattable: SQL Optimization: Substring Joins

Friday, December 21, 2007

SQL Optimization: Substring Joins

The DBA at my current client has some mad T-SQL skills. He took a naïve join implementation I had written and improved its performance by three orders of magnitude.

My initial query looked something like this:

SELECT A.*, B.* FROM A JOIN B ON A.Message LIKE '%' + B.Identifier + '%'

Obviously this isn't the best situation in the first place. We don't have a clean relation between these two tables, instead we've got to pick out an identifier (GUID/uniqueidentifier) from a larger text field. Notwithstanding some obvious ETL or trigger possibilities, the statement above seems the simplest solution. The only problem is it is slooow.

The DBA had a great optimization which was essentially to tease out the identifier from the field as you would if you were writing a trigger to create a join column. Putting this in your query allows you to join on this derived column in your code. The performance implications are well above what I would have guessed them to be, operating–as I am–from naïveté. Here's the generalized solution based on our example above:

SELECT B.*, J.* FROM B JOIN ( SELECT A2.*, CAST(SUBSTRING(A2.Message, A2.GuidStart, A2.GuidEnd - A2.GuidStart) AS uniqueidentifier) AS JoinGuid FROM ( SELECT A1.*, CHARINDEX('TokenAfterGuid', A1.Message, A1.Start) as GuidEnd FROM ( SELECT A.*, CHARINDEX('TokenBeforeGuid:', A.Message) + LEN('TokenBeforeGuid:') + 1 AS GuidStart FROM A ) AS A1 ) AS A2 ) ON B.Guid = J.JoinGuid

Of course, you would expand the wildcard (*) references. This is really a great technique considering the performance ramifications. Certainly in our case, where the query was for a view, this was a wonderful improvement. Obviously, the best option from a performance stand point would be to tease out the uniqueidentifier in an INSERT/UPDATE trigger, create an index on the new column, and join the tables on that; however, in situations where you don't have the option of doing ETL or triggers this can be useful.

IFormattable

Friday, December 21, 2007

SQL Optimization: Substring Joins

Tags

Previous Articles

Who is IFormattable?

Who's Reading IFormattable?

Subscribe via email