Using OR condition + function in join appears to confuse SQL Server's query optimizer -
i struggling debug performance on particular query. query this:
select count(*) dbo.user d inner join dbo.distinct_first_name dfn on ( [dbo].jw(dfn.first_name, 'john') > 0.8 , (d.first_name = dfn.first_name or d.nick_name = dfn.first_name or d.middle_name = dfn.first_name) )
the query runs jaro winkler filter on distinct first name table (containing approx 15k rows) , inner joins against user table produce result set. defined, takes around 1 minute run approx 500k rows in user table.
here's know:
1) jaro winkler filter instant (0.1s itself)
2) if change user clause include 1 of columns (i.e. remove ors) takes 0.4s
3) if change 3 queries, , run them back, takes approx 2s
4) if change jaro winkler filter 0.99 (so there's 1 result) makes no substantive difference in query execution time
5) if replace jaro winkler filter equality operation (dfn.first_name = 'john') total query time reduced 4s
(all timings on slow virt; real life performance better.)
so, reason, combination of function , ors confusing query optimizer. execution plan not informative; says 90% of query spent on:
<relop nodeid="63" physicalop="clustered index seek" logicalop="clustered index seek" estimaterows="1.69029" estimateio="0.003125" estimatecpu="0.000158859" avgrowsize="17" estimatedtotalsubtreecost="71.4311" tablecardinality="15958" parallel="0" estimaterebinds="448881" estimaterewinds="0.504024" estimatedexecutionmode="row"> <outputlist> <columnreference database="[mydb]" schema="[dbo]" table="[distinct_first_name]" alias="[dfn]" column="first_name" /> </outputlist> <runtimeinformation> <runtimecountersperthread thread="0" actualrows="857936" actualendofscans="859454" actualexecutions="859454" /> </runtimeinformation> <indexscan ordered="1" scandirection="forward" forcedindex="0" forceseek="0" forcescan="0" noexpandhint="0" storage="rowstore">
splitting query option, since in sproc, , can redesign schema little, i'm stumped what's bogging down. ideas?
first of all, both first_name_alphaonly
, nick_name_alphaonly
non-persisted computed columns, cardinality est off, , multiplied.
then, there 857.936 individual clustered index seeks on distinct_first_name
table, , after filter including jw
function applied.
creating indexes on computed columns help. filtering on distinct_first_name
prior join (into #temp table) also. , it's advice on turning ors union alls.
optimizer, afaik, never rearange ors unions itself. believe it's called playing safe.
Comments
Post a Comment