文章出處

簡介

    Microsoft SQL Server在大多數情況下都能夠創建非常高效的執行計劃。但是在某些特定的查詢類型會給查詢優化器帶來麻煩。本篇文章闡述了其中4種情況,這些情況通常包含了需要SQL Server傳遞多個數據集、產生沒有統計信息的中間結果集,抑或是中間結果集的統計信息無法被正確計算。

    將單獨的查詢語句分解為多個語句或是多個步驟,來使得SQl Server生成完全不同的執行計劃,或是為中間結果集創建統計信息。分解語句,而不是使用提示的好處是使得SQL Server可以根據數據的變化,動態的生成高效的執行計劃。

     雖然本篇文章所探討的查詢類型大多是基于ETL或是報表,但同樣也可以在其他類型的查詢中找到。

     本篇文章主要闡述了下述4中查詢模式:

 

Where字句中包含了OR邏輯

    在這種情況下,在OR或Join操作符兩端的條件涉及了不同的表。該類語句可以通過使用UNION操作符來替代WHERE或ON子句中的OR操作符。

 

中間解決集存在聚合

    該類查詢是與聚合后的數據集進行連接,這往往會導致低效的查詢性能。該類問題可以通過將聚合后的查詢結果存入臨時表來解決。

 

大量復雜的Join

    該類查詢模式包含了大量連接,尤其是連接條件是不等連接,由于統計信息隨著表連接的增多精度逐漸下降,這會導致低效的查詢性能。解決這類情況可以通過分解查詢,并將中間解決存入臨時表解決。

 

在Where或Join中包含了Case子句

    該類查詢在WHERE或JOIN子句中包含了CASE子句,這會導致難以有效的利用統計信息,從而造成查詢性能低效。解決該類問題可以使用T-SQL的IF語句來替代CASE子句將語句執行流程提煉出來。

 

    理解上述4種查詢類型的概念可以更好的幫助解決其他類似情況導致的低效查詢。因此可以重寫查詢來獲得更高效穩定的查詢性能。

 

 

 

下列使用OR子句的例子不會引起性能問題.

WHERE a.col1 = @val1 OR a.col1 = @val2 …           

    這種查詢模式僅僅是WHERE COL1 IN (@val1,@val2)的另一種寫法。使用這種模式不會產生低效的查詢計劃。但是,值得注意的是WHERE子句中的每一個值都會使得SQL Server從B-Tree的根節點找到葉子節點。

    該查詢模式的關鍵是OR操作符兩邊的列都會被計算出值。在上述例子中,該列是col1。如果col1上存在索引,則在該索引上執行查找操作就可以滿足所有條件。如果索引足夠小,甚至可能出現索引掃描。同樣的指導原則適用于覆蓋索引和其他索引。

 

WHERE a.col1 = @val1 OR a.col2 = @val2 … 

    在這種查詢模式中,同一個表的兩列中的值需要被估算出來,如果在這兩列上都存在索引,其中一個索引的第一列是COL1,另一個索引的第一列是COL2,如果數據量大,則SQL Server使用索引來獲取數據(如圖1所示)方式為UNION兩個索引查找的值,而如果數據量比較小,則SQL Server會使用掃描。

1

圖1.

 

WHERE a.col1 = @val1 OR a.col2 IN (SELECT col2 FROM tab2)

    通過分析該模式可以看出,SQL Server重寫了IN子句,將IN變為和表2進行了連接,并使用了不同的索引來獲取滿足條件的行。并且在最后對結果集進行排序來消除重復行(如圖2)。這同樣也是高效的執行計劃,對該類語句應用本文所探討的技術并不會產生更好的執行計劃。

2

圖2.

 

會引起問題的OR用法

    在上述例子中,OR操作符兩邊的條件都是在同一個表之內,因此,執行計劃不會出現問題。但如果,OR兩端的操作符引用了不同的表時,則會出現問題。

 

WHERE a.col1 = @val1 OR b.col2 = @val2

    上述查詢所代表的查詢模式是會導致問題的。將該查詢分解為兩步可以生成成本更小的執行計劃。請注意在該查詢模式中涉及到了兩個表。在OR操作符兩端的兩個表的條件中涉及到不同的兩個表。

注意:為了簡便起見,例子中僅僅使用了等號作為條件。然而,即使使用了不等式或諸如BETWEEN的變量作為條件也可以出現相同的情況。

 

    下述是附件1中所描述的會導致問題的示例查詢。

DECLARE @minEmp INT
 
DECLARE @maxEmp INT
 
SET @minEmp= 100
 
SET @maxEmp= 200
 
 
 
SELECT e.* FROM HumanResources.Employee e
 
LEFT JOIN Adventureworks.Person.Contact c ON e.EmployeeID=c.ContactID
 
WHERE e.EmployeeID BETWEEN @minEmp and @maxEmp
 
      OR c.EmailAddress IN('sabria0@adventure-works.com','teresa0@adventure-works.com','shaun0@adventure-works.com')
 
 

代碼清單1.會導致問題的示例查詢

 

    現在可以看到OR兩端的條件涉及到了HumanResources.Employee以及Person.Contact表。現在我們來看在 Person.Contacts上存在的索引。

 

表1.AdventureWork

索引名稱 索引描述 索引鍵
AK_Contact_rowguid 非聚集,唯一 rowguid
IX_Contact_EmailAddress 非聚集 EmailAddress
PK_Contact_ContactID 非聚集,唯一,主鍵 ContactID

 

    在Person.Contact(EmailAddress)上存在的索引可以被用于獲取查詢條件中滿足郵件地址的行。在Employee表上的主鍵和聚集索引是EmployeeID。按理來說SQL Server應該從兩個表上的索引通過低成本的索引查找獲得。然而,執行查詢所獲得的執行計劃卻顯示了對兩個索引使用了索引掃描,如圖4所示。

4

圖4.

 

    此時如果將STATISTIC IO和STATISTIC TIME啟用的話,結果如下:

Table 'Contact'. Scan count 1, logical reads 569, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 1, logical reads 6963, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

代碼清單2.

 

    結果顯示了總共7532個邏輯讀以及94ms的CPU時間。該計劃或許在小系統且執行不頻繁時不會引起注意,然而,如果表中數據增長到一定數量,且查詢的復雜度增長時,該類查詢將會成為將系統資源損耗殆盡的罪魁禍首。

    為了降低查詢成本,可以將該類查詢分為更容易讓SQL Server解釋的多個小查詢。您可以重寫查詢,將Where子句中的條件進行分解,使得單個查詢條件只涉及一個表。

    對于如例子所示的示例,可以很容易看出兩個表進行UNION操作可以替代上述出問題的查詢。如下面代碼所示:

DECLARE @minEmp INT

DECLARE @maxEmp INT

SET @minEmp = 100

SET @maxEmp = 200





SELECT e.*FROM HumanResources.Employee e

LEFT JOIN Adventureworks.Person.Contact c ON e.EmployeeID = c.ContactID

WHERE EmployeeID BETWEEN @minEmp and @maxEmp

UNION

SELECT e.*FROM HumanResources.Employee e

LEFT JOIN Adventureworks.Person.Contact c ON e.EmployeeID = c.ContactID

WHERE

c.EmailAddress in('sabria0@adventure-works.com','teresa0@adventure-works.com','shaun0@adventure-works.com')

代碼清單3.

 

    雖然功能上代碼清單3和代碼清單1一致,但SQL Server對于這兩種查詢語句的處理方式則完全不同。在該查詢計劃中,使用UNION來代替OR。使用UNION之后,SQL Server可以通過索引掃描查找,而不再是索引掃描,因此降低了成本,如圖5所示。

5

圖5.

 

    結果如代碼清單4:

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Employee'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Contact'. Scan count 3, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

代碼清單4.

 

    因此可以很容易看出,邏輯讀下降到26,CPU時間低于1MS。

 

查詢反模式2:由聚合產生的中間結果集

    諸如GROUP BY或DISTINCT之類的關鍵字會產生與原表行數不同的中間結果集。如果這些中間結果集還與其他表或數據集進行連接,則中間結果集上不會存在統計信息。因此SQL Server只能通過原數據集的統計信息估算基數,但是這種方式的精度隨著數據集的增多和查詢復雜度的增加變得越來越不盡人意。任何不準確的基數預估都可能導致低效的查詢計劃。

    如果不準確的基數預估是源于過期的統計信息,則更新表或索引的統計信息就能夠產生更高效的查詢計劃。但是,如果查詢語句需要生成中間結果集以供后續步驟使用的話,這種中間結果集上不會存在統計信息,雖然可以從原數據集中獲取統計信息,但這種方式很難生成高效的執行計劃。本文的附件C和附件D的例子就是這類查詢無法使用統計信息而導致的低效執行計劃。在生產系統中,該類查詢往往出現在ETL或報表作業中。值得注意的是如需做出該類DEMO則需要足夠量的數據。代碼清單5包含了兩個可以作為DEMO來演示該情況和解決方案的查詢。

    下述步驟可以幫您準備好演示。

1.執行下述代碼

/*

Do this all in tempdb to get the auto-cleanup after restart if all is forgotten

*/



USE tempdb

GO

SET NOCOUNT ON

SET STATISTICS IO OFF

SET STATISTICS TIME OFF

GO



/*

Cleanup so multiple runs can be done if necessary

*/



IF OBJECT_ID('dbo.Customers') IS NOT NULL

DROP TABLE dbo.Customers

IF OBJECT_ID('dbo.InternetOrders') IS NOT NULL

DROP TABLE dbo.InternetOrders

IF OBJECT_ID('dbo.StoreOrders') IS NOT NULL

DROP TABLE dbo.StoreOrders

IF OBJECT_ID('dbo.InternetQuotes') IS NOT NULL

DROP TABLE dbo.InternetQuotes

IF OBJECT_ID('dbo.StoreQuotes') IS NOT NULL

DROP TABLE dbo.StoreQuotes

IF OBJECT_ID('dbo.SurveyResults') IS NOT NULL

DROP TABLE dbo.SurveyResults

IF OBJECT_ID('dbo.SurveyDetails') IS NOT NULL

DROP TABLE dbo.SurveyDetails

IF OBJECT_ID('dbo.TransactionType3') IS NOT NULL

DROP TABLE dbo.TransactionType3

IF OBJECT_ID('dbo.TransactionType4') IS NOT NULL

DROP TABLE dbo.TransactionType4

IF OBJECT_ID('dbo.TransactionType5') IS NOT NULL

DROP TABLE dbo.TransactionType5

IF OBJECT_ID('dbo.TransactionType6') IS NOT NULL

DROP TABLE dbo.TransactionType6



/*

create tables for customers, internet orders, and store orders

*/



CREATE TABLE dbo.Customers

(

customerID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, customerName VARCHAR(30)NOT NULL

, otherStuff NCHAR(100) NULL

)

GO



CREATE TABLE dbo.InternetOrders

(

customerID INT NOT NULL

, orderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX InternetOrders_customerID on InternetOrders(customerID) INCLUDE(orderTotal)

CREATE INDEX InternetOrders_OrderDate ON dbo.InternetOrders(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE storeOrders

(

customerID INT NOT NULL

, storeOrderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX storeOrders_customerID ON storeOrders(customerID) INCLUDE(orderTotal)

CREATE INDEX StoreOrders_OrderDate ON dbo.StoreOrders(orderDate) INCLUDE(CustomerID, orderTotal)

GO



CREATE TABLE dbo.InternetQuotes

(

customerID INT NOT NULL

, quoteID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, quoteTotal MONEY NOT NULL

, quoteDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX InternetQuotes_customerID on InternetQuotes(customerID) INCLUDE(quoteTotal)

CREATE INDEX Internetquotes_OrderDate ON dbo.InternetQuotes(quoteDate) INCLUDE(CustomerID, quoteTotal)



GO



CREATE TABLE dbo.StoreQuotes

(

customerID INT NOT NULL

, storeQuoteID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, quoteTotal MONEY NOT NULL

, quoteDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX StoreQuotes_customerID on StoreQuotes(customerID) INCLUDE(quoteTotal)

CREATE INDEX StoreQuotes_OrderDate ON dbo.StoreQuotes(quoteDate) INCLUDE(CustomerID, quoteTotal)



GO



CREATE TABLE dbo.TransactionType3

(

customerID INT NOT NULL

, orderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX TransactionType3_customerID on dbo.TransactionType3(customerID) INCLUDE(orderTotal)

CREATE INDEX TransactionType3_OrderDate ON dbo.TransactionType3(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE TransactionType4

(

customerID INT NOT NULL

, storeOrderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX TransactionType4_customerID ON dbo.TransactionType4(customerID) INCLUDE(orderTotal)

CREATE INDEX TransactionType4_OrderDate ON dbo.TransactionType4(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE dbo.TransactionType5

(

customerID INT NOT NULL

, orderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX TransactionType5_customerID on dbo.TransactionType5(customerID) INCLUDE(orderTotal)

CREATE INDEX TransactionType5_OrderDate ON dbo.TransactionType5(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE TransactionType6

(

customerID INT NOT NULL

, storeOrderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX TransactionType6_customerID ON dbo.TransactionType6(customerID) INCLUDE(orderTotal)

CREATE INDEX TransactionType6_OrderDate ON dbo.TransactionType6(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE dbo.SurveyResults

(

contactID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, customerID INT NULL

, partnerID INT NULL

, aggResults TINYINT NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX SurveyReults_CustomerID ON dbo.SurveyResults(CustomerID)

GO

CREATE TABLE dbo.SurveyDetails

(

surveyID INT NOT NULL

, questionNbr TINYINT NOT NULL

, customerID INT NOT NULL

, rating TINYINT NOT NULL

, surveyDate DATETIME NOT NULL

, verbatim NCHAR(500) NULL

)

GO



/*

Populate these tables to form the first part of the query

*/



CREATE TABLE #firstNamePart

(

namePart NVARCHAR(14)

)

GO

CREATE TABLE #secondNamePart

(

namePart NVARCHAR(14)

)



INSERT INTO #firstNamePart VALUES (N'Some')

INSERT INTO #firstNamePart VALUES (N'Another')

INSERT INTO #firstNamePart VALUES (N'Different')

INSERT INTO #firstNamePart VALUES (N'Contoso')

INSERT INTO #firstNamePart VALUES (N'Similar')

INSERT INTO #firstNamePart VALUES (N'Dissimilar')

INSERT INTO #firstNamePart VALUES (N'My')

INSERT INTO #firstNamePart VALUES (N'Your')

INSERT INTO #firstNamePart VALUES (N'Their')

INSERT INTO #firstNamePart VALUES (N'Somebody''s')

INSERT INTO #firstNamePart VALUES (N'This')

INSERT INTO #firstNamePart VALUES (N'That')

INSERT INTO #firstNamePart VALUES (N'Varied')



INSERT INTO #secondNamePart VALUES (N'Inc.')

INSERT INTO #secondNamePart VALUES (N'LLC')

INSERT INTO #secondNamePart VALUES (N'Hobby')

INSERT INTO #secondNamePart VALUES (N'Unlimited')

INSERT INTO #secondNamePart VALUES (N'Limited')

INSERT INTO #secondNamePart VALUES (N'Musings')

INSERT INTO #secondNamePart VALUES (N'Manufacturing')

INSERT INTO #secondNamePart VALUES (N'Exploration')

INSERT INTO #secondNamePart VALUES (N'Enterprise')

INSERT INTO #secondNamePart VALUES (N'Services')

INSERT INTO #secondNamePart VALUES (N'Attempts')

INSERT INTO #secondNamePart VALUES (N'Dreams')

INSERT INTO #secondNamePart VALUES (N'Ideas')



-- populate customer



INSERT INTO dbo.Customers(customerName, otherStuff)

SELECT a.namePart +N' '+ b.namePart,N'otherStuff'

FROM #firstNamePart a CROSS JOIN #secondNamePart b



INSERT INTO dbo.Customers(customerName, otherStuff)

SELECT a.namePart +N' '+ b.namePart,N'otherStuff'

FROM #firstNamePart a CROSS JOIN #secondNamePart b



GO

DROP TABLE #firstNamePart

DROP TABLE #secondNamePart

GO



-- populate the internetOrders and storeOrders tables:



DECLARE @customerID INT -- as we go through

DECLARE @orderTotal MONEY

DECLARE @orderDate DATETIME

DECLARE @numRecords SMALLINT

DECLARE @ct SMALLINT



DECLARE crs CURSOR FOR SELECT customerID from dbo.Customers

OPEN crs

FETCH NEXT FROM crs INTO @customerID



WHILE @@FETCH_STATUS= 0

BEGIN

-- internet orders

SET @numRecords =RAND()* 10000

SET @ct = 0

WHILE @ct < @numRecords

BEGIN

SET @orderTotal =RAND()* 10000

SET @orderDate =DATEADD(dd,RAND()* 1500,'2008-01-01 00:00:00.000')

INSERT INTO dbo.InternetOrders(customerID, orderTotal, orderDate, otherDetails)

VALUES (@customerID, @orderTotal, @orderDate,'Other Details')

SET @ct = @ct + 1

END

-- set up store orders

SET @numRecords =RAND()* 1000

SET @ct = 0

WHILE @ct < @numRecords

BEGIN

SET @orderTotal =RAND()* 10000

SET @orderDate =DATEADD(dd,RAND()* 1500,'2008-01-01 00:00:00.000')

INSERT INTO dbo.StoreOrders(customerID, orderTotal, orderDate, otherDetails)

VALUES (@customerID, @orderTotal, @orderDate,'Other Details')

SET @ct = @ct + 1

END

INSERT INTO dbo.SurveyResults(customerID, aggResults, otherDetails)

VALUES (@customerID, @customerID % 5,N'Other Details')

FETCH NEXT FROM crs INTO @customerID

END



CLOSE CRS

DEALLOCATE CRS



/*

Populate the quote tables with sample data by duplicating the sales data

Also populate TransactionType3 and TransactionType4

*/



INSERT INTO dbo.InternetQuotes(customerID, quoteDate, quoteTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.InternetOrders



INSERT INTO dbo.StoreQuotes(customerID, quoteDate, quoteTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.storeOrders



INSERT INTO dbo.TransactionType3(customerID, orderDate, orderTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.InternetOrders



INSERT INTO dbo.TransactionType4(customerID, orderDate, orderTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.storeOrders



INSERT INTO dbo.TransactionType5(customerID, orderDate, orderTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.InternetOrders



INSERT INTO dbo.TransactionType6(customerID, orderDate, orderTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.storeOrders

GO



/*

Populate SurveyDetails with sample data for 50 questions



*/



DECLARE @questionNbr TINYINT

DECLARE @surveyID INT



SET @questionNbr = 1

WHILE @questionNbr < 51

BEGIN

INSERT INTO dbo.SurveyDetails(surveyID, questionNbr, customerID, rating, surveyDate, verbatim)

SELECT 1, @questionNbr, customerID, customerID % 5,'2008-01-01',N'Feedback from the customer'

FROM dbo.Customers

INSERT INTO dbo.SurveyDetails(surveyID, questionNbr, customerID, rating, surveyDate, verbatim)

SELECT 2, @questionNbr, customerID, customerID % 5,'2008-01-01',N'Feedback from the customer'

FROM dbo.Customers



SET @questionNbr = @questionNbr + 1

END

GO



/*

Update all statistics to be sure they are all in the best possible shape

*/



UPDATESTATISTICS dbo.Customers WITHFULLSCAN

UPDATESTATISTICS dbo.InternetOrders WITHFULLSCAN

UPDATESTATISTICS dbo.storeOrders WITHFULLSCAN

UPDATESTATISTICS dbo.InternetQuotes WITHFULLSCAN

UPDATESTATISTICS dbo.StoreQuotes WITHFULLSCAN

UPDATESTATISTICS dbo.TransactionType3 WITHFULLSCAN

UPDATESTATISTICS dbo.TransactionType4 WITHFULLSCAN

UPDATESTATISTICS dbo.TransactionType5 WITHFULLSCAN

UPDATESTATISTICS dbo.TransactionType6 WITHFULLSCAN

UPDATESTATISTICS dbo.SurveyResults WITHFULLSCAN

代碼清單5.

 

2.分別在兩個窗口執行代碼清單6和代碼清單7

set statistics io on 
set statistics time on

SELECT T1.customerName, R.ContactID, R.AggResults

, D.surveyId, D.questionNbr, D.rating, D.verbatim

FROM

(

SELECT Tab1.customerID, Tab1.customerName, Tab1.StoreOrderTotal, Tab1.InternetOrderTotal

, Tab1.TotalOrders, Tab2.InternetQuoteTotal, Tab2.StoreQuoteTotal, Tab2.TotalQuote

FROM

(

SELECT A.customerID, a.customerName, a.orderTotal as InternetOrderTotal, b.orderTotal as StoreOrderTotal,

TotalOrders = a.orderTotal + b.orderTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.InternetOrders i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.StoreOrders s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.orderTotal + b.orderTotal > 100000.00

) Tab1 JOIN

(

SELECT A.customerID, a.customerName, a.quoteTotal as InternetQuoteTotal, b.quoteTotal as StoreQuoteTotal,

TotalQuote = a.quoteTotal + b.quoteTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.quoteTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.InternetQuotes i ON c.customerID = i.customerID

WHERE i.quoteDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.quoteTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.quoteTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.StoreQuotes s ON c.customerID = s.customerID

WHERE s.quoteDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.quoteTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.quoteTotal + b.quoteTotal > 100000.00

) Tab2 ON Tab1.customerID = Tab2.customerID

) T1 JOIN

(

SELECT Tab1.customerID, Tab1.customerName, Tab1.StoreOrderTotal, Tab1.InternetOrderTotal

, Tab1.TotalOrders, Tab2.InternetQuoteTotal, Tab2.StoreQuoteTotal, Tab2.TotalQuote

FROM

(

SELECT A.customerID, a.customerName, a.orderTotal as InternetOrderTotal, b.orderTotal as StoreOrderTotal,

TotalOrders = a.orderTotal + b.orderTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.TransactionType3 i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.TransactionType4 s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.orderTotal + b.orderTotal > 100000.00

) Tab1 JOIN

(

SELECT A.customerID, a.customerName, a.quoteTotal as InternetQuoteTotal, b.quoteTotal as StoreQuoteTotal,

TotalQuote = a.quoteTotal + b.quoteTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.TransactionType5 i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.TransactionType6 s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.quoteTotal + b.quoteTotal > 100000.00

) Tab2 ON Tab1.customerID = Tab2.customerID

) T2 ON T1.customerID = T2.customerID

LEFT OUTER JOIN dbo.SurveyResults R on T1.customerID = R.customerID

LEFT OUTER JOIN dbo.SurveyDetails D on T1.customerID = D.customerID

WHERE T1.TotalOrders > 10000.00 AND T2.TotalQuote > 100000.00

代碼清單6.

SELECT T1.customerID, T1.customerName 

INTO #temp

FROM

(

SELECT Tab1.customerID, Tab1.customerName, Tab1.StoreOrderTotal, Tab1.InternetOrderTotal

, Tab1.TotalOrders, Tab2.InternetQuoteTotal, Tab2.StoreQuoteTotal, Tab2.TotalQuote

FROM

(

SELECT A.customerID, a.customerName, a.orderTotal as InternetOrderTotal, b.orderTotal as StoreOrderTotal,

TotalOrders = a.orderTotal + b.orderTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.InternetOrders i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.StoreOrders s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.orderTotal + b.orderTotal > 100000.00

) Tab1 JOIN

(

SELECT A.customerID, a.customerName, a.quoteTotal as InternetQuoteTotal, b.quoteTotal as StoreQuoteTotal,

TotalQuote = a.quoteTotal + b.quoteTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.quoteTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.InternetQuotes i ON c.customerID = i.customerID

WHERE i.quoteDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.quoteTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.quoteTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.StoreQuotes s ON c.customerID = s.customerID

WHERE s.quoteDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.quoteTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.quoteTotal + b.quoteTotal > 100000.00

) Tab2 ON Tab1.customerID = Tab2.customerID

) T1 JOIN

(

SELECT Tab1.customerID, Tab1.customerName, Tab1.StoreOrderTotal, Tab1.InternetOrderTotal

, Tab1.TotalOrders, Tab2.InternetQuoteTotal, Tab2.StoreQuoteTotal, Tab2.TotalQuote

FROM

(

SELECT A.customerID, a.customerName, a.orderTotal as InternetOrderTotal, b.orderTotal as StoreOrderTotal,

TotalOrders = a.orderTotal + b.orderTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.TransactionType3 i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.TransactionType4 s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.orderTotal + b.orderTotal > 100000.00

) Tab1 JOIN

(

SELECT A.customerID, a.customerName, a.quoteTotal as InternetQuoteTotal, b.quoteTotal as StoreQuoteTotal,

TotalQuote = a.quoteTotal + b.quoteTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.TransactionType5 i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.TransactionType6 s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.quoteTotal + b.quoteTotal > 100000.00

) Tab2 ON Tab1.customerID = Tab2.customerID

) T2 ON T1.customerID = T2.customerID

WHERE T1.TotalOrders > 10000.00 AND T2.TotalQuote > 100000.00











SELECT T1.customerName, R.ContactID, R.AggResults, D.surveyId, d.QuestionNbr, D.Rating, D.verbatim FROM #temp T1

LEFT OUTER JOIN dbo.SurveyResults R on T1.customerID = R.customerID

LEFT OUTER JOIN dbo.SurveyDetails D on T1.customerID = D.customerID





drop table #temp

代碼清單7.

 

3.啟用IO和TIME計數

4.包含實際的執行計劃

 

    在完成代碼清單5后,運行代碼清單6兩次。第一次執行會包含解析,執行時間以及物理讀的時間,因此主要觀察第二次執行時間,在第二次執行結束后,查看實際的執行計劃。對其中估計的行數和實際的行數進行對比。

    值得注意的是,在數據流的開始,實際的行數和估計的行數相差很小。隨著每一個數據集集和聚合的中間結果集做連接后,估計的行數變得越來越不準確。

    隨著由右向左的數據流向觀察,直至最左邊,你會發現估計的行數急劇下降,雖然實際的行數卻沒有變。最終,估計的行數變為1,而實際的行數是300,如圖6所示。

6

圖6.

 

    當SQL Server低估了操作符中需要處理的數據集的行數時,會導致查詢分析器做出次優的決定,比如說低效的連接順序。最簡單的例子是,錯誤的低估行數會導致中間結果集作為loop join的外層表,見圖7。

7

圖7.

    在循環嵌套連接中,對應外層表的每一行,都需要在內存表執行相應的操作。在本例中,SQL Server選擇循環嵌套連接是由于外層結果集的估計行數是1。這意味著僅僅需要在內層結果集執行一次操作。低估行數導致性能噩夢是由于當內部操作是索引或表掃描時出現,如圖8所示。此時估計的行數是1,而實際的行數則為334。

    8

圖8.

 

    因為上述估計的不準確,SQL Server認為做掃描而不是使用索引查找或其他連接策略是最優的選擇。如果統計信息準確,則對SurveyDetails表的掃描只占了18%的執行成本,但實際上該表掃描執行了334次而不是估計的1次。因此該成本要遠高于預計的值。在執行計劃編譯時所有的統計信息都是最新,但sql server同時也需要中間結果集來做出最有的選擇。

 

解決該類問題

    為了解決上述問題,可以使用臨時表存儲中間結果集來使得SQL Server計算統計信息后重編譯。

注意:這里必須使用臨時表,不能使用表變量。因為在臨時表中插入數據后就為數據創建統計信息。但表變量中沒有這些統計信息。但如果表變量是在option(recompile)中使用的,那么表變量的統計信息會在編譯時獲得。

     代碼清單7和代碼清單6得到的結果是一樣的,但方式不同。首先是select…into,這種方式將聚合后的數據存入臨時表。一旦結果集保存到臨時表后,SQL Server就可以創建和更新統計信息并且重編譯執行計劃來使得代碼清單6中低效執行計劃的出現。然后接下來的查詢用于獲得最終結果,該查詢會將臨時表中的數據和其他兩個表進行連接,如圖9所示。

9

圖9.

    可以看出,SQL Server分為多個步驟來使得查詢更加高效,最主要的事將物理連接類型變為hash join,從而避免了多次掃描。

注意:隨著查詢復雜度的增加,有很多種不同的方式可以分解查詢來提高效率。比如說,聚合后的數據可以插入臨時表,你可以查看執行計劃來找到實際行數和估計行數嚴重不符的部分,最好的選擇是估計只有一行,但實際有多行的部分。

 

性能提升是否明顯?

    由于測試數據的隨機性,下面測試結果的數字或許會略微不同,但性能的提升一定會很明顯。表2展示了IO和CPU時間的差異。

表2.對比兩個查詢的性能

測試標準 第一個查詢 第二個查詢 性能提升百分比
邏輯讀 1,622,398 11,685 99.90%
CPU時間 4,914 1,139 76.82%
消耗時間 5,278 2,803 46.89%

 

    數據越多,查詢越復雜,則性能提升越明顯。上述數據已經足夠證明分解查詢帶來的性能提升。

    雖然上述例子中的聚合數據集是來自衍生表,但同樣也可以來自CTE和邏輯視圖。通過同樣的步驟,使用臨時表存儲中間結果集就可以利用統計信息來生成高效的執行計劃。

 

其他辦法

    代碼清單6中的查詢提示了索引缺失,創建索引可以消除表掃描從而提升性能,至少在少量數據集中是這樣,這類查詢通常來說是用于ETL或報表,執行的次數往往很少,因此創建和維護索引的成本需要考慮,因此需要考慮trade off。

    另一種辦法是使用hash join提示,但這并不總是最好的選擇。使用臨時表存儲中間結果集允許查詢優化器根據數據的大小選擇最好的執行計劃。這意味著可以獲得更穩定的性能收益,而使用連接提示則性能變得不穩定。

    還有一種辦法是為中間結果集創建索引視圖,此時統計信息在索引視圖上進行維護。但是,索引視圖在數據修改時的開銷也不小,這也是鎖爭搶的來源之一,因此需要將多種可能的解決方案進行評估。

 

其他需要分解的復雜查詢

    需要分解的復雜查詢不僅僅包含之前討論的兩種,雖然查詢優化器一直在提升,但仍然可能某些特定查詢需要分解來獲得性能提升。

包含大量連接的查詢

    隨著每次連接,初始數據集中的數據不斷被過濾,因此在估計基數時引入了更多因此,每次單表數據中出現的小異常都會隨著連接的增多被放大。

    因此大量連接的查詢應該被分解為多個查詢,將中間結果存入臨時表,然后使用臨時表和數據集進行連接。

在WHERE和Join子句中出現的CASE子句

    這類查詢的一個例子包括“WHERE col1 = CASE @val1 WHEN 1 THEN ‘val1’ WHEN 2 THEN ‘val2’ WHEN 3 THEN col1”這樣的子句使得獲取估計行數變得困難,甚至不可能。任何時候估計的行數不準確都會導致低效的查詢計劃。

    使用T-SQL的條件語句IF…ELSE來將這類查詢分解為多個查詢可以獲得更準確的估計行數。

 

小結

    通常來說,很少需要只使用一個復雜的查詢語句獲取所有結果。在諸如需要連接聚合結果的情況下,在單獨一個查詢中完成所有的步驟會導致不準確的預計基數,基數估計的偏差會導致低效的查詢,在單獨一個查詢語句中,SQL Server無法再查詢執行過程中暫停,在中間結果集上計算統計信息,并隨之調整執行計劃。

    將查詢分解并將中間結果集存入臨時表使得SQL Server可以做到單獨查詢中做不到的事,即為中間結果集創建統計信息并根據這些信息重新生成執行計劃。使用UNION和IF邏輯可使得SQL Server生成更高效的執行計劃來得到同樣的結果。

    在SQL Server查詢語句的設計中,高性能不僅取決于獲取到的結果,還取決于如何獲取結果。雖然查詢優化器在大多數情況下能夠很好的完成工作,但也有些情況下無法計算出統計信息,因此需要改變獲取數據的方式從而獲得穩定的高性能。本篇文章討論了將大且復雜的語句分解為多個部分所需的技術。

 

 

---------------------------------------------------

原文鏈接:http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx


文章列表


不含病毒。www.avast.com
arrow
arrow
    全站熱搜
    創作者介紹
    創作者 大師兄 的頭像
    大師兄

    IT工程師數位筆記本

    大師兄 發表在 痞客邦 留言(0) 人氣()