KnowledgeBase: January 2015

Use TABLOCK to boost your INSERT INTO … SELECT performance

Something that we do frequently in an analytics application is loading data from one table to another. Personally I recommend SSIS for such tasks. But there are chances SSIS is not suitable, then we usually use INSERT INTO … SELECT.

I don’t recommend SELECT … INTO because you can’t control which file group the destination table goes to. And it is very important when I have to use table partition.

So if we need use T-SQL to move data from one place to another one. It is recommended to use TABLOCK option on the destination table. With a few other constraints, the query can be executed with minimal logging.

The few constraints are:

The database must be in bulk logged recovery model.
The destination table must be empty or without clustered index.
There is no non-clustered index on the destination table.

Maybe it is difficult to match the second and third constraint. But we have an alternative which is partition switch. We can partition the destination table. Load the data from source into an empty table with same schema except the index objects (usually I keep the clustered index). After load, we can do a partition switch.

Here is two sample queries:

INSERT INTO dbo.FactDevicePerformance_History
([TimeKey], [DeviceKey], [reads per sec], [read hits per sec],
[seq reads per sec], [seq read hits per sec], [writes per sec],
[write hits per sec], [Kbytes read per sec], [Kbytes written per sec])
SELECT h.* FROM AIR.dbo.FactDevicePerformance_History h
INNER JOIN AIR.dbo.DimTime t ON h.TimeKey = t.TimeKey
WHERE t.[Date] BETWEEN ‘2011-11-1′ AND ‘2011-11-30′

INSERT INTO dbo.FactDevicePerformance_History WITH (TABLOCK)
([TimeKey], [DeviceKey], [reads per sec], [read hits per sec],
[seq reads per sec], [seq read hits per sec], [writes per sec],
[write hits per sec], [Kbytes read per sec], [Kbytes written per sec])
SELECT h.* FROM AIR.dbo.FactDevicePerformance_History h
INNER JOIN AIR.dbo.DimTime t ON h.TimeKey = t.TimeKey
WHERE t.[Date] BETWEEN ‘2011-1-1′ AND ‘2011-1-31′

You can find the only difference is query hint “TABLOCK”. But first query runs 21 seconds, the second one runs only 8 seconds. First query increase the log file from 0.5MB to 672MB. The second one increase the log file from 0.5MB to 3.37MB. There are 2.3 million records moved in the query.

This is what called minimal logging since SQL Server 2005. Anyway you still need to compare these two approaches before you implement. Because with minimal logging SQL Server will force data pages to be flushed to disk before the transaction commits. So if the IO system for data file is not very fast, or the data pages affected by the operation are not sequentially arranged, you might see worse performance with minimal logging.

KnowledgeBase

Thursday, January 29, 2015

WITH (TABLOCK)

Use TABLOCK to boost your INSERT INTO … SELECT performance

Tuesday, January 27, 2015

Time variable

Tuesday, January 6, 2015

Beginning and End of month calculation