Caching result sets

(For Sam) I wanted to performance tune a stored-procedure that was just one big SELECT statement (used to return all current Orders).

The code was just about as optimum as it could get, and returned around 8,000 rows each time, taking about 35 seconds to do so.

I saved the output over a few consecutive days and noticed (crucially) that most of the rows were the same each day.

My big-idea then, was to pre-cache (and pre-format) the results on “Day One”, and just append new rows to that going forward.

The final working stored-procedure contained 5 labeled areas:-

 - (1. Create and fill a cache-table if there isn't one)
 - 2. Save a thin version of the current data to a temp-table
 - 3. Add only NEW data to the cache-table
 - 4. Remove DELETED data from the cache-table
 - 5. Output the cache-table

1. If the cache-table didn’t exist, run the original query, but saving INTO a cache-table. Mostly this step was not executed, but I wanted the stored-procedure to be complete.

There was a DateTime column in the results set that was guaranteed to be unique. I made this the primary-key of the cache-table.

2. In a separate window, I stripped back the original query until just the DateTime column was returned. Unnecessarily, I added code to the top to delete any temp-table called “#thin” if it already existed (my habit). Then I added code to save the stripped back query results INTO a temp-table … called “#thin”.

This step would run every time, and the output could be compared with the old data (in the cache-table) to add any new rows, and knock off any old ones.

3. The original query was executed but with a WHERE clause added, like WHERE prod.DateTime not in (SELECT DateTime FROM #thin). The 2 or 3 (fat) rows returned from this step were appended to the cache-table.

4. A simple DELETE removed any rows from the cache-table where the DateTime was not in the #thin table.

5. The Cache-table was SELECT’ed in full as the stored-procedures output. Which typically ran in around 7 seconds. Despite the extra round-trip to the database.

Testing. After a day or two compare the old / new result sets in spreadsheet tabs and adjust indexing accordingly (As always, full responsibility lies with the implementer).

Addendum. To help performance I later changed Step-3 from …

WHERE prod.DateTime not in (SELECT DateTime FROM #thin)

… to …

LEFT JOIN cache.table cac ON cac.DateTime = prod.DateTime
WHERE cac.DateTime IS NULL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s