I recently learned that WITH’s twin, if only in name, lives on as a SQL clause. Even though most of my time is spent doing application development, I’m no stranger to database development. I do enjoy diving into a DBMS and whipping up some nice SQL code, stored procedures and the like. But WITH was a clause I’d done without because I had no idea it existed in the SQL world.
Recently a coworker asked me to look at a very frustrating DB2 statement that wasn’t compiling or executing as it should. It seemed that there were problems buried somewhere deep in the layers of nested statements required because temporary tables were not going to be something we could use here. After admitting defeat, we called in an architect who suggested the WITH clause. What? Neither of us had heard of a WITH clause for SQL. With my curiosity peaked, I decided to investigate this WITH clause further to see what I had been missing out on.
It turns out that WITH can be pretty useful in the right situations, simplifying code and providing a more efficient way to access subqueries and temporary table structures. I found that it was adopted in the SQL:1999 standard as a way to write recursive queries and create named subquery blocks.
Some of the major DBMS players jumped on this right away, but with slightly different implementations. Oracle chose not to implement the recursive functionality of the WITH clause because they already had this functionality using the PL/SQL CONNECT BY clause:
SELECT product_id, product_name, parent_id FROM tbl_products START WITH product_id=50 CONNECT BY parent_id = PRIOR product_id;
Using the WITH clause in DB2 or SQL Server to achieve the same result looks like this:
WITH recursiveProducts (product_id, product_name, category_id) AS (SELECT parent.product_id, parent.product_name, parent.parent_id FROM tbl_products parent WHERE parent.product_id=50 UNION ALL SELECT child.product_id, child.product_name, child.parent_id FROM recursiveProducts parent, tbl_products child WHERE child.parent_id = parent.product_id) SELECT product_id, product_name, product_id FROM recursiveProducts;
Obviously not as clean or simple as Oracle’s implementation, but it works.
PostgreSQL requires the use of an additional keyword “RECURSIVE” to make this statement work:
WITH RECURSIVE recursiveProducts (product_id, product_name, category_id) AS (SELECT parent.product_id, parent.product_name, parent.parent_id FROM tbl_products parent WHERE parent.product_id=50 UNION ALL SELECT child.product_id, child.product_name, child.parent_id FROM recursiveProducts parent, tbl_products child WHERE child.parent_id = parent.product_id) SELECT product_id, product_name, product_id FROM recursiveProducts;
Sadly some other “major” DBMSs (I’m looking at you MySQL) do not support the WITH function or recursive queries at all, so at least we’ve got something here.
While knowing how to write a recursive SQL statement is a great thing, its not something I’ve really had a need for very often. The other function of the WITH clause, however, is something I could see myself using somewhat often now that I’m familiar with it. The ability to use WITH to create named subquery blocks, or even pseudo temp tables, is supported by most (but not all – still looking at you, MySQL), such as Oracle, PostgreSQL, DB2 and SQL Server. As an added bonus, this also becomes a more efficient way to reuse the same subquery.
The optimizer in Oracle recognizes that the statement in the WITH block only needs to be executed once, as opposed to each time its used in the statement. For example, this statement, not using WITH, will execute the same subquery twice:
SELECT product_id, (SELECT AVG(current_inventory) FROM product) FROM product WHERE current_inventory > (SELECT AVG(current_inventory) FROM product);
Using WITH, we get the same results, but Oracle knows to only execute the subquery once.
WITH product Avg AS (SELECT AVG(current_inventory) avg_inventory FROM product) SELECT product_id, (SELECT avg_inventory FROM productAVG) FROM product WHERE current_inventory > (SELECTavg_inventory FROM productAVG);
DB2 handles things a little differently, using the WITH clause as a way to create a pseudo temporary table, saving the overhead of creating and dropping a regular view that would only be used one time. Additionally, during statement preparation, DB2 does not need to access the catalog for the view, saving additional overhead.
Without WITH, we get a pretty inefficient statement that creates a true temporary table. Not only do we need to define the table and its columns, it still needs to be populated with data. This can be an expensive set of operations depending on the size of your temporary table, the amount of data being stored and how many temporary tables you’ll end up using.
declare global temporary table product Avg (avg_inventory numeric(8,2)) on commit preserve rows not logged ; insert into SESSION.productAVG (avg_inventory) SELECT AVG(current_inventory) avg_inventory FROM product SELECT prd.product_id, avg.avg_inventory FROM product prd, SESSION.productAVG avg WHERE prd.current_inventory > avg.avg_inventory
By using the WITH clause instead, we’re not accessing the catalog nor are we having to use the SESSION schema. Plus we don’t have to remember to drop the table when we’re done. This table structure falls out of scope and is gone forever once the statement is executed. Because DB2 treats this as a standard temporary table, you’re able to use it in exactly the same way.
WITH productAVG as (SELECT AVG(current_inventory) avg_inventory FROM product) SELECT prd.product_id, pAvg.avg_inventory FROM product prd,.productAvg pAvg WHERE prd.current_inventory > pAvg.avg_inventory
I’ve just outlined the basic uses here. There are many ways in which these can be used as building blocks for simplifying potentially complex SQL statements, all while keeping code as efficient as possible. I am excited to find excuses to put this clause to use now that I know it exists. The nightmares of SQL horrors past may never have occurred had I learned of this earlier.
A word of caution, though: I do encourage anyone looking to make use of the WITH clause to take the time to research if it is supported (seriously MySQL, I do actually like you) and how it is implemented for the DBMS you are using. In my limited research I have found that each platform handles WITH a bit differently, from limiting functionality to differences in how the underlying structures are defined and created by WITH. Good luck!
— Scott Peters, firstname.lastname@example.org