What’s up with WITH?

Scott Peters Databases, Technology Snapshot 1 Comment

Ah, the WITH statement. A (mostly) loved holdover from such languages as Python and VB(A) that still finds some questionable use today in JavaScript. It was easy to use and could save quite a bit of coding time. I do, at times, wish it was available in Java. But this blog is not about that WITH, nor am I pining for Java to take on some old VB traits (GoTo, anyone?).

I recently learned that WITH’s twin, if only in name, lives on as a SQL clause. Even though most of my time is spent doing application development, I’m no stranger to database development. I do enjoy diving into a DBMS and whipping up some nice SQL code, stored procedures and the like. But WITH was a clause I’d done without because I had no idea it existed in the SQL world.

Recently a coworker asked me to look at a very frustrating DB2 statement that wasn’t compiling or executing as it should. It seemed that there were problems buried somewhere deep in the layers of nested statements required because temporary tables were not going to be something we could use here. After admitting defeat, we called in an architect who suggested the WITH clause. What? Neither of us had heard of a WITH clause for SQL. With my curiosity peaked, I decided to investigate this WITH clause further to see what I had been missing out on.

It turns out that WITH can be pretty useful in the right situations, simplifying code and providing a more efficient way to access subqueries and temporary table structures. I found that it was adopted in the SQL:1999 standard as a way to write recursive queries and create named subquery blocks.

Some of the major DBMS players jumped on this right away, but with slightly different implementations. Oracle chose not to implement the recursive functionality of the WITH clause because they already had this functionality using the PL/SQL CONNECT BY clause:

SELECT product_id, product_name, parent_id
FROM tbl_products
START WITH product_id=50
CONNECT BY parent_id  =  PRIOR product_id;

Using the WITH clause in DB2 or SQL Server to achieve the same result looks like this:

WITH recursiveProducts
     (product_id, product_name, category_id)  AS
(SELECT parent.product_id,
     parent.product_name,
     parent.parent_id
FROM tbl_products parent
WHERE parent.product_id=50
UNION ALL
SELECT   child.product_id,
     child.product_name,
     child.parent_id
FROM recursiveProducts parent, tbl_products child
WHERE child.parent_id  =  parent.product_id)
SELECT product_id, product_name, product_id
FROM recursiveProducts;

Obviously not as clean or simple as Oracle’s implementation, but it works.

PostgreSQL requires the use of an additional keyword “RECURSIVE” to make this statement work:

WITH RECURSIVE recursiveProducts
    (product_id, product_name, category_id)  AS
(SELECT parent.product_id,
     parent.product_name,
     parent.parent_id
FROM tbl_products parent
WHERE parent.product_id=50
UNION ALL
SELECT   child.product_id,
     child.product_name,
     child.parent_id
FROM recursiveProducts parent, tbl_products child
WHERE child.parent_id  =  parent.product_id)
SELECT product_id, product_name, product_id
FROM recursiveProducts;

Sadly some other “major” DBMSs (I’m looking at you MySQL) do not support the WITH function or recursive queries at all, so at least we’ve got something here.

While knowing how to write a recursive SQL statement is a great thing, its not something I’ve really had a need for very often. The other function of the WITH clause, however, is something I could see myself using somewhat often now that I’m familiar with it. The ability to use WITH to create named subquery blocks, or even pseudo temp tables, is supported by most (but not all – still looking at you, MySQL), such as Oracle, PostgreSQL, DB2 and SQL Server. As an added bonus, this also becomes a more efficient way to reuse the same subquery.

The optimizer in Oracle recognizes that the statement in the WITH block only needs to be executed once, as opposed to each time its used in the statement. For example, this statement, not using WITH, will execute the same subquery twice:

SELECT product_id,   (SELECT AVG(current_inventory) FROM product)
FROM product
WHERE current_inventory  >  (SELECT AVG(current_inventory)
     FROM product);

Using WITH, we get the same results, but Oracle knows to only execute the subquery once.

WITH product Avg  AS  (SELECT AVG(current_inventory)  avg_inventory
     FROM product)
SELECT product_id,  (SELECT avg_inventory FROM productAVG)
FROM product
WHERE current_inventory  >  (SELECTavg_inventory
     FROM productAVG);

DB2 handles things a little differently, using the WITH clause as a way to create a pseudo temporary table, saving the overhead of creating and dropping a regular view that would only be used one time. Additionally, during statement preparation, DB2 does not need to access the catalog for the view, saving additional overhead.

Without WITH, we get a pretty inefficient statement that creates a true temporary table. Not only do we need to define the table and its columns, it still needs to be populated with data. This can be an expensive set of operations depending on the size of your temporary table, the amount of data being stored and how many temporary tables you’ll end up using.

declare global temporary table product Avg
     (avg_inventory numeric(8,2))
     on commit preserve rows not logged  ;

insert into SESSION.productAVG  (avg_inventory)
     SELECT AVG(current_inventory)  avg_inventory
          FROM product

SELECT prd.product_id,  avg.avg_inventory
FROM product prd, SESSION.productAVG avg
WHERE prd.current_inventory  > avg.avg_inventory

By using the WITH clause instead, we’re not accessing the catalog nor are we having to use the SESSION schema. Plus we don’t have to remember to drop the table when we’re done. This table structure falls out of scope and is gone forever once the statement is executed. Because DB2 treats this as a standard temporary table, you’re able to use it in exactly the same way.

WITH productAVG as (SELECT AVG(current_inventory) avg_inventory
FROM product)
SELECT prd.product_id, pAvg.avg_inventory
FROM product prd,.productAvg pAvg
WHERE prd.current_inventory  >  pAvg.avg_inventory

I’ve just outlined the basic uses here. There are many ways in which these can be used as building blocks for simplifying potentially complex SQL statements, all while keeping code as efficient as possible. I am excited to find excuses to put this clause to use now that I know it exists. The nightmares of SQL horrors past may never have occurred had I learned of this earlier.

A word of caution, though: I do encourage anyone looking to make use of the WITH clause to take the time to research if it is supported (seriously MySQL, I do actually like you) and how it is implemented for the DBMS you are using. In my limited research I have found that each platform handles WITH a bit differently, from limiting functionality to differences in how the underlying structures are defined and created by WITH. Good luck!

— Scott Peters, asktheteam@keyholesoftware.com


About the Author
Scott Peters

Scott Peters


Share this Post

Comments 1

  1. Great post. Two points of note:

    First, subqueries in the FROM clause can often provide the same advantages as noon-recursive WITH in most major RDBMS’s. You may see these called by several different names, but using your example query it would look like this:

    SELECT product_id, product_avg.inventory
    FROM product,
    (SELECT AVG(current_inventory) as inventory
    FROM product) as product_avg
    WHERE current_inventory > product_avg.inventory;

    This should perform the same as the WITH version – but I emphasize “should” and will come back to that momentarily.

    The choice between FROM subqueries and non-recursive WITH, if your DBMS supports both, often boils down to what seems more readable. When non-recursive WITH really makes a difference is if you need the same pseudo-view in unrelated parts of a more complex query (though some SQL engines may choke on WITH in exactly that circumstance).

    The second point (and why I emphasize “should” preform the same) is that relative performance of equivalent queries is highly dependent on the DBMS (and in particular on the optimizer). Since SQL is meant to be descriptive rather than procedural, ideally refactoring wouldn’t affect performance; but in practice optimizers aren’t that magical. If a query gives you performance problems, sometimes all you can do is test various versions until one works well (and hope the next DBMS upgrade doesn’t change performance characteristics too much).

    In particular note that many cost-based optimizers have problems with pseudo-views both in WITH and in FROM subqueries due to lack of statistics, so for large datasets it’s possible that a real temp table will outperform after all.

Leave a Reply