SQL Run

Database Programming

8. Subqueries

In this chapter

Understanding Subqueries

Subquery Syntax

Subqueries vs. Joins

Simple and Correlated Subqueries

Qualifying Column Names in Subqueries

Nulls in Subqueries

Using Subqueries as Column Expressions

Comparing a Subquery Value by Using a Comparison Operator

Testing Set Membership with IN

Comparing All Subquery Values with ALL

Comparing Some Subquery Values with ANY

Testing Existence with EXISTS

Comparing Equivalent Queries

To this point, I’ve used a single SELECT statement to retrieve data from one or more tables. This chapter describes nested queries, which let you retrieve or modify data based on another query’s result.

A subquery, or subselect, is a SELECT statement embedded in another SQL statement. You can nest a subquery in:

In general, you can use a subquery anywhere an expression is allowed, but your DBMS might restrict where they can appear. This chapter covers subqueries nested in a SELECT statement or another subquery; Chapter 10 covers subqueries embedded in INSERT, UPDATE, and DELETE statements.

Understanding Subqueries

This section defines some terms and introduces subqueries by giving an example of a SELECT statement that contains a simple subquery. Subsequent sections explain the types of subqueries and their syntax and semantics.

Suppose that you want to list the names of the publishers of biographies. The naive approach is to write two queries: one query to retrieve the IDs of all the biography publishers (Listing 8.1 and Figure 8.1) and a second query that uses the first query’s result to list the publisher names (Listing 8.2 and Figure 8.2).

Listing 8.1List the biography publishers. See Figure 8.1 for the result.

SELECT pub_id
  FROM titles
  WHERE type = 'biography';

Figure 8.1Result of Listing 8.1. You can add DISTINCT to the SELECT clause of Listing 8.1 to list the publishers only once; see “Eliminating Duplicate Rows with DISTINCT” in Chapter 4.

pub_id
------
P01
P03
P01
P01

Listing 8.2This query uses the result of Listing 8.1 to list the names of the biography publishers. See Figure 8.2 for the result.

SELECT pub_name
  FROM publishers
  WHERE pub_id IN ('P01', 'P03');

Figure 8.2Result of Listing 8.2.

pub_name
-------------------
Abatis Publishers
Schadenfreude Press

A better way is to use an inner join (Listing 8.3 and Figure 8.3); see “Creating an Inner Join with INNER JOIN” in Chapter 7.

Listing 8.3List the names of the biography publishers by using an inner join. See Figure 8.3 for the result.

SELECT DISTINCT pub_name
  FROM publishers p
  INNER JOIN titles t
    ON p.pub_id = t.pub_id
  WHERE t.type = 'biography';

Figure 8.3Result of Listing 8.3.

pub_name
-------------------
Abatis Publishers
Schadenfreude Press

Another alternative is to use a subquery (Listing 8.4 and Figure 8.4). The subquery in Listing 8.4 is highlighted. A subquery also is called an inner query, and the statement containing a subquery is called an outer query. In other words, an enclosed subquery is an inner query of an outer query. Remember that a subquery can be nested in another subquery, so inner and outer are relative terms in statements with multiple nested subqueries.

Listing 8.4List the names of the biography publishers by using a subquery. See Figure 8.4 for the result.

SELECT pub_name
  FROM publishers
  WHERE pub_id IN
    (SELECT pub_id
       FROM titles
       WHERE type = 'biography');

Figure 8.4Result of Listing 8.4.

pub_name
-------------------
Abatis Publishers
Schadenfreude Press

I’ll explain how a DBMS executes subqueries in “Simple and Correlated Subqueries” later in this chapter, but for now, all that you need to know is that in Listing 8.4, the DBMS processes the inner query (highlighted text) first and then uses its interim result to run the outer query (plain text) and get the final result. The IN keyword that introduces the subquery tests for list membership and works like IN in “List Filtering with IN” in Chapter 4. Note that the inner query in Listing 8.4 is the same query as Listing 8.1, and the outer query is the same query as Listing 8.2.

Tips for Subqueries

Subquery Syntax

The syntax of a subquery is the same as that of a normal SELECT statement (see Chapters 4 through 7) except for the following differences:

In practice, a subquery usually appears in a WHERE clause that takes one of these forms:

test_expr is a literal value, a column name, an expression, or a scalar subquery; op is a comparison operator (=, <>, <, <=, >, or >=); and subquery is a simple or correlated subquery. I’ll cover each of these forms later in this chapter. You can use these subquery forms in a HAVING clause, too.

Tips for Subquery Syntax

Subqueries vs. Joins

In “Understanding Subqueries” earlier in this chapter, Listings 8.3 and 8.4 showed two equivalent queries: one used a join, and the other used a subquery. Many subqueries can be formulated alternatively as joins. In fact, a subquery is a way to relate one table to another without actually doing a join.

Because subqueries can be hard to use and debug, you might prefer to use joins, but you can pose some questions only as subqueries. In cases where you can use subqueries and joins interchangeably, you should test queries on your DBMS to see whether a performance difference exists between a statement that uses a subquery and a semantically equivalent statement that uses a join. For example, the query

SELECT MAX(table1.col1)
  FROM table1
  WHERE table1.col1 IN
    (SELECT table2.col1
       FROM table2);

usually will run faster than

SELECT MAX(table1.col1)
  FROM table1
  INNER JOIN table2
    ON table1.col1 = table2.col1;

For more information, see “Comparing Equivalent Queries” later in this chapter.

The following syntax diagrams show some equivalent statements that use subqueries and joins. These two statements are equivalent (IN subquery):

SELECT *
  FROM table1
  WHERE id IN
    (SELECT id FROM table2);

and (inner join):

SELECT DISTINCT table1.*
  FROM table1
  INNER JOIN table2
    ON table1.id = table2.id;

See Listings 8.5a and 8.5b and Figure 8.5 for an example.

Listing 8.5aThis statement uses a subquery to list the authors who live in the same city in which a publisher is located. See Figure 8.5 for the result.

SELECT au_id, city
  FROM authors
  WHERE city IN
    (SELECT city FROM publishers);

Listing 8.5bThis statement is equivalent to Listing 8.5a but uses an inner join instead of a subquery. See Figure 8.5 for the result.

SELECT DISTINCT a.au_id, a.city
  FROM authors a
  INNER JOIN publishers p
    ON a.city = p.city;

Figure 8.5Result of Listings 8.5a and 8.5b.

au_id city
----- -------------
A03   San Francisco
A04   San Francisco
A05   New York

These three statements are equivalent (NOT IN subquery):

SELECT *
  FROM table1
  WHERE id NOT IN
    (SELECT id FROM table2);

and (NOT EXISTS subquery):

SELECT *
  FROM table1
  WHERE NOT EXISTS
    (SELECT *
       FROM table2
       WHERE table1.id = table2.id);

and (left outer join):

SELECT table1.*
  FROM table1
  LEFT OUTER JOIN table2
    ON table1.id = table2.id
  WHERE table2.id IS NULL;

See Listings 8.6a, 8.6b, and 8.6c and Figure 8.6 for an example. IN and EXISTS subqueries are covered later in this chapter.

Listing 8.6aThis statement uses an IN subquery to list the authors who haven’t written (or cowritten) a book. See Figure 8.6 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors
  WHERE au_id NOT IN
    (SELECT au_id FROM title_authors);

Listing 8.6bThis statement is equivalent to Listing 8.6a but uses an EXISTS subquery instead of an IN subquery. See Figure 8.6 for the result.

SELECT au_id, au_fname, au_lname
FROM authors a
  WHERE NOT EXISTS
    (SELECT *
       FROM title_authors ta
       WHERE a.au_id = ta.au_id);

Listing 8.6cThis statement is equivalent to Listings 8.6a and 8.6b but uses a left outer join instead of a subquery. See Figure 8.6 for the result.

SELECT a.au_id, a.au_fname, a.au_lname
  FROM authors a
  LEFT OUTER JOIN title_authors ta
    ON a.au_id = ta.au_id
  WHERE ta.au_id IS NULL;

Figure 8.6Result of Listings 8.6a, 8.6b, and 8.6c.

au_id au_fname au_lname
----- -------- -----------
A07   Paddy    O'Furniture

You also can write a self-join as a subquery (Listings 8.7a and 8.7b and Figure 8.7). For information about self-joins, see “Creating a Self-Join” in Chapter 7.

Listing 8.7aThis statement uses a subquery to list the authors who live in the same state as author A04 (Klee Hull). See Figure 8.7 for the result.

SELECT au_id, au_fname, au_lname, state
  FROM authors
  WHERE state IN
    (SELECT state
       FROM authors
       WHERE au_id = 'A04');

Listing 8.7bThis statement is equivalent to Listing 8.7a but uses an inner join instead of a subquery. See Figure 8.7 for the result.

SELECT a1.au_id, a1.au_fname,
    a1.au_lname, a1.state
  FROM authors a1
  INNER JOIN authors a2
    ON a1.state = a2.state
  WHERE a2.au_id = 'A04';

Figure 8.7Result of Listings 8.7a and 8.7b.

au_id au_fname au_lname state
----- -------- -------- -----
A03   Hallie   Hull     CA
A04   Klee     Hull     CA
A06            Kellsey  CA

Favor subqueries if you’re comparing an aggregate value to other values (Listing 8.8 and Figure 8.8). Without a subquery, you’d need two SELECT statements to list all the books with the highest price: one query to find the highest price and a second query to list all the books selling for that price. For information about aggregate functions, see Chapter 6.

Listing 8.8List all books whose price equals the highest book price. See Figure 8.8 for the result.

SELECT title_id, price
  FROM titles
  WHERE price =
    (SELECT MAX(price)
       FROM titles
);

Figure 8.8Result of Listing 8.8.

title_id price
-------- -----
T03      39.95

Use joins when you include columns from multiple tables in the result. Listing 8.5b uses a join to retrieve authors who live in the same city in which a publisher is located. To include the publisher ID in the result, simply add the column pub_id to the SELECT-clause list (Listing 8.9 and Figure 8.9).

You can’t accomplish this same task with a subquery because it’s illegal to include a column in the outer query’s SELECT-clause list from a table that appears in only the inner query:

SELECT a.au_id, a.city, p.pub_id
  FROM authors a
  WHERE a.city IN
    (SELECT p.city
       FROM publishers p);     --Illegal

Listing 8.9List the authors who live in the same city in which a publisher is located, and include the publisher in the result. See Figure 8.9 for the result.

SELECT a.au_id, a.city, p.pub_id
  FROM authors a
  INNER JOIN publishers p
    ON a.city = p.city;

Figure 8.9Result of Listing 8.9.

au_id city          pub_id
----- ------------- ------
A03   San Francisco P02
A04   San Francisco P02
A05   New York      P01

Tips for Subqueries vs. Joins

Simple and Correlated Subqueries

You can use two types of subqueries:

A simple subquery, or noncorrelated subquery, is a subquery that can be evaluated independently of its outer query and is processed only once for the entire statement. All the subqueries in this chapter’s examples so far have been simple subqueries (except Listing 8.6b).

A correlated subquery can’t be evaluated independently of its outer query; it’s an inner query that depends on data from the outer query. A correlated subquery is used if a statement needs to process a table in the inner query for each row in the outer query.

Correlated subqueries have more-complicated syntax and a knottier execution sequence than simple subqueries, but you can use them to solve problems that you can’t solve with simple subqueries or joins. This section gives an example of a simple subquery and a correlated subquery and then describes how a DBMS executes each one. Subsequent sections in this chapter contain more examples of each type of subquery.

Simple Subqueries

A DBMS evaluates a simple subquery by evaluating the inner query once and substituting its result into the outer query. A simple subquery executes prior to, and independent of, its outer query.

Let’s revisit Listing 8.5a from earlier in this chapter. Listing 8.10 (which is identical to Listing 8.5a) uses a simple subquery to list the authors who live in the same city in which a publisher is located; see Figure 8.10 for the result. Conceptually, a DBMS processes this query in two steps as two separate SELECT statements:

  1. The inner query (a simple subquery) returns the cities of all the publishers (Listing 8.11 and Figure 8.11).
  2. The DBMS substitutes the values returned by the inner query in step 1 into the outer query, which finds the author IDs corresponding to the publishers’ cities (Listing 8.12 and Figure 8.12).

Listing 8.10List the authors who live in the same city in which a publisher is located. See Figure 8.10 for the result.

SELECT au_id, city
  FROM authors
  WHERE city IN
    (SELECT city
       FROM publishers);

Figure 8.10Result of Listing 8.10.

au_id city
----- -------------
A03   San Francisco
A04   San Francisco
A05   New York

Listing 8.11List the cities in which the publishers are located. See Figure 8.11 for the result.

SELECT city
  FROM publishers;

Figure 8.11Result of Listing 8.11.

city
-------------
New York
San Francisco
Hamburg
Berkeley

Listing 8.12List the authors who live in one of the cities returned by Listing 8.11. See Figure 8.12 for the result.

SELECT au_id, city
  FROM authors
  WHERE city IN
    ('New York', 'San Francisco',
     'Hamburg', 'Berkeley');

Figure 8.12Result of Listing 8.12.

au_id city
----- -------------
A03   San Francisco
A04   San Francisco
A05   New York

Correlated Subqueries

Correlated subqueries offer a more powerful data-retrieval mechanism than simple subqueries do. A correlated subquery’s important characteristics are:

Listing 8.13 uses a correlated subquery to list the books that have sales better than the average sales of books of its type; see Figure 8.13 for the result. candidate (following titles in the outer query) and average (following titles in the inner query) are alias table names for the table titles, so that the information can be evaluated as though it comes from two different tables (see “Creating a Self-Join” in Chapter 7).

Listing 8.13List the books that have sales greater than or equal to the average sales of books of its type. The correlation variable candidate.type defines the initial condition to be met by the rows of the inner table average. The outer WHERE condition (sales >=) defines the final test that the rows of the inner table average must satisfy. See Figure 8.13 for the result.

SELECT
    candidate.title_id,
    candidate.type,
    candidate.sales
  FROM titles candidate
  WHERE sales >=
    (SELECT AVG(sales)
       FROM titles average
       WHERE average.type = candidate.type);

Figure 8.13Result of Listing 8.13.

title_id type       sales
-------- ---------- -------
T02      history       9566
T03      computer     25667
T05      psychology  201440
T07      biography  1500200
T09      children      5000
T13      history      10467

In Listing 8.13, the subquery can’t be resolved independently of the outer query. It needs a value for candidate.type, but this value is a correlation variable that changes as the DBMS examines different rows in the table candidate. The column average.type is said to correlate with candidate.type in the outer query. The average sales for a book type are calculated in the subquery by using the type of each book from the table in the outer query (candidate). The subquery computes the average sales for this type and then compares it with a row in the table candidate. If the sales in the table candidate are greater than or equal to average sales for the type, then that book is displayed in the result. A DBMS processes this query as follows:

  1. The book type in the first row of candidate is used in the subquery to compute average sales.

    Take the row for book T01, whose type is history, so the value in the column type in the first row of the table candidate is history. In effect, the subquery becomes:

    SELECT AVG(sales)
      FROM titles average
      WHERE average.type = 'history';

    This pass through the subquery yields a value of 6866—the average sales of history books. In the outer query, book T01’s sales of 566 are compared to the average sales of history books. T01’s sales are lower than average, so T01 isn’t displayed in the result.

  2. Next, book T02’s row in candidate is evaluated.

    T02 also is a history book, so the evaluated subquery is the same as in step 1:

    SELECT AVG(sales)
      FROM titles average
      WHERE average.type = 'history';

    This pass through the subquery again yields 6866 for the average sales of history books. Book T02’s sales of 9566 are higher than average, so T02 is displayed in the result.

  3. Next, book T03’s row in candidate is evaluated.

    T03 is a computer book, so this time, the evaluated subquery is:

    SELECT AVG(sales)
      FROM titles average
      WHERE average.type = 'computer';

    The result of this pass through the subquery is average sales of 25667 for computer books. Because book T03’s sales of 25667 equals the average (it’s the only computer book), T03 is displayed in the result.

  4. The DBMS repeats this process until every row in the outer table candidate has been tested.

If you can get the same result by using a simple subquery or a correlated subquery, then use the simple subquery because it probably will run faster. Listings 8.14a and 8.14b show two equivalent queries that list all authors who earn 100 percent (1.0) of the royalty share on a book. Listing 8.14a, which uses a simple subquery, is more efficient than Listing 8.14b, which uses a correlated subquery. In the simple subquery, the DBMS reads the inner table title_authors once. In the correlated subquery, the DBMS must loop through title_authors five times—once for each qualifying row in the outer table authors. See Figure 8.14 for the result.

Why do I say that a statement that uses a simple subquery probably will run faster than an equivalent statement that uses a correlated subquery when a correlated subquery clearly requires more work? Because your DBMS’s optimizer might be clever enough to recognize and reformulate a correlated subquery as a semantically equivalent simple subquery internally before executing the statement. For more information, see “Comparing Equivalent Queries” later in this chapter.

Listing 8.14bThis statement uses a simple subquery to list all authors who earn 100 percent (1.0) royalty on a book. See Figure 8.14 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors
  WHERE au_id IN
    (SELECT au_id
       FROM title_authors
       WHERE royalty_share = 1.0);

Listing 8.14bThis statement is equivalent to Listing 8.14a but uses a correlated subquery instead of a simple subquery. This query probably will run slower than Listing 8.14a. See Figure 8.14 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors
  WHERE 1.0 IN
    (SELECT royalty_share
       FROM title_authors
       WHERE title_authors.au_id =
             authors.au_id);

Figure 8.14Result of Listings 8.14a and 8.14b.

au_id au_fname  au_lname
----- --------- ---------
A01   Sarah     Buchman
A02   Wendy     Heydemark
A04   Klee      Hull
A05   Christian Kells
A06             Kellsey

Tips for Simple and Correlated Subqueries

Qualifying Column Names in Subqueries

Recall from “Qualifying Column Names” in Chapter 7 that you can qualify a column name explicitly with a table name to identify the column unambiguously. In statements that contain subqueries, column names are qualified implicitly by the table referenced in the FROM clause at the same nesting level.

In Listing 8.15a, which lists the names of biography publishers, the column names are qualified implicitly, meaning:

Listing 8.15b shows Listing 8.15a with explicit qualifiers. See Figure 8.15 for the result.

Listing 8.15aThe tables publishers and titles both contain a column named pub_id, but you don’t have to qualify pub_id in this query because of the implicit assumptions about table names that SQL makes. See Figure 8.15 for the result.

SELECT pub_name
  FROM publishers
  WHERE pub_id IN
    (SELECT pub_id
       FROM titles
       WHERE type = 'biography');

Listing 8.15bThis query is equivalent to Listing 8.15a, but with explicit qualification of pub_id. See Figure 8.15 for the result.

SELECT pub_name
  FROM publishers
  WHERE publishers.pub_id IN
    (SELECT titles.pub_id
       FROM titles
       WHERE type = 'biography');

Figure 8.15Result of Listings 8.15a and 8.15b.

pub_name
-------------------
Abatis Publishers
Schadenfreude Press

Tips for Qualified Column Names in Subqueries

Nulls in Subqueries

Beware of nulls; their presence complicates subqueries. If you don’t eliminate them when they’re present, then you might get an unexpected answer.

A subquery can hide a comparison to a null. Recall from “Nulls” in Chapter 3 that nulls don’t equal each other and that you can’t determine whether a null matches any other value. The following example involves a NOT IN subquery (see “Testing Set Membership with IN” later in this chapter). Consider the following two tables, each with one column. The first table is named table1:

col
----
   1
   2

The second table is named table2:

col
----
   1
   2
   3

If I run Listing 8.16 to list the values in table2 that aren’t in table1, then I get Figure 8.16a, as expected.

Listing 8.16List the values in table2 that aren’t in table1. See Figures 8.16a and 8.16b for the result.

SELECT col
  FROM table2
  WHERE col NOT IN
    (SELECT col
       FROM table1);

Figure 8.16aResult of Listing 8.16 when table1 doesn’t contain a null.

col
----
   3

Now add a null to table1:

col
----
   1
   2
NULL

If I rerun Listing 8.16, then I get Figure 8.16b (an empty table), which is correct logically but not what I expected.

Figure 8.16bResult of Listing 8.16 when table1 contains a null. This result is an empty table, which is correct logically but not what I expected.

col
----

Why is the result empty this time? The solution requires some algebra. I can move the NOT outside the subquery condition without changing the meaning of Listing 8.16:

SELECT col
  FROM table2
  WHERE NOT col IN
    (SELECT col FROM table1);

The IN clause determines whether a value in table2 matches any value in table1, so I can rewrite the subquery as a compound condition:

SELECT col
  FROM table2
  WHERE NOT ((col = 1)
         OR  (col = 2)
         OR  (col = NULL));

If I apply De Morgan’s Laws (refer to Table 4.6 in Chapter 4), then this query becomes:

SELECT col
  FROM table2
  WHERE (col <> 1)
    AND (col <> 2)
    AND (col <> NULL);

The final expression, col <> NULL, always is unknown. Refer to the AND truth table (Table 4.3 in Chapter 4), and you’ll see that the entire WHERE search condition reduces to unknown, which always is rejected by WHERE.

To fix Listing 8.16 so that it doesn’t examine the null in table1, add an IS NOT NULL condition to the subquery (see “Testing for Nulls with IS NULL” in Chapter 4):

SELECT col
  FROM table2
  WHERE col NOT IN
    (SELECT col
       FROM table1
       WHERE col IS NOT NULL);

Tips for Nulls in Subqueries

Using Subqueries as Column Expressions

In Chapters 4, 5, and 6, you learned that the items in a SELECT-clause list can be literals, column names, or more-complex expressions. SQL also lets you to embed a subquery in a SELECT-clause list.

A subquery that’s used as a column expression must be a scalar subquery. Recall from Table 8.1 in “Subquery Syntax” earlier in this chapter that a scalar subquery returns a single value (that is, a one-row, one-column result). In most cases, you’ll have to use an aggregate function or restrictive WHERE conditions in the subquery to guarantee that the subquery returns only one row.

The syntax for the SELECT-clause list is the same as you’ve been using all along, except that you can specify a parenthesized subquery as one of the column expressions in the list, as the following examples show.

Listing 8.17 uses two simple subqueries as column expressions to list each biography, its price, the average price of all books (not just biographies), and the difference between the price of the biography and the average price of all books. The aggregate function AVG() guarantees that each subquery returns a single value. See Figure 8.17 for the result. Remember that AVG() ignores nulls when computing an average; see “Calculating an Average with AVG()” in Chapter 6.

Listing 8.17List each biography, its price, the average price of all books, and the difference between the price of the biography and the average price of all books. See Figure 8.17 for the result.

SELECT title_id, price,
    (SELECT AVG(price) FROM titles)
      AS "AVG(price)",
    price - (SELECT AVG(price) FROM titles)
      AS "Difference"
  FROM titles
  WHERE type='biography';

Figure 8.17Result of Listing 8.17.

title_id price   AVG(price) Difference
-------- ------- ---------- ----------
T06        19.95    18.3875     1.5625
T07        23.95    18.3875     5.5625
T10         NULL    18.3875       NULL
T12        12.99    18.3875    -5.3975

Listing 8.18 uses correlated subqueries to list all the authors of each book in one row, as you’d view them in a report or spreadsheet. See Figure 8.18 for the result. Note that in each WHERE clause, SQL qualifies title_id implicitly with the table alias ta referenced in the subquery’s FROM clause; see “Qualifying Column Names in Subqueries” earlier in this chapter. For a more efficient way to implement this query, see “Tips for Column Expressions” later in this section. See Listing 15.8 in Chapter 15 for the reverse of this query.

Listing 8.18List all the authors of each book in one row. See Figure 8.18 for the result.

SELECT title_id,
    (SELECT au_id
       FROM title_authors ta
       WHERE au_order = 1
         AND title_id = t.title_id)
      AS "Author 1",
    (SELECT au_id
       FROM title_authors ta
       WHERE au_order = 2
         AND title_id = t.title_id)
      AS "Author 2",
    (SELECT au_id
       FROM title_authors ta
       WHERE au_order = 3
         AND title_id = t.title_id)
      AS "Author 3"
  FROM titles t;

Figure 8.18Result of Listing 8.18.

title_id Author 1 Author 2 Author 3
-------- -------- -------- --------
T01      A01      NULL     NULL
T02      A01      NULL     NULL
T03      A05      NULL     NULL
T04      A03      A04      NULL
T05      A04      NULL     NULL
T06      A02      NULL     NULL
T07      A02      A04      NULL
T08      A06      NULL     NULL
T09      A06      NULL     NULL
T10      A02      NULL     NULL
T11      A06      A03      A04
T12      A02      NULL     NULL
T13      A01      NULL     NULL

In Listing 8.19, I revisit Listing 7.30 in “Creating Outer Joins with OUTER JOIN” in Chapter 7, but this time, I’m using a correlated subquery instead of an outer join to list the number of books that each author wrote (or cowrote). See Figure 8.19 for the result.

Listing 8.19List the number of books that each author wrote (or cowrote), including authors who have written no books. See Figure 8.19 for the result.

SELECT au_id,
    (SELECT COUNT(*)
       FROM title_authors ta
       WHERE ta.au_id = a.au_id)
      AS "Num books"
  FROM authors a
  ORDER BY au_id;

Figure 8.19Result of Listing 8.19.

au_id Num books
----- ---------
A01           3
A02           4
A03           2
A04           4
A05           1
A06           3
A07           0

Listing 8.20 uses a correlated subquery to list each author and the latest date on which he or she published a book. You should qualify every column name explicitly in a subquery that contains a join to make it clear which table is referenced (even when qualifiers are unnecessary). See Figure 8.20 for the result.

Listing 8.20List each author and the latest date on which he or she published a book. See Figure 8.20 for the result.

SELECT au_id,
    (SELECT MAX(pubdate)
       FROM titles t
       INNER JOIN title_authors ta
         ON ta.title_id = t.title_id
       WHERE ta.au_id = a.au_id)
      AS "Latest pub date"
  FROM authors a;

Figure 8.20Result of Listing 8.20.

au_id Latest pub date
----- ---------------
A01   2000-08-01
A02   2000-08-31
A03   2000-11-30
A04   2001-01-01
A05   2000-09-01
A06   2002-05-31
A07   NULL

Listing 8.21 uses a correlated subquery to compute the running total of all book sales. A running total, or running sum, is a common calculation: For each book, I want to compute the sum of all sales of the books that precede the book. Here, I’m defining precede to mean those books whose title_id comes before the current book’s title_id alphabetically. Note the use of table aliases to refer to the same table in two contexts. The subquery returns the sum of sales for all books preceding the current book, which is denoted by t1.title_id. See Figure 8.21 for the result. See also “Calculating Running Statistics” in Chapter 15.

Listing 8.21Compute the running sum of all book sales. See Figure 8.21 for the result.

SELECT t1.title_id, t1.sales,
    (SELECT SUM(t2.sales)
       FROM titles t2
       WHERE t2.title_id <= t1.title_id)
      AS "Running total"
  FROM titles t1;

Figure 8.21Result of Listing 8.21.

title_id sales   Running total
-------- ------- -------------
T01          566           566
T02         9566         10132
T03        25667         35799
T04        13001         48800
T05       201440        250240
T06        11320        261560
T07      1500200       1761760
T08         4095       1765855
T09         5000       1770855
T10         NULL       1770855
T11        94123       1864978
T12       100001       1964979
T13        10467       1975446

You also can use a subquery in a FROM clause. In “Tips for DISTINCT” in Chapter 6, I used a FROM subquery to replicate a distinct aggregate function. Listing 8.22 uses a FROM subquery to calculate the greatest number of titles written (or cowritten) by any author. See Figure 8.22 for the result. Note that the outer query uses a table alias (ta) and column alias (count_titles) to reference the inner query’s result. See also “Column aliases and WHERE” in Chapter 4.

Listing 8.22Calculate the greatest number of titles written (or cowritten) by any author. See Figure 8.22 for the result.

SELECT MAX(ta.count_titles) AS "Max titles"
  FROM (SELECT COUNT(*) AS count_titles
          FROM title_authors
          GROUP BY au_id) ta;

Figure 8.22Result of Listing 8.22.

Max titles
----------
         4

Tips for Column Expressions

Comparing a Subquery Value by Using a Comparison Operator

You can use a subquery as a filter in a WHERE clause or HAVING clause by using one of the comparison operators (=, <>, <, <=, >, or >=).

The important characteristics of a subquery comparison test are:

The hard part of writing these statements is getting the subquery to return one value, which you can guarantee several ways:

To compare a subquery value:

Listing 8.23 tests the result of a simple subquery for equality to list the authors who live in the state in which Tenterhooks Press is located. Only one publisher is named Tenterhooks Press, so the inner WHERE condition guarantees that the inner query returns a single-valued result. See Figure 8.23 for the result.

Listing 8.23List the authors who live in the state in which the publisher Tenterhooks Press is located. See Figure 8.23 for the result.

SELECT au_id, au_fname, au_lname, state
  FROM authors
  WHERE state =
    (SELECT state
       FROM publishers
       WHERE pub_name = 'Tenterhooks Press');

Figure 8.23Result of Listing 8.23.

au_id au_fname au_lname state
----- -------- -------- -----
A03   Hallie   Hull     CA
A04   Klee     Hull     CA
A06            Kellsey  CA

Listing 8.24 is the same as Listing 8.23 except for the name of the publisher. No publisher named XXX exists, so the subquery returns an empty table (zero rows). The comparison evaluates to null, so the final result is empty. See Figure 8.24 for the result.

Listing 8.24List the authors who live in the state in which the publisher XXX is located. See Figure 8.24 for the result.

SELECT au_id, au_fname, au_lname, state
  FROM authors
  WHERE state =
    (SELECT state
       FROM publishers
       WHERE pub_name = 'XXX');

Figure 8.24Result of Listing 8.24 (an empty table).

au_id au_fname au_lname state
----- -------- -------- -----

Listing 8.25 lists the books with above-average sales. Subqueries introduced with comparison operators often use aggregate functions to return a single value. See Figure 8.25 for the result.

Listing 8.25List the books with above-average sales. See Figure 8.25 for the result.

SELECT title_id, sales
  FROM titles
  WHERE sales >
    (SELECT AVG(sales)
       FROM titles);

Figure 8.25Result of Listing 8.25.

title_id sales
-------- -------
T05       201440
T07      1500200

To list the authors of the books with above-average sales, I’ve added an inner join to Listing 8.25 (Listing 8.26 and Figure 8.26).

Listing 8.26List the authors of the books with above-average sales by using a join and a subquery. See Figure 8.26 for the result.

SELECT ta.au_id, ta.title_id
  FROM titles t
  INNER JOIN title_authors ta
    ON ta.title_id = t.title_id
  WHERE sales >
    (SELECT AVG(sales)
       FROM titles)
  ORDER BY ta.au_id ASC, ta.title_id ASC;

Figure 8.26Result of Listing 8.26.

au_id title_id
----- --------
A02   T07
A04   T05
A04   T07

Recall from the introduction to this chapter that you can use a subquery almost anywhere an expression is allowed, so this syntax is valid:

WHERE (subquery) op (subquery)

The left subquery must return a single value. Listing 8.27 is equivalent to Listing 8.26, but I’ve removed the inner join and instead placed a correlated subquery to the left of the comparison operator. See Figure 8.27 for the result.

Listing 8.27List the authors of the books with above-average sales by using two subqueries. See Figure 8.27 for the result.

SELECT au_id, title_id
  FROM title_authors ta
  WHERE
    (SELECT AVG(sales)
       FROM titles t
       WHERE ta.title_id = t.title_id)
    >
    (SELECT AVG(sales)
       FROM titles)
  ORDER BY au_id ASC, title_id ASC;

Figure 8.27Result of Listing 8.27.

au_id title_id
----- --------
A02   T07
A04   T05
A04   T07

You can include GROUP BY or HAVING clauses in a subquery if you know that the GROUP BY or HAVING clause itself returns a single value. Listing 8.28 lists the books priced higher than the highest-priced biography. See Figure 8.28 for the result.

Listing 8.28List the books priced higher than the highest-priced biography. See Figure 8.28 for the result.

SELECT title_id, price
  FROM titles
  WHERE price >
    (SELECT MAX(price)
       FROM titles
       GROUP BY type
       HAVING type = 'biography');

Figure 8.28Result of Listing 8.28.

title_id price
-------- -----
T03      39.95
T13      29.99

Listing 8.29 uses a subquery in a HAVING clause to list the publishers whose average sales exceed overall average sales. Again, the subquery returns a single value (the average of all sales). See Figure 8.29 for the result.

Listing 8.29List the publishers whose average sales exceed the overall average sales. See Figure 8.29 for the result.

SELECT pub_id, AVG(sales) AS "AVG(sales)"
  FROM titles
  GROUP BY pub_id
  HAVING AVG(sales) >
    (SELECT AVG(sales)
       FROM titles);

Figure 8.29Result of Listing 8.29.

pub_id AVG(sales)
------ ----------
P03     506744.33

Listing 8.30 uses a correlated subquery to list authors whose royalty share is less than the highest royalty share of any coauthor of a book. The outer query selects the rows of title_authors (that is, of ta1) one by one. The subquery calculates the highest royalty share for each book being considered for selection in the outer query. For each possible value of ta1, the DBMS evaluates the subquery and puts the row being considered in the result if the royalty share is less than the calculated maximum. See Figure 8.30 for the result.

Listing 8.30List authors whose royalty share is less than the highest royalty share of any coauthor of a book. See Figure 8.30 for the result.

SELECT ta1.au_id, ta1.title_id,
    ta1.royalty_share
  FROM title_authors ta1
  WHERE ta1.royalty_share <
    (SELECT MAX(ta2.royalty_share)
       FROM title_authors ta2
       WHERE ta1.title_id = ta2.title_id);

Figure 8.30Result of Listing 8.30.

au_id title_id royalty_share
----- -------- -------------
A04   T04               0.40
A03   T11               0.30
A04   T11               0.30

Listing 8.31 uses a correlated subquery to imitate a GROUP BY clause and list all books that have a price greater than the average for books of its type. For each possible value of t1, the DBMS evaluates the subquery and includes the row in the result if the price value in that row exceeds the calculated average. It’s unnecessary to group by type explicitly because the rows for which the average price is calculated are restricted by the subquery’s WHERE clause. See Figure 8.31 for the result.

Listing 8.31List all books that have a price greater than the average for books of its type. See Figure 8.31 for the result.

SELECT type, title_id, price
  FROM titles t1
  WHERE price >
    (SELECT AVG(t2.price)
       FROM titles t2
       WHERE t1.type = t2.type)
  ORDER BY type ASC, title_id ASC;

Figure 8.31Result of Listing 8.31.

type       title_id price
---------- -------- -----
biography  T06      19.95
biography  T07      23.95
children   T09      13.95
history    T13      29.99
psychology T04      12.99

Listing 8.32 uses the same structure as Listing 8.31 to list all the books whose sales are less than the best-selling books of their types. See Figure 8.32 for the result.

Listing 8.32List all the books whose sales are less than the best-selling books of their types. See Figure 8.32 for the result.

SELECT type, title_id, sales
  FROM titles t1
  WHERE sales <
    (SELECT MAX(sales)
       FROM titles t2
       WHERE t1.type = t2.type
         AND sales IS NOT NULL)
  ORDER BY type ASC, title_id ASC;

Figure 8.32Result of Listing 8.32.

type       title_id sales
---------- -------- ------
biography  T06       11320
biography  T12      100001
children   T08        4095
history    T01         566
history    T02        9566
psychology T04       13001
psychology T11       94123

Tips for Subquery Comparison Operators

Testing Set Membership with IN

“List Filtering with IN” in Chapter 4 describes how to use the IN keyword in a WHERE clause to compare a literal, column value, or more-complex expression to a list of values. You also can use a subquery to generate the list.

The important characteristics of a subquery set membership test are:

To test set membership:

Listing 8.33 lists the names of the publishers that have published biographies. The DBMS evaluates this statement in two steps. First, the inner query returns the IDs of the publishers that have published biographies (P01 and P03). Second, the DBMS substitutes these values into the outer query, which finds the names that go with the IDs in the table publishers. See Figure 8.33 for the result.

Listing 8.33List the names of the publishers that have published biographies. See Figure 8.33 for the result.

SELECT pub_name
  FROM publishers
  WHERE pub_id IN
    (SELECT pub_id
       FROM titles
       WHERE type = 'biography');

Figure 8.33Result of Listing 8.33.

pub_name
-------------------
Abatis Publishers
Schadenfreude Press

Here’s the join version of Listing 8.33:

SELECT DISTINCT pub_name
  FROM publishers p
  INNER JOIN titles t
    ON p.pub_id = t.pub_id
  AND type = 'biography';

Listing 8.34 is the same as Listing 8.33, except that it uses NOT IN to list the names of the publishers that haven’t published biographies. See Figure 8.34 for the result. This statement can’t be converted to a join. The analogous not-equal join has a different meaning: It lists the names of publishers that have published some book that isn’t a biography.

Listing 8.34List the names of the publishers that haven’t published biographies. See Figure 8.34 for the result.

SELECT pub_name
  FROM publishers
  WHERE pub_id NOT IN
    (SELECT pub_id
       FROM titles
       WHERE type = 'biography');

Figure 8.34Result of Listing 8.34.

pub_name
-----------------
Core Dump Books
Tenterhooks Press

Listing 8.35 is equivalent to Listing 7.31 in Chapter 7, except that it uses a subquery instead of an outer join to list the authors who haven’t written (or cowritten) a book. See Figure 8.35 for the result.

Listing 8.35List the authors who haven’t written (or cowritten) a book. See Figure 8.35 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors
  WHERE au_id NOT IN
    (SELECT au_id
       FROM title_authors);

Figure 8.35Result of Listing 8.35.

au_id au_fname au_lname
----- -------- -----------
A07   Paddy    O'Furniture

Listing 8.36 lists the names of the authors who have published a book with publisher P03 (Schadenfreude Press). The join to the table authors is necessary to include the authors’ names (not just their IDs) in the result. See Figure 8.36 for the result.

Listing 8.36List the names of the authors who have published a book with publisher P03. See Figure 8.36 for the result.

SELECT DISTINCT a.au_id, au_fname, au_lname
  FROM title_authors ta
  INNER JOIN authors a
    ON ta.au_id = a.au_id
  WHERE title_id IN
    (SELECT title_id
       FROM titles
       WHERE pub_id = 'P03');

Figure 8.36Result of Listing 8.36.

au_id au_fname au_lname
----- -------- ---------
A01   Sarah    Buchman
A02   Wendy    Heydemark
A04   Klee     Hull

A subquery can itself include one or more subqueries. Listing 8.37 lists the names of authors who have participated in writing at least one biography. The innermost query returns the title IDs T06, T07, T10, and T12. The DBMS evaluates the subquery at the next higher level by using these title IDs and returns the author IDs. Finally, the outermost query uses the author IDs to find the names of the authors. See Figure 8.37 for the result.

Listing 8.37List the names of authors who have participated in writing at least one biography. See Figure 8.37 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors
  WHERE au_id IN
    (SELECT au_id
       FROM title_authors
       WHERE title_id IN
         (SELECT title_id
            FROM titles
            WHERE type = 'biography'));

Figure 8.37Result of Listing 8.37.

au_id au_fname au_lname
----- -------- ---------
A02   Wendy    Heydemark
A04   Klee     Hull

Excessive subquery nesting makes a statement hard to read; often, it’s easier to restate the query as a join. Here’s the join version of Listing 8.37:

SELECT DISTINCT a.au_id, au_fname, au_lname
  FROM authors a
  INNER JOIN title_authors ta
    ON a.au_id = ta.au_id
  INNER JOIN titles t
    ON t.title_id = ta.title_id
  WHERE type = 'biography';

Listing 8.38 lists the names of all non-lead authors (au_order > 1) who live in California and who receive less than 50 percent of the royalties for a book. See Figure 8.38 for the result.

Listing 8.38List the names of all ancillary authors who live in California and who receive less than 50 percent of the royalties for a book. See Figure 8.38 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors
  WHERE state = 'CA'
    AND au_id IN
      (SELECT au_id
         FROM title_authors
         WHERE royalty_share < 0.5
           AND au_order > 1);

Figure 8.38Result of Listing 8.38.

au_id au_fname au_lname
----- -------- --------
A03   Hallie   Hull
A04   Klee     Hull

Here’s the join version of Listing 8.38:

SELECT DISTINCT a.au_id, au_fname, au_lname
  FROM authors a
  INNER JOIN title_authors ta
    ON a.au_id = ta.au_id
  WHERE state = 'CA'
    AND royalty_share < 0.5
    AND au_order > 1;

Listing 8.39 lists the names of authors who are coauthors of a book. To determine whether an author is a coauthor or the sole author of a book, examine his or her royalty share for the book. If the royalty share is less than 100 percent (1.0), then the author is a coauthor; otherwise, he or she is the sole author. See Figure 8.39 for the result.

Listing 8.39List the names of authors who are coauthors of a book. See Figure 8.39 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors a
  WHERE au_id IN
    (SELECT au_id
       FROM title_authors
       WHERE royalty_share < 1.0);

Figure 8.39Result of Listing 8.39.

au_id au_fname au_lname
----- -------- ---------
A02   Wendy    Heydemark
A03   Hallie   Hull
A04   Klee     Hull
A06            Kellsey

Listing 8.40 uses a correlated subquery to list the names of authors who are sole authors of a book—that is, authors who earn 100 percent (1.0) of the royalty on a book. See Figure 8.40 for the result. The DBMS considers each row in the outer-query table authors to be a candidate for inclusion in the result. When the DBMS examines the first candidate row in authors, it sets the correlation variable a.au_id equal to A01 (Sarah Buchman), which it substitutes into the inner query:

SELECT royalty_share
  FROM title_authors ta
  WHERE ta.au_id = 'A01';

The inner query returns 1.0, so the outer query evaluates to:

SELECT a.au_id, au_fname, au_lname
  FROM authors a
  WHERE 1.0 IN (1.0)

The WHERE condition is true, so author A01 is included in the result. The DBMS repeats this procedure for every author; see “Simple and Correlated Subqueries” earlier in this chapter.

Listing 8.40List the names of authors who are sole authors of a book. See Figure 8.40 for the result.

SELECT a.au_id, au_fname, au_lname
  FROM authors a
  WHERE 1.0 IN
    (SELECT royalty_share
       FROM title_authors ta
       WHERE ta.au_id = a.au_id);

Figure 8.40Result of Listing 8.40.

au_id au_fname  au_lname
----- --------- ---------
A01   Sarah     Buchman
A02   Wendy     Heydemark
A04   Klee      Hull
A05   Christian Kells
A06             Kellsey

Listing 8.41 lists the names of authors who are both coauthors and sole authors. The inner query returns the author IDs of sole authors, and the outer query compares these IDs with the IDs of the coauthors. See Figure 8.41 for the result.

Listing 8.41List the names of authors who are both coauthors and sole authors. See Figure 8.41 for the result.

SELECT DISTINCT a.au_id, au_fname, au_lname
  FROM authors a
  INNER JOIN title_authors ta
    ON a.au_id = ta.au_id
  WHERE ta.royalty_share < 1.0
    AND a.au_id IN
      (SELECT a.au_id
         FROM authors a
         INNER JOIN title_authors ta
           ON a.au_id = ta.au_id
         AND ta.royalty_share = 1.0);

Figure 8.41Result of Listing 8.41.

au_id au_fname au_lname
----- -------- ---------
A02   Wendy    Heydemark
A04   Klee     Hull
A06            Kellsey

You can rewrite Listing 8.41 as a join or as an intersection. Here’s the join version of Listing 8.41:

SELECT DISTINCT a.au_id, au_fname, au_lname
  FROM authors a
  INNER JOIN title_authors ta1
    ON a.au_id = ta1.au_id
  INNER JOIN title_authors ta2
    ON a.au_id = ta2.au_id
  WHERE ta1.royalty_share < 1.0
    AND ta2.royalty_share = 1.0;

Here’s the intersection version of Listing 8.41 (see “Finding Common Rows with INTERSECT” in Chapter 9):

SELECT DISTINCT a.au_id, au_fname, au_lname
  FROM authors a
  INNER JOIN title_authors ta
    ON a.au_id = ta.au_id
  WHERE ta.royalty_share < 1.0
INTERSECT
SELECT DISTINCT a.au_id, au_fname, au_lname
  FROM authors a
  INNER JOIN title_authors ta
    ON a.au_id = ta.au_id
  WHERE ta.royalty_share = 1.0;

Listing 8.42 uses a correlated subquery to list the types of books published by more than one publisher. See Figure 8.42 for the result.

Listing 8.42List the types of books common to more than one publisher. See Figure 8.42 for the result.

SELECT DISTINCT t1.type
  FROM titles t1
  WHERE t1.type IN
    (SELECT t2.type
       FROM titles t2
       WHERE t1.pub_id <> t2.pub_id);

Figure 8.42Result of Listing 8.42.

type
---------
biography
history

Here’s the self-join version of Listing 8.42:

SELECT DISTINCT t1.type
  FROM titles t1
  INNER JOIN titles t2
    ON t1.type = t2.type
    AND t1.pub_id <> t2.pub_id;

Tips for IN

Comparing All Subquery Values with ALL

You can use the ALL keyword to determine whether a value is less than or greater than all the values in a subquery result.

The important characteristics of subquery comparisons that use ALL are:

To compare all subquery values:

Listing 8.43 lists the authors who live in a city in which no publisher is located. The inner query finds all the cities in which publishers are located, and the outer query compares each author’s city to all the publishers’ cities. See Figure 8.43 for the result.

Listing 8.43List the authors who live in a city in which no publisher is located. See Figure 8.43 for the result.

SELECT au_id, au_lname, au_fname, city
  FROM authors
  WHERE city <> ALL
    (SELECT city
       FROM publishers);

Figure 8.43Result of Listing 8.43.

au_id au_lname    au_fname city
----- ----------- -------- ---------
A01   Buchman     Sarah    Bronx
A02   Heydemark   Wendy    Boulder
A06   Kellsey              Palo Alto
A07   O'Furniture Paddy    Sarasota

You can use NOT IN to replicate Listing 8.43:

SELECT au_id, au_lname, au_fname, city
  FROM authors
  WHERE city NOT IN
    (SELECT city FROM publishers);

Listing 8.44 lists the nonbiographies that are priced less than all the biographies. The inner query finds all the biography prices. The outer query inspects the lowest price in the list and determines whether each non-biography is cheaper. See Figure 8.44 for the result. The price IS NOT NULL condition is required because the price of biography T10 is null. Without this condition, the entire query would return zero rows because it’s impossible to determine whether a price is less than null (see “Nulls” in Chapter 3).

Listing 8.44List the nonbiographies that are cheaper than all the biographies. See Figure 8.44 for the result.

SELECT title_id, title_name
  FROM titles
  WHERE type <> 'biography'
    AND price < ALL
    (SELECT price
       FROM titles
       WHERE type = 'biography'
         AND price IS NOT NULL);

Figure 8.44Result of Listing 8.44.

title_id title_name
-------- --------------------------------
T05      Exchange of Platitudes
T08      Just Wait Until After School
T11      Perhaps It's a Glandular Problem

Listing 8.45 lists the books that outsold all the books that author A06 wrote (or cowrote). The inner query uses a join to find the sales of each book by author A06. The outer query inspects the highest sales figure in the list and determines whether each book sold more copies. See Figure 8.45 for the result. Again, the IS NOT NULL condition is needed in case sales is null for a book by author A06.

Listing 8.45List the books that outsold all the books that author A06 wrote (or cowrote). See Figure 8.45 for the result.

SELECT title_id, title_name
  FROM titles
  WHERE sales > ALL
    (SELECT sales
       FROM title_authors ta
       INNER JOIN titles t
         ON t.title_id = ta.title_id
       WHERE ta.au_id = 'A06'
         AND sales IS NOT NULL);

Figure 8.45Result of Listing 8.45.

title_id title_name
-------- -------------------------
T05      Exchange of Platitudes
T07      I Blame My Mother
T12      Spontaneous, Not Annoying

I can replicate Listing 8.45 by using GROUP BY, HAVING, and MAX() (instead of ALL):

SELECT title_id
  FROM titles
  GROUP BY title_id
  HAVING MAX(sales) >
    (SELECT MAX(sales)
       FROM title_authors ta
       INNER JOIN titles t
         ON t.title_id = ta.title_id
       WHERE ta.au_id = 'A06');

Listing 8.46 uses a correlated subquery in the HAVING clause of the outer query to list the types of books for which the highest sales figure is more than twice the average sales for that type. The inner query is evaluated once for each group defined in the outer query (once for each type of book). See Figure 8.46 for the result.

Listing 8.46List the types of books for which the highest sales figure is more than twice the average sales for that type. See Figure 8.46 for the result.

SELECT t1.type
  FROM titles t1
  GROUP BY t1.type
  HAVING MAX(t1.sales) >= ALL
    (SELECT 2.0 * AVG(t2.sales)
       FROM titles t2
       WHERE t1.type = t2.type);

Figure 8.46Result of Listing 8.46.

type
---------
biography

Tips for ALL

Comparing Some Subquery Values with ANY

ANY works like ALL (see the preceding section) but instead determines whether a value is equal to, less than, or greater than any (at least one) of the values in a subquery result.

The important characteristics of subquery comparisons that use ANY are:

To compare some subquery values:

Listing 8.47 lists the authors who live in a city in which a publisher is located. The inner query finds all the cities in which publishers are located, and the outer query compares each author’s city to all the publishers’ cities. See Figure 8.47 for the result.

Listing 8.47List the authors who live in a city in which a publisher is located. See Figure 8.47 for the result.

SELECT au_id, au_lname, au_fname, city
  FROM authors
  WHERE city = ANY
    (SELECT city
       FROM publishers);

Figure 8.47Result of Listing 8.47.

au_id au_lname au_fname  city
----- -------- --------- -------------
A03   Hull     Hallie    San Francisco
A04   Hull     Klee      San Francisco
A05   Kells    Christian New York

You can use IN to replicate Listing 8.47:

SELECT au_id, au_lname, au_fname, city
FROM authors
  WHERE city IN
    (SELECT city FROM publishers);

Listing 8.48 lists the nonbiographies that are priced less than at least one biography. The inner query finds all the biography prices. The outer query inspects the highest price in the list and determines whether each nonbiography is cheaper. See Figure 8.48 for the result.

Unlike the ALL comparison in Listing 8.44 in the preceding section, the price IS NOT NULL condition isn’t required here, even though the price of biography T10 is null. The DBMS doesn’t determine whether all the price comparisons are true—just whether at least one is true—so the null comparison is ignored.

Listing 8.48List the nonbiographies that are cheaper than at least one biography. See Figure 8.48 for the result.

SELECT title_id, title_name
  FROM titles
  WHERE type <> 'biography'
    AND price < ANY
      (SELECT price
         FROM titles
         WHERE type = 'biography');

Figure 8.48Result of Listing 8.48.

title_id title_name
-------- --------------------------------
T01      1977!
T02      200 Years of German Humor
T04      But I Did It Unconsciously
T05      Exchange of Platitudes
T08      Just Wait Until After School
T09      Kiss My Boo-Boo
T11      Perhaps It's a Glandular Problem

Listing 8.49 lists the books that outsold at least one of the books that author A06 wrote (or cowrote). The inner query uses a join to find the sales of each book by author A06. The outer query inspects the lowest sales figure in the list and determines whether each book sold more copies. See Figure 8.49 for the result. Again, unlike the ALL comparison in Listing 8.45 in the preceding section, the IS NOT NULL condition isn’t needed here.

Listing 8.49List the books that outsold at least one of the books that author A06 wrote (or cowrote). See Figure 8.49 for the result.

SELECT title_id, title_name
  FROM titles
  WHERE sales > ANY
    (SELECT sales
       FROM title_authors ta
       INNER JOIN titles t
         ON t.title_id = ta.title_id
       WHERE ta.au_id = 'A06');

Figure 8.49Result of Listing 8.49.

title_id title_name
-------- -----------------------------------
T02      200 Years of German Humor
T03      Ask Your System Administrator
T04      But I Did It Unconsciously
T05      Exchange of Platitudes
T06      How About Never?
T07      I Blame My Mother
T09      Kiss My Boo-Boo
T11      Perhaps It's a Glandular Problem
T12      Spontaneous, Not Annoying
T13      What Are The Civilian Applications?

I can replicate Listing 8.49 by using GROUP BY, HAVING, and MIN() (instead of ANY):

SELECT title_id
  FROM titles
  GROUP BY title_id
  HAVING MIN(sales) >
    (SELECT MIN(sales)
       FROM title_authors ta
       INNER JOIN titles t
         ON t.title_id = ta.title_id
       WHERE ta.au_id = 'A06');

Tips for ANY

Testing Existence with EXISTS

So far in this chapter, I’ve been using the comparison operators IN, ALL, and ANY to compare a specific test value to values in a subquery result. EXISTS and NOT EXISTS don’t compare values; rather, they simply look for the existence or nonexistence of rows in a subquery result.

The important characteristics of an existence test are:

To test existence:

Listing 8.50 lists the names of the publishers that have published biographies. This query considers each publisher’s ID in turn and determines whether it causes the existence test to evaluate to true. Here, the first publisher is P01 (Abatis Publishers). The DBMS ascertains whether any rows exist in the table titles in which pub_id is P01 and type is biography. If so, then Abatis Publishers is included in the final result. The DBMS repeats the same process for each of the other publisher IDs. See Figure 8.50 for the result. If I wanted to list the names of publishers that haven’t published biographies, then I’d change EXISTS to NOT EXISTS. See Listing 8.33 earlier in this chapter for an equivalent query that uses IN.

Listing 8.50List the names of the publishers that have published biographies. See Figure 8.50 for the result.

SELECT pub_name
  FROM publishers p
  WHERE EXISTS
    (SELECT *
       FROM titles t
       WHERE t.pub_id = p.pub_id
         AND type = 'biography');

Figure 8.50Result of Listing 8.50.

pub_name
-------------------
Abatis Publishers
Schadenfreude Press

Listing 8.51 lists the authors who haven’t written (or cowritten) a book. See Figure 8.51 for the result. See Listing 8.35 earlier in this chapter for an equivalent query that uses NOT IN.

Listing 8.51List the authors who haven’t written (or cowritten) a book. See Figure 8.51 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors a
  WHERE NOT EXISTS
    (SELECT *
       FROM title_authors ta
       WHERE ta.au_id = a.au_id);

Figure 8.51Result of Listing 8.51.

au_id au_fname  au_lname
----- --------- -----------
A07   Paddy     O'Furniture

Listing 8.52 lists the authors who live in a city in which a publisher is located. See Figure 8.52 for the result. See Listing 8.47 earlier in this chapter for an equivalent query that uses =ANY.

Listing 8.52List the authors who live in a city in which a publisher is located. See Figure 8.52 for the result.

SELECT au_id, au_lname, au_fname, city
  FROM authors a
  WHERE EXISTS
    (SELECT *
       FROM publishers p
       WHERE p.city = a.city);

Figure 8.52Result of Listing 8.52.

au_id au_lname au_fname  city
----- -------- --------- -------------
A03   Hull     Hallie    San Francisco
A04   Hull     Klee      San Francisco
A05   Kells    Christian New York

“Finding Common Rows with INTERSECT” in Chapter 9 describes how to use INTERSECT to retrieve the rows that two tables have in common. You also can use EXISTS to find an intersection. Listing 8.53 lists the cities in which both an author and publisher are located. See Figure 8.53 for the result. See Listing 9.8 in Chapter 9 for an equivalent query that uses INTERSECT.

You also can replicate this query with an inner join:

SELECT DISTINCT a.city
  FROM authors a
  INNER JOIN publishers p
    ON a.city = p.city;

Listing 8.53List the cities in which both an author and publisher are located. See Figure 8.53 for the result.

SELECT DISTINCT city
  FROM authors a
  WHERE EXISTS
    (SELECT *
       FROM publishers p
       WHERE p.city = a.city);

Figure 8.53Result of Listing 8.53.

city
-------------
New York
San Francisco

“Finding Different Rows with EXCEPT” in Chapter 9 describes how to use EXCEPT to retrieve the rows in one table that aren’t also in another table. You also can use NOT EXISTS to find a difference. Listing 8.54 lists the cities in which an author lives but a publisher isn’t located. See Figure 8.54 for the result. See Listing 9.9 in Chapter 9 for an equivalent query that uses EXCEPT.

You also can replicate this query with NOT IN:

SELECT DISTINCT city
  FROM authors
  WHERE city NOT IN
    (SELECT city
       FROM publishers);

Or with an outer join:

SELECT DISTINCT a.city
  FROM authors a
  LEFT OUTER JOIN publishers p
    ON a.city = p.city
  WHERE p.city IS NULL;

Listing 8.54List the cities in which an author lives but a publisher isn’t located. See Figure 8.54 for the result.

SELECT DISTINCT city
  FROM authors a
  WHERE NOT EXISTS
    (SELECT *
       FROM publishers p
       WHERE p.city = a.city);

Figure 8.54Result of Listing 8.54.

city
---------
Boulder
Bronx
Palo Alto
Sarasota

Listing 8.55 lists the authors who wrote (or cowrote) three or more books. See Figure 8.55 for the result.

Listing 8.55List the authors who wrote (or cowrote) three or more books. See Figure 8.55 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors a
  WHERE EXISTS
    (SELECT *
       FROM title_authors ta
       WHERE ta.au_id = a.au_id
       HAVING COUNT(*) >= 3);

Figure 8.55Result of Listing 8.55.

au_id au_fname au_lname
----- -------- ---------
A01   Sarah    Buchman
A02   Wendy    Heydemark
A04   Klee     Hull
A06            Kellsey

Listing 8.56 uses two existence tests to list the authors who wrote (or cowrote) both children’s and psychology books. See Figure 8.56 for the result.

Listing 8.56List the authors who wrote (or cowrote) a children’s book and also wrote (or cowrote) a psychology book. See Figure 8.56 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors a
  WHERE EXISTS
    (SELECT *
       FROM title_authors ta
       INNER JOIN titles t
         ON t.title_id = ta.title_id
       WHERE ta.au_id = a.au_id
         AND t.type = 'children')
  AND EXISTS
    (SELECT *
       FROM title_authors ta
       INNER JOIN titles t
         ON t.title_id = ta.title_id
       WHERE ta.au_id = a.au_id
         AND t.type = 'psychology');

Figure 8.56Result of Listing 8.56.

au_id au_fname au_lname
----- -------- --------
A06            Kellsey

Listing 8.57 performs a uniqueness test to determine whether duplicates occur in the column au_id in the table authors. The query prints Yes if duplicate values exist in the column au_id; otherwise, it returns an empty result. See Figure 8.57 for the result. au_id is the primary key of authors, so of course it contains no duplicates.

Listing 8.57Does the column au_id in the table authors contain duplicate values? See Figure 8.57 for the result.

SELECT DISTINCT 'Yes' AS "Duplicates?"
  WHERE EXISTS
    (SELECT *
       FROM authors
       GROUP BY au_id
       HAVING COUNT(*) > 1);

Figure 8.57Result of Listing 8.57.

Duplicates?
-----------

Listing 8.58 shows the same query for the table title_authors, which does contain duplicate au_id values. See Figure 8.58 for the result. You can add grouping columns to the GROUP BY clause to determine whether multiple-column duplicates exist.

Listing 8.58Does the column au_id in the table title_authors contain duplicate values? See Figure 8.58 for the result.

SELECT DISTINCT 'Yes' AS "Duplicates?"
  WHERE EXISTS
    (SELECT *
       FROM title_authors
       GROUP BY au_id
       HAVING COUNT(*) > 1);

Figure 8.58Result of Listing 8.58.

Duplicates?
-----------
Yes

Tips for EXISTS

Comparing Equivalent Queries

As you’ve seen in this chapter and the preceding one, you can express the same query in different ways (different syntax, same semantics). To expand on this point, I’ve written the same query six semantically equivalent ways. Each of the statements in Listing 8.60 lists the authors who have written (or cowritten) at least one book. See Figure 8.60 for the result.

Listing 8.60These six queries are equivalent semantically; they all list the authors who have written (or cowritten) at least one book. See Figure 8.60 for the result.

SELECT DISTINCT a.au_id
  FROM authors a
  INNER JOIN title_authors ta
    ON a.au_id = ta.au_id;

SELECT DISTINCT a.au_id
  FROM authors a, title_authors ta
  WHERE a.au_id = ta.au_id;

SELECT au_id
  FROM authors a
  WHERE au_id IN
    (SELECT au_id
       FROM title_authors);

SELECT au_id
  FROM authors a
  WHERE au_id = ANY
    (SELECT au_id
       FROM title_authors);

SELECT au_id
  FROM authors a
  WHERE EXISTS
    (SELECT *
       FROM title_authors ta
       WHERE a.au_id = ta.au_id);

SELECT au_id
  FROM authors a
  WHERE 0 <
    (SELECT COUNT(*)
       FROM title_authors ta
       WHERE a.au_id = ta.au_id);

Figure 8.60Each of the six statements in Listing 8.60 returns this result.

au_id
-----
A01
A02
A03
A04
A05
A06

The first two queries (inner joins) will run at the same speed as one another. Of the third through sixth queries (which use subqueries), the last one probably is the worst performer. The DBMS will stop processing the other subqueries as soon as it encounters a single matching value. But the subquery in the last statement has to count all the matching rows before it returns either true or false. Your DBMS’s optimizer should run the inner joins at about the same speed as the fastest subquery statement.

You might find this programming flexibility to be attractive, but people who design DBMS optimizers don’t, because they’re tasked with considering all the possible ways to express a query, figuring out which one performs best, and reformulating your query internally to its optimal form. (Entire careers are devoted to solving these types of optimization problems.) If your DBMS has a flawless optimizer, then it will run all six of the queries in Listing 8.60 at the same speed. But that situation is unlikely, so you’ll have to experiment with your DBMS to see which version runs fastest.

Tips for Comparing Equivalent Queries

SQL Tuning

After you learn the basics of SQL, your next step is to tune your SQL statements so that they run efficiently, which means learning about your DBMS’s optimizer. Performance tuning involves some platform-independent general principles, but the most effective tuning relies on the idiosyncrasies of the specific DBMS. Tuning is beyond the scope of this book, but the internet has plenty of discussion groups and articles—search for tuning (or performance or optimization) together with the name of your DBMS.

A good book to get started with is Peter Gulutzan and Trudy Pelzer’s SQL Performance Tuning, which covers eight DBMSs, or Dan Tow’s SQL Tuning, which covers Microsoft SQL Server, Oracle, and Db2. If you look up one of these books on Amazon.com, then you can find other tuning books in the “customers also viewed” list.