In this chapter
Simple and Correlated Subqueries
Qualifying Column Names in Subqueries
Using Subqueries as Column Expressions
Comparing a Subquery Value by Using a Comparison Operator
Testing Set Membership with IN
Comparing All Subquery Values with ALL
To this point, I’ve used a single SELECT statement to retrieve data from one or more tables. This chapter describes nested queries, which let you retrieve or modify data based on another query’s result.
A subquery, or subselect, is a SELECT statement embedded in another SQL statement. You can nest a subquery in:
In general, you can use a subquery anywhere an expression is allowed, but your DBMS might restrict where they can appear. This chapter covers subqueries nested in a SELECT statement or another subquery; Chapter 10 covers subqueries embedded in INSERT, UPDATE, and DELETE statements.
This section defines some terms and introduces subqueries by giving an example of a SELECT statement that contains a simple subquery. Subsequent sections explain the types of subqueries and their syntax and semantics.
Suppose that you want to list the names of the publishers of biographies. The naive approach is to write two queries: one query to retrieve the IDs of all the biography publishers (Listing 8.1 and Figure 8.1) and a second query that uses the first query’s result to list the publisher names (Listing 8.2 and Figure 8.2).
Listing 8.1List the biography publishers. See Figure 8.1 for the result.
SELECT pub_id
FROM titles
WHERE type = 'biography';
Figure 8.1Result of Listing 8.1. You can add DISTINCT to the SELECT clause of Listing 8.1 to list the publishers only once; see “Eliminating Duplicate Rows with DISTINCT” in Chapter 4.
pub_id
------
P01
P03
P01
P01
Listing 8.2This query uses the result of Listing 8.1 to list the names of the biography publishers. See Figure 8.2 for the result.
SELECT pub_name
FROM publishers
WHERE pub_id IN ('P01', 'P03');
Figure 8.2Result of Listing 8.2.
pub_name
-------------------
Abatis Publishers
Schadenfreude Press
A better way is to use an inner join (Listing 8.3 and Figure 8.3); see “Creating an Inner Join with INNER JOIN” in Chapter 7.
Listing 8.3List the names of the biography publishers by using an inner join. See Figure 8.3 for the result.
SELECT DISTINCT pub_name
FROM publishers p
INNER JOIN titles t
ON p.pub_id = t.pub_id
WHERE t.type = 'biography';
Figure 8.3Result of Listing 8.3.
pub_name
-------------------
Abatis Publishers
Schadenfreude Press
Another alternative is to use a subquery (Listing 8.4 and Figure 8.4). The subquery in Listing 8.4 is highlighted. A subquery also is called an inner query, and the statement containing a subquery is called an outer query. In other words, an enclosed subquery is an inner query of an outer query. Remember that a subquery can be nested in another subquery, so inner and outer are relative terms in statements with multiple nested subqueries.
Listing 8.4List the names of the biography publishers by using a subquery. See Figure 8.4 for the result.
SELECT pub_name
FROM publishers
WHERE pub_id IN
(SELECT pub_id
FROM titles
WHERE type = 'biography');
Figure 8.4Result of Listing 8.4.
pub_name
-------------------
Abatis Publishers
Schadenfreude Press
I’ll explain how a DBMS executes subqueries in “Simple and Correlated Subqueries” later in this chapter, but for now, all that you need to know is that in Listing 8.4, the DBMS processes the inner query (highlighted text) first and then uses its interim result to run the outer query (plain text) and get the final result. The IN keyword that introduces the subquery tests for list membership and works like IN in “List Filtering with IN” in Chapter 4. Note that the inner query in Listing 8.4 is the same query as Listing 8.1, and the outer query is the same query as Listing 8.2.
MySQL 4.1 and later support subqueries, but earlier versions don’t. You can’t run the examples in this chapter if you’re using MySQL 4.0 or earlier, but you have a few choices, in order of preference:
The syntax of a subquery is the same as that of a normal SELECT statement (see Chapters 4 through 7) except for the following differences:
Depending on the context in which it’s used, a subquery might be required to return a limited number of rows or columns. The SQL standard categorizes a subquery by the number of rows and columns it returns (Table 8.1). In all cases, the subquery also can return an empty table (zero rows).
Subquery | Rows | Columns |
---|---|---|
Scalar subquery | 1 | 1 |
Row subquery | 1 | ≥1 |
Table subquery | ≥1 | ≥1 |
In practice, a subquery usually appears in a WHERE clause that takes one of these forms:
WHERE test_expr op (subquery)
WHERE test_expr [NOT] IN (subquery)
WHERE test_expr op ALL (subquery)
WHERE test_expr op ANY (subquery)
WHERE [NOT] EXISTS (subquery)
test_expr is a literal value, a column name, an expression, or a scalar subquery; op is a comparison operator (=, <>, <, <=, >, or >=); and subquery is a simple or correlated subquery. I’ll cover each of these forms later in this chapter. You can use these subquery forms in a HAVING clause, too.
In “Understanding Subqueries” earlier in this chapter, Listings 8.3 and 8.4 showed two equivalent queries: one used a join, and the other used a subquery. Many subqueries can be formulated alternatively as joins. In fact, a subquery is a way to relate one table to another without actually doing a join.
Because subqueries can be hard to use and debug, you might prefer to use joins, but you can pose some questions only as subqueries. In cases where you can use subqueries and joins interchangeably, you should test queries on your DBMS to see whether a performance difference exists between a statement that uses a subquery and a semantically equivalent statement that uses a join. For example, the query
SELECT MAX(table1.col1)
FROM table1
WHERE table1.col1 IN
(SELECT table2.col1
FROM table2);
usually will run faster than
SELECT MAX(table1.col1)
FROM table1
INNER JOIN table2
ON table1.col1 = table2.col1;
For more information, see “Comparing Equivalent Queries” later in this chapter.
The following syntax diagrams show some equivalent statements that use subqueries and joins. These two statements are equivalent (IN subquery):
SELECT *
FROM table1
WHERE id IN
(SELECT id FROM table2);
and (inner join):
SELECT DISTINCT table1.*
FROM table1
INNER JOIN table2
ON table1.id = table2.id;
See Listings 8.5a and 8.5b and Figure 8.5 for an example.
Listing 8.5aThis statement uses a subquery to list the authors who live in the same city in which a publisher is located. See Figure 8.5 for the result.
SELECT au_id, city
FROM authors
WHERE city IN
(SELECT city FROM publishers);
Listing 8.5bThis statement is equivalent to Listing 8.5a but uses an inner join instead of a subquery. See Figure 8.5 for the result.
SELECT DISTINCT a.au_id, a.city
FROM authors a
INNER JOIN publishers p
ON a.city = p.city;
Figure 8.5Result of Listings 8.5a and 8.5b.
au_id city
----- -------------
A03 San Francisco
A04 San Francisco
A05 New York
These three statements are equivalent (NOT IN subquery):
SELECT *
FROM table1
WHERE id NOT IN
(SELECT id FROM table2);
and (NOT EXISTS subquery):
SELECT *
FROM table1
WHERE NOT EXISTS
(SELECT *
FROM table2
WHERE table1.id = table2.id);
and (left outer join):
SELECT table1.*
FROM table1
LEFT OUTER JOIN table2
ON table1.id = table2.id
WHERE table2.id IS NULL;
See Listings 8.6a, 8.6b, and 8.6c and Figure 8.6 for an example. IN and EXISTS subqueries are covered later in this chapter.
Listing 8.6aThis statement uses an IN subquery to list the authors who haven’t written (or cowritten) a book. See Figure 8.6 for the result.
SELECT au_id, au_fname, au_lname
FROM authors
WHERE au_id NOT IN
(SELECT au_id FROM title_authors);
Listing 8.6bThis statement is equivalent to Listing 8.6a but uses an EXISTS subquery instead of an IN subquery. See Figure 8.6 for the result.
SELECT au_id, au_fname, au_lname
FROM authors a
WHERE NOT EXISTS
(SELECT *
FROM title_authors ta
WHERE a.au_id = ta.au_id);
Listing 8.6cThis statement is equivalent to Listings 8.6a and 8.6b but uses a left outer join instead of a subquery. See Figure 8.6 for the result.
SELECT a.au_id, a.au_fname, a.au_lname
FROM authors a
LEFT OUTER JOIN title_authors ta
ON a.au_id = ta.au_id
WHERE ta.au_id IS NULL;
Figure 8.6Result of Listings 8.6a, 8.6b, and 8.6c.
au_id au_fname au_lname
----- -------- -----------
A07 Paddy O'Furniture
You also can write a self-join as a subquery (Listings 8.7a and 8.7b and Figure 8.7). For information about self-joins, see “Creating a Self-Join” in Chapter 7.
Listing 8.7aThis statement uses a subquery to list the authors who live in the same state as author A04 (Klee Hull). See Figure 8.7 for the result.
SELECT au_id, au_fname, au_lname, state
FROM authors
WHERE state IN
(SELECT state
FROM authors
WHERE au_id = 'A04');
Listing 8.7bThis statement is equivalent to Listing 8.7a but uses an inner join instead of a subquery. See Figure 8.7 for the result.
SELECT a1.au_id, a1.au_fname,
a1.au_lname, a1.state
FROM authors a1
INNER JOIN authors a2
ON a1.state = a2.state
WHERE a2.au_id = 'A04';
Figure 8.7Result of Listings 8.7a and 8.7b.
au_id au_fname au_lname state
----- -------- -------- -----
A03 Hallie Hull CA
A04 Klee Hull CA
A06 Kellsey CA
Favor subqueries if you’re comparing an aggregate value to other values (Listing 8.8 and Figure 8.8). Without a subquery, you’d need two SELECT statements to list all the books with the highest price: one query to find the highest price and a second query to list all the books selling for that price. For information about aggregate functions, see Chapter 6.
Listing 8.8List all books whose price equals the highest book price. See Figure 8.8 for the result.
SELECT title_id, price
FROM titles
WHERE price =
(SELECT MAX(price)
FROM titles);
Figure 8.8Result of Listing 8.8.
title_id price
-------- -----
T03 39.95
Use joins when you include columns from multiple tables in the result. Listing 8.5b uses a join to retrieve authors who live in the same city in which a publisher is located. To include the publisher ID in the result, simply add the column pub_id to the SELECT-clause list (Listing 8.9 and Figure 8.9).
You can’t accomplish this same task with a subquery because it’s illegal to include a column in the outer query’s SELECT-clause list from a table that appears in only the inner query:
SELECT a.au_id, a.city, p.pub_id
FROM authors a
WHERE a.city IN
(SELECT p.city
FROM publishers p); --Illegal
Listing 8.9List the authors who live in the same city in which a publisher is located, and include the publisher in the result. See Figure 8.9 for the result.
SELECT a.au_id, a.city, p.pub_id
FROM authors a
INNER JOIN publishers p
ON a.city = p.city;
Figure 8.9Result of Listing 8.9.
au_id city pub_id
----- ------------- ------
A03 San Francisco P02
A04 San Francisco P02
A05 New York P01
You can use two types of subqueries:
A simple subquery, or noncorrelated subquery, is a subquery that can be evaluated independently of its outer query and is processed only once for the entire statement. All the subqueries in this chapter’s examples so far have been simple subqueries (except Listing 8.6b).
A correlated subquery can’t be evaluated independently of its outer query; it’s an inner query that depends on data from the outer query. A correlated subquery is used if a statement needs to process a table in the inner query for each row in the outer query.
Correlated subqueries have more-complicated syntax and a knottier execution sequence than simple subqueries, but you can use them to solve problems that you can’t solve with simple subqueries or joins. This section gives an example of a simple subquery and a correlated subquery and then describes how a DBMS executes each one. Subsequent sections in this chapter contain more examples of each type of subquery.
A DBMS evaluates a simple subquery by evaluating the inner query once and substituting its result into the outer query. A simple subquery executes prior to, and independent of, its outer query.
Let’s revisit Listing 8.5a from earlier in this chapter. Listing 8.10 (which is identical to Listing 8.5a) uses a simple subquery to list the authors who live in the same city in which a publisher is located; see Figure 8.10 for the result. Conceptually, a DBMS processes this query in two steps as two separate SELECT statements:
Listing 8.10List the authors who live in the same city in which a publisher is located. See Figure 8.10 for the result.
SELECT au_id, city
FROM authors
WHERE city IN
(SELECT city
FROM publishers);
Figure 8.10Result of Listing 8.10.
au_id city
----- -------------
A03 San Francisco
A04 San Francisco
A05 New York
Listing 8.11List the cities in which the publishers are located. See Figure 8.11 for the result.
SELECT city
FROM publishers;
Figure 8.11Result of Listing 8.11.
city
-------------
New York
San Francisco
Hamburg
Berkeley
Listing 8.12List the authors who live in one of the cities returned by Listing 8.11. See Figure 8.12 for the result.
SELECT au_id, city
FROM authors
WHERE city IN
('New York', 'San Francisco',
'Hamburg', 'Berkeley');
Figure 8.12Result of Listing 8.12.
au_id city
----- -------------
A03 San Francisco
A04 San Francisco
A05 New York
Correlated subqueries offer a more powerful data-retrieval mechanism than simple subqueries do. A correlated subquery’s important characteristics are:
The basic syntax of a query that contains a correlated subquery is:
SELECT outer_columns
FROM outer_table
WHERE outer_column_value IN
(SELECT inner_column
FROM inner_table
WHERE inner_column = outer_column)
Execution always starts with the outer query (plain text). The outer query selects each individual row of outer_table as a candidate row. For each candidate row, the DBMS executes the correlated inner query (highlighted text) once and flags the inner_table rows that satisfy the inner WHERE condition for the value outer_column_value. The DBMS tests the outer WHERE condition against the flagged inner_table rows and displays the flagged rows that satisfy this condition. This process continues until all the candidate rows have been processed.
Listing 8.13 uses a correlated subquery to list the books that have sales better than the average sales of books of its type; see Figure 8.13 for the result. candidate (following titles in the outer query) and average (following titles in the inner query) are alias table names for the table titles, so that the information can be evaluated as though it comes from two different tables (see “Creating a Self-Join” in Chapter 7).
Listing 8.13List the books that have sales greater than or equal to the average sales of books of its type. The correlation variable candidate.type defines the initial condition to be met by the rows of the inner table average. The outer WHERE condition (sales >=) defines the final test that the rows of the inner table average must satisfy. See Figure 8.13 for the result.
SELECT
candidate.title_id,
candidate.type,
candidate.sales
FROM titles candidate
WHERE sales >=
(SELECT AVG(sales)
FROM titles average
WHERE average.type = candidate.type);
Figure 8.13Result of Listing 8.13.
title_id type sales
-------- ---------- -------
T02 history 9566
T03 computer 25667
T05 psychology 201440
T07 biography 1500200
T09 children 5000
T13 history 10467
In Listing 8.13, the subquery can’t be resolved independently of the outer query. It needs a value for candidate.type, but this value is a correlation variable that changes as the DBMS examines different rows in the table candidate. The column average.type is said to correlate with candidate.type in the outer query. The average sales for a book type are calculated in the subquery by using the type of each book from the table in the outer query (candidate). The subquery computes the average sales for this type and then compares it with a row in the table candidate. If the sales in the table candidate are greater than or equal to average sales for the type, then that book is displayed in the result. A DBMS processes this query as follows:
The book type in the first row of candidate is used in the subquery to compute average sales.
Take the row for book T01, whose type is history, so the value in the column type in the first row of the table candidate is history. In effect, the subquery becomes:
SELECT AVG(sales)
FROM titles average
WHERE average.type = 'history';
This pass through the subquery yields a value of 6866—the average sales of history books. In the outer query, book T01’s sales of 566 are compared to the average sales of history books. T01’s sales are lower than average, so T01 isn’t displayed in the result.
Next, book T02’s row in candidate is evaluated.
T02 also is a history book, so the evaluated subquery is the same as in step 1:
SELECT AVG(sales)
FROM titles average
WHERE average.type = 'history';
This pass through the subquery again yields 6866 for the average sales of history books. Book T02’s sales of 9566 are higher than average, so T02 is displayed in the result.
Next, book T03’s row in candidate is evaluated.
T03 is a computer book, so this time, the evaluated subquery is:
SELECT AVG(sales)
FROM titles average
WHERE average.type = 'computer';
The result of this pass through the subquery is average sales of 25667 for computer books. Because book T03’s sales of 25667 equals the average (it’s the only computer book), T03 is displayed in the result.
If you can get the same result by using a simple subquery or a correlated subquery, then use the simple subquery because it probably will run faster. Listings 8.14a and 8.14b show two equivalent queries that list all authors who earn 100 percent (1.0) of the royalty share on a book. Listing 8.14a, which uses a simple subquery, is more efficient than Listing 8.14b, which uses a correlated subquery. In the simple subquery, the DBMS reads the inner table title_authors once. In the correlated subquery, the DBMS must loop through title_authors five times—once for each qualifying row in the outer table authors. See Figure 8.14 for the result.
Why do I say that a statement that uses a simple subquery probably will run faster than an equivalent statement that uses a correlated subquery when a correlated subquery clearly requires more work? Because your DBMS’s optimizer might be clever enough to recognize and reformulate a correlated subquery as a semantically equivalent simple subquery internally before executing the statement. For more information, see “Comparing Equivalent Queries” later in this chapter.
Listing 8.14bThis statement uses a simple subquery to list all authors who earn 100 percent (1.0) royalty on a book. See Figure 8.14 for the result.
SELECT au_id, au_fname, au_lname
FROM authors
WHERE au_id IN
(SELECT au_id
FROM title_authors
WHERE royalty_share = 1.0);
Listing 8.14bThis statement is equivalent to Listing 8.14a but uses a correlated subquery instead of a simple subquery. This query probably will run slower than Listing 8.14a. See Figure 8.14 for the result.
SELECT au_id, au_fname, au_lname
FROM authors
WHERE 1.0 IN
(SELECT royalty_share
FROM title_authors
WHERE title_authors.au_id =
authors.au_id);
Figure 8.14Result of Listings 8.14a and 8.14b.
au_id au_fname au_lname
----- --------- ---------
A01 Sarah Buchman
A02 Wendy Heydemark
A04 Klee Hull
A05 Christian Kells
A06 Kellsey
MySQL 4.0 and earlier don’t support subqueries; see the DBMS tip in “Tips for Subqueries” earlier in this chapter.
In older PostgreSQL versions, convert the floating-point numbers in Listings 8.14a and 8.14b to DECIMAL; see “Converting Data Types with CAST()” in Chapter 5. To run Listings 8.14a and 8.14b, change the floating-point literal in each listing to:
CAST(1.0 AS DECIMAL)
Recall from “Qualifying Column Names” in Chapter 7 that you can qualify a column name explicitly with a table name to identify the column unambiguously. In statements that contain subqueries, column names are qualified implicitly by the table referenced in the FROM clause at the same nesting level.
In Listing 8.15a, which lists the names of biography publishers, the column names are qualified implicitly, meaning:
Listing 8.15b shows Listing 8.15a with explicit qualifiers. See Figure 8.15 for the result.
Listing 8.15aThe tables publishers and titles both contain a column named pub_id, but you don’t have to qualify pub_id in this query because of the implicit assumptions about table names that SQL makes. See Figure 8.15 for the result.
SELECT pub_name
FROM publishers
WHERE pub_id IN
(SELECT pub_id
FROM titles
WHERE type = 'biography');
Listing 8.15bThis query is equivalent to Listing 8.15a, but with explicit qualification of pub_id. See Figure 8.15 for the result.
SELECT pub_name
FROM publishers
WHERE publishers.pub_id IN
(SELECT titles.pub_id
FROM titles
WHERE type = 'biography');
Figure 8.15Result of Listings 8.15a and 8.15b.
pub_name
-------------------
Abatis Publishers
Schadenfreude Press
Beware of nulls; their presence complicates subqueries. If you don’t eliminate them when they’re present, then you might get an unexpected answer.
A subquery can hide a comparison to a null. Recall from “Nulls” in Chapter 3 that nulls don’t equal each other and that you can’t determine whether a null matches any other value. The following example involves a NOT IN subquery (see “Testing Set Membership with IN” later in this chapter). Consider the following two tables, each with one column. The first table is named table1:
col
----
1
2
The second table is named table2:
col
----
1
2
3
If I run Listing 8.16 to list the values in table2 that aren’t in table1, then I get Figure 8.16a, as expected.
Listing 8.16List the values in table2 that aren’t in table1. See Figures 8.16a and 8.16b for the result.
SELECT col
FROM table2
WHERE col NOT IN
(SELECT col
FROM table1);
Figure 8.16aResult of Listing 8.16 when table1 doesn’t contain a null.
col
----
3
Now add a null to table1:
col
----
1
2
NULL
If I rerun Listing 8.16, then I get Figure 8.16b (an empty table), which is correct logically but not what I expected.
Figure 8.16bResult of Listing 8.16 when table1 contains a null. This result is an empty table, which is correct logically but not what I expected.
col
----
Why is the result empty this time? The solution requires some algebra. I can move the NOT outside the subquery condition without changing the meaning of Listing 8.16:
SELECT col
FROM table2
WHERE NOT col IN
(SELECT col FROM table1);
The IN clause determines whether a value in table2 matches any value in table1, so I can rewrite the subquery as a compound condition:
SELECT col
FROM table2
WHERE NOT ((col = 1)
OR (col = 2)
OR (col = NULL));
If I apply De Morgan’s Laws (refer to Table 4.6 in Chapter 4), then this query becomes:
SELECT col
FROM table2
WHERE (col <> 1)
AND (col <> 2)
AND (col <> NULL);
The final expression, col <> NULL, always is unknown. Refer to the AND truth table (Table 4.3 in Chapter 4), and you’ll see that the entire WHERE search condition reduces to unknown, which always is rejected by WHERE.
To fix Listing 8.16 so that it doesn’t examine the null in table1, add an IS NOT NULL condition to the subquery (see “Testing for Nulls with IS NULL” in Chapter 4):
SELECT col
FROM table2
WHERE col NOT IN
(SELECT col
FROM table1
WHERE col IS NOT NULL);
In Chapters 4, 5, and 6, you learned that the items in a SELECT-clause list can be literals, column names, or more-complex expressions. SQL also lets you to embed a subquery in a SELECT-clause list.
A subquery that’s used as a column expression must be a scalar subquery. Recall from Table 8.1 in “Subquery Syntax” earlier in this chapter that a scalar subquery returns a single value (that is, a one-row, one-column result). In most cases, you’ll have to use an aggregate function or restrictive WHERE conditions in the subquery to guarantee that the subquery returns only one row.
The syntax for the SELECT-clause list is the same as you’ve been using all along, except that you can specify a parenthesized subquery as one of the column expressions in the list, as the following examples show.
Listing 8.17 uses two simple subqueries as column expressions to list each biography, its price, the average price of all books (not just biographies), and the difference between the price of the biography and the average price of all books. The aggregate function AVG() guarantees that each subquery returns a single value. See Figure 8.17 for the result. Remember that AVG() ignores nulls when computing an average; see “Calculating an Average with AVG()” in Chapter 6.
Listing 8.17List each biography, its price, the average price of all books, and the difference between the price of the biography and the average price of all books. See Figure 8.17 for the result.
SELECT title_id, price,
(SELECT AVG(price) FROM titles)
AS "AVG(price)",
price - (SELECT AVG(price) FROM titles)
AS "Difference"
FROM titles
WHERE type='biography';
Figure 8.17Result of Listing 8.17.
title_id price AVG(price) Difference
-------- ------- ---------- ----------
T06 19.95 18.3875 1.5625
T07 23.95 18.3875 5.5625
T10 NULL 18.3875 NULL
T12 12.99 18.3875 -5.3975
Listing 8.18 uses correlated subqueries to list all the authors of each book in one row, as you’d view them in a report or spreadsheet. See Figure 8.18 for the result. Note that in each WHERE clause, SQL qualifies title_id implicitly with the table alias ta referenced in the subquery’s FROM clause; see “Qualifying Column Names in Subqueries” earlier in this chapter. For a more efficient way to implement this query, see “Tips for Column Expressions” later in this section. See Listing 15.8 in Chapter 15 for the reverse of this query.
Listing 8.18List all the authors of each book in one row. See Figure 8.18 for the result.
SELECT title_id,
(SELECT au_id
FROM title_authors ta
WHERE au_order = 1
AND title_id = t.title_id)
AS "Author 1",
(SELECT au_id
FROM title_authors ta
WHERE au_order = 2
AND title_id = t.title_id)
AS "Author 2",
(SELECT au_id
FROM title_authors ta
WHERE au_order = 3
AND title_id = t.title_id)
AS "Author 3"
FROM titles t;
Figure 8.18Result of Listing 8.18.
title_id Author 1 Author 2 Author 3
-------- -------- -------- --------
T01 A01 NULL NULL
T02 A01 NULL NULL
T03 A05 NULL NULL
T04 A03 A04 NULL
T05 A04 NULL NULL
T06 A02 NULL NULL
T07 A02 A04 NULL
T08 A06 NULL NULL
T09 A06 NULL NULL
T10 A02 NULL NULL
T11 A06 A03 A04
T12 A02 NULL NULL
T13 A01 NULL NULL
In Listing 8.19, I revisit Listing 7.30 in “Creating Outer Joins with OUTER JOIN” in Chapter 7, but this time, I’m using a correlated subquery instead of an outer join to list the number of books that each author wrote (or cowrote). See Figure 8.19 for the result.
Listing 8.19List the number of books that each author wrote (or cowrote), including authors who have written no books. See Figure 8.19 for the result.
SELECT au_id,
(SELECT COUNT(*)
FROM title_authors ta
WHERE ta.au_id = a.au_id)
AS "Num books"
FROM authors a
ORDER BY au_id;
Figure 8.19Result of Listing 8.19.
au_id Num books
----- ---------
A01 3
A02 4
A03 2
A04 4
A05 1
A06 3
A07 0
Listing 8.20 uses a correlated subquery to list each author and the latest date on which he or she published a book. You should qualify every column name explicitly in a subquery that contains a join to make it clear which table is referenced (even when qualifiers are unnecessary). See Figure 8.20 for the result.
Listing 8.20List each author and the latest date on which he or she published a book. See Figure 8.20 for the result.
SELECT au_id,
(SELECT MAX(pubdate)
FROM titles t
INNER JOIN title_authors ta
ON ta.title_id = t.title_id
WHERE ta.au_id = a.au_id)
AS "Latest pub date"
FROM authors a;
Figure 8.20Result of Listing 8.20.
au_id Latest pub date
----- ---------------
A01 2000-08-01
A02 2000-08-31
A03 2000-11-30
A04 2001-01-01
A05 2000-09-01
A06 2002-05-31
A07 NULL
Listing 8.21 uses a correlated subquery to compute the running total of all book sales. A running total, or running sum, is a common calculation: For each book, I want to compute the sum of all sales of the books that precede the book. Here, I’m defining precede to mean those books whose title_id comes before the current book’s title_id alphabetically. Note the use of table aliases to refer to the same table in two contexts. The subquery returns the sum of sales for all books preceding the current book, which is denoted by t1.title_id. See Figure 8.21 for the result. See also “Calculating Running Statistics” in Chapter 15.
Listing 8.21Compute the running sum of all book sales. See Figure 8.21 for the result.
SELECT t1.title_id, t1.sales,
(SELECT SUM(t2.sales)
FROM titles t2
WHERE t2.title_id <= t1.title_id)
AS "Running total"
FROM titles t1;
Figure 8.21Result of Listing 8.21.
title_id sales Running total
-------- ------- -------------
T01 566 566
T02 9566 10132
T03 25667 35799
T04 13001 48800
T05 201440 250240
T06 11320 261560
T07 1500200 1761760
T08 4095 1765855
T09 5000 1770855
T10 NULL 1770855
T11 94123 1864978
T12 100001 1964979
T13 10467 1975446
You also can use a subquery in a FROM clause. In “Tips for DISTINCT” in Chapter 6, I used a FROM subquery to replicate a distinct aggregate function. Listing 8.22 uses a FROM subquery to calculate the greatest number of titles written (or cowritten) by any author. See Figure 8.22 for the result. Note that the outer query uses a table alias (ta) and column alias (count_titles) to reference the inner query’s result. See also “Column aliases and WHERE” in Chapter 4.
Listing 8.22Calculate the greatest number of titles written (or cowritten) by any author. See Figure 8.22 for the result.
SELECT MAX(ta.count_titles) AS "Max titles"
FROM (SELECT COUNT(*) AS count_titles
FROM title_authors
GROUP BY au_id) ta;
Figure 8.22Result of Listing 8.22.
Max titles
----------
4
Use CASE expressions instead of correlated subqueries to implement Listing 8.18 more efficiently (see “Evaluating Conditional Values with CASE” in Chapter 5):
SELECT title_id,
MIN(CASE au_order WHEN 1
THEN au_id END)
AS "Author 1",
MIN(CASE au_order WHEN 2
THEN au_id END)
AS "Author 2",
MIN(CASE au_order WHEN 3
THEN au_id END)
AS "Author 3"
FROM title_authors
GROUP BY title_id
ORDER BY title_id ASC;
MySQL 4.0 and earlier don’t support subqueries; see the DBMS tip in “Tips for Subqueries” earlier in this chapter.
In Microsoft Access, you must increase the precision of the average-price calculation in Listing 8.17. Use the type-conversion function CDbl() to coerce the average price to a double-precision floating-point number; see the DBMS tip in “Tips for Converting Data Types” in Chapter 5. To run Listing 8.17, change both occurrences of AVG(price)
to CDbl(AVG(price))
.
You can use a subquery as a filter in a WHERE clause or HAVING clause by using one of the comparison operators (=, <>, <, <=, >, or >=).
The important characteristics of a subquery comparison test are:
The hard part of writing these statements is getting the subquery to return one value, which you can guarantee several ways:
To compare a subquery value:
In the WHERE clause of a SELECT statement, type:
WHERE test_expr op (subquery)
test_expr is a literal value, a column name, an expression, or a subquery that returns a single value; op is a comparison operator (=, <>, <, <=, >, or >=); and subquery is a scalar subquery that returns exactly one column and zero or one rows.
If the value returned by subquery satisfies the comparison to test_expr, then the comparison condition evaluates to true. The comparison condition is false if the subquery value doesn’t satisfy the condition, the subquery value is null, or the subquery result is empty (has zero rows).
The same syntax applies to a HAVING clause:
HAVING test_expr op (subquery)
Listing 8.23 tests the result of a simple subquery for equality to list the authors who live in the state in which Tenterhooks Press is located. Only one publisher is named Tenterhooks Press, so the inner WHERE condition guarantees that the inner query returns a single-valued result. See Figure 8.23 for the result.
Listing 8.23List the authors who live in the state in which the publisher Tenterhooks Press is located. See Figure 8.23 for the result.
SELECT au_id, au_fname, au_lname, state
FROM authors
WHERE state =
(SELECT state
FROM publishers
WHERE pub_name = 'Tenterhooks Press');
Figure 8.23Result of Listing 8.23.
au_id au_fname au_lname state
----- -------- -------- -----
A03 Hallie Hull CA
A04 Klee Hull CA
A06 Kellsey CA
Listing 8.24 is the same as Listing 8.23 except for the name of the publisher. No publisher named XXX exists, so the subquery returns an empty table (zero rows). The comparison evaluates to null, so the final result is empty. See Figure 8.24 for the result.
Listing 8.24List the authors who live in the state in which the publisher XXX is located. See Figure 8.24 for the result.
SELECT au_id, au_fname, au_lname, state
FROM authors
WHERE state =
(SELECT state
FROM publishers
WHERE pub_name = 'XXX');
Figure 8.24Result of Listing 8.24 (an empty table).
au_id au_fname au_lname state
----- -------- -------- -----
Listing 8.25 lists the books with above-average sales. Subqueries introduced with comparison operators often use aggregate functions to return a single value. See Figure 8.25 for the result.
Listing 8.25List the books with above-average sales. See Figure 8.25 for the result.
SELECT title_id, sales
FROM titles
WHERE sales >
(SELECT AVG(sales)
FROM titles);
Figure 8.25Result of Listing 8.25.
title_id sales
-------- -------
T05 201440
T07 1500200
To list the authors of the books with above-average sales, I’ve added an inner join to Listing 8.25 (Listing 8.26 and Figure 8.26).
Listing 8.26List the authors of the books with above-average sales by using a join and a subquery. See Figure 8.26 for the result.
SELECT ta.au_id, ta.title_id
FROM titles t
INNER JOIN title_authors ta
ON ta.title_id = t.title_id
WHERE sales >
(SELECT AVG(sales)
FROM titles)
ORDER BY ta.au_id ASC, ta.title_id ASC;
Figure 8.26Result of Listing 8.26.
au_id title_id
----- --------
A02 T07
A04 T05
A04 T07
Recall from the introduction to this chapter that you can use a subquery almost anywhere an expression is allowed, so this syntax is valid:
WHERE (subquery) op (subquery)
The left subquery must return a single value. Listing 8.27 is equivalent to Listing 8.26, but I’ve removed the inner join and instead placed a correlated subquery to the left of the comparison operator. See Figure 8.27 for the result.
Listing 8.27List the authors of the books with above-average sales by using two subqueries. See Figure 8.27 for the result.
SELECT au_id, title_id
FROM title_authors ta
WHERE
(SELECT AVG(sales)
FROM titles t
WHERE ta.title_id = t.title_id)
>
(SELECT AVG(sales)
FROM titles)
ORDER BY au_id ASC, title_id ASC;
Figure 8.27Result of Listing 8.27.
au_id title_id
----- --------
A02 T07
A04 T05
A04 T07
You can include GROUP BY or HAVING clauses in a subquery if you know that the GROUP BY or HAVING clause itself returns a single value. Listing 8.28 lists the books priced higher than the highest-priced biography. See Figure 8.28 for the result.
Listing 8.28List the books priced higher than the highest-priced biography. See Figure 8.28 for the result.
SELECT title_id, price
FROM titles
WHERE price >
(SELECT MAX(price)
FROM titles
GROUP BY type
HAVING type = 'biography');
Figure 8.28Result of Listing 8.28.
title_id price
-------- -----
T03 39.95
T13 29.99
Listing 8.29 uses a subquery in a HAVING clause to list the publishers whose average sales exceed overall average sales. Again, the subquery returns a single value (the average of all sales). See Figure 8.29 for the result.
Listing 8.29List the publishers whose average sales exceed the overall average sales. See Figure 8.29 for the result.
SELECT pub_id, AVG(sales) AS "AVG(sales)"
FROM titles
GROUP BY pub_id
HAVING AVG(sales) >
(SELECT AVG(sales)
FROM titles);
Figure 8.29Result of Listing 8.29.
pub_id AVG(sales)
------ ----------
P03 506744.33
Listing 8.30 uses a correlated subquery to list authors whose royalty share is less than the highest royalty share of any coauthor of a book. The outer query selects the rows of title_authors (that is, of ta1) one by one. The subquery calculates the highest royalty share for each book being considered for selection in the outer query. For each possible value of ta1, the DBMS evaluates the subquery and puts the row being considered in the result if the royalty share is less than the calculated maximum. See Figure 8.30 for the result.
Listing 8.30List authors whose royalty share is less than the highest royalty share of any coauthor of a book. See Figure 8.30 for the result.
SELECT ta1.au_id, ta1.title_id,
ta1.royalty_share
FROM title_authors ta1
WHERE ta1.royalty_share <
(SELECT MAX(ta2.royalty_share)
FROM title_authors ta2
WHERE ta1.title_id = ta2.title_id);
Figure 8.30Result of Listing 8.30.
au_id title_id royalty_share
----- -------- -------------
A04 T04 0.40
A03 T11 0.30
A04 T11 0.30
Listing 8.31 uses a correlated subquery to imitate a GROUP BY clause and list all books that have a price greater than the average for books of its type. For each possible value of t1, the DBMS evaluates the subquery and includes the row in the result if the price value in that row exceeds the calculated average. It’s unnecessary to group by type explicitly because the rows for which the average price is calculated are restricted by the subquery’s WHERE clause. See Figure 8.31 for the result.
Listing 8.31List all books that have a price greater than the average for books of its type. See Figure 8.31 for the result.
SELECT type, title_id, price
FROM titles t1
WHERE price >
(SELECT AVG(t2.price)
FROM titles t2
WHERE t1.type = t2.type)
ORDER BY type ASC, title_id ASC;
Figure 8.31Result of Listing 8.31.
type title_id price
---------- -------- -----
biography T06 19.95
biography T07 23.95
children T09 13.95
history T13 29.99
psychology T04 12.99
Listing 8.32 uses the same structure as Listing 8.31 to list all the books whose sales are less than the best-selling books of their types. See Figure 8.32 for the result.
Listing 8.32List all the books whose sales are less than the best-selling books of their types. See Figure 8.32 for the result.
SELECT type, title_id, sales
FROM titles t1
WHERE sales <
(SELECT MAX(sales)
FROM titles t2
WHERE t1.type = t2.type
AND sales IS NOT NULL)
ORDER BY type ASC, title_id ASC;
Figure 8.32Result of Listing 8.32.
type title_id sales
---------- -------- ------
biography T06 11320
biography T12 100001
children T08 4095
history T01 566
history T02 9566
psychology T04 13001
psychology T11 94123
“List Filtering with IN” in Chapter 4 describes how to use the IN keyword in a WHERE clause to compare a literal, column value, or more-complex expression to a list of values. You also can use a subquery to generate the list.
The important characteristics of a subquery set membership test are:
To test set membership:
In the WHERE clause of a SELECT statement, type:
WHERE test_expr [NOT] IN (subquery)
test_expr is a literal value, a column name, an expression, or a subquery that returns a single value; and subquery is a subquery that returns one column and zero or more rows.
If the value of test_expr equals any value returned by subquery, then the IN condition evaluates to true. The IN condition is false if the subquery result is empty, if no row in the subquery result matches test_expr, or if all the values in the subquery result are null. Specify NOT to negate the condition’s result.
The same syntax applies to a HAVING clause:
HAVING test_expr [NOT] IN (subquery)
Listing 8.33 lists the names of the publishers that have published biographies. The DBMS evaluates this statement in two steps. First, the inner query returns the IDs of the publishers that have published biographies (P01 and P03). Second, the DBMS substitutes these values into the outer query, which finds the names that go with the IDs in the table publishers. See Figure 8.33 for the result.
Listing 8.33List the names of the publishers that have published biographies. See Figure 8.33 for the result.
SELECT pub_name
FROM publishers
WHERE pub_id IN
(SELECT pub_id
FROM titles
WHERE type = 'biography');
Figure 8.33Result of Listing 8.33.
pub_name
-------------------
Abatis Publishers
Schadenfreude Press
Here’s the join version of Listing 8.33:
SELECT DISTINCT pub_name
FROM publishers p
INNER JOIN titles t
ON p.pub_id = t.pub_id
AND type = 'biography';
Listing 8.34 is the same as Listing 8.33, except that it uses NOT IN to list the names of the publishers that haven’t published biographies. See Figure 8.34 for the result. This statement can’t be converted to a join. The analogous not-equal join has a different meaning: It lists the names of publishers that have published some book that isn’t a biography.
Listing 8.34List the names of the publishers that haven’t published biographies. See Figure 8.34 for the result.
SELECT pub_name
FROM publishers
WHERE pub_id NOT IN
(SELECT pub_id
FROM titles
WHERE type = 'biography');
Figure 8.34Result of Listing 8.34.
pub_name
-----------------
Core Dump Books
Tenterhooks Press
Listing 8.35 is equivalent to Listing 7.31 in Chapter 7, except that it uses a subquery instead of an outer join to list the authors who haven’t written (or cowritten) a book. See Figure 8.35 for the result.
Listing 8.35List the authors who haven’t written (or cowritten) a book. See Figure 8.35 for the result.
SELECT au_id, au_fname, au_lname
FROM authors
WHERE au_id NOT IN
(SELECT au_id
FROM title_authors);
Figure 8.35Result of Listing 8.35.
au_id au_fname au_lname
----- -------- -----------
A07 Paddy O'Furniture
Listing 8.36 lists the names of the authors who have published a book with publisher P03 (Schadenfreude Press). The join to the table authors is necessary to include the authors’ names (not just their IDs) in the result. See Figure 8.36 for the result.
Listing 8.36List the names of the authors who have published a book with publisher P03. See Figure 8.36 for the result.
SELECT DISTINCT a.au_id, au_fname, au_lname
FROM title_authors ta
INNER JOIN authors a
ON ta.au_id = a.au_id
WHERE title_id IN
(SELECT title_id
FROM titles
WHERE pub_id = 'P03');
Figure 8.36Result of Listing 8.36.
au_id au_fname au_lname
----- -------- ---------
A01 Sarah Buchman
A02 Wendy Heydemark
A04 Klee Hull
A subquery can itself include one or more subqueries. Listing 8.37 lists the names of authors who have participated in writing at least one biography. The innermost query returns the title IDs T06, T07, T10, and T12. The DBMS evaluates the subquery at the next higher level by using these title IDs and returns the author IDs. Finally, the outermost query uses the author IDs to find the names of the authors. See Figure 8.37 for the result.
Listing 8.37List the names of authors who have participated in writing at least one biography. See Figure 8.37 for the result.
SELECT au_id, au_fname, au_lname
FROM authors
WHERE au_id IN
(SELECT au_id
FROM title_authors
WHERE title_id IN
(SELECT title_id
FROM titles
WHERE type = 'biography'));
Figure 8.37Result of Listing 8.37.
au_id au_fname au_lname
----- -------- ---------
A02 Wendy Heydemark
A04 Klee Hull
Excessive subquery nesting makes a statement hard to read; often, it’s easier to restate the query as a join. Here’s the join version of Listing 8.37:
SELECT DISTINCT a.au_id, au_fname, au_lname
FROM authors a
INNER JOIN title_authors ta
ON a.au_id = ta.au_id
INNER JOIN titles t
ON t.title_id = ta.title_id
WHERE type = 'biography';
Listing 8.38 lists the names of all non-lead authors (au_order > 1) who live in California and who receive less than 50 percent of the royalties for a book. See Figure 8.38 for the result.
Listing 8.38List the names of all ancillary authors who live in California and who receive less than 50 percent of the royalties for a book. See Figure 8.38 for the result.
SELECT au_id, au_fname, au_lname
FROM authors
WHERE state = 'CA'
AND au_id IN
(SELECT au_id
FROM title_authors
WHERE royalty_share < 0.5
AND au_order > 1);
Figure 8.38Result of Listing 8.38.
au_id au_fname au_lname
----- -------- --------
A03 Hallie Hull
A04 Klee Hull
Here’s the join version of Listing 8.38:
SELECT DISTINCT a.au_id, au_fname, au_lname
FROM authors a
INNER JOIN title_authors ta
ON a.au_id = ta.au_id
WHERE state = 'CA'
AND royalty_share < 0.5
AND au_order > 1;
Listing 8.39 lists the names of authors who are coauthors of a book. To determine whether an author is a coauthor or the sole author of a book, examine his or her royalty share for the book. If the royalty share is less than 100 percent (1.0), then the author is a coauthor; otherwise, he or she is the sole author. See Figure 8.39 for the result.
Listing 8.39List the names of authors who are coauthors of a book. See Figure 8.39 for the result.
SELECT au_id, au_fname, au_lname
FROM authors a
WHERE au_id IN
(SELECT au_id
FROM title_authors
WHERE royalty_share < 1.0);
Figure 8.39Result of Listing 8.39.
au_id au_fname au_lname
----- -------- ---------
A02 Wendy Heydemark
A03 Hallie Hull
A04 Klee Hull
A06 Kellsey
Listing 8.40 uses a correlated subquery to list the names of authors who are sole authors of a book—that is, authors who earn 100 percent (1.0) of the royalty on a book. See Figure 8.40 for the result. The DBMS considers each row in the outer-query table authors to be a candidate for inclusion in the result. When the DBMS examines the first candidate row in authors, it sets the correlation variable a.au_id equal to A01 (Sarah Buchman), which it substitutes into the inner query:
SELECT royalty_share
FROM title_authors ta
WHERE ta.au_id = 'A01';
The inner query returns 1.0, so the outer query evaluates to:
SELECT a.au_id, au_fname, au_lname
FROM authors a
WHERE 1.0 IN (1.0)
The WHERE condition is true, so author A01 is included in the result. The DBMS repeats this procedure for every author; see “Simple and Correlated Subqueries” earlier in this chapter.
Listing 8.40List the names of authors who are sole authors of a book. See Figure 8.40 for the result.
SELECT a.au_id, au_fname, au_lname
FROM authors a
WHERE 1.0 IN
(SELECT royalty_share
FROM title_authors ta
WHERE ta.au_id = a.au_id);
Figure 8.40Result of Listing 8.40.
au_id au_fname au_lname
----- --------- ---------
A01 Sarah Buchman
A02 Wendy Heydemark
A04 Klee Hull
A05 Christian Kells
A06 Kellsey
Listing 8.41 lists the names of authors who are both coauthors and sole authors. The inner query returns the author IDs of sole authors, and the outer query compares these IDs with the IDs of the coauthors. See Figure 8.41 for the result.
Listing 8.41List the names of authors who are both coauthors and sole authors. See Figure 8.41 for the result.
SELECT DISTINCT a.au_id, au_fname, au_lname
FROM authors a
INNER JOIN title_authors ta
ON a.au_id = ta.au_id
WHERE ta.royalty_share < 1.0
AND a.au_id IN
(SELECT a.au_id
FROM authors a
INNER JOIN title_authors ta
ON a.au_id = ta.au_id
AND ta.royalty_share = 1.0);
Figure 8.41Result of Listing 8.41.
au_id au_fname au_lname
----- -------- ---------
A02 Wendy Heydemark
A04 Klee Hull
A06 Kellsey
You can rewrite Listing 8.41 as a join or as an intersection. Here’s the join version of Listing 8.41:
SELECT DISTINCT a.au_id, au_fname, au_lname
FROM authors a
INNER JOIN title_authors ta1
ON a.au_id = ta1.au_id
INNER JOIN title_authors ta2
ON a.au_id = ta2.au_id
WHERE ta1.royalty_share < 1.0
AND ta2.royalty_share = 1.0;
Here’s the intersection version of Listing 8.41 (see “Finding Common Rows with INTERSECT” in Chapter 9):
SELECT DISTINCT a.au_id, au_fname, au_lname
FROM authors a
INNER JOIN title_authors ta
ON a.au_id = ta.au_id
WHERE ta.royalty_share < 1.0
INTERSECT
SELECT DISTINCT a.au_id, au_fname, au_lname
FROM authors a
INNER JOIN title_authors ta
ON a.au_id = ta.au_id
WHERE ta.royalty_share = 1.0;
Listing 8.42 uses a correlated subquery to list the types of books published by more than one publisher. See Figure 8.42 for the result.
Listing 8.42List the types of books common to more than one publisher. See Figure 8.42 for the result.
SELECT DISTINCT t1.type
FROM titles t1
WHERE t1.type IN
(SELECT t2.type
FROM titles t2
WHERE t1.pub_id <> t2.pub_id);
Figure 8.42Result of Listing 8.42.
type
---------
biography
history
Here’s the self-join version of Listing 8.42:
SELECT DISTINCT t1.type
FROM titles t1
INNER JOIN titles t2
ON t1.type = t2.type
AND t1.pub_id <> t2.pub_id;
To run Listing 8.41 in Microsoft Access, type:
SELECT DISTINCT a.au_id, au_fname, au_lname
FROM (authors AS a
INNER JOIN title_authors AS ta1
ON a.au_id = ta1.au_id)
INNER JOIN title_authors AS ta2
ON a.au_id = ta2.au_id
WHERE ta1.royalty_share < 1.0
AND ta2.royalty_share = 1.0;
MySQL 4.0 and earlier don’t support subqueries; see the DBMS tip in “Tips for Subqueries” earlier in this chapter.
In older PostgreSQL versions, convert the floating-point numbers in Listings 8.38 through 8.41 to DECIMAL; see “Converting Data Types with CAST()” in Chapter 5. To run Listings 8.38 through 8.41, change the floating-point literals to (Listing 8.38):
CAST(0.5 AS DECIMAL)
and (Listing 8.39):
CAST(1.0 AS DECIMAL)
and (Listing 8.40):
CAST(1.0 AS DECIMAL)
and (Listing 8.41):
CAST(1.0 AS DECIMAL)
(in two places)
Some DBMSs let you test multiple values simultaneously by using this syntax:
SELECT columns
FROM table1
WHERE (col1, col2,..., colN) IN
(SELECT colA, colB,..., colN
FROM table2);
The test expression (left of IN) is a parenthesized list of table1 columns. The subquery returns the same number of columns as there are in the list. The DBMS compares the values in corresponding columns. The following query, for example, works in Oracle, Db2, MySQL, and PostgreSQL:
SELECT au_id, city, state
FROM authors
WHERE (city, state) IN
(SELECT city, state
FROM publishers);
The result lists the authors who live in the same city and state as some publisher:
au_id city state
----- ------------- -----
A03 San Francisco CA
A04 San Francisco CA
A05 New York NY
You can use the ALL keyword to determine whether a value is less than or greater than all the values in a subquery result.
The important characteristics of subquery comparisons that use ALL are:
When ALL is used with <, <=, >, or >=, the comparison is equivalent to evaluating the subquery result’s minimum or maximum value. <ALL means less than every subquery value—in other words, less than the minimum value. >ALL means greater than every subquery value—that is, greater than the maximum value. Table 8.2 shows equivalent ALL expressions and column functions. Listing 8.45 later in this section shows how to replicate a >ALL query by using MAX().
ALL Expression | Column Function |
---|---|
< ALL(subquery) | < MIN(subquery values) |
> ALL(subquery) | > MAX(subquery values) |
Semantic equivalence doesn’t mean that two queries will run at the same speed. For example, the query
SELECT * FROM table1
WHERE col1 > ANY
(SELECT MAX(col1) FROM table2);
usually is faster than
SELECT * FROM table1
WHERE col1 > ALL
(SELECT col1 FROM table2);
For more information, see “Comparing Equivalent Queries” later in this chapter.
To compare all subquery values:
In the WHERE clause of a SELECT statement, type:
WHERE test_expr op ALL (subquery)
test_expr is a literal value, a column name, an expression, or a subquery that returns a single value; op is a comparison operator (=, <>, <, <=, >, or >=); and subquery is a subquery that returns one column and zero or more rows.
The ALL condition evaluates to true if all values in subquery satisfy the ALL condition or if the subquery result is empty (has zero rows). The ALL condition is false if any (at least one) value in subquery doesn’t satisfy the ALL condition or if any value is null.
The same syntax applies to a HAVING clause:
HAVING test_expr op ALL (subquery)
Listing 8.43 lists the authors who live in a city in which no publisher is located. The inner query finds all the cities in which publishers are located, and the outer query compares each author’s city to all the publishers’ cities. See Figure 8.43 for the result.
Listing 8.43List the authors who live in a city in which no publisher is located. See Figure 8.43 for the result.
SELECT au_id, au_lname, au_fname, city
FROM authors
WHERE city <> ALL
(SELECT city
FROM publishers);
Figure 8.43Result of Listing 8.43.
au_id au_lname au_fname city
----- ----------- -------- ---------
A01 Buchman Sarah Bronx
A02 Heydemark Wendy Boulder
A06 Kellsey Palo Alto
A07 O'Furniture Paddy Sarasota
You can use NOT IN to replicate Listing 8.43:
SELECT au_id, au_lname, au_fname, city
FROM authors
WHERE city NOT IN
(SELECT city FROM publishers);
Listing 8.44 lists the nonbiographies that are priced less than all the biographies. The inner query finds all the biography prices. The outer query inspects the lowest price in the list and determines whether each non-biography is cheaper. See Figure 8.44 for the result. The price IS NOT NULL condition is required because the price of biography T10 is null. Without this condition, the entire query would return zero rows because it’s impossible to determine whether a price is less than null (see “Nulls” in Chapter 3).
Listing 8.44List the nonbiographies that are cheaper than all the biographies. See Figure 8.44 for the result.
SELECT title_id, title_name
FROM titles
WHERE type <> 'biography'
AND price < ALL
(SELECT price
FROM titles
WHERE type = 'biography'
AND price IS NOT NULL);
Figure 8.44Result of Listing 8.44.
title_id title_name
-------- --------------------------------
T05 Exchange of Platitudes
T08 Just Wait Until After School
T11 Perhaps It's a Glandular Problem
Listing 8.45 lists the books that outsold all the books that author A06 wrote (or cowrote). The inner query uses a join to find the sales of each book by author A06. The outer query inspects the highest sales figure in the list and determines whether each book sold more copies. See Figure 8.45 for the result. Again, the IS NOT NULL condition is needed in case sales is null for a book by author A06.
Listing 8.45List the books that outsold all the books that author A06 wrote (or cowrote). See Figure 8.45 for the result.
SELECT title_id, title_name
FROM titles
WHERE sales > ALL
(SELECT sales
FROM title_authors ta
INNER JOIN titles t
ON t.title_id = ta.title_id
WHERE ta.au_id = 'A06'
AND sales IS NOT NULL);
Figure 8.45Result of Listing 8.45.
title_id title_name
-------- -------------------------
T05 Exchange of Platitudes
T07 I Blame My Mother
T12 Spontaneous, Not Annoying
I can replicate Listing 8.45 by using GROUP BY, HAVING, and MAX() (instead of ALL):
SELECT title_id
FROM titles
GROUP BY title_id
HAVING MAX(sales) >
(SELECT MAX(sales)
FROM title_authors ta
INNER JOIN titles t
ON t.title_id = ta.title_id
WHERE ta.au_id = 'A06');
Listing 8.46 uses a correlated subquery in the HAVING clause of the outer query to list the types of books for which the highest sales figure is more than twice the average sales for that type. The inner query is evaluated once for each group defined in the outer query (once for each type of book). See Figure 8.46 for the result.
Listing 8.46List the types of books for which the highest sales figure is more than twice the average sales for that type. See Figure 8.46 for the result.
SELECT t1.type
FROM titles t1
GROUP BY t1.type
HAVING MAX(t1.sales) >= ALL
(SELECT 2.0 * AVG(t2.sales)
FROM titles t2
WHERE t1.type = t2.type);
Figure 8.46Result of Listing 8.46.
type
---------
biography
MySQL 4.0 and earlier don’t support subqueries; see the DBMS tip in “Tips for Subqueries” earlier in this chapter.
In older PostgreSQL versions, convert the floating-point numbers in Listing 8.46 to DECIMAL; see “Converting Data Types with CAST()” in Chapter 5. To run Listing 8.46, change the floating-point literal to:
CAST(2.0 AS DECIMAL)
ANY works like ALL (see the preceding section) but instead determines whether a value is equal to, less than, or greater than any (at least one) of the values in a subquery result.
The important characteristics of subquery comparisons that use ANY are:
When ANY is used with <, <=, >, or >=, the comparison is equivalent to evaluating the subquery result’s maximum or minimum value. <ANY means less than at least one subquery value—in other words, less than the maximum value. >ANY means greater than at least one subquery value—that is, greater than the minimum value. Table 8.3 shows equivalent ANY expressions and column functions. Listing 8.49 later in this section shows how to replicate a >ANY query by using MIN().
ANY Expression | Column Function |
---|---|
< ANY(subquery) | < MAX(subquery values) |
> ANY(subquery) | > MIN(subquery values) |
To compare some subquery values:
In the WHERE clause of a SELECT statement, type:
WHERE test_expr op ANY (subquery)
test_expr is a literal value, a column name, an expression, or a subquery that returns a single value; op is a comparison operator (=, <>, <, <=, >, or >=); and subquery is a subquery that returns one column and zero or more rows.
If any (at least one) value in subquery satisfies the ANY condition, then the condition evaluates to true. The ANY condition is false if no value in subquery satisfies the condition or if subquery is empty (has zero rows) or contains all nulls.
The same syntax applies to a HAVING clause:
HAVING test_expr op ANY (subquery)
Listing 8.47 lists the authors who live in a city in which a publisher is located. The inner query finds all the cities in which publishers are located, and the outer query compares each author’s city to all the publishers’ cities. See Figure 8.47 for the result.
Listing 8.47List the authors who live in a city in which a publisher is located. See Figure 8.47 for the result.
SELECT au_id, au_lname, au_fname, city
FROM authors
WHERE city = ANY
(SELECT city
FROM publishers);
Figure 8.47Result of Listing 8.47.
au_id au_lname au_fname city
----- -------- --------- -------------
A03 Hull Hallie San Francisco
A04 Hull Klee San Francisco
A05 Kells Christian New York
You can use IN to replicate Listing 8.47:
SELECT au_id, au_lname, au_fname, city
FROM authors
WHERE city IN
(SELECT city FROM publishers);
Listing 8.48 lists the nonbiographies that are priced less than at least one biography. The inner query finds all the biography prices. The outer query inspects the highest price in the list and determines whether each nonbiography is cheaper. See Figure 8.48 for the result.
Unlike the ALL comparison in Listing 8.44 in the preceding section, the price IS NOT NULL condition isn’t required here, even though the price of biography T10 is null. The DBMS doesn’t determine whether all the price comparisons are true—just whether at least one is true—so the null comparison is ignored.
Listing 8.48List the nonbiographies that are cheaper than at least one biography. See Figure 8.48 for the result.
SELECT title_id, title_name
FROM titles
WHERE type <> 'biography'
AND price < ANY
(SELECT price
FROM titles
WHERE type = 'biography');
Figure 8.48Result of Listing 8.48.
title_id title_name
-------- --------------------------------
T01 1977!
T02 200 Years of German Humor
T04 But I Did It Unconsciously
T05 Exchange of Platitudes
T08 Just Wait Until After School
T09 Kiss My Boo-Boo
T11 Perhaps It's a Glandular Problem
Listing 8.49 lists the books that outsold at least one of the books that author A06 wrote (or cowrote). The inner query uses a join to find the sales of each book by author A06. The outer query inspects the lowest sales figure in the list and determines whether each book sold more copies. See Figure 8.49 for the result. Again, unlike the ALL comparison in Listing 8.45 in the preceding section, the IS NOT NULL condition isn’t needed here.
Listing 8.49List the books that outsold at least one of the books that author A06 wrote (or cowrote). See Figure 8.49 for the result.
SELECT title_id, title_name
FROM titles
WHERE sales > ANY
(SELECT sales
FROM title_authors ta
INNER JOIN titles t
ON t.title_id = ta.title_id
WHERE ta.au_id = 'A06');
Figure 8.49Result of Listing 8.49.
title_id title_name
-------- -----------------------------------
T02 200 Years of German Humor
T03 Ask Your System Administrator
T04 But I Did It Unconsciously
T05 Exchange of Platitudes
T06 How About Never?
T07 I Blame My Mother
T09 Kiss My Boo-Boo
T11 Perhaps It's a Glandular Problem
T12 Spontaneous, Not Annoying
T13 What Are The Civilian Applications?
I can replicate Listing 8.49 by using GROUP BY, HAVING, and MIN() (instead of ANY):
SELECT title_id
FROM titles
GROUP BY title_id
HAVING MIN(sales) >
(SELECT MIN(sales)
FROM title_authors ta
INNER JOIN titles t
ON t.title_id = ta.title_id
WHERE ta.au_id = 'A06');
=ANY is equivalent to IN, but <>ANY isn’t equivalent to NOT IN. If subquery returns the values x, y, and z, then
test_expr <> ANY (subquery)
is equivalent to:
test_expr <> x OR
test_expr <> y OR
test_expr <> z
But
test_expr NOT IN (subquery)
is equivalent to:
test_expr <> x AND
test_expr <> y AND
test_expr <> z
(NOT IN actually is equivalent to <>ALL.)
So far in this chapter, I’ve been using the comparison operators IN, ALL, and ANY to compare a specific test value to values in a subquery result. EXISTS and NOT EXISTS don’t compare values; rather, they simply look for the existence or nonexistence of rows in a subquery result.
The important characteristics of an existence test are:
To test existence:
In the WHERE clause of a SELECT statement, type:
WHERE [NOT] EXISTS (subquery)
subquery is a subquery that returns any number of columns and rows.
If subquery returns one or more rows, then the EXISTS test evaluates to true. If subquery returns zero rows, then the EXISTS test evaluates to false. Specify NOT to negate the test’s result.
The same syntax applies to a HAVING clause:
HAVING [NOT] EXISTS (subquery)
Listing 8.50 lists the names of the publishers that have published biographies. This query considers each publisher’s ID in turn and determines whether it causes the existence test to evaluate to true. Here, the first publisher is P01 (Abatis Publishers). The DBMS ascertains whether any rows exist in the table titles in which pub_id is P01 and type is biography. If so, then Abatis Publishers is included in the final result. The DBMS repeats the same process for each of the other publisher IDs. See Figure 8.50 for the result. If I wanted to list the names of publishers that haven’t published biographies, then I’d change EXISTS to NOT EXISTS. See Listing 8.33 earlier in this chapter for an equivalent query that uses IN.
Listing 8.50List the names of the publishers that have published biographies. See Figure 8.50 for the result.
SELECT pub_name
FROM publishers p
WHERE EXISTS
(SELECT *
FROM titles t
WHERE t.pub_id = p.pub_id
AND type = 'biography');
Figure 8.50Result of Listing 8.50.
pub_name
-------------------
Abatis Publishers
Schadenfreude Press
Listing 8.51 lists the authors who haven’t written (or cowritten) a book. See Figure 8.51 for the result. See Listing 8.35 earlier in this chapter for an equivalent query that uses NOT IN.
Listing 8.51List the authors who haven’t written (or cowritten) a book. See Figure 8.51 for the result.
SELECT au_id, au_fname, au_lname
FROM authors a
WHERE NOT EXISTS
(SELECT *
FROM title_authors ta
WHERE ta.au_id = a.au_id);
Figure 8.51Result of Listing 8.51.
au_id au_fname au_lname
----- --------- -----------
A07 Paddy O'Furniture
Listing 8.52 lists the authors who live in a city in which a publisher is located. See Figure 8.52 for the result. See Listing 8.47 earlier in this chapter for an equivalent query that uses =ANY.
Listing 8.52List the authors who live in a city in which a publisher is located. See Figure 8.52 for the result.
SELECT au_id, au_lname, au_fname, city
FROM authors a
WHERE EXISTS
(SELECT *
FROM publishers p
WHERE p.city = a.city);
Figure 8.52Result of Listing 8.52.
au_id au_lname au_fname city
----- -------- --------- -------------
A03 Hull Hallie San Francisco
A04 Hull Klee San Francisco
A05 Kells Christian New York
“Finding Common Rows with INTERSECT” in Chapter 9 describes how to use INTERSECT to retrieve the rows that two tables have in common. You also can use EXISTS to find an intersection. Listing 8.53 lists the cities in which both an author and publisher are located. See Figure 8.53 for the result. See Listing 9.8 in Chapter 9 for an equivalent query that uses INTERSECT.
You also can replicate this query with an inner join:
SELECT DISTINCT a.city
FROM authors a
INNER JOIN publishers p
ON a.city = p.city;
Listing 8.53List the cities in which both an author and publisher are located. See Figure 8.53 for the result.
SELECT DISTINCT city
FROM authors a
WHERE EXISTS
(SELECT *
FROM publishers p
WHERE p.city = a.city);
Figure 8.53Result of Listing 8.53.
city
-------------
New York
San Francisco
“Finding Different Rows with EXCEPT” in Chapter 9 describes how to use EXCEPT to retrieve the rows in one table that aren’t also in another table. You also can use NOT EXISTS to find a difference. Listing 8.54 lists the cities in which an author lives but a publisher isn’t located. See Figure 8.54 for the result. See Listing 9.9 in Chapter 9 for an equivalent query that uses EXCEPT.
You also can replicate this query with NOT IN:
SELECT DISTINCT city
FROM authors
WHERE city NOT IN
(SELECT city
FROM publishers);
Or with an outer join:
SELECT DISTINCT a.city
FROM authors a
LEFT OUTER JOIN publishers p
ON a.city = p.city
WHERE p.city IS NULL;
Listing 8.54List the cities in which an author lives but a publisher isn’t located. See Figure 8.54 for the result.
SELECT DISTINCT city
FROM authors a
WHERE NOT EXISTS
(SELECT *
FROM publishers p
WHERE p.city = a.city);
Figure 8.54Result of Listing 8.54.
city
---------
Boulder
Bronx
Palo Alto
Sarasota
Listing 8.55 lists the authors who wrote (or cowrote) three or more books. See Figure 8.55 for the result.
Listing 8.55List the authors who wrote (or cowrote) three or more books. See Figure 8.55 for the result.
SELECT au_id, au_fname, au_lname
FROM authors a
WHERE EXISTS
(SELECT *
FROM title_authors ta
WHERE ta.au_id = a.au_id
HAVING COUNT(*) >= 3);
Figure 8.55Result of Listing 8.55.
au_id au_fname au_lname
----- -------- ---------
A01 Sarah Buchman
A02 Wendy Heydemark
A04 Klee Hull
A06 Kellsey
Listing 8.56 uses two existence tests to list the authors who wrote (or cowrote) both children’s and psychology books. See Figure 8.56 for the result.
Listing 8.56List the authors who wrote (or cowrote) a children’s book and also wrote (or cowrote) a psychology book. See Figure 8.56 for the result.
SELECT au_id, au_fname, au_lname
FROM authors a
WHERE EXISTS
(SELECT *
FROM title_authors ta
INNER JOIN titles t
ON t.title_id = ta.title_id
WHERE ta.au_id = a.au_id
AND t.type = 'children')
AND EXISTS
(SELECT *
FROM title_authors ta
INNER JOIN titles t
ON t.title_id = ta.title_id
WHERE ta.au_id = a.au_id
AND t.type = 'psychology');
Figure 8.56Result of Listing 8.56.
au_id au_fname au_lname
----- -------- --------
A06 Kellsey
Listing 8.57 performs a uniqueness test to determine whether duplicates occur in the column au_id in the table authors. The query prints Yes if duplicate values exist in the column au_id; otherwise, it returns an empty result. See Figure 8.57 for the result. au_id is the primary key of authors, so of course it contains no duplicates.
Listing 8.57Does the column au_id in the table authors contain duplicate values? See Figure 8.57 for the result.
SELECT DISTINCT 'Yes' AS "Duplicates?"
WHERE EXISTS
(SELECT *
FROM authors
GROUP BY au_id
HAVING COUNT(*) > 1);
Figure 8.57Result of Listing 8.57.
Duplicates?
-----------
Listing 8.58 shows the same query for the table title_authors, which does contain duplicate au_id values. See Figure 8.58 for the result. You can add grouping columns to the GROUP BY clause to determine whether multiple-column duplicates exist.
Listing 8.58Does the column au_id in the table title_authors contain duplicate values? See Figure 8.58 for the result.
SELECT DISTINCT 'Yes' AS "Duplicates?"
WHERE EXISTS
(SELECT *
FROM title_authors
GROUP BY au_id
HAVING COUNT(*) > 1);
Figure 8.58Result of Listing 8.58.
Duplicates?
-----------
Yes
You also can use COUNT(*) to determine whether a subquery returns at least one row, but COUNT(*) (usually) is less efficient than EXISTS. The DBMS quits processing an EXISTS subquery as soon as it determines whether the subquery returns a row, whereas COUNT(*) forces the DBMS to process the entire subquery. This query is equivalent to Listing 8.52 but runs slower:
SELECT au_id, au_lname, au_fname, city
FROM authors a
WHERE
(SELECT COUNT(*)
FROM publishers p
WHERE p.city = a.city) > 0;
Although I use SELECT COUNT(*) in some of the DBMS-specific subqueries in the DBMS tip in this section, you should be wary of using an aggregate function in a subquery’s SELECT clause. The existence test in Listing 8.59, for example, always is true because COUNT(*) always will return a row (with the value zero here). I could argue that the result, Figure 8.59, is flawed logically because no publisher ID XXX exists.
Listing 8.59Be careful when using aggregate functions in a subquery SELECT clause. See Figure 8.59 for the result.
SELECT pub_id
FROM publishers
WHERE EXISTS
(SELECT COUNT(*)
FROM titles
WHERE pub_id = 'XXX');
Figure 8.59Result of Listing 8.59.
pub_id
------
P01
P02
P03
P04
To run Listings 8.55, 8.57, and 8.58 in Microsoft Access, change SELECT * to SELECT 1. Additionally, in Listing 8.57 add the clause FROM authors
to the outer query, and in Listing 8.58 add the clause FROM title_authors
to the outer query.
To run Listings 8.57 and 8.58 in Oracle, add the clause FROM DUAL
to the outer query; see the DBMS tip in “Tips for Derived Columns” in Chapter 5.
To run Listings 8.55, 8.57, and 8.58 in Db2, change SELECT * to SELECT 1. Additionally, in Listings 8.57 and 8.58, add the clause FROM SYSIBM.SYSDUMMY1
to the outer query; see the DBMS tip in “Tips for Derived Columns” in Chapter 5. For example, change Listing 8.57 to:
SELECT DISTINCT 'Yes' AS "Duplicates?"
FROM SYSIBM.SYSDUMMY1
WHERE EXISTS
(SELECT 1
FROM authors
GROUP BY au_id
HAVING COUNT(*) > 1);
In MySQL, to run Listing 8.57 add the clause FROM authors
to the outer query, and in Listing 8.58 add the clause FROM title_authors
to the outer query. MySQL 4.0 and earlier don’t support subqueries; see the DBMS tip in “Tips for Subqueries” earlier in this chapter.
To run Listings 8.55, 8.57, and 8.58 in PostgreSQL, change SELECT *
to SELECT 1
.
As you’ve seen in this chapter and the preceding one, you can express the same query in different ways (different syntax, same semantics). To expand on this point, I’ve written the same query six semantically equivalent ways. Each of the statements in Listing 8.60 lists the authors who have written (or cowritten) at least one book. See Figure 8.60 for the result.
Listing 8.60These six queries are equivalent semantically; they all list the authors who have written (or cowritten) at least one book. See Figure 8.60 for the result.
SELECT DISTINCT a.au_id
FROM authors a
INNER JOIN title_authors ta
ON a.au_id = ta.au_id;
SELECT DISTINCT a.au_id
FROM authors a, title_authors ta
WHERE a.au_id = ta.au_id;
SELECT au_id
FROM authors a
WHERE au_id IN
(SELECT au_id
FROM title_authors);
SELECT au_id
FROM authors a
WHERE au_id = ANY
(SELECT au_id
FROM title_authors);
SELECT au_id
FROM authors a
WHERE EXISTS
(SELECT *
FROM title_authors ta
WHERE a.au_id = ta.au_id);
SELECT au_id
FROM authors a
WHERE 0 <
(SELECT COUNT(*)
FROM title_authors ta
WHERE a.au_id = ta.au_id);
Figure 8.60Each of the six statements in Listing 8.60 returns this result.
au_id
-----
A01
A02
A03
A04
A05
A06
The first two queries (inner joins) will run at the same speed as one another. Of the third through sixth queries (which use subqueries), the last one probably is the worst performer. The DBMS will stop processing the other subqueries as soon as it encounters a single matching value. But the subquery in the last statement has to count all the matching rows before it returns either true or false. Your DBMS’s optimizer should run the inner joins at about the same speed as the fastest subquery statement.
You might find this programming flexibility to be attractive, but people who design DBMS optimizers don’t, because they’re tasked with considering all the possible ways to express a query, figuring out which one performs best, and reformulating your query internally to its optimal form. (Entire careers are devoted to solving these types of optimization problems.) If your DBMS has a flawless optimizer, then it will run all six of the queries in Listing 8.60 at the same speed. But that situation is unlikely, so you’ll have to experiment with your DBMS to see which version runs fastest.
DBMSs provide tools to let you measure the efficiency of queries. Tables 8.4 and 8.5 list the commands that time queries and show their execution plans.
DBMS | Command |
---|---|
Access | Not available |
SQL Server | SET STATISTICS TIME ON |
Oracle | SET TIMING ON |
Db2 | db2batch |
MySQL | The mysql command-line utility prints execution times by default. |
PostgreSQL | \timing |
DBMS | Command |
---|---|
Access | Not available |
SQL Server | SET SHOWPLAN_TEXT ON |
Oracle | EXPLAIN PLAN |
Db2 | EXPLAIN or db2expln |
MySQL | EXPLAIN |
PostgreSQL | EXPLAIN |
SQL Tuning
After you learn the basics of SQL, your next step is to tune your SQL statements so that they run efficiently, which means learning about your DBMS’s optimizer. Performance tuning involves some platform-independent general principles, but the most effective tuning relies on the idiosyncrasies of the specific DBMS. Tuning is beyond the scope of this book, but the internet has plenty of discussion groups and articles—search for tuning (or performance or optimization) together with the name of your DBMS.
A good book to get started with is Peter Gulutzan and Trudy Pelzer’s SQL Performance Tuning, which covers eight DBMSs, or Dan Tow’s SQL Tuning, which covers Microsoft SQL Server, Oracle, and Db2. If you look up one of these books on Amazon.com, then you can find other tuning books in the “customers also viewed” list.