In this chapter
Recall from Chapter 2 that set theory is fundamental to the relational model. But whereas mathematical sets are unchanging, database sets are dynamic—they grow, shrink, and otherwise change over time. This chapter covers the following SQL set operators, which combine the results of two SELECT statements into one result:
These set operations aren’t joins, but you can mix and chain them to combine two or more tables.
A UNION operation combines the results of two queries into a single result that has the rows returned by both queries. (This operation differs from a join, which combines columns from two tables.) A UNION expression removes duplicate rows from the result; a UNION ALL expression doesn’t remove duplicates.
Unions are simple, but they have some restrictions:
To combine rows:
Type:
select_query1
UNION [ALL]
select_query2;
select_query1 and select_query2 are SELECT queries. The number and the order of the columns must be identical in both queries, and the data types of corresponding columns must be compatible. Duplicate rows are eliminated from the result unless ALL is specified.
Listing 9.1 lists the states where authors and publishers are located. By default, UNION removes duplicate rows from the result. See Figure 9.1 for the result.
Listing 9.1List the states where authors and publishers are located. See Figure 9.1 for the result.
SELECT state FROM authors
UNION
SELECT state FROM publishers;
Figure 9.1Result of Listing 9.1.
state
-----
NULL
CA
CO
FL
NY
Listing 9.2 is the same as Listing 9.1 except that it includes the ALL keyword, so all rows are included in the results, and duplicates aren’t removed. See Figure 9.2 for the result.
Listing 9.2List the states where authors and publishers are located, including duplicates. See Figure 9.2 for the result.
SELECT state FROM authors
UNION ALL
SELECT state FROM publishers;
Figure 9.2Result of Listing 9.2.
state
-----
NY
CO
CA
CA
NY
CA
FL
NY
CA
NULL
CA
Listing 9.3 lists the names of all the authors and publishers. The AS clause in the first query names the column in the result. The ORDER BY clause uses a relative column position instead of a column name to sort the result. See Figure 9.3 for the result.
Listing 9.3List the names of all the authors and publishers. See Figure 9.3 for the result.
SELECT au_fname || ' ' || au_lname AS "Name"
FROM authors
UNION
SELECT pub_name
FROM publishers
ORDER BY 1 ASC;
Figure 9.3Result of Listing 9.3.
Name
-------------------
Kellsey
Abatis Publishers
Christian Kells
Core Dump Books
Hallie Hull
Klee Hull
Paddy O'Furniture
Sarah Buchman
Schadenfreude Press
Tenterhooks Press
Wendy Heydemark
Listing 9.4 expands on Listing 9.3 and defines the extra column Type to identify which table each row came from. The WHERE conditions retrieve the authors and publishers from New York state only. See Figure 9.4 for the result.
Listing 9.4List the names of all the authors and publishers located in New York state, sorted by type and then by name. See Figure 9.4 for the result.
SELECT
'author' AS "Type",
au_fname || ' ' || au_lname AS "Name",
state
FROM authors
WHERE state = 'NY'
UNION
SELECT
'publisher',
pub_name,
state
FROM publishers
WHERE state = 'NY'
ORDER BY 1 ASC, 2 ASC;
Figure 9.4Result of Listing 9.4.
Type Name state
--------- ----------------- -----
author Christian Kells NY
author Sarah Buchman NY
publisher Abatis Publishers NY
Listing 9.5 adds a third query to Listing 9.4 to retrieve the titles of books published in New York state also. See Figure 9.5 for the result.
Listing 9.5List the names of all the authors and publishers located in New York state and the titles of books published in New York state, sorted by type and then by name. See Figure 9.5 for the result.
SELECT
'author' AS "Type",
au_fname || ' ' || au_lname AS "Name"
FROM authors
WHERE state = 'NY'
UNION
SELECT
'publisher',
pub_name
FROM publishers
WHERE state = 'NY'
UNION
SELECT
'title',
title_name
FROM titles t
INNER JOIN publishers p
ON t.pub_id = p.pub_id
WHERE p.state = 'NY'
ORDER BY 1 ASC, 2 ASC;
Figure 9.5Result of Listing 9.5.
Type Name
--------- --------------------------
author Christian Kells
author Sarah Buchman
publisher Abatis Publishers
title 1977!
title How About Never?
title Not Without My Faberge Egg
title Spontaneous, Not Annoying
Listing 9.6 is similar to Listing 9.5 except that it lists the counts of each author, publisher, and book, instead of their names. See Figure 9.6 for the result.
Listing 9.6List the counts of all the authors and publishers located in New York state and the titles of books published in New York state, sorted by type. See Figure 9.6 for the result.
SELECT
'author' AS "Type",
COUNT(au_id) AS "Count"
FROM authors
WHERE state = 'NY'
UNION
SELECT
'publisher',
COUNT(pub_id)
FROM publishers
WHERE state = 'NY'
UNION
SELECT
'title',
COUNT(title_id)
FROM titles t
INNER JOIN publishers p
ON t.pub_id = p.pub_id
WHERE p.state = 'NY'
ORDER BY 1 ASC;
Figure 9.6Result of Listing 9.6.
Type Count
--------- -----
author 2
publisher 1
title 4
In Listing 9.7, I revisit Listing 5.30 in “Evaluating Conditional Values with CASE” in Chapter 5. But instead of using CASE to change book prices and simulate if-then logic, I use multiple UNION queries. See Figure 9.7 for the result.
Listing 9.7Raise the price of history books by 10 percent and psychology books by 20 percent, and leave the prices of other books unchanged. See Figure 9.7 for the result.
SELECT title_id, type, price,
price * 1.10 AS "New price"
FROM titles
WHERE type = 'history'
UNION
SELECT title_id, type, price, price * 1.20
FROM titles
WHERE type = 'psychology'
UNION
SELECT title_id, type, price, price
FROM titles
WHERE type NOT IN ('psychology', 'history')
ORDER BY type ASC, title_id ASC;
Figure 9.7Result of Listing 9.7.
title_id type price New price
-------- ---------- ------- ---------
T06 biography 19.95 19.95
T07 biography 23.95 23.95
T10 biography NULL NULL
T12 biography 12.99 12.99
T08 children 10.00 10.00
T09 children 13.95 13.95
T03 computer 39.95 39.95
T01 history 21.99 24.19
T02 history 19.95 21.95
T13 history 29.99 32.99
T04 psychology 12.99 15.59
T05 psychology 6.95 8.34
T11 psychology 7.99 9.59
UNION Commutativity
In theory, the order in which the SELECT queries (tables) occur in a union should make no speed difference. But in practice your DBMS might run
small_table1
UNION
small_table2
UNION
big_table;
faster than
small_table1
UNION
big_table
UNION
small_table2;
because of the way the optimizer merges intermediate results and removes duplicate rows. Experiment.
Don’t use UNION where a compound condition will suffice:
SELECT DISTINCT * FROM mytable
WHERE col1 = 1 AND col2 = 2;
usually is faster than
SELECT * FROM mytable
WHERE col1 = 1;
UNION
SELECT * FROM mytable
WHERE col2 = 2;
If you mix UNION and UNION ALL in a single statement, then use parentheses to specify order of evaluation. Take these two statements, for example:
SELECT * FROM table1
UNION ALL
(SELECT * FROM table2
UNION
SELECT * FROM table3);
and:
(SELECT * FROM table1
UNION ALL
SELECT * FROM table2)
UNION
SELECT * FROM table3;
The first statement eliminates duplicates in the union of table2 and table3 but doesn’t eliminate duplicates in the union of that result and table1. The second statement includes duplicates in the union of table1 and table2 but eliminates duplicates in the subsequent union with table3, so ALL has no effect on the final result of this statement.
In Microsoft Access and Microsoft SQL Server, use + to concatenate strings (see “Concatenating Strings with ||” in Chapter 5). To run Listings 9.3 through 9.5, change the concatenation expressions to:
au_fname + ' ' + au_lname
In MySQL, use CONCAT() to concatenate strings (see “Concatenating Strings with ||” in Chapter 5). To run Listings 9.3 through 9.5, change the concatenation expressions to:
CONCAT(au_fname, ' ', au_lname)
In older PostgreSQL versions, convert the floating-point numbers in Listing 9.7 to DECIMAL; see “Converting Data Types with CAST()” in Chapter 5. To run Listing 9.7, change new-price calculations to:
price * CAST((1.10) AS DECIMAL)
price * CAST((1.20) AS DECIMAL)
An INTERSECT operation combines the results of two queries into a single result that has all the rows common to both queries. Intersections have the same restrictions as unions; see “Combining Rows with UNION” earlier in this chapter.
To find common rows:
Type:
select_query1
INTERSECT
select_query2;
select_query1 and select_query2 are SELECT queries. The number and the order of the columns must be identical in both queries, and the data types of corresponding columns must be compatible. Duplicate rows are eliminated from the result.
Listing 9.8 uses INTERSECT to list the cities in which both an author and a publisher are located. See Figure 9.8 for the result.
Listing 9.8List the cities in which both an author and a publisher are located. See Figure 9.8 for the result.
SELECT city
FROM authors
INTERSECT
SELECT city
FROM publishers;
Figure 9.8Result of Listing 9.8.
city
-------------
New York
San Francisco
It’s helpful to think of UNION as logical OR and INTERSECTION as logical AND; see “Combining and Negating Conditions with AND, OR, and NOT” in Chapter 4. If you want to know, for example, which products are supplied by vendor A or vendor B, then type:
SELECT product_id
FROM vendor_a_product_list
UNION
SELECT product_id
FROM vendor_b_product_list;
If you want to know which products are supplied by vendor A and vendor B, then type:
SELECT product_id
FROM vendor_a_product_list
INTERSECT
SELECT product_id
FROM vendor_b_product_list;
If your DBMS doesn’t support INTERSECT, then you can replicate it with an INNER JOIN or an EXISTS subquery. Each of the following statements is equivalent to Listing 9.8 (inner join):
SELECT DISTINCT authors.city
FROM authors
INNER JOIN publishers
ON authors.city = publishers.city;
or (EXISTS subquery):
SELECT DISTINCT city
FROM authors
WHERE EXISTS
(SELECT *
FROM publishers
WHERE authors.city = publishers.city;)
An EXCEPT operation, also called a difference, combines the results of two queries into a single result that has the rows that belong to only the first query. To contrast INTERSECT and EXCEPT, A INTERSECT B contains rows from table A that are duplicated in table B, whereas A EXCEPT B contains rows from table A that aren’t duplicated in table B. Differences have the same restrictions as unions; see “Combining Rows with UNION” earlier in this chapter.
To find different rows:
Type:
select_query1
EXCEPT
select_query2;
select_query1 and select_query2 are SELECT queries. The number and the order of the columns must be identical in both queries, and the data types of corresponding columns must be compatible. Duplicate rows are eliminated from the result.
Listing 9.9 uses EXCEPT to list the cities in which an author lives but a publisher isn’t located. See Figure 9.9 for the result.
Listing 9.9List the cities in which an author lives but a publisher isn’t located. See Figure 9.9 for the result.
SELECT city
FROM authors
EXCEPT
SELECT city
FROM publishers;
Figure 9.9Result of Listing 9.9.
city
---------
Boulder
Bronx
Palo Alto
Sarasota
Don’t use EXCEPT where a compound condition will suffice.
SELECT * FROM mytable
WHERE col1 = 1 AND NOT col2 = 2;
usually is faster than
SELECT * FROM mytable
WHERE col1 = 1;
EXCEPT
SELECT * FROM mytable
WHERE col2 = 2;
If your DBMS doesn’t support EXCEPT, then you can replicate it with an outer join, a NOT EXISTS subquery, or a NOT IN subquery. Each of the following statements is equivalent to Listing 9.9 (outer join):
SELECT DISTINCT a.city
FROM authors a
LEFT OUTER JOIN publishers p
ON a.city = p.city
WHERE p.city IS NULL;
or (NOT EXISTS subquery):
SELECT DISTINCT city
FROM authors
WHERE NOT EXISTS
(SELECT *
FROM publishers
WHERE authors.city = publishers.city);
or (NOT IN subquery):
SELECT DISTINCT city
FROM authors
WHERE city NOT IN
(SELECT city
FROM publishers);
Microsoft Access and MySQL don’t support EXCEPT. To run Listing 9.9, use one of the equivalent queries given in the preceding tip.
In Oracle, the EXCEPT operator is called MINUS. To run Listing 9.9, type:
SELECT city FROM authors
MINUS
SELECT city FROM publishers;