In this chapter
Retrieving Columns with SELECT and FROM
Creating Column Aliases with AS
Eliminating Duplicate Rows with DISTINCT
This chapter introduces SQL’s workhorse—the SELECT statement. Most SQL work involves retrieving and manipulating data by using this one (albeit complex) statement. SELECT retrieves rows, columns, and derived values from one or more tables in a database; its syntax is:
SELECT columns
FROM tables
[JOIN joins]
[WHERE search_condition]
[GROUP BY grouping_columns]
[HAVING search_condition]
[ORDER BY sort_columns];
SELECT, FROM, ORDER BY, and WHERE are covered in this chapter, GROUP BY and HAVING in Chapter 6, and JOIN in Chapter 7. By convention, I call only a SELECT statement a query because it returns a result set. DBMS documentation and other books might refer to any SQL statement as a query. Although SELECT is powerful, it’s not dangerous: you can’t use it to add, change, or delete data or database objects. (The dangerous stuff starts in Chapter 10.)
Tip: Recall that italic_type denotes a variable in code that must be replaced with a value, and brackets indicate an optional clause or item; see “About This Book” in the Introduction for typographic and syntax conventions.
In its simplest form, a SELECT statement retrieves columns from a table; you can retrieve one column, multiple columns, or all columns. The SELECT clause lists the columns to display, and the FROM clause specifies the table from which to draw the columns.
To retrieve a column from a table:
Type:
SELECT column
FROM table;
column is a column name, and table is the name of the table that contains column (Listing 4.1 and Figure 4.1).
Listing 4.1List the cities in which the authors live. See Figure 4.1 for the result.
SELECT city
FROM authors;
Figure 4.1Result of Listing 4.1.
city
-------------
Bronx
Boulder
San Francisco
San Francisco
New York
Palo Alto
Sarasota
To retrieve multiple columns from a table:
Type:
SELECT columns
FROM table;
columns is two or more comma-separated column names, and table is the name of the table that contains columns (Listing 4.2 and Figure 4.2).
Columns are displayed in the same order in which they’re listed in columns, not the order in which they’re defined in table.
Listing 4.2List each author’s first name, last name, city, and state. See Figure 4.2 for the result.
SELECT au_fname, au_lname, city, state
FROM authors;
Figure 4.2Result of Listing 4.2.
au_fname au_lname city state
--------- ----------- ------------- -----
Sarah Buchman Bronx NY
Wendy Heydemark Boulder CO
Hallie Hull San Francisco CA
Klee Hull San Francisco CA
Christian Kells New York NY
Kellsey Palo Alto CA
Paddy O'Furniture Sarasota FL
To retrieve all columns from a table:
Type:
SELECT *
FROM table;
table is the name of the table (Listing 4.3 and Figure 4.3).
Columns are displayed in the order in which they’re defined in table.
Listing 4.3List all the columns in the table authors. See Figure 4.3 for the result.
SELECT *
FROM authors;
Figure 4.3Result of Listing 4.3.
au_id au_fname au_lname phone address city state zip ----- --------- ----------- ------------ -------------------- ------------- ----- ----- A01 Sarah Buchman 718-496-7223 75 West 205 St Bronx NY 10468 A02 Wendy Heydemark 303-986-7020 2922 Baseline Rd Boulder CO 80303 A03 Hallie Hull 415-549-4278 3800 Waldo Ave, #14F San Francisco CA 94123 A04 Klee Hull 415-549-4278 3800 Waldo Ave, #14F San Francisco CA 94123 A05 Christian Kells 212-771-4680 114 Horatio St New York NY 10014 A06 Kellsey 650-836-7128 390 Serra Mall Palo Alto CA 94305 A07 Paddy O'Furniture 941-925-0752 1442 Main St Sarasota FL 34236
I use NULL to indicate a null in a table or result; see “Nulls” in Chapter 3 (Listing 4.4 and Figure 4.4).
Listing 4.4List each publisher’s city, state, and country. See Figure 4.4 for the result.
SELECT city, state, country
FROM publishers;
Figure 4.4Result of Listing 4.4. The column state doesn’t apply to Germany. NULL specifies a null, which is distinct from an “invisible” value such as an empty string or a string of spaces.
city state country
------------- ----- -------
New York NY USA
San Francisco CA USA
Hamburg NULL Germany
Berkeley CA USA
Results Are Unformatted
All results display raw, unformatted values. Monetary amounts lack currency signs, for example, and numbers might have too many decimal places. Reporting tools—not data-retrieval tools—format data, although DBMSs have nonstandard functions that let you format numbers and datetimes in query results. See Microsoft SQL Server’s datename() function or MySQL’s date_format() function, for example.
In the query results so far, I’ve allowed the DBMS to use default values for column headings. (A column’s default heading in a result is the source column’s name in the table definition.) You can use the AS clause to create a column alias. A column alias is an alternative name (identifier) that you specify to control how column headings are displayed in a result. Use column aliases if column names are cryptic, hard to type, too long, or too short.
A column alias immediately follows a column name in the SELECT clause of a SELECT statement. Enclose the alias in single or double quotes if it’s a reserved keyword or if it contains spaces, punctuation, or special characters. You can omit the quotes if the alias is a single non-reserved word that contains only letters, digits, or underscores. If you want a particular column to retain its default heading, then omit its AS clause.
To create column aliases:
Type:
SELECT
column1 [AS] alias1,
column2 [AS] alias2,
...
columnN [AS] aliasN
FROM table;
column1, column2,..., columnN are column names; alias1, alias2,..., aliasN are their corresponding column aliases; and table is the name of the table that contains column1, column2,....
Listing 4.5 shows the syntactic variations of the AS clause. Figure 4.5 shows the result of Listing 4.5.
Listing 4.5The AS clause specifies a column alias to display in results. This statement shows alternative constructions for AS syntax. In your programs, pick one construction and use it consistently. See Figure 4.5 for the result.
SELECT
au_fname AS "First name",
au_lname AS 'Last name',
city AS City,
state,
zip 'Postal code'
FROM authors;
Figure 4.5Result of Listing 4.5.
First name Last name City state Postal code
---------- ----------- ------------- ----- -----------
Sarah Buchman Bronx NY 10468
Wendy Heydemark Boulder CO 80303
Hallie Hull San Francisco CA 94123
Klee Hull San Francisco CA 94123
Christian Kells New York NY 10014
Kellsey Palo Alto CA 94305
Paddy O'Furniture Sarasota FL 34236
In standard SQL and most DBMSs, the keyword AS is optional, but you should include it and surround aliases with double quotes to make your SQL code more portable and readable. With these syntactic conventions, Listing 4.5 is equivalent to:
SELECT
au_fname AS "First name",
au_lname AS "Last name",
city AS "City",
state,
zip AS "Postal code"
FROM authors;
SELECT SUM(sales) AS "Sum" FROM titles;
uses the reserved word SUM as a column alias, for example. For information about keywords, see “SQL Syntax” and “Identifiers” in Chapter 3.Microsoft Access and PostgreSQL require the AS keyword for column references.
Oracle and Db2 display unquoted column names and aliases in uppercase.
sqlplus (Oracle’s command-line processor) truncates column aliases to the number of characters specified in the table’s column definitions. The column alias "Postal code" displays as Posta in a CHAR(5) column, for example.
DBMSs have restrictions on embedded spaces, punctuation, and special characters in aliases; search your DBMS documentation for SELECT or AS.
Columns often contain duplicate values, and it’s common to want a result that lists each duplicate only once. If I type Listing 4.6 to list the states where the authors live, then the result, Figure 4.6, contains unneeded duplicates.
Listing 4.6List the states in which the authors live. See Figure 4.6 for the result.
SELECT state
FROM authors;
Figure 4.6Result of Listing 4.6. This result contains unneeded duplicates of CA and NY.
state
-----
NY
CO
CA
CA
NY
CA
FL
The DISTINCT keyword eliminates duplicate rows from a result. Note that the columns of a DISTINCT result form a candidate key (unless they contain nulls).
To eliminate duplicate rows:
Type:
SELECT DISTINCT columns
FROM table;
columns is one or more comma-separated column names, and table is the name of the table that contains columns (Listing 4.7 and Figure 4.7).
Listing 4.7List the distinct states in which the authors live. The keyword DISTINCT eliminates duplicate rows in the result. See Figure 4.7 for the result.
SELECT DISTINCT state
FROM authors;
Figure 4.7Result of Listing 4.7. This result has no CA or NY duplicates.
state
-----
NY
CO
CA
FL
If the SELECT DISTINCT clause contains more than one column, then the values of all the specified columns combined determine the uniqueness of rows. The result of Listing 4.8 is Figure 4.8, which contains a duplicate row that has two columns. The result of Listing 4.9 is Figure 4.9, which eliminates the two-column duplicate.
Listing 4.8List the cities and states in which the authors live. See Figure 4.8 for the result.
SELECT city, state
FROM authors;
Figure 4.8Result of Listing 4.8. This result contains a duplicate row for San Francisco, California.
city state
------------- -----
Bronx NY
Boulder CO
New York NY
Palo Alto CA
San Francisco CA
San Francisco CA
Sarasota FL
Listing 4.9List the distinct cities and states in which the authors live. See Figure 4.9 for the result.
SELECT DISTINCT city, state
FROM authors;
Figure 4.9Result of Listing 4.9. It’s the city–state combination that’s considered to be unique, not the value in any single column.
city state
------------- -----
Bronx NY
Boulder CO
New York NY
Palo Alto CA
San Francisco CA
Sarasota FL
The SELECT statement syntax includes the optional ALL keyword. You rarely see ALL in practice because it denotes the default behavior: display all rows, including duplicates.
SELECT columns FROM table;
is equivalent to:
SELECT ALL columns FROM table;
The syntax diagram is:
SELECT [ALL | DISTINCT] columns
FROM table;
SELECT
DISTINCT
*
FROM
table;
and SELECT
*
FROM
table;
return identical results because all rows are unique.Rows in a query result are unordered, so you should view the order in which rows appear as being arbitrary. This situation arises because the relational model posits that row order is irrelevant for table operations. You can use the ORDER BY clause to sort rows by a specified column or columns in ascending (lowest to highest) or descending (highest to lowest) order; for details, see “Sort Order” in this section. The ORDER BY clause always is the last clause in a SELECT statement.
To sort by a column:
Type:
SELECT columns
FROM table
ORDER BY sort_column [ASC | DESC];
columns is one or more comma-separated column names, sort_column is the name of the column on which to sort the result, and table is the name of the table that contains columns and sort_column. (sort_column doesn’t have to be in listed in columns.) Specify ASC for an ascending sort or DESC for a descending sort. If no sort direction is specified, then ASC is assumed (Listings 4.10 and 4.11, Figures 4.10 and 4.11).
Listing 4.10List the authors’ first names, last names, cities, and states, sorted by ascending last name. ORDER BY performs ascending sorts by default, so the ASC keyword is optional. See Figure 4.10 for the result.
SELECT au_fname, au_lname, city, state
FROM authors
ORDER BY au_lname ASC;
Figure 4.10Result of Listing 4.10. This result is sorted in ascending last-name order.
au_fname au_lname city state
--------- ----------- ------------- -----
Sarah Buchman Bronx NY
Wendy Heydemark Boulder CO
Hallie Hull San Francisco CA
Klee Hull San Francisco CA
Christian Kells New York NY
Kellsey Palo Alto CA
Paddy O'Furniture Sarasota FL
Listing 4.11List the authors’ first names, last names, cities, and states, sorted by descending first name. The DESC keyword is required. See Figure 4.11 for the result.
SELECT au_fname, au_lname, city, state
FROM authors
ORDER BY au_fname DESC;
Figure 4.11Result of Listing 4.11. This result is sorted in descending first-name order. The first name of the author Kellsey is an empty string ('') and sorts last (or first in ascending order).
au_fname au_lname city state
--------- ----------- ------------- -----
Wendy Heydemark Boulder CO
Sarah Buchman Bronx NY
Paddy O'Furniture Sarasota FL
Klee Hull San Francisco CA
Hallie Hull San Francisco CA
Christian Kells New York NY
Kellsey Palo Alto CA
Sort Order
Sorting numeric and datetime values is unambiguous; sorting character strings is complex. A DBMS uses a collating sequence, or collation, to determine the order in which characters are sorted. The collation defines the order of precedence for every character in your character set. Your character set depends on the language that you’re using—European languages (a Latin character set), Hebrew (the Hebrew alphabet), or Chinese (ideographs), for example. The collation also determines case sensitivity (is ‘A’ < ‘a’?), accent sensitivity (is ‘A’ < ‘Á’?), width sensitivity (for multibyte or Unicode characters), and other factors such as linguistic practices. The SQL standard doesn’t define particular collations and character sets, so each DBMS uses its own sorting strategy and default collation. DBMSs provide commands or tools that display the current collation and character set. Run the command exec
sp_helpsort
in Microsoft SQL Server, for example. Search your DBMS documentation for collation or sort order.
To sort by multiple columns:
Type:
SELECT columns
FROM table
ORDER BY
sort_column1 [ASC | DESC],
sort_column2 [ASC | DESC],
...
sort_columnN [ASC | DESC];
columns is one or more comma-separated column names; sort_column1, sort_column2,..., sort_columnN are the names of the columns on which to sort the result; and table is the name of the table that contains columns and the sort columns. (The sort columns don’t have to be in listed in columns.) Rows are sorted first by sort_column1; then rows that have equal values in sort_column1 are sorted by the values in sort_column2, and so on. For each sort column, specify ASC for an ascending sort or DESC for a descending sort. If no sort direction is specified, then ASC is assumed (Listing 4.12 and Figure 4.12).
Listing 4.12List the authors’ first names, last names, cities, and states, sorted by descending city within ascending state. See Figure 4.12 for the result.
SELECT au_fname, au_lname, city, state
FROM authors
ORDER BY
state ASC,
city DESC;
Figure 4.12Result of Listing 4.12.
au_fname au_lname city state
--------- ----------- ------------- -----
Hallie Hull San Francisco CA
Klee Hull San Francisco CA
Kellsey Palo Alto CA
Wendy Heydemark Boulder CO
Paddy O'Furniture Sarasota FL
Christian Kells New York NY
Sarah Buchman Bronx NY
Sorting by Substrings
To sort results by specific parts of a string, use the functions described in “Extracting a Substring with SUBSTRING()” in Chapter 5. For example, this query sorts by the last four characters of phone:
SELECT au_id, phone
FROM authors
ORDER BY substr(phone, length(phone)-3);
This query works for Oracle, Db2, MySQL, and PostgreSQL. In Microsoft SQL Server, use substring(phone, len(phone)-3, 4)
. In Microsoft Access, use Mid(phone, len(phone)-3, 4)
.
You can specify relative column-position numbers instead of column names in ORDER BY. The position numbers refer to the columns in the result, not the original table. Using column positions saves typing, but the resulting code is unclear and invites mistakes if you reorder the columns in the SELECT clause.
To sort by relative column positions:
Type:
SELECT columns
FROM table
ORDER BY
sort_num1 [ASC | DESC],
sort_num2 [ASC | DESC],
...
sort_numN [ASC | DESC];
columns is one or more comma-separated column names; and sort_num1, sort_num2,..., sort_numN are integers between 1 and the number of columns in columns, inclusive. Each integer specifies the relative position of a column in columns. table is the name of the table that contains columns. (The sort numbers can’t refer to a column that’s not listed in columns.) The sort order is the same order described in “To sort by multiple columns” earlier in this section (Listing 4.13 and Figure 4.13).
Listing 4.13List each author’s first name, last name, city, and state, sorted first by ascending state (column 4 in the SELECT clause) and then by descending last name within each state (column 2). See Figure 4.13 for the result.
SELECT au_fname, au_lname, city, state
FROM authors
ORDER BY 4 ASC, 2 DESC;
Figure 4.13Result of Listing 4.13.
au_fname au_lname city state
--------- ----------- ------------- -----
Kellsey Palo Alto CA
Hallie Hull San Francisco CA
Klee Hull San Francisco CA
Wendy Heydemark Boulder CO
Paddy O'Furniture Sarasota FL
Christian Kells New York NY
Sarah Buchman Bronx NY
Sorting and Nulls
Sorting is one of the situations where SQL departs from the idea that a null isn’t equal to any other value, including another null. (The logical comparison NULL = NULL is unknown, not true.) When nulls are sorted, they all are considered to be equal to one another.
The SQL standard leaves it up to the DBMS to decide whether nulls are either greater than or less than all non-null values. Microsoft Access, Microsoft SQL Server, and MySQL treat nulls as the lowest possible values (Listing 4.14 and Figure 4.14). Oracle, Db2, and PostgreSQL treat nulls as the highest possible values. See also “Nulls” in Chapter 3.
In Oracle and PostgreSQL, use NULLS FIRST or NULLS LAST with ORDER BY to control null-sorting behavior. For other DBMSs, create a derived column (see Chapter 5) that flags nulls—CASE WHEN column IS NULL THEN 0 ELSE 1 END AS is_null
, for example—and add it as the first column (with ASC or DESC) in the ORDER BY clause.
Listing 4.14Nulls in a sort column are listed first or last, depending on the DBMS. See Figure 4.14 for the result.
SELECT pub_id, state, country
FROM publishers
ORDER BY state ASC;
Figure 4.14Result of Listing 4.14. This result is sorted by ascending state. The DBMS in which I ran this query treats nulls as the lowest possible values, so the row with the null state is listed first. A DBMS that treats nulls as the highest possible values would list the same row last.
pub_id state country
------ ----- -------
P03 NULL Germany
P04 CA USA
P02 CA USA
P01 NY USA
You can sort by columns that aren’t listed in the SELECT clause (Listing 4.15 and Figure 4.15). This technique won’t work for relative column positions.
Listing 4.15zip doesn’t appear in the list of columns to retrieve. See Figure 4.15 for the result.
SELECT city, state
FROM authors
ORDER BY zip ASC;
Figure 4.15Result of Listing 4.15. This result is sorted by ascending zip code (postal code). Rows might appear to be in random order if you sort by an undisplayed column, confusing your end user.
city state
------------- -----
New York NY
Bronx NY
Sarasota FL
Boulder CO
San Francisco CA
San Francisco CA
Palo Alto CA
You can specify column aliases instead of column names in ORDER BY (Listing 4.16 and Figure 4.16). See “Creating Column Aliases with AS” earlier in this chapter.
Listing 4.16This query uses column aliases in the ORDER BY clause. See Figure 4.16 for the result.
SELECT
au_fname AS "First Name",
au_lname AS "Last Name",
state
FROM authors
ORDER BY
state ASC,
"Last Name" ASC,
"First Name" ASC;
Figure 4.16Result of Listing 4.16.
First Name Last Name state
---------- ----------- -----
Hallie Hull CA
Klee Hull CA
Kellsey CA
Wendy Heydemark CO
Paddy O'Furniture FL
Sarah Buchman NY
Christian Kells NY
To sort based on conditional logic, add a CASE expression to the ORDER BY clause (see “Evaluating Conditional Values with CASE” in Chapter 5). For example, this query sorts by price if type is “history”; otherwise, it sorts by sales:
SELECT title_id, type, price, sales
FROM titles
ORDER BY CASE WHEN type = 'history'
THEN price ELSE sales END;
You can sort by the results of expressions; Chapter 5 describes how to create expressions by using functions and operators (Listing 4.17 and Figure 4.17).
Listing 4.17This query sorts by an expression. See Figure 4.17 for the result. I’ve created a column alias for the expression because it would be cumbersome to repeat the expression in the ORDER BY clause and because it creates a more meaningful column label in the result.
SELECT
title_id,
price,
sales,
price * sales AS "Revenue"
FROM titles
ORDER BY "Revenue" DESC;
Figure 4.17Result of Listing 4.17. This result lists titles by descending revenue (the product of price and sales).
title_id price sales Revenue
-------- ----- ------- -----------
T07 23.95 1500200 35929790.00
T05 6.95 201440 1400008.00
T12 12.99 100001 1299012.99
T03 39.95 25667 1025396.65
T11 7.99 94123 752042.77
T13 29.99 10467 313905.33
T06 19.95 11320 225834.00
T02 19.95 9566 190841.70
T04 12.99 13001 168882.99
T09 13.95 5000 69750.00
T08 10.00 4095 40950.00
T01 21.99 566 12446.34
T10 NULL NULL NULL
DBMSs restrict the columns that can appear in an ORDER BY clause, depending on data type. For example, in Microsoft SQL Server, you can’t sort by nvarchar(max), varchar(max), and varbinary(max) columns; and in Oracle, you can’t sort by blob, clob, nclob, and bfile columns. Search your DBMS documentation for SELECT or ORDER BY.
In Microsoft Access you can’t use an expression’s column alias in ORDER BY. To run Listing 4.17, either retype the expression in the ORDER BY clause:
ORDER BY price * sales DESC
or use the relative column position:
ORDER BY 4 DESC
Sorting Speed
The three factors that most affect sorting speed are, in order of importance:
Always restrict a sort to the minimum number of rows needed. Running times of sorting routines don’t scale linearly with the number of rows sorted—so sorting 10n rows takes much more than 10 times longer than sorting n rows. Also try to reduce the number of sorted columns and the columns’ data-type lengths in the table definition, if possible.
The result of each SELECT statement so far has included every row in the table (for the specified columns). You can use the WHERE clause to filter unwanted rows from the result. This filtering capability gives the SELECT statement its real power. In a WHERE clause, you specify a search condition that has one or more conditions that need to be satisfied by the rows of a table. A condition, or predicate, is a logical expression that evaluates to true, false, or unknown. Rows for which the condition is true are included in the result; rows for which the condition is false or unknown are excluded. (An unknown result, which arises from nulls, is described in the next section.) SQL provides operators that express different types of conditions (Table 4.1). Operators are symbols or keywords that specify actions to perform on values or other elements.
Condition | SQL Operators |
---|---|
Comparison | =, <>, <, <=, >, >= |
Pattern matching | LIKE |
Range filtering | BETWEEN |
List filtering | IN |
Null testing | IS NULL |
SQL’s comparison operators compare two values and evaluate to true, false, or unknown (Table 4.2).
Operator | Description |
---|---|
= | Equal to |
<> | Not equal to |
< | Less than |
<= | Less than or equal to |
> | Greater than |
>= | Greater than or equal to |
The data type determines how values are compared:
Compare only identical or similar data types. If you try to compare values that have different data types, then your DBMS might:
Return an error
or
Compare the values unequally and return a result with no rows
or
To filter rows by making a comparison:
Type:
SELECT columns
FROM table
WHERE test_column op value;
columns is one or more comma-separated column names, and table is the name of the table that contains columns.
In the search condition, test_column is the name of a column in table. (test_column doesn’t have to be listed in columns.) op is one of the comparison operators listed in Table 4.2, and value is a value that’s compared with the value in test_column (Listings 4.18, 4.19, and 4.20, Figures 4.18, 4.19, and 4.20).
Listing 4.18This List the authors whose last name is not Hull. See Figure 4.18 for the result.
SELECT au_id, au_fname, au_lname
FROM authors
WHERE au_lname <> 'Hull';
Figure 4.18Result of Listing 4.18.
au_id au_fname au_lname
----- --------- -----------
A01 Sarah Buchman
A02 Wendy Heydemark
A05 Christian Kells
A06 Kellsey
A07 Paddy O'Furniture
Listing 4.19List the titles for which there is no signed contract. See Figure 4.19 for the result.
SELECT title_name, contract
FROM titles
WHERE contract = 0;
Figure 4.19Result of Listing 4.19.
title_name contract
-------------------------- --------
Not Without My Faberge Egg 0
Listing 4.20List the titles published in 2001 and later. See Figure 4.20 for the result.
SELECT title_name, pubdate
FROM titles
WHERE pubdate >= DATE '2001-01-01';
Figure 4.20Result of Listing 4.20.
title_name pubdate
---------------------------- ----------
Exchange of Platitudes 2001-01-01
Just Wait Until After School 2001-06-01
Kiss My Boo-Boo 2002-05-31
The right and left sides of the comparison can also be complex expressions. The general form of a comparison is:
expr1 op expr2
expr1 and expr2 are expressions. An expression is any valid combination of column names, literals, functions, and operators that resolves to a single value (per row). Chapter 5 covers expressions in more detail (Listing 4.21 and Figure 4.21).
Listing 4.21List the titles that generated more than $1 million in revenue. This search condition uses an arithmetic expression. See Figure 4.21 for the result.
SELECT
title_name,
price * sales AS "Revenue"
FROM titles
WHERE price * sales > 1000000;
Figure 4.21Result of Listing 4.21.
title_name Revenue
----------------------------- -----------
Ask Your System Administrator 1025396.65
Exchange of Platitudes 1400008.00
I Blame My Mother 35929790.00
Spontaneous, Not Annoying 1299012.99
WHERE test_column IS NULL
(WHERE test_column = NULL
is incorrect); see “Testing for Nulls with IS NULL” later in this chapter. See also “Nulls” in Chapter 3.For speed, fold your constants into a minimal number of expressions. For example, change
WHERE col1 + 2 <= 10
to
WHERE col1 <= 8
The best practice is to put only simple column references to the left of the = and more-complex expressions to the right.
Your DBMS’s collation determines whether string comparisons are case insensitive (‘A’ = ‘a’) or case sensitive (‘A’ ≠ ‘a’). Microsoft Access, Microsoft SQL Server, Db2, and MySQL perform case-insensitive comparisons by default. Oracle and PostgreSQL perform case-sensitive comparisons by default. In general, case-sensitive comparisons are slightly faster than case-insensitive ones. See also “Changing String Case with UPPER() and LOWER()” in Chapter 5.
Case sensitivity can vary by context. MySQL comparisons are case insensitive in WHERE comparisons but are case sensitive in string-related functions, for example.
In Microsoft Access date literals, omit the DATE keyword and surround the literal with # characters instead of quotes. To run Listing 4.20, change the date in the WHERE clause to #2001-01-01#.
In Microsoft SQL Server and Db2 date literals, omit the DATE keyword. To run Listing 4.20, change the date in the WHERE clause to '2001-01-01'.
In older PostgreSQL versions, to compare a value in a NUMERIC or DECIMAL column with a real (floating-point) number, convert the real number to NUMERIC or DECIMAL explicitly. See “Converting Data Types with CAST()” in Chapter 5.
Some DBMSs support the comparison operator != as a synonym for <> (not equal). You should use <> to keep your code portable.
Column Aliases and WHERE
If you alias a column in a SELECT clause (see “Creating Column Aliases with AS” earlier in this chapter), then you can’t reference it in the WHERE clause. The following query fails because the WHERE clause is evaluated before the SELECT clause, so the alias copies_sold doesn’t yet exist when the WHERE clause is evaluated:
-- Wrong
SELECT sales AS copies_sold
FROM titles
WHERE copies_sold > 100000;
Instead, use a subquery (Chapter 8) in the FROM clause, which is evaluated before the WHERE clause:
-- Correct
SELECT *
FROM (SELECT sales AS copies_sold
FROM titles) ta
WHERE copies_sold > 100000;
This solution works not only for columns aliases but also for aggregate functions, scalar subqueries, and window functions referenced in WHERE clauses. Note that in the latter query, the subquery is aliased ta (a table alias). All DBMSs accept table aliases, but not all require them. See also “Using Subqueries as Column Expressions” in Chapter 8.
You can specify multiple conditions in a single WHERE clause to, say, retrieve rows based on the values in multiple columns. You can use the AND and OR operators to combine two or more conditions into a compound condition. AND, OR, and a third operator, NOT, are logical operators. Logical operators, or boolean operators, are operators designed to work with truth values: true, false, and unknown.
If you’ve programmed in other languages (or studied propositional logic), then you’re familiar with the two-value logic (2VL) system. In two-value logic, the result of a logical expression is either true or false. 2VL assumes perfect knowledge, in which all propositions are known to be true or false. Databases model real data, however, and our knowledge of the world is imperfect—that’s why we use nulls to represent unknown values (see “Nulls” in Chapter 3).
2VL is insufficient to represent knowledge gaps, so SQL uses three-value logic (3VL). In three-value logic, the result of a logical expression is true, false, or unknown. If the result of a compound condition is false or unknown, then the row is excluded from the result. (To retrieve rows with nulls, see “Testing for Nulls with IS NULL” later in this chapter.)
The AND operator’s important characteristics are:
Table 4.3 shows the possible outcomes when you combine two conditions with AND. The table’s left column shows the truth values of the first condition, the top row shows the truth values of the second condition, and each intersection shows the AND outcome. This type of table is called a truth table.
AND | True | False | Unknown |
---|---|---|---|
True | True | False | Unknown |
False | False | False | False |
Unknown | Unknown | False | Unknown |
AND is commutative (independent of order):
WHERE condition1 AND condition2
is equivalent to
WHERE condition2 AND condition1
See Listings 4.22 and 4.23, and Figures 4.22 and 4.23, for some AND examples.
Listing 4.22List the biographies that sell for less than $20. See Figure 4.22 for the result.
SELECT title_name, type, price
FROM titles
WHERE type = 'biography' AND price < 20;
Figure 4.22Result of Listing 4.22.
title_name type price
------------------------- --------- -----
How About Never? biography 19.95
Spontaneous, Not Annoying biography 12.99
Listing 4.23List the authors whose last names begin with one of the letters H through Z and who don’t live in California. See Figure 4.23 for the result.
SELECT au_fname, au_lname
FROM authors
WHERE au_lname >= 'H'
AND au_lname <= 'Zz'
AND state <> 'CA';
Figure 4.23Result of Listing 4.23. Remember that the results of string comparisons depend on the DBMS’s collating sequence; see “Sorting Rows with ORDER BY” earlier in this chapter.
au_fname au_lname
--------- -----------
Wendy Heydemark
Christian Kells
Paddy O'Furniture
The OR operator’s important characteristics are:
Table 4.4 shows the OR truth table.
OR | True | False | Unknown |
---|---|---|---|
True | True | True | True |
False | True | False | Unknown |
Unknown | True | Unknown | Unknown |
See Listings 4.24 and 4.25, and Figures 4.24 and 4.25, for some OR examples.
Listing 4.24List the authors who live in New York State, Colorado, or San Francisco. See Figure 4.24 for the result.
SELECT au_fname, au_lname, city, state
FROM authors
WHERE (state = 'NY')
OR (state = 'CO')
OR (city = 'San Francisco');
Figure 4.24Result of Listing 4.24.
au_fname au_lname city state
--------- --------- ------------- -----
Sarah Buchman Bronx NY
Wendy Heydemark Boulder CO
Hallie Hull San Francisco CA
Klee Hull San Francisco CA
Christian Kells New York NY
Listing 4.25 shows the effect of nulls in conditions. You might expect the result, Figure 4.25, to display all the rows in the table publishers. But the row for publisher P03 (located in Germany) is missing because it contains a null in the column state. The null causes the result of both of the OR conditions to be unknown, so the row is excluded from the result. To test for nulls, see “Testing for Nulls with IS NULL” later in this chapter.
Listing 4.25List the publishers that are located in California or are not located in California. This example is contrived to show the effect of nulls in conditions. See Figure 4.25 for the result.
SELECT pub_id, pub_name, state, country
FROM publishers
WHERE (state = 'CA')
OR (state <> 'CA');
Figure 4.25Result of Listing 4.25. Publisher P03 is missing because its state is null.
pub_id pub_name state country
------ ----------------- ----- -------
P01 Abatis Publishers NY USA
P02 Core Dump Books CA USA
P04 Tenterhooks Press CA USA
The NOT operator’s important characteristics are:
Table 4.5 shows the NOT truth table.
Condition | NOT Condition |
---|---|
True | False |
False | True |
Unknown | Unknown |
In comparisons, place NOT before the column name or expression
WHERE NOT state = 'CA' --Correct
and not before the operator (even though it sounds better when read):
WHERE state NOT = 'CA' --Illegal
NOT acts on one condition. To negate two or more conditions, repeat the NOT for each condition. To list titles that are not biographies and are not priced less than $20, for example, type
SELECT title_id, type, price
FROM titles
WHERE NOT type = 'biography'
AND NOT price < 20; --Correct
and not
SELECT title_id, type, price
FROM titles
WHERE NOT type = 'biography'
AND price < 20; --Wrong
The latter clause is legal but returns the wrong result. See “Tips for AND, OR, and NOT” in this section to learn ways to express equivalent NOT conditions.
In comparisons, using NOT often is a matter of style. The following two clauses are equivalent:
WHERE NOT state = 'CA'
and
WHERE state <> 'CA'
See Listings 4.26 and 4.27, and Figures 4.26 and 4.27, for some NOT examples.
Listing 4.26List the authors who don’t live in California. See Figure 4.26 for the result.
SELECT au_fname, au_lname, state
FROM authors
WHERE NOT (state = 'CA');
Figure 4.26Result of Listing 4.26.
au_fname au_lname state
--------- ----------- -----
Sarah Buchman NY
Wendy Heydemark CO
Christian Kells NY
Paddy O'Furniture FL
Listing 4.27List the titles whose price is not less than $20 and that have sold more than 15000 copies. See Figure 4.27 for the result.
SELECT title_name, sales, price
FROM titles
WHERE NOT (price < 20)
AND (sales > 15000);
Figure 4.27Result of Listing 4.27.
title_name sales price
----------------------------- ------- -----
Ask Your System Administrator 25667 39.95
I Blame My Mother 1500200 23.95
You can combine the three logical operators in a compound condition. Your DBMS uses SQL’s precedence rules to determine which operators to evaluate first. Precedence is covered in “Determining the Order of Evaluation” in Chapter 5, but for now you need know only that when you use multiple logical operators in a compound condition, NOT is evaluated first, then AND, and finally OR. You can override this order with parentheses: everything in parentheses is evaluated first. When parenthesized conditions are nested, the innermost condition is evaluated first. Under the default precedence rules, the condition
p AND NOT q OR r
is equivalent to
(p AND (NOT q)) OR r
It’s wise to use parentheses, rather than rely on the default evaluation order, to make the evaluation order clear.
If I want to list history and biography titles priced less than $20, for example, then Listing 4.28 won’t work. AND is evaluated before OR, so the query is evaluated as follows:
Listing 4.28This query won’t work if I want to list history and biography titles less than $20 because AND has higher precedence than OR. See Figure 4.28 for the result.
SELECT title_id, type, price
FROM titles
WHERE type = 'history'
OR type = 'biography'
AND price < 20;
Figure 4.28Result of Listing 4.28. This result contains two history titles priced more than $20, which isn’t what I want.
title_id type price
-------- --------- -----
T01 history 21.99
T02 history 19.95
T06 biography 19.95
T12 biography 12.99
T13 history 29.99
To fix this query, add parentheses to force evaluation of OR first. Listing 4.29 is evaluated as follows:
Listing 4.29To fix Listing 4.28, add parentheses to force OR to be evaluated before AND. See Figure 4.29 for the result.
SELECT title_id, type, price
FROM titles
WHERE (type = 'history'
OR type = 'biography')
AND price < 20;
Figure 4.29Result of Listing 4.29.
title_id type price
-------- --------- -----
T02 history 19.95
T06 biography 19.95
T12 biography 12.99
Dissecting WHERE Clauses
If your WHERE clause isn’t working, then you can debug it by displaying the result of each condition individually. To see the result of each comparison in Listing 4.29, for example, put each comparison expression in the SELECT clause’s output column list, along with the values you’re comparing:
SELECT
type,
type = 'history' AS "Hist?",
type = 'biography' AS "Bio?",
price,
price < 20 AS "<20?"
FROM titles;
This query runs on Microsoft Access, MySQL, and PostgreSQL. If your DBMS interprets the = symbol as an assignment operator rather than as a comparison operator, then you must substitute equivalent expressions for the logical comparisons. In Oracle, for example, you can replace type = 'history'
with INSTR(type, 'history')
. The query’s result is:
type Hist? Bio? price <20?
---------- ----- ---- ----- ----
history 1 0 21.99 0
history 1 0 19.95 1
computer 0 0 39.95 0
psychology 0 0 12.99 1
psychology 0 0 6.95 1
biography 0 1 19.95 1
biography 0 1 23.95 0
children 0 0 10.00 1
children 0 0 13.95 1
biography 0 1 NULL NULL
psychology 0 0 7.99 1
biography 0 1 12.99 1
history 1 0 29.99 0
The comparison columns display zero if the comparison is false, nonzero if it’s true, or null if it’s unknown.
If your search condition contains only AND operators, then your query will run faster if you put the conditions least likely to be true first. If col1 = 'A' is less likely than col2 = 'B', then
WHERE col1 = 'A' AND col2 = 'B'
is faster than
WHERE col2 = 'B' AND col1 = 'A'
because the DBMS won’t bother to evaluate the second expression if the first is false. For search conditions that contain only OR operators, do the reverse: put the most likely conditions first. If the conditions are equally likely, then put the least complex expression first.
This logic depends on your DBMS’s optimizer reading WHERE clauses from left to right, which most do. Oracle’s cost-based optimizer (as opposed to its rule-based optimizer), however, reads right to left.
It’s a common error to type
WHERE state = 'NY' OR 'CA' --Illegal
instead of
WHERE state = 'NY' OR state = 'CA'
It’s easy to translate a correctly phrased spoken-language statement into an incorrect SQL statement. If you say, “List the books priced less than $10 and more than $30,” then the and suggests the use of the AND operator:
SELECT title_name, price
FROM titles
WHERE price < 10 AND price > 30; --Wrong
This query returns no rows, however, because it’s impossible for a book to be priced less than $10 and more than $30 simultaneously, as AND logic commands. The logical meaning of OR finds books that meet any of the criteria, not all the criteria at the same time:
WHERE price < 10 OR price > 30 --Correct
Table 4.6 shows alternative ways of expressing the same condition. The first equivalency is double negation, the second two are De Morgan’s Laws, and the final two are the distributive laws.
This Condition | Is Equivalent To |
---|---|
NOT (NOT p) | p |
NOT (p AND q) | (NOT p) OR (NOT q) |
NOT (p OR q) | (NOT p) AND (NOT q) |
p AND (q OR r) | (p AND q) OR (p AND r) |
p OR (q AND r) | (p OR q) AND (p OR r) |
Some DBMSs support the exclusive-or (or xor) logical operator, which yields true only if exactly one of its operands is true:
p XOR q
is equivalent to
(p AND (NOT q)) OR ((NOT p) AND q)
Re-expressing Conditions
You must master the laws in Table 4.6 to become a competent programmer in SQL (or any language). They’re especially useful when you want to re-express conditions to make queries run faster. For example, the statement
SELECT * FROM mytable
WHERE col1 = 1
AND NOT (col1 = col2 OR col3 = 3);
is equivalent to
SELECT * FROM mytable
WHERE col1 = 1
AND col2 <> 1
AND col3 <> 3;
but the latter one will run faster if your DBMS’s optimizer isn’t smart enough to re-express the former internally. (The condition col1 = col2 is more expensive computationally than comparing col1 and col2 to literal values.)
You also can use the laws to change a condition into its opposite. For example, the reverse of the condition
WHERE (col1 = 'A') AND (col2 = 'B')
is
WHERE (col1 <> 'A') OR (col2 <> 'B')
In this case, it would have easier just to negate the entire original expression with NOT:
WHERE NOT ((col1 = 'A') AND (col2 = 'B'))
But this simple approach won’t work with complex conditions involving multiple ANDs, ORs, and NOTs.
Here’s a problem to solve: look at only the first code line below and see whether you can repeatedly apply equivalency rules to push the NOT operators inward until they apply to only the individual expressions p, q, and r:
NOT ((p AND q) OR (NOT p AND r))
= NOT (p AND q) AND NOT (NOT p AND r)
= (NOT p OR NOT q) AND (p OR NOT r)
The preceding examples retrieved rows based on the exact value of a column or columns. You can use LIKE to retrieve rows based on partial information. LIKE is useful if you don’t know an exact value (“The author’s last name is Kel-something”) or you want to retrieve rows with similar values (“Which authors live in the San Francisco Bay Area?”). The LIKE condition’s important characteristics are:
LIKE uses a pattern that values are matched against. A pattern is a quoted string that contains the literal characters to match and any combination of wildcards. Wildcards are special characters used to match parts of a value. Table 4.7 lists the wildcard operators, and Table 4.8 lists some example patterns.
Operator | Matches |
---|---|
% | A percent sign matches any string of zero or more characters. |
_ |
An underscore matches any one character. |
Pattern | Matches |
---|---|
'A%' | Matches a string of length ≥ 1 that begins with A, including the single letter A. Matches ‘A’, ‘Anonymous’, and ‘AC/DC’. |
'%s' | Matches a string of length ≥ 1 that ends with s, including the single letter s. A string with trailing spaces (after the s) won’t match. Matches ‘s’, ‘Victoria Falls’, and ‘DBMSs’. |
'%in%' | Matches a string of length ≥ 2 that contains in anywhere. Matches ‘in’, ‘inch’, ‘Pine’, ‘linchpin’, and ‘lynchpin’. |
'____ ' |
Matches any four-character string. Matches ‘ABCD’, ‘I am’, and ‘Jack’. |
'Qua__ ' |
Matches any five-character string that begins with Qua. Matches ‘Quack’, ‘Quaff’, and ‘Quake’. |
'_ re_ ' |
Matches any four-character string that has re as its second and third characters. Matches ‘Tree’, ‘area’, and ‘fret’. |
'_ re%' |
Matches a string of length ≥ 3 that begins with any character and has re as its second and third characters. Matches ‘Tree’, ‘area’, ‘fret’, ‘are’, and ‘fretful’. |
'%re_ ' |
Matches a string of length ≥ 3 that has re as the second and third characters from its end and ends with any character. Matches ‘Tree’, ‘area’, ‘fret’, ‘red’, and ‘Blood red’. |
To filter rows by matching a pattern:
Type:
SELECT columns
FROM table
WHERE test_column [NOT] LIKE 'pattern';
columns is one or more comma-separated column names, and table is the name of the table that contains columns.
In the search condition, test_column is the name of a column in table (test_column doesn’t have to be listed in columns), and pattern is the pattern that’s compared with the value in test_column. pattern is a string like one of the examples listed in Table 4.8. To retrieve rows with values that don’t match pattern, specify NOT LIKE (Listings 4.30 through 4.33, Figures 4.30 through 4.33).
Listing 4.30List the authors whose last names begin with Kel. See Figure 4.30 for the result.
SELECT au_fname, au_lname
FROM authors
WHERE au_lname LIKE 'Kel%';
Figure 4.30Result of Listing 4.30.
au_fname au_lname
--------- --------
Christian Kells
Kellsey
Listing 4.31List the authors whose last names have ll (el-el) as the third and fourth characters. See Figure 4.31 for the result.
SELECT au_fname, au_lname
FROM authors
WHERE au_lname LIKE '__ll%';
Figure 4.31Result of Listing 4.31.
au_fname au_lname
--------- --------
Hallie Hull
Klee Hull
Christian Kells
Kellsey
Listing 4.32List the authors who live in the San Francisco Bay Area (that is, with postal codes that begin with 94). See Figure 4.32 for the result.
SELECT au_fname, au_lname, city, state, zip
FROM authors
WHERE zip LIKE '94___';
Figure 4.32Result of Listing 4.32.
au_fname au_lname city state zip
-------- -------- ------------- ----- -----
Hallie Hull San Francisco CA 94123
Klee Hull San Francisco CA 94123
Kellsey Palo Alto CA 94305
Listing 4.33List the authors who live outside the 212, 415, and 303 area codes. This example shows three alternative patterns for excluding telephone numbers. You should favor the first alternative because single-character matches (_) are faster than multiple-character ones (%). See Figure 4.33 for the result.
SELECT au_fname, au_lname, phone
FROM authors
WHERE phone NOT LIKE '212-___-____'
AND phone NOT LIKE '415-___-%'
AND phone NOT LIKE '303-%';
Figure 4.33Result of Listing 4.33.
au_fname au_lname phone
-------- ----------- ------------
Sarah Buchman 718-496-7223
Kellsey 650-836-7128
Paddy O'Furniture 941-925-0752
You can search for values that contain the special wildcard characters. Use the ESCAPE keyword to specify an escape character that you can use to search for a percent sign or underscore as a literal character. Immediately precede a wildcard character with an escape character to strip the wildcard of its special meaning. If the escape character is !, for example, then !% in a pattern searches values for a literal %. (Unescaped wildcards still have their special meaning.) The escape character can’t be part of the value that you’re trying to retrieve; if you’re searching for '50% OFF!', then choose an escape character other than !. Table 4.9 shows some examples of escaped and unescaped patterns; the designated escape character is !.
Pattern | Matches |
---|---|
'100%' | Unescaped. Matches 100 followed by a string of zero or more characters. |
'100!%' | Escaped. Matches ‘100%’. |
'_ op' |
Unescaped. Matches ‘top’, ‘hop’, ‘pop’, and so on. |
'!_ op' |
Escaped. Matches ‘_op’. |
To match a wildcard character:
Type:
SELECT columns
FROM table
WHERE test_column [NOT] LIKE 'pattern'
ESCAPE 'escape_char';
The syntax is the same as the SELECT statement in “To filter rows by matching a pattern,” earlier in this section, except for the ESCAPE clause. escape_char is a single character. Any character in pattern that follows escape_char is interpreted literally; escape_char itself is not considered to be part of the search pattern (Listing 4.34 and Figure 4.34).
Listing 4.34List the titles that contain percent signs. Only the % that follows the escape character ! has its literal meaning; the other two percent signs still act as wildcards. See Figure 4.34 for the result.
SELECT title_name
FROM titles
WHERE title_name LIKE '%!%%' ESCAPE '!';
Figure 4.34Result of Listing 4.34. An empty result. No title names contain a % character.
title_name
----------------------------------------
The NOT that can precede LIKE is independent of the NOT that can precede test_column (see “The NOT Operator” earlier in this chapter). The clause
WHERE phone NOT LIKE '212-%'
is equivalent to
WHERE NOT phone LIKE '212-%'
You even can write this silly double negation, which retrieves everyone with a 212 area code:
WHERE NOT phone NOT LIKE '212-%'
In the simplest case in which a pattern contains no wildcards, LIKE works like an = comparison (and NOT LIKE works like <>). In many cases
WHERE city LIKE 'New York'
is equivalent to
WHERE city = 'New York'
But these comparisons will differ if your DBMS takes trailing spaces into account for LIKE but not for =. If that’s not important, then the = form is usually faster than LIKE.
Microsoft Access doesn’t support the ESCAPE clause. Instead, surround a wildcard character with brackets to render it a literal character. To run Listing 4.34, replace the WHERE clause with:
WHERE title_name LIKE '%[%]%'
Some DBMSs let you use regular expressions to match patterns. Microsoft SQL Server, for example, supports a limited variant of POSIX-style regular expressions. The [] wildcard matches any single character within a range or set, and the [^] wildcard matches any single character not within a range or set. Table 4.10 lists some examples. The SQL standard uses the SIMILAR operator for regex matching. Regex support varies by DBMS; search your DBMS documentation for LIKE, regular expressions, or pattern matching.
Some DBMSs let you use LIKE to search numeric and datetime columns.
Pattern | Matches |
---|---|
'[a-c]at' | Matches ‘bat’ and ‘cat’ but not ‘fat’. |
'[bcf]at' | Matches ‘bat’, ‘cat’, and ‘fat’ but not ‘eat’. |
'[^c]at' | Matches ‘bat’ and ‘fat’ but not ‘cat’. |
'se[^n]%' | Matches strings of length ≥ 2 that begin with se and whose third character isn‘t n. |
Use BETWEEN to determine whether a given value falls within a specified range. The BETWEEN condition’s important characteristics are:
BETWEEN is a convenient, shorthand clause that you can replicate by using AND.
WHERE test_column BETWEEN
low_value AND high_value
is equivalent to:
WHERE (test_column >= low_value)
AND (test_column <= high_value)
BETWEEN specifies an inclusive range, in which the high value and low value are included in the search. To specify an exclusive range, which excludes endpoints, use > and < comparisons instead of BETWEEN:
WHERE (test_column > low_value)
AND (test_column < high_value)
To filter rows by using a range:
Type:
SELECT columns
FROM table
WHERE test_column [NOT] BETWEEN
low_value AND high_value;
columns is one or more comma-separated column names, and table is the name of the table that contains columns.
In the search condition, test_column is the name of a column in table (test_column doesn’t have to be listed in columns), and low_value and high_value specify the endpoints of the range that is compared with the value in test_column. low_value must be less than or equal to high_value, and both values must be the same as or comparable to the data type of test_column. To match values that lie outside the range, specify NOT BETWEEN (Listings 4.35, 4.36, and 4.37, Figures 4.35, 4.36, and 4.37).
Listing 4.35List the authors who live outside the postal range 20000–89999. See Figure 4.35 for the result.
SELECT au_fname, au_lname, zip
FROM authors
WHERE zip NOT BETWEEN '20000' AND '89999';
Figure 4.35Result of Listing 4.35.
au_fname au_lname zip
--------- -------- -----
Sarah Buchman 10468
Hallie Hull 94123
Klee Hull 94123
Christian Kells 10014
Kellsey 94305
Listing 4.36List the titles priced between $10 and $19.95, inclusive. See Figure 4.36 for the result.
SELECT title_id, price
FROM titles
WHERE price BETWEEN 10 AND 19.95;
Figure 4.36Result of Listing 4.36.
title_id price
-------- -----
T02 19.95
T04 12.99
T06 19.95
T08 10.00
T09 13.95
T12 12.99
Listing 4.37List the titles published in 2000. See Figure 4.37 for the result.
SELECT title_id, pubdate
FROM titles
WHERE
pubdate BETWEEN DATE '2000-01-01'
AND DATE '2000-12-31';
Figure 4.37Result of Listing 4.37.
title_id pubdate
-------- ----------
T01 2000-08-01
T03 2000-09-01
T06 2000-07-31
T11 2000-11-30
T12 2000-08-31
Listing 4.38 shows how to rewrite Listing 4.36 with an exclusive range, which doesn’t include the $10 and $19.95 endpoints. See Figure 4.38 for the result.
Listing 4.38List the titles priced between $10 and $19.95, exclusive. See Figure 4.38 for the result.
SELECT title_id, price
FROM titles
WHERE (price > 10)
AND (price < 19.95);
Figure 4.38Result of Listing 4.38.
title_id price
-------- -----
T04 12.99
T09 13.95
T12 12.99
Specifying a character range often needs some thought. Suppose you want to search for last names that begin with the letter F. The following clause won’t work because it will retrieve someone whose last name is the letter G (is the letter G, not starts with the letter G):
WHERE last_name BETWEEN 'F' AND 'G'
This next clause shows the correct way to specify the ending point (in most cases):
WHERE last_name BETWEEN 'F' AND 'Fz'
In older PostgreSQL versions, convert the floating-point numbers in Listings 4.36 and 4.38 to DECIMAL; see “Converting Data Types with CAST()” in Chapter 5. To run Listings 4.36 and 4.38, change the floating-point literals to:
CAST(19.95 AS DECIMAL)
In Microsoft Access date literals, omit the DATE keyword and surround the literal with # characters instead of quotes. To run Listing 4.37, change the dates in the WHERE clause to #2000-01-01# and #2000-12-31#.
In Microsoft SQL Server and Db2 date literals, omit the DATE keyword. To run Listing 4.37, change the dates in the WHERE clause to '2000-01-01' and '2000-12-31'.
In some DBMSs, low_value can exceed high_value; search your DBMS documentation for WHERE or BETWEEN.
Use IN to determine whether a given value matches any value in a specified list. The IN condition’s important characteristics are:
IN is a convenient, shorthand clause that you can replicate by using OR.
WHERE test_column IN
(value1, value2, value3)
is equivalent to:
WHERE (test_column = value1)
OR (test_column = value2)
OR (test_column = value3)
To filter rows by using a list:
Type:
SELECT columns
FROM table
WHERE test_column [NOT] IN
(value1, value2,...);
columns is one or more comma-separated column names, and table is the name of the table that contains columns.
In the search condition, test_column is the name of a column in table (test_column doesn’t have to be listed in columns), and value1, value2,... are one or more comma-separated values that are compared with the value in test_column. The list values can appear in any order and must be the same as or comparable to the data type of test_column. To match values that aren’t in the list, specify NOT IN (Listings 4.39, 4.40, and 4.41, Figures 4.39, 4.40, and 4.41).
Listing 4.39List the authors who don’t live in New York State, New Jersey, or California. See Figure 4.39 for the result.
SELECT au_fname, au_lname, state
FROM authors
WHERE state NOT IN ('NY', 'NJ', 'CA');
Figure 4.39Result of Listing 4.39.
au_fname au_lname state
-------- ----------- -----
Wendy Heydemark CO
Paddy O'Furniture FL
Listing 4.40List the titles for which advances of $0, $1000, or $5000 were paid. See Figure 4.40 for the result.
SELECT title_id, advance
FROM royalties
WHERE advance IN (0.00, 1000.00, 5000.00);
Figure 4.40Result of Listing 4.40.
title_id advance
-------- -------
T02 1000.00
T08 0.00
T09 0.00
Listing 4.41List the titles published on the first day of the year 2000, 2001, or 2002. See Figure 4.41 for the result.
SELECT title_id, pubdate
FROM titles
WHERE pubdate IN
(DATE '2000-01-01',
DATE '2001-01-01',
DATE '2002-01-01');
Figure 4.41Result of Listing 4.41.
title_id pubdate
-------- ----------
T05 2001-01-01
For speed, list the most likely values first. If you’re testing U.S. addresses, for example, then list the most populous states first:
WHERE state IN ('CA', 'TX', 'NY', 'FL',...,'VT', 'DC', 'WY')
The search condition
WHERE col1 BETWEEN 1 AND 5
AND col1 <> 3
usually is faster than
WHERE col1 IN (1, 2, 4, 5)
NOT IN is equivalent to combining tests for inequality with AND. This statement is equivalent to Listing 4.39:
SELECT au_fname, au_lname, state
FROM authors
WHERE state <> 'NY'
AND state <> 'NJ'
AND state <> 'CA';
In Microsoft Access date literals, omit the DATE keyword and surround the literal with # characters instead of quotes. To run Listing 4.41, change the WHERE clause to:
WHERE pubdate IN
(#1/1/2000#,
#1/1/2001#,
#1/1/2002#)
In Microsoft SQL Server and Db2 date literals, omit the DATE keyword. To run Listing 4.41, change the WHERE clause to:
WHERE pubdate IN
('2000-01-01',
'2001-01-01',
'2002-01-01')
In older PostgreSQL versions, convert the floating-point number in Listing 4.40 to DECIMAL; see “Converting Data Types with CAST()” in Chapter 5. To run Listing 4.40, change the WHERE clause to:
WHERE advance IN
(CAST( 0.00 AS DECIMAL),
CAST(1000.00 AS DECIMAL),
CAST(5000.00 AS DECIMAL))
Recall from “Nulls” in Chapter 3 that nulls represent missing or unknown values. This situation causes a problem: LIKE, BETWEEN, IN, and other WHERE-clause conditions can’t find nulls because unknown values don’t satisfy specific conditions. A null matches no value—not even other nulls. You can’t use = or <> to test whether a value is null.
In the table publishers, for example, note that publisher P03 has a null in the column state because that column doesn’t apply to Germany (Listing 4.42 and Figure 4.42). I can’t use complementary comparisons to select the null because null is neither California nor not-California; it’s undefined (Listings 4.43 and 4.44, Figures 4.43 and 4.44).
Listing 4.42List the locations of all the publishers. See Figure 4.42 for the result.
SELECT pub_id, city, state, country
FROM publishers;
Figure 4.42Result of Listing 4.42. The column state doesn’t apply to the publisher located in Germany.
pub_id city state country
------ ------------- ----- -------
P01 New York NY USA
P02 San Francisco CA USA
P03 Hamburg NULL Germany
P04 Berkeley CA USA
Listing 4.43List the publishers located in California. See Figure 4.43 for the result.
SELECT pub_id, city, state, country
FROM publishers
WHERE state = 'CA';
Figure 4.43Result of Listing 4.43. This result doesn’t include publisher P03.
pub_id city state country
------ ------------- ----- -------
P02 San Francisco CA USA
P04 Berkeley CA USA
Listing 4.44List the publishers located outside California (the wrong way—see Listing 4.45 for the correct way). See Figure 4.44 for the result.
SELECT pub_id, city, state, country
FROM publishers
WHERE state <> 'CA';
Figure 4.44Result of Listing 4.44. This result doesn’t include publisher P03 either. The conditions state = 'CA' and state <> 'CA' aren’t complementary after all; nulls don’t match any value and so can’t be selected by using the types of conditions I’ve covered so far.
pub_id city state country
------ -------- ----- -------
P01 New York NY USA
To avert disaster, SQL provides IS NULL to determine whether a given value is null. The IS NULL condition’s important characteristics are:
To retrieve rows with nulls or non-null values:
Type:
SELECT columns
FROM table
WHERE test_column IS [NOT] NULL;
columns is one or more comma-separated column names, and table is the name of the table that contains columns.
In the search condition, test_column is the name of a column in table. (test_column doesn’t have to be listed in columns.) To match non-null values, specify NOT NULL (Listings 4.45 and 4.46, Figures 4.45 and 4.46).
Listing 4.45List the publishers located outside California (the correct way). See Figure 4.45 for the result.
SELECT pub_id, city, state, country
FROM publishers
WHERE state <> 'CA'
OR state IS NULL;
Figure 4.45Result of Listing 4.45. Now publisher P03 is in the result.
pub_id city state country
------ -------- ----- -------
P01 New York NY USA
P03 Hamburg NULL Germany
Listing 4.46List the biographies whose (past or future) publication dates are known. See Figure 4.46 for the result.
SELECT title_id, type, pubdate
FROM titles
WHERE type = 'biography'
AND pubdate IS NOT NULL;
Figure 4.46Result of Listing 4.46. Without the IS NOT NULL condition, this result would have included title T10.
title_id type pubdate
-------- --------- ----------
T06 biography 2000-07-31
T07 biography 1999-10-01
T12 biography 2000-08-31
Nulls cause rows to be excluded from results only if a column containing nulls is a test column in a WHERE condition. The following query, for example, retrieves all the rows in the table publishers (refer to Figure 4.42) because the null in the column state isn’t compared with anything:
SELECT pub_id, city, state, country
FROM publishers
WHERE country <> 'Canada';
To forbid nulls in a column, see “Forbidding Nulls with NOT NULL” in Chapter 11.
It bears repeating that a null isn’t the same as an empty string (''). In the table authors, for example, the column au_fname contains an empty string for author A06 (last name of Kellsey). The WHERE condition to find the first name is
WHERE au_fname = ''
and not
WHERE au_fname IS NULL