SQL Run

Database Programming

4. Retrieving Data from a Table

In this chapter

Retrieving Columns with SELECT and FROM

Creating Column Aliases with AS

Eliminating Duplicate Rows with DISTINCT

Sorting Rows with ORDER BY

Filtering Rows with WHERE

Combining and Negating Conditions with AND, OR, and NOT

Matching Patterns with LIKE

Range Filtering with BETWEEN

List Filtering with IN

Testing for Nulls with IS NULL

This chapter introduces SQL’s workhorse—the SELECT statement. Most SQL work involves retrieving and manipulating data by using this one (albeit complex) statement. SELECT retrieves rows, columns, and derived values from one or more tables in a database; its syntax is:

SELECT columns
  FROM tables
  [JOIN joins]
  [WHERE search_condition]
  [GROUP BY grouping_columns]
  [HAVING search_condition]
  [ORDER BY sort_columns];

SELECT, FROM, ORDER BY, and WHERE are covered in this chapter, GROUP BY and HAVING in Chapter 6, and JOIN in Chapter 7. By convention, I call only a SELECT statement a query because it returns a result set. DBMS documentation and other books might refer to any SQL statement as a query. Although SELECT is powerful, it’s not dangerous: you can’t use it to add, change, or delete data or database objects. (The dangerous stuff starts in Chapter 10.)

Tip: Recall that italic_type denotes a variable in code that must be replaced with a value, and brackets indicate an optional clause or item; see “About This Book” in the Introduction for typographic and syntax conventions.

Retrieving Columns with SELECT and FROM

In its simplest form, a SELECT statement retrieves columns from a table; you can retrieve one column, multiple columns, or all columns. The SELECT clause lists the columns to display, and the FROM clause specifies the table from which to draw the columns.

To retrieve a column from a table:

Listing 4.1List the cities in which the authors live. See Figure 4.1 for the result.

SELECT city
  FROM authors;

Figure 4.1Result of Listing 4.1.

city
-------------
Bronx
Boulder
San Francisco
San Francisco
New York
Palo Alto
Sarasota

To retrieve multiple columns from a table:

Listing 4.2List each author’s first name, last name, city, and state. See Figure 4.2 for the result.

SELECT au_fname, au_lname, city, state
  FROM authors;

Figure 4.2Result of Listing 4.2.

au_fname  au_lname    city          state
--------- ----------- ------------- -----
Sarah     Buchman     Bronx         NY
Wendy     Heydemark   Boulder       CO
Hallie    Hull        San Francisco CA
Klee      Hull        San Francisco CA
Christian Kells       New York      NY
          Kellsey     Palo Alto     CA
Paddy     O'Furniture Sarasota      FL

To retrieve all columns from a table:

Listing 4.3List all the columns in the table authors. See Figure 4.3 for the result.

SELECT *
  FROM authors;

Figure 4.3Result of Listing 4.3.

au_id au_fname  au_lname    phone        address              city          state zip
----- --------- ----------- ------------ -------------------- ------------- ----- -----
A01   Sarah     Buchman     718-496-7223 75 West 205 St       Bronx         NY    10468
A02   Wendy     Heydemark   303-986-7020 2922 Baseline Rd     Boulder       CO    80303
A03   Hallie    Hull        415-549-4278 3800 Waldo Ave, #14F San Francisco CA    94123
A04   Klee      Hull        415-549-4278 3800 Waldo Ave, #14F San Francisco CA    94123
A05   Christian Kells       212-771-4680 114 Horatio St       New York      NY    10014
A06             Kellsey     650-836-7128 390 Serra Mall       Palo Alto     CA    94305
A07   Paddy     O'Furniture 941-925-0752 1442 Main St         Sarasota      FL    34236

Tips for Retrieving Columns

Results Are Unformatted

All results display raw, unformatted values. Monetary amounts lack currency signs, for example, and numbers might have too many decimal places. Reporting tools—not data-retrieval tools—format data, although DBMSs have nonstandard functions that let you format numbers and datetimes in query results. See Microsoft SQL Server’s datename() function or MySQL’s date_format() function, for example.

Creating Column Aliases with AS

In the query results so far, I’ve allowed the DBMS to use default values for column headings. (A column’s default heading in a result is the source column’s name in the table definition.) You can use the AS clause to create a column alias. A column alias is an alternative name (identifier) that you specify to control how column headings are displayed in a result. Use column aliases if column names are cryptic, hard to type, too long, or too short.

A column alias immediately follows a column name in the SELECT clause of a SELECT statement. Enclose the alias in single or double quotes if it’s a reserved keyword or if it contains spaces, punctuation, or special characters. You can omit the quotes if the alias is a single non-reserved word that contains only letters, digits, or underscores. If you want a particular column to retain its default heading, then omit its AS clause.

To create column aliases:

Listing 4.5 shows the syntactic variations of the AS clause. Figure 4.5 shows the result of Listing 4.5.

Listing 4.5The AS clause specifies a column alias to display in results. This statement shows alternative constructions for AS syntax. In your programs, pick one construction and use it consistently. See Figure 4.5 for the result.

SELECT
    au_fname AS "First name",
    au_lname AS 'Last name',
    city AS City,
    state,
    zip 'Postal code'
  FROM authors;

Figure 4.5Result of Listing 4.5.

First name Last name   City          state Postal code
---------- ----------- ------------- ----- -----------
Sarah      Buchman     Bronx         NY    10468
Wendy      Heydemark   Boulder       CO    80303
Hallie     Hull        San Francisco CA    94123
Klee       Hull        San Francisco CA    94123
Christian  Kells       New York      NY    10014
           Kellsey     Palo Alto     CA    94305
Paddy      O'Furniture Sarasota      FL    34236

In standard SQL and most DBMSs, the keyword AS is optional, but you should include it and surround aliases with double quotes to make your SQL code more portable and readable. With these syntactic conventions, Listing 4.5 is equivalent to:

SELECT
    au_fname AS "First name",
    au_lname AS "Last name",
    city     AS "City",
    state,
    zip      AS "Postal code"
  FROM authors;

Tips for Column Aliases

Eliminating Duplicate Rows with DISTINCT

Columns often contain duplicate values, and it’s common to want a result that lists each duplicate only once. If I type Listing 4.6 to list the states where the authors live, then the result, Figure 4.6, contains unneeded duplicates.

Listing 4.6List the states in which the authors live. See Figure 4.6 for the result.

SELECT state
  FROM authors;

Figure 4.6Result of Listing 4.6. This result contains unneeded duplicates of CA and NY.

state
-----
NY
CO
CA
CA
NY
CA
FL

The DISTINCT keyword eliminates duplicate rows from a result. Note that the columns of a DISTINCT result form a candidate key (unless they contain nulls).

To eliminate duplicate rows:

Listing 4.7List the distinct states in which the authors live. The keyword DISTINCT eliminates duplicate rows in the result. See Figure 4.7 for the result.

SELECT DISTINCT state
  FROM authors;

Figure 4.7Result of Listing 4.7. This result has no CA or NY duplicates.

state
-----
NY
CO
CA
FL

If the SELECT DISTINCT clause contains more than one column, then the values of all the specified columns combined determine the uniqueness of rows. The result of Listing 4.8 is Figure 4.8, which contains a duplicate row that has two columns. The result of Listing 4.9 is Figure 4.9, which eliminates the two-column duplicate.

Listing 4.8List the cities and states in which the authors live. See Figure 4.8 for the result.

SELECT city, state
  FROM authors;

Figure 4.8Result of Listing 4.8. This result contains a duplicate row for San Francisco, California.

city          state
------------- -----
Bronx         NY
Boulder       CO
New York      NY
Palo Alto     CA
San Francisco CA
San Francisco CA
Sarasota      FL

Listing 4.9List the distinct cities and states in which the authors live. See Figure 4.9 for the result.

SELECT DISTINCT city, state
  FROM authors;

Figure 4.9Result of Listing 4.9. It’s the city–state combination that’s considered to be unique, not the value in any single column.

city          state
------------- -----
Bronx         NY
Boulder       CO
New York      NY
Palo Alto     CA
San Francisco CA
Sarasota      FL

Tips for DISTINCT

Sorting Rows with ORDER BY

Rows in a query result are unordered, so you should view the order in which rows appear as being arbitrary. This situation arises because the relational model posits that row order is irrelevant for table operations. You can use the ORDER BY clause to sort rows by a specified column or columns in ascending (lowest to highest) or descending (highest to lowest) order; for details, see “Sort Order” in this section. The ORDER BY clause always is the last clause in a SELECT statement.

To sort by a column:

Listing 4.10List the authors’ first names, last names, cities, and states, sorted by ascending last name. ORDER BY performs ascending sorts by default, so the ASC keyword is optional. See Figure 4.10 for the result.

SELECT au_fname, au_lname, city, state
  FROM authors
  ORDER BY au_lname ASC;

Figure 4.10Result of Listing 4.10. This result is sorted in ascending last-name order.

au_fname  au_lname    city          state
--------- ----------- ------------- -----
Sarah     Buchman     Bronx         NY
Wendy     Heydemark   Boulder       CO
Hallie    Hull        San Francisco CA
Klee      Hull        San Francisco CA
Christian Kells       New York      NY
          Kellsey     Palo Alto     CA
Paddy     O'Furniture Sarasota      FL

Listing 4.11List the authors’ first names, last names, cities, and states, sorted by descending first name. The DESC keyword is required. See Figure 4.11 for the result.

SELECT au_fname, au_lname, city, state
  FROM authors
  ORDER BY au_fname DESC;

Figure 4.11Result of Listing 4.11. This result is sorted in descending first-name order. The first name of the author Kellsey is an empty string ('') and sorts last (or first in ascending order).

au_fname  au_lname    city          state
--------- ----------- ------------- -----
Wendy     Heydemark   Boulder       CO
Sarah     Buchman     Bronx         NY
Paddy     O'Furniture Sarasota      FL
Klee      Hull        San Francisco CA
Hallie    Hull        San Francisco CA
Christian Kells       New York      NY
          Kellsey     Palo Alto     CA

Sort Order

Sorting numeric and datetime values is unambiguous; sorting character strings is complex. A DBMS uses a collating sequence, or collation, to determine the order in which characters are sorted. The collation defines the order of precedence for every character in your character set. Your character set depends on the language that you’re using—European languages (a Latin character set), Hebrew (the Hebrew alphabet), or Chinese (ideographs), for example. The collation also determines case sensitivity (is ‘A’ < ‘a’?), accent sensitivity (is ‘A’ < ‘Á’?), width sensitivity (for multibyte or Unicode characters), and other factors such as linguistic practices. The SQL standard doesn’t define particular collations and character sets, so each DBMS uses its own sorting strategy and default collation. DBMSs provide commands or tools that display the current collation and character set. Run the command exec sp_helpsort in Microsoft SQL Server, for example. Search your DBMS documentation for collation or sort order.

To sort by multiple columns:

Listing 4.12List the authors’ first names, last names, cities, and states, sorted by descending city within ascending state. See Figure 4.12 for the result.

SELECT au_fname, au_lname, city, state
  FROM authors
  ORDER BY
    state ASC,
    city  DESC;

Figure 4.12Result of Listing 4.12.

au_fname  au_lname    city          state
--------- ----------- ------------- -----
Hallie    Hull        San Francisco CA
Klee      Hull        San Francisco CA
          Kellsey     Palo Alto     CA
Wendy     Heydemark   Boulder       CO
Paddy     O'Furniture Sarasota      FL
Christian Kells       New York      NY
Sarah     Buchman     Bronx         NY

Sorting by Substrings

To sort results by specific parts of a string, use the functions described in “Extracting a Substring with SUBSTRING()” in Chapter 5. For example, this query sorts by the last four characters of phone:

SELECT au_id, phone
  FROM authors
  ORDER BY substr(phone, length(phone)-3);

This query works for Oracle, Db2, MySQL, and PostgreSQL. In Microsoft SQL Server, use substring(phone, len(phone)-3, 4). In Microsoft Access, use Mid(phone, len(phone)-3, 4).

You can specify relative column-position numbers instead of column names in ORDER BY. The position numbers refer to the columns in the result, not the original table. Using column positions saves typing, but the resulting code is unclear and invites mistakes if you reorder the columns in the SELECT clause.

To sort by relative column positions:

Listing 4.13List each author’s first name, last name, city, and state, sorted first by ascending state (column 4 in the SELECT clause) and then by descending last name within each state (column 2). See Figure 4.13 for the result.

SELECT au_fname, au_lname, city, state
  FROM authors
  ORDER BY 4 ASC, 2 DESC;

Figure 4.13Result of Listing 4.13.

au_fname  au_lname    city          state
--------- ----------- ------------- -----
          Kellsey     Palo Alto     CA
Hallie    Hull        San Francisco CA
Klee      Hull        San Francisco CA
Wendy     Heydemark   Boulder       CO
Paddy     O'Furniture Sarasota      FL
Christian Kells       New York      NY
Sarah     Buchman     Bronx         NY

Sorting and Nulls

Sorting is one of the situations where SQL departs from the idea that a null isn’t equal to any other value, including another null. (The logical comparison NULL = NULL is unknown, not true.) When nulls are sorted, they all are considered to be equal to one another.

The SQL standard leaves it up to the DBMS to decide whether nulls are either greater than or less than all non-null values. Microsoft Access, Microsoft SQL Server, and MySQL treat nulls as the lowest possible values (Listing 4.14 and Figure 4.14). Oracle, Db2, and PostgreSQL treat nulls as the highest possible values. See also “Nulls” in Chapter 3.

In Oracle and PostgreSQL, use NULLS FIRST or NULLS LAST with ORDER BY to control null-sorting behavior. For other DBMSs, create a derived column (see Chapter 5) that flags nulls—CASE WHEN column IS NULL THEN 0 ELSE 1 END AS is_null, for example—and add it as the first column (with ASC or DESC) in the ORDER BY clause.

Listing 4.14Nulls in a sort column are listed first or last, depending on the DBMS. See Figure 4.14 for the result.

SELECT pub_id, state, country
  FROM publishers
  ORDER BY state ASC;

Figure 4.14Result of Listing 4.14. This result is sorted by ascending state. The DBMS in which I ran this query treats nulls as the lowest possible values, so the row with the null state is listed first. A DBMS that treats nulls as the highest possible values would list the same row last.

pub_id state country
------ ----- -------
P03    NULL  Germany
P04    CA    USA
P02    CA    USA
P01    NY    USA

Tips for ORDER BY

Sorting Speed

The three factors that most affect sorting speed are, in order of importance:

Always restrict a sort to the minimum number of rows needed. Running times of sorting routines don’t scale linearly with the number of rows sorted—so sorting 10n rows takes much more than 10 times longer than sorting n rows. Also try to reduce the number of sorted columns and the columns’ data-type lengths in the table definition, if possible.

Filtering Rows with WHERE

The result of each SELECT statement so far has included every row in the table (for the specified columns). You can use the WHERE clause to filter unwanted rows from the result. This filtering capability gives the SELECT statement its real power. In a WHERE clause, you specify a search condition that has one or more conditions that need to be satisfied by the rows of a table. A condition, or predicate, is a logical expression that evaluates to true, false, or unknown. Rows for which the condition is true are included in the result; rows for which the condition is false or unknown are excluded. (An unknown result, which arises from nulls, is described in the next section.) SQL provides operators that express different types of conditions (Table 4.1). Operators are symbols or keywords that specify actions to perform on values or other elements.

Table 4.1Types of Conditions
Condition SQL Operators
Comparison =, <>, <, <=, >, >=
Pattern matching LIKE
Range filtering BETWEEN
List filtering IN
Null testing IS NULL

SQL’s comparison operators compare two values and evaluate to true, false, or unknown (Table 4.2).

Table 4.2Comparison Operators
Operator Description
= Equal to
<> Not equal to
< Less than
<= Less than or equal to
> Greater than
>= Greater than or equal to

The data type determines how values are compared:

Compare only identical or similar data types. If you try to compare values that have different data types, then your DBMS might:

To filter rows by making a comparison:

Listing 4.18This List the authors whose last name is not Hull. See Figure 4.18 for the result.

SELECT au_id, au_fname, au_lname
  FROM authors
  WHERE au_lname <> 'Hull';

Figure 4.18Result of Listing 4.18.

au_id au_fname  au_lname
----- --------- -----------
A01   Sarah     Buchman
A02   Wendy     Heydemark
A05   Christian Kells
A06             Kellsey
A07   Paddy     O'Furniture

Listing 4.19List the titles for which there is no signed contract. See Figure 4.19 for the result.

SELECT title_name, contract
  FROM titles
  WHERE contract = 0;

Figure 4.19Result of Listing 4.19.

title_name                 contract
-------------------------- --------
Not Without My Faberge Egg        0

Listing 4.20List the titles published in 2001 and later. See Figure 4.20 for the result.

SELECT title_name, pubdate
  FROM titles
  WHERE pubdate >= DATE '2001-01-01';

Figure 4.20Result of Listing 4.20.

title_name                   pubdate
---------------------------- ----------
Exchange of Platitudes       2001-01-01
Just Wait Until After School 2001-06-01
Kiss My Boo-Boo              2002-05-31

The right and left sides of the comparison can also be complex expressions. The general form of a comparison is:

expr1 op expr2

expr1 and expr2 are expressions. An expression is any valid combination of column names, literals, functions, and operators that resolves to a single value (per row). Chapter 5 covers expressions in more detail (Listing 4.21 and Figure 4.21).

Listing 4.21List the titles that generated more than $1 million in revenue. This search condition uses an arithmetic expression. See Figure 4.21 for the result.

SELECT
    title_name,
    price * sales AS "Revenue"
  FROM titles
  WHERE price * sales > 1000000;

Figure 4.21Result of Listing 4.21.

title_name                    Revenue
----------------------------- -----------
Ask Your System Administrator  1025396.65
Exchange of Platitudes         1400008.00
I Blame My Mother             35929790.00
Spontaneous, Not Annoying      1299012.99

Tips for WHERE

Column Aliases and WHERE

If you alias a column in a SELECT clause (see “Creating Column Aliases with AS” earlier in this chapter), then you can’t reference it in the WHERE clause. The following query fails because the WHERE clause is evaluated before the SELECT clause, so the alias copies_sold doesn’t yet exist when the WHERE clause is evaluated:

-- Wrong
SELECT sales AS copies_sold
  FROM titles
  WHERE copies_sold > 100000;

Instead, use a subquery (Chapter 8) in the FROM clause, which is evaluated before the WHERE clause:

-- Correct
SELECT *
  FROM (SELECT sales AS copies_sold
        FROM titles) ta
  WHERE copies_sold > 100000;

This solution works not only for columns aliases but also for aggregate functions, scalar subqueries, and window functions referenced in WHERE clauses. Note that in the latter query, the subquery is aliased ta (a table alias). All DBMSs accept table aliases, but not all require them. See also “Using Subqueries as Column Expressions” in Chapter 8.

Combining and Negating Conditions with AND, OR, and NOT

You can specify multiple conditions in a single WHERE clause to, say, retrieve rows based on the values in multiple columns. You can use the AND and OR operators to combine two or more conditions into a compound condition. AND, OR, and a third operator, NOT, are logical operators. Logical operators, or boolean operators, are operators designed to work with truth values: true, false, and unknown.

If you’ve programmed in other languages (or studied propositional logic), then you’re familiar with the two-value logic (2VL) system. In two-value logic, the result of a logical expression is either true or false. 2VL assumes perfect knowledge, in which all propositions are known to be true or false. Databases model real data, however, and our knowledge of the world is imperfect—that’s why we use nulls to represent unknown values (see “Nulls” in Chapter 3).

2VL is insufficient to represent knowledge gaps, so SQL uses three-value logic (3VL). In three-value logic, the result of a logical expression is true, false, or unknown. If the result of a compound condition is false or unknown, then the row is excluded from the result. (To retrieve rows with nulls, see “Testing for Nulls with IS NULL” later in this chapter.)

The AND Operator

The AND operator’s important characteristics are:

See Listings 4.22 and 4.23, and Figures 4.22 and 4.23, for some AND examples.

Listing 4.22List the biographies that sell for less than $20. See Figure 4.22 for the result.

SELECT title_name, type, price
  FROM titles
  WHERE type = 'biography' AND price < 20;

Figure 4.22Result of Listing 4.22.

title_name                type      price
------------------------- --------- -----
How About Never?          biography 19.95
Spontaneous, Not Annoying biography 12.99

Listing 4.23List the authors whose last names begin with one of the letters H through Z and who don’t live in California. See Figure 4.23 for the result.

SELECT au_fname, au_lname
  FROM authors
  WHERE au_lname >= 'H'
    AND au_lname <= 'Zz'
    AND state <> 'CA';

Figure 4.23Result of Listing 4.23. Remember that the results of string comparisons depend on the DBMS’s collating sequence; see “Sorting Rows with ORDER BY” earlier in this chapter.

au_fname  au_lname
--------- -----------
Wendy     Heydemark
Christian Kells
Paddy     O'Furniture

The OR Operator

The OR operator’s important characteristics are:

See Listings 4.24 and 4.25, and Figures 4.24 and 4.25, for some OR examples.

Listing 4.24List the authors who live in New York State, Colorado, or San Francisco. See Figure 4.24 for the result.

SELECT au_fname, au_lname, city, state
  FROM authors
  WHERE (state = 'NY')
     OR (state = 'CO')
     OR (city  = 'San Francisco');

Figure 4.24Result of Listing 4.24.

au_fname  au_lname  city          state
--------- --------- ------------- -----
Sarah     Buchman   Bronx         NY
Wendy     Heydemark Boulder       CO
Hallie    Hull      San Francisco CA
Klee      Hull      San Francisco CA
Christian Kells     New York      NY

Listing 4.25 shows the effect of nulls in conditions. You might expect the result, Figure 4.25, to display all the rows in the table publishers. But the row for publisher P03 (located in Germany) is missing because it contains a null in the column state. The null causes the result of both of the OR conditions to be unknown, so the row is excluded from the result. To test for nulls, see “Testing for Nulls with IS NULL” later in this chapter.

Listing 4.25List the publishers that are located in California or are not located in California. This example is contrived to show the effect of nulls in conditions. See Figure 4.25 for the result.

SELECT pub_id, pub_name, state, country
  FROM publishers
  WHERE (state =  'CA')
     OR (state <> 'CA');

Figure 4.25Result of Listing 4.25. Publisher P03 is missing because its state is null.

pub_id pub_name          state country
------ ----------------- ----- -------
P01    Abatis Publishers NY    USA
P02    Core Dump Books   CA    USA
P04    Tenterhooks Press CA    USA

The NOT Operator

The NOT operator’s important characteristics are:

See Listings 4.26 and 4.27, and Figures 4.26 and 4.27, for some NOT examples.

Listing 4.26List the authors who don’t live in California. See Figure 4.26 for the result.

SELECT au_fname, au_lname, state
  FROM authors
  WHERE NOT (state = 'CA');

Figure 4.26Result of Listing 4.26.

au_fname  au_lname    state
--------- ----------- -----
Sarah     Buchman     NY
Wendy     Heydemark   CO
Christian Kells       NY
Paddy     O'Furniture FL

Listing 4.27List the titles whose price is not less than $20 and that have sold more than 15000 copies. See Figure 4.27 for the result.

SELECT title_name, sales, price
  FROM titles
  WHERE NOT (price < 20)
    AND (sales > 15000);

Figure 4.27Result of Listing 4.27.

title_name                    sales   price
----------------------------- ------- -----
Ask Your System Administrator   25667 39.95
I Blame My Mother             1500200 23.95

Using AND, OR, and NOT Together

You can combine the three logical operators in a compound condition. Your DBMS uses SQL’s precedence rules to determine which operators to evaluate first. Precedence is covered in “Determining the Order of Evaluation” in Chapter 5, but for now you need know only that when you use multiple logical operators in a compound condition, NOT is evaluated first, then AND, and finally OR. You can override this order with parentheses: everything in parentheses is evaluated first. When parenthesized conditions are nested, the innermost condition is evaluated first. Under the default precedence rules, the condition

p AND NOT q OR r

is equivalent to

(p AND (NOT q)) OR r

It’s wise to use parentheses, rather than rely on the default evaluation order, to make the evaluation order clear.

If I want to list history and biography titles priced less than $20, for example, then Listing 4.28 won’t work. AND is evaluated before OR, so the query is evaluated as follows:

  1. Find all the biography titles less than $20.
  2. Find all the history titles (regardless of price).
  3. List both sets of titles in the result (Figure 4.28).

Listing 4.28This query won’t work if I want to list history and biography titles less than $20 because AND has higher precedence than OR. See Figure 4.28 for the result.

SELECT title_id, type, price
  FROM titles
  WHERE type  = 'history'
     OR type  = 'biography'
    AND price < 20;

Figure 4.28Result of Listing 4.28. This result contains two history titles priced more than $20, which isn’t what I want.

title_id type      price
-------- --------- -----
T01      history   21.99
T02      history   19.95
T06      biography 19.95
T12      biography 12.99
T13      history   29.99

To fix this query, add parentheses to force evaluation of OR first. Listing 4.29 is evaluated as follows:

  1. Find all the biography and history titles.
  2. Of the titles found in step 1, keep the ones priced less than $20.
  3. List the subset of titles in the result (Figure 4.29).

Listing 4.29To fix Listing 4.28, add parentheses to force OR to be evaluated before AND. See Figure 4.29 for the result.

SELECT title_id, type, price
  FROM titles
  WHERE (type  = 'history'
     OR  type  = 'biography')
    AND  price < 20;

Figure 4.29Result of Listing 4.29.

title_id type      price
-------- --------- -----
T02      history   19.95
T06      biography 19.95
T12      biography 12.99

Dissecting WHERE Clauses

If your WHERE clause isn’t working, then you can debug it by displaying the result of each condition individually. To see the result of each comparison in Listing 4.29, for example, put each comparison expression in the SELECT clause’s output column list, along with the values you’re comparing:

SELECT
    type,
    type = 'history' AS "Hist?",
    type = 'biography' AS "Bio?",
    price,
    price < 20 AS "<20?"
  FROM titles;

This query runs on Microsoft Access, MySQL, and PostgreSQL. If your DBMS interprets the = symbol as an assignment operator rather than as a comparison operator, then you must substitute equivalent expressions for the logical comparisons. In Oracle, for example, you can replace type = 'history' with INSTR(type, 'history'). The query’s result is:

type       Hist? Bio?  price  <20?
---------- ----- ----  -----  ----
history        1    0  21.99     0
history        1    0  19.95     1
computer       0    0  39.95     0
psychology     0    0  12.99     1
psychology     0    0   6.95     1
biography      0    1  19.95     1
biography      0    1  23.95     0
children       0    0  10.00     1
children       0    0  13.95     1
biography      0    1   NULL  NULL
psychology     0    0   7.99     1
biography      0    1  12.99     1
history        1    0  29.99     0

The comparison columns display zero if the comparison is false, nonzero if it’s true, or null if it’s unknown.

Tips for AND, OR, and NOT

Re-expressing Conditions

You must master the laws in Table 4.6 to become a competent programmer in SQL (or any language). They’re especially useful when you want to re-express conditions to make queries run faster. For example, the statement

SELECT * FROM mytable
  WHERE col1 = 1
    AND NOT (col1 = col2 OR col3 = 3);

is equivalent to

SELECT * FROM mytable
  WHERE col1 = 1
    AND col2 <> 1
    AND col3 <> 3;

but the latter one will run faster if your DBMS’s optimizer isn’t smart enough to re-express the former internally. (The condition col1 = col2 is more expensive computationally than comparing col1 and col2 to literal values.)

You also can use the laws to change a condition into its opposite. For example, the reverse of the condition

WHERE (col1 = 'A') AND (col2 = 'B')

is

WHERE (col1 <> 'A') OR (col2 <> 'B')

In this case, it would have easier just to negate the entire original expression with NOT:

WHERE NOT ((col1 = 'A') AND (col2 = 'B'))

But this simple approach won’t work with complex conditions involving multiple ANDs, ORs, and NOTs.

Here’s a problem to solve: look at only the first code line below and see whether you can repeatedly apply equivalency rules to push the NOT operators inward until they apply to only the individual expressions p, q, and r:

NOT ((p AND q) OR (NOT p AND r))
= NOT (p AND q) AND NOT (NOT p AND r)
= (NOT p OR NOT q) AND (p OR NOT r)

Matching Patterns with LIKE

The preceding examples retrieved rows based on the exact value of a column or columns. You can use LIKE to retrieve rows based on partial information. LIKE is useful if you don’t know an exact value (“The author’s last name is Kel-something”) or you want to retrieve rows with similar values (“Which authors live in the San Francisco Bay Area?”). The LIKE condition’s important characteristics are:

To filter rows by matching a pattern:

Listing 4.30List the authors whose last names begin with Kel. See Figure 4.30 for the result.

SELECT au_fname, au_lname
  FROM authors
  WHERE au_lname LIKE 'Kel%';

Figure 4.30Result of Listing 4.30.

au_fname  au_lname
--------- --------
Christian Kells
          Kellsey

Listing 4.31List the authors whose last names have ll (el-el) as the third and fourth characters. See Figure 4.31 for the result.

SELECT au_fname, au_lname
  FROM authors
  WHERE au_lname LIKE '__ll%';

Figure 4.31Result of Listing 4.31.

au_fname  au_lname
--------- --------
Hallie    Hull
Klee      Hull
Christian Kells
          Kellsey

Listing 4.32List the authors who live in the San Francisco Bay Area (that is, with postal codes that begin with 94). See Figure 4.32 for the result.

SELECT au_fname, au_lname, city, state, zip
  FROM authors
  WHERE zip LIKE '94___';

Figure 4.32Result of Listing 4.32.

au_fname au_lname city          state zip
-------- -------- ------------- ----- -----
Hallie   Hull     San Francisco CA    94123
Klee     Hull     San Francisco CA    94123
         Kellsey  Palo Alto     CA    94305

Listing 4.33List the authors who live outside the 212, 415, and 303 area codes. This example shows three alternative patterns for excluding telephone numbers. You should favor the first alternative because single-character matches (_) are faster than multiple-character ones (%). See Figure 4.33 for the result.

SELECT au_fname, au_lname, phone
  FROM authors
  WHERE phone NOT LIKE '212-___-____'
    AND phone NOT LIKE '415-___-%'
    AND phone NOT LIKE '303-%';

Figure 4.33Result of Listing 4.33.

au_fname au_lname    phone
-------- ----------- ------------
Sarah    Buchman     718-496-7223
         Kellsey     650-836-7128
Paddy    O'Furniture 941-925-0752

You can search for values that contain the special wildcard characters. Use the ESCAPE keyword to specify an escape character that you can use to search for a percent sign or underscore as a literal character. Immediately precede a wildcard character with an escape character to strip the wildcard of its special meaning. If the escape character is !, for example, then !% in a pattern searches values for a literal %. (Unescaped wildcards still have their special meaning.) The escape character can’t be part of the value that you’re trying to retrieve; if you’re searching for '50% OFF!', then choose an escape character other than !. Table 4.9 shows some examples of escaped and unescaped patterns; the designated escape character is !.

Table 4.9Escaped and Unescaped Patterns
Pattern Matches
'100%' Unescaped. Matches 100 followed by a string of zero or more characters.
'100!%' Escaped. Matches ‘100%’.
'_op' Unescaped. Matches ‘top’, ‘hop’, ‘pop’, and so on.
'!_op' Escaped. Matches ‘_op’.

To match a wildcard character:

Listing 4.34List the titles that contain percent signs. Only the % that follows the escape character ! has its literal meaning; the other two percent signs still act as wildcards. See Figure 4.34 for the result.

SELECT title_name
  FROM titles
  WHERE title_name LIKE '%!%%' ESCAPE '!';

Figure 4.34Result of Listing 4.34. An empty result. No title names contain a % character.

title_name
----------------------------------------

Tips for LIKE

Range Filtering with BETWEEN

Use BETWEEN to determine whether a given value falls within a specified range. The BETWEEN condition’s important characteristics are:

To filter rows by using a range:

Listing 4.35List the authors who live outside the postal range 20000–89999. See Figure 4.35 for the result.

SELECT au_fname, au_lname, zip
  FROM authors
  WHERE zip NOT BETWEEN '20000' AND '89999';

Figure 4.35Result of Listing 4.35.

au_fname  au_lname zip
--------- -------- -----
Sarah     Buchman  10468
Hallie    Hull     94123
Klee      Hull     94123
Christian Kells    10014
          Kellsey  94305

Listing 4.36List the titles priced between $10 and $19.95, inclusive. See Figure 4.36 for the result.

SELECT title_id, price
  FROM titles
  WHERE price BETWEEN 10 AND 19.95;

Figure 4.36Result of Listing 4.36.

title_id price
-------- -----
T02      19.95
T04      12.99
T06      19.95
T08      10.00
T09      13.95
T12      12.99

Listing 4.37List the titles published in 2000. See Figure 4.37 for the result.

SELECT title_id, pubdate
  FROM titles
  WHERE
    pubdate BETWEEN DATE '2000-01-01'
                AND DATE '2000-12-31';

Figure 4.37Result of Listing 4.37.

title_id pubdate
-------- ----------
T01      2000-08-01
T03      2000-09-01
T06      2000-07-31
T11      2000-11-30
T12      2000-08-31

Tips for BETWEEN

List Filtering with IN

Use IN to determine whether a given value matches any value in a specified list. The IN condition’s important characteristics are:

To filter rows by using a list:

Listing 4.39List the authors who don’t live in New York State, New Jersey, or California. See Figure 4.39 for the result.

SELECT au_fname, au_lname, state
  FROM authors
  WHERE state NOT IN ('NY', 'NJ', 'CA');

Figure 4.39Result of Listing 4.39.

au_fname au_lname    state
-------- ----------- -----
Wendy    Heydemark   CO
Paddy    O'Furniture FL

Listing 4.40List the titles for which advances of $0, $1000, or $5000 were paid. See Figure 4.40 for the result.

SELECT title_id, advance
  FROM royalties
  WHERE advance IN (0.00, 1000.00, 5000.00);

Figure 4.40Result of Listing 4.40.

title_id advance
-------- -------
T02      1000.00
T08         0.00
T09         0.00

Listing 4.41List the titles published on the first day of the year 2000, 2001, or 2002. See Figure 4.41 for the result.

SELECT title_id, pubdate
  FROM titles
  WHERE pubdate IN
    (DATE '2000-01-01',
     DATE '2001-01-01',
     DATE '2002-01-01')
;

Figure 4.41Result of Listing 4.41.

title_id pubdate
-------- ----------
T05      2001-01-01

Tips for IN

Testing for Nulls with IS NULL

Recall from “Nulls” in Chapter 3 that nulls represent missing or unknown values. This situation causes a problem: LIKE, BETWEEN, IN, and other WHERE-clause conditions can’t find nulls because unknown values don’t satisfy specific conditions. A null matches no value—not even other nulls. You can’t use = or <> to test whether a value is null.

In the table publishers, for example, note that publisher P03 has a null in the column state because that column doesn’t apply to Germany (Listing 4.42 and Figure 4.42). I can’t use complementary comparisons to select the null because null is neither California nor not-California; it’s undefined (Listings 4.43 and 4.44, Figures 4.43 and 4.44).

Listing 4.42List the locations of all the publishers. See Figure 4.42 for the result.

SELECT pub_id, city, state, country
  FROM publishers;

Figure 4.42Result of Listing 4.42. The column state doesn’t apply to the publisher located in Germany.

pub_id city          state country
------ ------------- ----- -------
P01    New York      NY    USA
P02    San Francisco CA    USA
P03    Hamburg       NULL  Germany
P04    Berkeley      CA    USA

Listing 4.43List the publishers located in California. See Figure 4.43 for the result.

SELECT pub_id, city, state, country
  FROM publishers
  WHERE state = 'CA';

Figure 4.43Result of Listing 4.43. This result doesn’t include publisher P03.

pub_id city          state country
------ ------------- ----- -------
P02    San Francisco CA    USA
P04    Berkeley      CA    USA

Listing 4.44List the publishers located outside California (the wrong way—see Listing 4.45 for the correct way). See Figure 4.44 for the result.

SELECT pub_id, city, state, country
  FROM publishers
  WHERE state <> 'CA';

Figure 4.44Result of Listing 4.44. This result doesn’t include publisher P03 either. The conditions state = 'CA' and state <> 'CA' aren’t complementary after all; nulls don’t match any value and so can’t be selected by using the types of conditions I’ve covered so far.

pub_id city     state country
------ -------- ----- -------
P01    New York NY    USA

To avert disaster, SQL provides IS NULL to determine whether a given value is null. The IS NULL condition’s important characteristics are:

To retrieve rows with nulls or non-null values:

Listing 4.45List the publishers located outside California (the correct way). See Figure 4.45 for the result.

SELECT pub_id, city, state, country
  FROM publishers
  WHERE state <> 'CA'
     OR state IS NULL;

Figure 4.45Result of Listing 4.45. Now publisher P03 is in the result.

pub_id city     state country
------ -------- ----- -------
P01    New York NY    USA
P03    Hamburg  NULL  Germany

Listing 4.46List the biographies whose (past or future) publication dates are known. See Figure 4.46 for the result.

SELECT title_id, type, pubdate
  FROM titles
  WHERE type = 'biography'
    AND pubdate IS NOT NULL;

Figure 4.46Result of Listing 4.46. Without the IS NOT NULL condition, this result would have included title T10.

title_id type      pubdate
-------- --------- ----------
T06      biography 2000-07-31
T07      biography 1999-10-01
T12      biography 2000-08-31

Tips for IS NULL