In this chapter
A transaction is a sequence of one or more SQL statements executed as a single logical unit of work. The DBMS considers a transaction to be an indivisible, all-or-nothing proposition: it executes all the transaction’s statements as a group, or it executes none of them.
For example, suppose that a bank customer transfers $500 from her savings account to her checking account. This operation consists of two separate actions, executed sequentially:
Figure 14.1 shows the two SQL statements for this transaction. Now imagine that the DBMS fails—power outage, system crash, hardware problem—after it executes the first statement but before the second. The accounts would be out of balance without your knowledge. Accusations of malfeasance and prison time would soon follow.
To avoid legal problems, use a transaction to guarantee that both SQL statements are performed to maintain the accounts in proper balance. When something prevents one of the statements in a transaction from executing, the DBMS undoes (rolls back) the other statements of the transaction. If no error occurs, then the changes are made permanent (committed).
Figure 14.1Two SQL statements are needed when a banking customer transfers money from savings to checking.
UPDATE savings_accounts
SET balance = balance - 500.00
WHERE account_number = 208998628;
UPDATE checking_accounts
SET balance = balance + 500.00
WHERE account_number = 786783165;
To learn how transactions work, you need to learn a few terms:
Commit.Committing a transaction makes all data modifications performed since the start of the transaction a permanent part of the database. After a transaction is committed, all changes made by the transaction become visible to other users and are guaranteed to be permanent if a crash or other failure occurs.
Roll back.Rolling back a transaction retracts any of the changes resulting from the SQL statements in the transaction. After a transaction is rolled back, the affected data are left unchanged, as though the SQL statements in the transaction were never executed.
Transaction log.The transaction log file, or just log, is a serial record of all modifications that have occurred in a database via transactions. The transaction log records the start of each transaction, the changes to the data, and enough information to undo or redo the changes made by the transaction (if necessary later). The log grows continually as transactions occur in the database.
Although it’s the DBMS’s responsibility to ensure the physical integrity of each transaction, it’s your responsibility to start and end transactions at points that enforce the logical consistency of the data, according to the rules of your organization or business. A transaction should contain only the SQL statements necessary to make a consistent change—no more and no fewer. Data in all referenced tables must be in a consistent state before the transaction begins and after it ends.
When you’re designing and executing transactions, some important considerations are:
Concurrency Control
To humans, computers appear to carry out two or more processes at the same time. In reality, computer operations occur not concurrently, but in sequence. The illusion of simultaneity appears because a microprocessor works with much smaller time slices than people can perceive. In a DBMS, concurrency control is a group of strategies that prevents loss of data integrity caused by interference between two or more users trying to access or change the same data simultaneously.
DBMSs use locking strategies to ensure transactional integrity and database consistency. Locking restricts data access during read and write operations; thus, it prevents users from reading data that are being changed by other users and prevents multiple users from changing the same data at the same time. Without locking, data can become logically incorrect, and statements executed against those data can return unexpected results. Occasionally you’ll end up in a deadlock, where you and another user, each having locked a piece of data needed for the other’s transaction, attempt to get a lock on each other’s piece. Most DBMSs can detect and resolve deadlocks by rolling back one user’s transaction so that the other can proceed (otherwise, you’d both wait forever for the other to release the lock). Locking mechanisms are very sophisticated; search your DBMS documentation for locking.
Concurrency transparency is the appearance from a transaction’s perspective that it’s the only transaction operating on the database. A DBMS isolates a transaction’s changes from changes made by any other concurrent transactions. Consequently, a transaction never sees data in an intermediate state; either it sees data in the state they were in before another concurrent transaction changed them, or it sees the data after the other transaction has completed. Isolated transactions let you reload starting data and replay (roll forward) a series of transactions to end up with the data in the same state they were in after the original transactions were executed.
For a transaction to be executed in all-or-nothing fashion, the transaction’s boundaries (starting and ending points) must be clear. These boundaries let the DBMS execute the statements as one atomic unit of work. A transaction can start implicitly with the first executable SQL statement or explicitly with the START TRANSACTION statement. A transaction ends explicitly with a COMMIT or ROLLBACK statement (it never ends implicitly). You can’t roll back a transaction after you commit it.
Oracle and Db2 transactions always start implicitly, so those DBMSs have no statement that marks the start of a transaction. In Microsoft Access, Microsoft SQL Server, MySQL, and PostgreSQL, you can (or must) start a transaction explicitly by using the BEGIN statement. The SQL:1999 standard introduced the START TRANSACTION statement—long after these DBMSs already were using BEGIN to start transactions, so the extended BEGIN syntax varies by DBMS. MySQL and PostgreSQL support START TRANSACTION (as a synonym for BEGIN).
To start a transaction explicitly:
In Microsoft Access or Microsoft SQL Server, type:
BEGIN TRANSACTION;
or
In MySQL or PostgreSQL, type:
START TRANSACTION;
To commit a transaction:
Type:
COMMIT;
To roll back a transaction:
Type:
ROLLBACK;
The SELECT statements in Listing 14.1 show that the UPDATE operations are performed by the DBMS and then undone by a ROLLBACK statement. See Figure 14.2 for the result.
Listing 14.1Within a transaction block, UPDATE operations (like INSERT and DELETE operations) are never final. See Figure 14.2 for the result.
SELECT SUM(pages), AVG(price) FROM titles;
BEGIN TRANSACTION;
UPDATE titles SET pages = 0;
UPDATE titles SET price = price * 2;
SELECT SUM(pages), AVG(price) FROM titles;
ROLLBACK;
SELECT SUM(pages), AVG(price) FROM titles;
Figure 14.2Result of Listing 14.1. The results of the SELECT statements show that the DBMS cancelled the transaction.
SUM(pages) AVG(price)
---------- ----------
5107 18.3875
SUM(pages) AVG(price)
---------- ----------
0 36.7750
SUM(pages) AVG(price)
---------- ----------
5107 18.3875
Listing 14.2 shows a more practical example of a transaction. I want to delete the publisher P04 from the table publishers without generating a referential-integrity error. Because some of the foreign-key values in titles point to publisher P04 in publishers, I first need to delete the related rows from the tables titles, titles_authors, and royalties. I use a transaction to be certain that all the DELETE statements are executed. If only some of the statements were successful, then the data would be left inconsistent. (For information about referential-integrity checks, see “Specifying a Foreign Key with FOREIGN KEY” in Chapter 11.)
Listing 14.2Use a transaction to delete publisher P04 from the table publishers and delete P04’s related rows in other tables.
BEGIN TRANSACTION;
DELETE FROM title_authors
WHERE title_id IN
(SELECT title_id
FROM titles
WHERE pub_id = 'P04');
DELETE FROM royalties
WHERE title_id IN
(SELECT title_id
FROM titles
WHERE pub_id = 'P04');
DELETE FROM titles
WHERE pub_id = 'P04';
DELETE FROM publishers
WHERE pub_id = 'P04';
COMMIT;
ACID
ACID is an acronym that summarizes the properties of a transaction:
Atomicity.Either all of a transaction’s data modifications are performed, or none of them are.
Consistency.A completed transaction leaves all data in a consistent state that maintains all data integrity. A consistent state satisfies all defined database constraints. (Note that consistency isn’t necessarily preserved at any intermediate point within a transaction.)
Isolation.A transaction’s effects are isolated (or concealed) from those of all other transactions. See “Concurrency Control” earlier in this chapter.
Durability.After a transaction completes, its effects are permanent and persist even if the system fails.
Transaction theory is a big topic, separate from the relational model. A good (if sometimes dated) reference is Transaction Processing: Concepts and Techniques by Jim Gray and Andreas Reuter.
It’s faster to UPDATE multiple columns with a single SET clause than to use multiple UPDATEs. For example, the query
UPDATE mytable
SET col1 = 1
col2 = 2
col3 = 3
WHERE col1 <> 1
OR col2 <> 2
OR col3 <> 3;
is better than three UPDATE statements because it decreases logging (although it increases locking).
For long transactions, you can set arbitrary intermediate markers, called savepoints, to divide a transaction into smaller parts. Savepoints let you roll back changes made from the current point in the transaction to a location earlier in the transaction (provided that the transaction hasn’t been committed). Imagine a session in which you’ve made a complex series of uncommitted INSERTs, UPDATEs, and DELETEs and then realize that the last few changes are incorrect or unnecessary. You can use savepoints to avoid resubmitting every statement. Microsoft Access doesn’t support savepoints. For Oracle, Db2, MySQL, and PostgreSQL, use the statement
SAVEPOINT savepoint_name;
For Microsoft SQL Server, use the statement
SAVE TRANSACTION savepoint_name;
See your DBMS documentation for information about savepoint locking subtleties and how to COMMIT or ROLLBACK to a particular savepoint.
In Microsoft Access, you can’t execute transactions in a SQL View window or via DAO; you must use the Microsoft Jet OLE DB Provider and ADO.
Oracle and Db2 transactions begin implicitly. To run Listings 14.1 and 14.2 in Oracle and Db2, omit the statement BEGIN TRANSACTION;
.
To run Listings 14.1 and 14.2 in MySQL, change the statement BEGIN TRANSACTION;
to START TRANSACTION;
(or to BEGIN;
).
MySQL supports transactions through InnoDB and NDB tables; search the MySQL documentation for transactions.
Microsoft SQL Server, Oracle, MySQL, and PostgreSQL support the statement SET TRANSACTION to set the characteristics of the upcoming transaction. Db2 transaction characteristics are controlled via server-level and connection initialization settings.