When to use SELECT ... FOR UPDATE?
Solution 1
The only portable way to achieve consistency between rooms and tags and making sure rooms are never returned after they had been deleted is locking them with SELECT FOR UPDATE
.
However in some systems locking is a side effect of concurrency control, and you achieve the same results without specifying FOR UPDATE
explicitly.
To solve this problem, Thread 1 should
SELECT id FROM rooms FOR UPDATE
, thereby preventing Thread 2 from deleting fromrooms
until Thread 1 is done. Is that correct?
This depends on the concurrency control your database system is using.
MyISAM
inMySQL
(and several other old systems) does lock the whole table for the duration of a query.In
SQL Server
,SELECT
queries place shared locks on the records / pages / tables they have examined, whileDML
queries place update locks (which later get promoted to exclusive or demoted to shared locks). Exclusive locks are incompatible with shared locks, so eitherSELECT
orDELETE
query will lock until another session commits.In databases which use
MVCC
(likeOracle
,PostgreSQL
,MySQL
withInnoDB
), aDML
query creates a copy of the record (in one or another way) and generally readers do not block writers and vice versa. For these databases, aSELECT FOR UPDATE
would come handy: it would lock eitherSELECT
or theDELETE
query until another session commits, just asSQL Server
does.
When should one use
REPEATABLE_READ
transaction isolation versusREAD_COMMITTED
withSELECT ... FOR UPDATE
?
Generally, REPEATABLE READ
does not forbid phantom rows (rows that appeared or disappeared in another transaction, rather than being modified)
In
Oracle
and earlierPostgreSQL
versions,REPEATABLE READ
is actually a synonym forSERIALIZABLE
. Basically, this means that the transaction does not see changes made after it has started. So in this setup, the lastThread 1
query will return the room as if it has never been deleted (which may or may not be what you wanted). If you don't want to show the rooms after they have been deleted, you should lock the rows withSELECT FOR UPDATE
In
InnoDB
,REPEATABLE READ
andSERIALIZABLE
are different things: readers inSERIALIZABLE
mode set next-key locks on the records they evaluate, effectively preventing the concurrentDML
on them. So you don't need aSELECT FOR UPDATE
in serializable mode, but do need them inREPEATABLE READ
orREAD COMMITED
.
Note that the standard on isolation modes does prescribe that you don't see certain quirks in your queries but does not define how (with locking or with MVCC
or otherwise).
When I say "you don't need SELECT FOR UPDATE
" I really should have added "because of side effects of certain database engine implementation".
Solution 2
Short answers:
Q1: Yes.
Q2: Doesn't matter which you use.
Long answer:
A select ... for update
will (as it implies) select certain rows but also lock them as if they have already been updated by the current transaction (or as if the identity update had been performed). This allows you to update them again in the current transaction and then commit, without another transaction being able to modify these rows in any way.
Another way of looking at it, it is as if the following two statements are executed atomically:
select * from my_table where my_condition;
update my_table set my_column = my_column where my_condition;
Since the rows affected by my_condition
are locked, no other transaction can modify them in any way, and hence, transaction isolation level makes no difference here.
Note also that transaction isolation level is independent of locking: setting a different isolation level doesn't allow you to get around locking and update rows in a different transaction that are locked by your transaction.
What transaction isolation levels do guarantee (at different levels) is the consistency of data while transactions are in progress.
Comments
-
Gili almost 3 years
Please help me understand the use-case behind
SELECT ... FOR UPDATE
.Question 1: Is the following a good example of when
SELECT ... FOR UPDATE
should be used?Given:
- rooms[id]
- tags[id, name]
- room_tags[room_id, tag_id]
- room_id and tag_id are foreign keys
The application wants to list all rooms and their tags, but needs to differentiate between rooms with no tags versus rooms that have been removed. If SELECT ... FOR UPDATE is not used, what could happen is:
- Initially:
- rooms contains
[id = 1]
- tags contains
[id = 1, name = 'cats']
- room_tags contains
[room_id = 1, tag_id = 1]
- rooms contains
- Thread 1:
SELECT id FROM rooms;
returns [id = 1]
- Thread 2:
DELETE FROM room_tags WHERE room_id = 1;
- Thread 2:
DELETE FROM rooms WHERE id = 1;
- Thread 2: [commits the transaction]
- Thread 1:
SELECT tags.name FROM room_tags, tags WHERE room_tags.room_id = 1 AND tags.id = room_tags.tag_id;
- returns an empty list
Now Thread 1 thinks that room 1 has no tags, but in reality the room has been removed. To solve this problem, Thread 1 should
SELECT id FROM rooms FOR UPDATE
, thereby preventing Thread 2 from deleting fromrooms
until Thread 1 is done. Is that correct?Question 2: When should one use
SERIALIZABLE
transaction isolation versusREAD_COMMITTED
withSELECT ... FOR UPDATE
?Answers are expected to be portable (not database-specific). If that's not possible, please explain why.
-
Quassnoi almost 11 yearsWhich RDBMS are you using?
-
Gili almost 11 years@Quassnoi, as mentioned at the bottom of the question, I am looking for a portable (not database-specific) solution.
-
Billy ONeal almost 11 yearsAre the options
REPEATABLE_READ
andREAD_COMMITTED
even portable options? The only results I get for those are for MSSQL server -
Gili almost 11 years@BillyONeal, these isolation levels are defined by the SQL standard, so yes they are portable.
-
Quassnoi almost 11 years@BillyONeal: note that isolations modes guarantee that you don't see quirks they don't allow, but say nothing about the quirks they do allow. This means that setting, say,
READ COMMITTED
mode does not define whether or not you will actually see records committed by another transaction: it only makes sure you will never see uncommitted records. -
Chris Saxon almost 11 yearsA
select ... for update
onrooms
will still allowroom_tags
to be deleted because they are separate tables. Did you mean to ask whether thefor update
clause will prevent deletions fromrooms
? -
Gili almost 11 years@ChrisSaxon, yes. Thanks for the correction!
-
luochen1990 almost 3 years"SELECT tags.name FROM room_tags, tags WHERE room_tags.tag_id = 1 AND tags.id = room_tags.tag_id;" , do you mean "WHERE room_tags.room_id = 1" instead of "tag_id = 1" ?
-
Gili almost 3 years@luochen1990 good catch. Fixed.
-
Colin 't Hart almost 11 yearsThe last point is the crux of the matter, I think: "you don't need a SELECT FOR UPDATE in serializable mode, but do need them in REPEATABLE READ or READ COMMITED".
-
Gili almost 11 yearsYou're right. The second question should have asked when
SERIALIZABLE
should be used versusREAD_COMMITTED
withSELECT ... FOR UPDATE
. Can you please update your answer to reflect this updated question? -
Quassnoi almost 11 years@Gili: "you don't need a
SELECT FOR UPDATE
in serializable mode", withInnoDB
. With the otherMVCC
systems, the two are synonyms and you do needSELECT FOR UPDATE
. -
Gili almost 11 yearsI think
What transaction isolation levels do guarantee [...] is the consistency of data once transactions are completed.
incorrectly implies that isolation levels don't affect what happens during a transaction. I recommend revising this section and providing more detail about how they impact what you see (or don't see) during a transaction. -
Gili almost 11 yearsI find Colin's post answers my specific questions better than your answer but I appreciate all the references you provided. I will accept an answer that best combines the two (specific answers on top, supporting references below).
-
Gili almost 11 yearsI find your post answers my specific questions better than Quassnoi's but I appreciate all the references he provided. I will accept an answer that best combines the two (specific answers on top, supporting references below).
-
Gili almost 11 years
This depends on the concurrency control your database system is using
: I think you're splitting hairs. All cases that you list below say that the room isn't deleted betweenSELECT
to the end of the transaction. So, shouldn't the answer simply beYes
with the supporting references below? -
Quassnoi almost 11 years@Gili: in
Oracle
,PostgreSQL
andMySQL
withInnoDB
inREAD COMMITTED
mode, the room may be deleted in another transaction afterSELECT
withoutFOR UPDATE
. Actually it can even inSQL Server
, if theDELETE
query starts after the firstSELECT
completes. -
Gili almost 11 yearsYes, but we're not talking about
SELECT
withoutFOR UPDATE
. The question specifically asks aboutSELECT
withFOR UPDATE
. I think you are also splitting hairs in the answer to the second question:If you don't want to show the rooms after they have been deleted, you should lock the rows with SELECT FOR UPDATE
. When choosing betweenSERIALIZABLE
orREAD_COMMITTED
withSELECT ... FOR UPDATE
, either approach will fix the application but the database might vary how this gets implemented under the hood. Correct me if I'm wrong. -
Quassnoi almost 11 years@Gili: in your first question, you ask whether
FOR UPDATE
"should" be used. Per RFC definition of "should", you are asking about "valid reasons in particular circumstances to ignore a particular item", with "full implications understood and carefully weighed before choosing a different course". I'm trying to explain such implications so that you can understand and carefully weigh them. In some systems, you can get away without specifyingFOR UPDATE
. Say, inSQL Server
one does not simply useSELECT FOR UPDATE
, not without declaring a cursor. -
Gili almost 11 yearsSo in part 1 you're saying MVCC systems need
SELECT ... FOR UPDATE
while some other systems are atomic without it. And in part 2 you're sayingSELECT ... FOR UPDATE
is necessary forREAD_COMMITTED
orREPEATABLE_READ
but notSERIALIZABLE
. Is that correct? If so, please try rephasing the answering in a more concise way. I'm having problems seeing the forest for the trees, if you know what I mean. -
Chao over 9 yearsLocking and isolation are interchangeably complicated. So are there any books to get the knowledge about that?
-
zambotn about 2 yearsInstead of using
SELECT FOR UPDATE
can't I just open a transaction every time I need multiple instructions to be atomically executed? It sounds more straightforward to me... -
Quassnoi about 2 years@zambotn: the definition of "atomically" comes with a whole lot of fine print, which this question and answer are all about.
-
zambotn about 2 years@Quassnoi fair enough...To me "atomically" means basically
SERIALIZABLE
(instructions are executed as if there is no other instruction executed while my critical zone is active). The purpose my question was understanding how this example can be close to a real-world scenario. DoesSELECT FOR UPDATE
works so much better than aSERIALIZABLE
transaction in this example? -
Quassnoi about 2 years@zambotn: you'll have to define "better" (in a separate post, not in comments). Is it ok if the delete locks? Is it ok if the room is returned by the select after it's been deleted in a concurrent transaction and the delete has been committed? And so on.