mysql / deterministic / reads sql data
Can I use the MySQL function characteristic DETERMINISTIC
in
combination with READS SQL DATA
and do I want to?
TL;DR
If the following two groups of statements are the same to you, you want
the DETERMINISTIC
characteristic on your FUNCTION, even if you have
READS SQL DATA
.
SET @id = (SELECT my_func());
SELECT * FROM my_large_table WHERE id = @id;
-- versus
SELECT * FROM my_large_table WHERE id = my_func();
(All of this is tested with MySQL 5.7.16 and some was also tested with MySQL cluster 5.6.)
First, some background
You may or may not have run into this MySQL error at one point:
You do not have the SUPER privilege and binary logging is enabled (you
*might* want to use the less safe log_bin_trust_function_creators
variable)
You may specify certain characteristics to MySQL FUNCTIONs/PROCEDUREs when creating them:
CREATE [DEFINER = { user | CURRENT_USER }] PROCEDURE sp_name ([proc_parameter[,...]]) [characteristic ...] routine_body CREATE [DEFINER = { user | CURRENT_USER }] FUNCTION sp_name ([func_parameter[,...]]) RETURNS type [characteristic ...] routine_body ... characteristic: COMMENT 'string' | LANGUAGE SQL | [NOT] DETERMINISTIC | { CONTAINS SQL | NO SQL | READS SQL DATA | MODIFIES SQL DATA } | SQL SECURITY { DEFINER | INVOKER }
Source: http://dev.mysql.com/doc/refman/5.7/en/create-procedure.html
For instance. We can create a function that returns 4.
mysql> CREATE FUNCTION test_4() RETURNS INT
DETERMINISTIC
NO SQL
RETURN 4;
mysql> select test_4()\G
*************************** 1. row ***************************
test_4(): 4
Not very useful, but it illustrates the most basic layout of a MySQL
function. We added the DETERMINISTIC
and NO SQL
characteristics
because the function always returns the same output for the same input
(deterministic) and it contains no SQL statements (no sql).
DETERMINISTIC characteristic, and how it affects replication
If you happen to create such a function as a non-privileged user on a
slavable MySQL server — i.e. one that creates binlogs that a MySQL
slave can use to replicate the statements (using the log_bin
mysqld
setting) — you would run into the “You do not have the SUPER privilege
and binary logging is enabled” error.
Why?
Because MySQL could use the characteristics from the functions to determine how a statement should be replicated. In this case, the characteristics might tell the replication: if this function is used in a statement, we can execute the same statement on the slave and keep consistent slave records.
To test this out, we create a few different functions and check how they would be replicated.
-- A real deterministic function.
CREATE FUNCTION f_deterministic() RETURNS INT
DETERMINISTIC RETURN 4;
-- A non-deterministic function.
CREATE FUNCTION f_non_deterministic() RETURNS FLOAT
DETERMINISTIC RETURN RAND();
-- A real deterministic function that modifies SQL data.
-- This one inserts a record into test_abc.
CREATE TABLE test_abc(id INT PRIMARY KEY AUTO_INCREMENT, value FLOAT NOT NULL);
DELIMITER ;;
CREATE FUNCTION f_modifies_sql_data() RETURNS FLOAT
DETERMINISTIC MODIFIES SQL DATA
BEGIN
INSERT INTO test_abc (value) VALUES (5);
RETURN 5;
END
;;
DELIMITER ;
If you run it, you get (something like) this:
mysql> SELECT f_deterministic(), f_non_deterministic(), f_modifies_sql_data();
+-------------------+-----------------------+-----------------------+
| f_deterministic() | f_non_deterministic() | f_modifies_sql_data() |
+-------------------+-----------------------+-----------------------+
| 4 | 0.7306850552558899 | 5 |
+-------------------+-----------------------+-----------------------+
If you use binlog-format = STATEMENT
this you get this warning:
Unsafe statement written to the binary log using statement format since
BINLOG_FORMAT = STATEMENT.
The binlog contains this row; statement based replication as expected:
SELECT `testdb`.`f_modifies_sql_data`()
For ROW
and MIXED
binlog formats, the binlog looks like this:
$ mysqlbinlog /var/log/mysql/mysql-bin.000005 --base64-output=decode-rows --verbose
...
### INSERT INTO `testdb`.`test_abc`
### SET
### @1=2
### @2=5
This was unexpected. On this MySQL 5.7.16, for the
MODIFIES SQL DATA
characteristic, the DETERMINISTIC
property is
ignored for the binlogs and instead of replicating the query, it
replicates the altered rows.
Note that that is not a bad thing. This keeps things consistent even
if the function is mislabeled and should’ve been NON DETERMINISTIC
.
Lying in the characteristics
What if we lie to MySQL? And tell it that the function does not modify SQL data?
No problem for MIXED based replication. With DETERMINISTIC NO SQL
the
INSERT statement is still propagated in the binlog.
This appears that the MIXED/ROW based replication is unaffected by labeling/mislabeling of the function characteristics. That’s one thing less to worry about. (For the tested MySQL version only! YMMV!)
Then, is there another reason to get the characteristics right?
Yes there is: query optimizations.
Query optimizations and DETERMINISTIC
Consider this table:
mysql> CREATE TABLE test_seq (id INT PRIMARY KEY NOT NULL, value FLOAT NOT NULL);
mysql> INSERT INTO test_seq VALUES (1,-1), (2, -2), (3, -3), (4, -4), (5, -5), (6, -6);
mysql> SELECT * FROM test_seq;
+----+-------+
| id | value |
+----+-------+
| 1 | -1 |
| 2 | -2 |
| 3 | -3 |
| 4 | -4 |
| 5 | -5 |
| 6 | -6 |
+----+-------+
What happens if we do this query:
SELECT * FROM test_seq WHERE id = f_modifies_sql_data();
Does it return 5
? Yes it does. But it also inserts records into
test_abc
, because we told the function to do so.
mysql> DELETE FROM test_abc;
mysql> SELECT * FROM test_seq WHERE id = f_modifies_sql_data();
+----+-------+
| id | value |
+----+-------+
| 5 | -5 |
+----+-------+
mysql> SELECT * FROM test_abc;
+----+-------+
| id | value |
+----+-------+
| 8 | 5 |
| 9 | 5 |
+----+-------+
Apparently that function was called twice. We would expect once, but a
second time is okay. But what if we relabel it as NOT DETERMINISTIC
?
mysql> DELIMITER ;;
mysql> CREATE FUNCTION f_modifies_sql_data_nondet() RETURNS FLOAT
NOT DETERMINISTIC MODIFIES SQL DATA
BEGIN
INSERT INTO test_abc (value) VALUES (5);
RETURN 5;
END
;;
ERROR 1418 (HY000): This function has none of DETERMINISTIC, NO SQL, or READS
SQL DATA in its declaration and binary logging is enabled (you *might* want to
use the less safe log_bin_trust_function_creators variable)
Heh, a different error. Lets enable log_bin_trust_function_creators
.
This time it writes.
mysql> DELETE FROM test_abc;
mysql> SELECT * FROM test_seq WHERE id = f_modifies_sql_data_nondet();
mysql> SELECT COUNT(*) FROM test_abc\G
*************************** 1. row ***************************
COUNT(*): 6
That’s right! Six records. One record inserted for every comparison
against id
. Here you clearly see the difference between
DETERMINISTIC
and NOT DETERMINISTIC
:
When used in a where_condition, a DETERMINISTIC
function is
called once (maybe twice) while a NOT DETERMINISTIC
function is
checked for every row.
Another clear example would be:
SELECT COUNT(*) FROM test_seq WHERE RAND() > 0.5;
That could return
any one value {0, 1, 2, 3, 4, 5, 6}. But when we wrap RAND()
in a
DETERMINISTIC
-labelled function, the result can only be in {0, 6}.
Makes sense? Yes.
DETERMINISTIC and READS SQL DATA
On to the confusing bits: the internet does not agree on whether
DETERMINISTIC
and READS SQL DATA
can be combined. But as you might
realize at this point, this can be quite a useful combination:
First your custom FUNCTION does its magic and looks up a indexed value. Then you look up the record based on that indexes value, in a potentially huge table. You don’t want the function to be called for every record.
mysql> EXPLAIN SELECT * FROM test_seq WHERE id = f_modifies_sql_data();
+----+----------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+----------+-------+---------------+---------+---------+-------+------+----------+-------+
| 1 | test_seq | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
+----+----------+-------+---------------+---------+---------+-------+------+----------+-------+
mysql> EXPLAIN SELECT * FROM test_seq WHERE id = f_modifies_sql_data_nondet();
+----+----------+------+---------------+------+---------+------+------+----------+-------------+
| id | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+----------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | test_seq | ALL | NULL | NULL | NULL | NULL | 6 | 16.67 | Using where |
+----+----------+------+---------------+------+---------+------+------+----------+-------------+
The difference between instant lookup and an uber-slow where lookup over all your records.
Caching is not an issue
Lastly, one could imagine that MySQL used the DETERMINISTIC
characteristic to cache the result (ignoring the READS SQL DATA
). This
does not appear to be the case. (Again, tested with MySQL 5.7.16,
YMMV! Although I would expect them not to change semantics here
lightly.)
If anything, your function is called too often. It’s even called for every resultant row if used in the select_expression:
mysql> DELETE FROM test_abc;
mysql> SELECT id, f_modifies_sql_data() FROM test_seq;
...
mysql> SELECT COUNT(*) FROM test_abc\G
*************************** 1. row ***************************
COUNT(*): 6
Moral of the story: yes, your function can safely be labelled
DETERMINISTIC
even if it READS SQL DATA
.
The only things you should worry about are other non-derministic
functions (RAND, UUID, …), limits (LIMIT) and an out-of-sync database.
But if your database is out of sync, you have more pressing issues to
worry about. And MIXED/ROW based replication appears to handle all of
that properly anyway. As we’ve seen, it safely replicates
MODIFIES SQL DATA
functions in all cases.
This matches with the statements made on Binary Logging of Stored Programs: In general, the issues described here result when binary logging occurs at the SQL statement level. If you use row-based binary logging, the log contains changes made to individual rows as a result of executing SQL statements.
That also means that the warnings/errors related to the
log_bin_trust_function_creators
can be safely ignored when you use
anything other than STATEMENT based binary logging.