Why am I seeing "COLLATION 'xxx' is not valid for CHARACTER SET 'yyy'"
Why have a COLLATE clause when comparing a BIGINTs compared to literal numbers? Remove the COLLATE clause in the UPDATE statement. -- This is the main solution, as per OP's comments.
Is the code inside a Stored Routine that was build with latin1? Do SHOW CREATE PROCEDURE (or FUNCTION) to see if that were the case. If so, then DROP and reCREATE it with utf8mb4 in force.
It is risky to change character_set_filesystem and character_set_server. Change them back.
Mason G. Zhwiti
Mason G. Zhwiti is a pseudonym to protect my employer.
Updated on June 07, 2022Comments
-
Mason G. Zhwiti almost 2 years
I am on MySQL 5.6.22 (InnoDB) on Amazon RDS. I have attempted to set all of my tables, columns, connection and database charset and collation settings to utf8mb4 / utf8mb4_unicode_ci. I can find no evidence anywhere that anything has charset
latin1
, yet when I execute the following code (either via node-mysql, or directly in "Sequel Pro" app on my Mac):update MyTable m set m.Column8 = 1 where m.Column3 = 26 and m.Column4 = 76 collate utf8mb4_unicode_ci
I get this error message:
COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'
I cannot find anything set to
latin1
in my configuration.Output of
show variables like "char%"
:character_set_client utf8mb4 character_set_connection utf8mb4 character_set_database utf8mb4 character_set_filesystem utf8mb4 character_set_results utf8mb4 character_set_server utf8mb4 character_set_system utf8 character_sets_dir /rdsdbbin/mysql-5.6.22.R1/share/charsets/
Output of
show variables like "collation%"
:collation_connection utf8mb4_unicode_ci collation_database utf8mb4_unicode_ci collation_server utf8mb4_unicode_ci
MyTable's CREATE TABLE info is:
CREATE TABLE `MyTable` ( `Column1` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `Column2` varchar(12) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `Column3` bigint(20) unsigned NOT NULL, `Column4` bigint(20) unsigned NOT NULL, `Column5` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6), `Column6` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6), `Column7` varchar(112) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '', `Column8` tinyint(1) unsigned NOT NULL, `Column9` decimal(16,14) DEFAULT NULL, `Column10` decimal(17,14) DEFAULT NULL, `Column11` bigint(20) unsigned DEFAULT NULL, `Column12` bigint(20) unsigned DEFAULT NULL, `Column13` timestamp(6) NULL DEFAULT NULL, `Column14` timestamp(6) NULL DEFAULT NULL, `Column15` tinyint(4) NOT NULL DEFAULT '1', `Column16` tinyint(4) NOT NULL DEFAULT '1', `Column17` varchar(15) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `Column18` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `Column19` bigint(20) unsigned DEFAULT NULL, PRIMARY KEY (`Column1`), KEY `IX_Reevues_Column3` (`Column3`), KEY `IX_Reevues_Column4` (`Column4`), KEY `IX_Reevues_Column6` (`Column6`), KEY `IX_Reevues_Column8` (`Column8`), KEY `IX_Reevues_Column2` (`Column2`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
-
Mason G. Zhwiti about 9 yearsThis SQL statement was the simplest version of my real statement that I could show that still triggered the error. The real statement normally does contain a comparison of two string values, so that's why the COLLATE statement was originally added. Are you suggesting that if no string values are being compared, the COLLATE statement should be removed, or it will generate this type of error? As for #2, I don't see any stored routines with latin1, but I did find some with utf8 instead of utf8mb4, so thank you for that tip. Can you elaborate on point #3?
-
Rick James about 9 yearsYou simplified it too much. Numbers don't need collation. Were the string values in columns or literals?
-
Rick James about 9 years#3 -- I don't know what impact it has. I have essentially never heard of needing to change them, nor seen anyone do them.
-
Mason G. Zhwiti about 9 yearsYou may be onto something here. In going through the history on this call, it looks like it was a copy/paste from a different call that was comparing a string value literal with a string column. COLLATE was there at the time for that, but then later the string comparison was removed, and the COLLATE remained. Do you think if no string columns were involved in the comparison, this is the type of error you'd end up seeing when using COLLATE?
-
Rick James about 9 yearsIt's still a mystery as to where "latin1" came from. Oh... Where is the UPDATE coming from? mysql commandline tool? phpmyadmin? PHP code? etc?
-
Rick James about 9 yearsOops, I see now: " node-mysql, or directly in "Sequel Pro" app " -- check the settings in them.
-
Mason G. Zhwiti about 9 yearsYeah that's what I can't figure out, I've checked all these settings, overriding defaults when necessary, and can't find any references to latin1 anywhere. However, removing COLLATE from this call at least causes the warning to go away, and I'm not expecting any ill effects at this point since I see now this SQL call no longer compares strings.
-
Rick James about 9 yearsHmmm... my character_set_server = 'latin1', but yours is not. You might change that back -- to see what happens.
-
eggyal about 9 years@MasonG.Zhwiti: As to where
latin1
came from, I have filed a bug report. -
Mason G. Zhwiti about 9 years@eggyal Thank you, so this confirms it: my issue is that I accidentally used COLLATE in a situation where there are no string comparisons, and the numbers being compared were being convert to strings on the fly, in the 'latin1' set, likely a bug in MySQL which you filed. If you want to post that as a separate answer, I'd like to split the bounty between you and Rick James. Thanks guys!
-
Rick James about 9 yearsAha! mysql>
SELECT 1 > 2 COLLATE utf8mb4_unicode_ci;
yieldsERROR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'
. -
Rick James about 9 yearsAre you saying that the over-simplification of the question led to a red herring??
-
eggyal about 9 years@MasonG.Zhwiti: Yes, that's correct. You can't split bounties—award away to Rick. :)
-
Mason G. Zhwiti about 9 years@RickJames Oh, no I was really seeing this in the field, too. ;-) But I was wrong in thinking I was still doing a string comparison in my real query. So my example code was right to highlight an incorrect use of COLLATE. ;) Going to award bounty now. Can you highlight in your answer that this specific issue was in fact the answer, and maybe link to eggyal's bug report? Thanks guys!
-
Mason G. Zhwiti about 9 yearsNeed to wait 4 hours to award.
-
Janus Bahs Jacquet over 3 yearsSo this is five years old, but apparently still a bug. I’m currently being hit by this bug in a similar case where collation + numbers is sort of inevitable. I’m trying to find exact duplicates in a table where there are lots of heavily accented characters. The table is utf8mb4, so I need to collate by utf8mb4_bin to avoid, say, *dʰéh₁(i)-lio- and *dʰéh₁(i̯)-lio- being considered the same (they should be considered different). So I do
SELECT word FROM table GROUP BY word HAVING COUNT(word) > 1
– and get hit with this, because the COUNT() value is an int being compared with collation. :-/ -
Rick James over 3 years@JanusBahsJacquet - Be aware that
i̯
is actually 2 characters, the letteri
and a non-spacing COMBINING INVERTED BREVE BELOW.'i' = 'i̯' COLLATE utf8mb4_unicode_520_ci
is TRUE for most collations, but not _general_ci and _bin. -
Janus Bahs Jacquet over 3 years@RickJames That was just a random example – there are many more where both forms are single characters. I asked a question about it and someone suggested
GROUP BY HEX(word)
as a workaround – that way, no explicit collation is needed. -
Rick James over 3 years@JanusBahsJacquet - Are you using 5.6 like the OP? If not, perhaps it is worth starting a new Question. Note also that
COUNT(word)
counts the word as long asword IS NOT NULL
. Perhaps you wantCOUNT(DISTINCT word)
, which counts the number of distinct values ofword
. (Then we get into dissecting howDISTINCT
works withCOLLATIONs
.) -
Janus Bahs Jacquet over 3 years@RickJames I already did ask a new question, that’s where I got that comment. :-) Yes, unfortunately on 5.6 and unable to upgrade; seems the bug was fixed in 5.7, according to the MySQL bug log.
word
is neverNULL
, so within the grouper result set, it’s fine that all non-null instances are counted. -
Rick James over 3 years@JanusBahsJacquet - Could you provide a link to that other Question.
-
Janus Bahs Jacquet over 3 yearsHere it is: stackoverflow.com/questions/64231720 (copy-pasting is a pain on the iOS app – doesn’t work half the time, freezes the app the other half…).