Why am I seeing "COLLATION 'xxx' is not valid for CHARACTER SET 'yyy'"

mysql sql collation

14,595

Why have a COLLATE clause when comparing a BIGINTs compared to literal numbers? Remove the COLLATE clause in the UPDATE statement. -- This is the main solution, as per OP's comments.
Is the code inside a Stored Routine that was build with latin1? Do SHOW CREATE PROCEDURE (or FUNCTION) to see if that were the case. If so, then DROP and reCREATE it with utf8mb4 in force.
It is risky to change character_set_filesystem and character_set_server. Change them back.

14,595

Author by

Mason G. Zhwiti

Mason G. Zhwiti is a pseudonym to protect my employer.

Updated on June 07, 2022

Comments

Mason G. Zhwiti almost 2 years

I am on MySQL 5.6.22 (InnoDB) on Amazon RDS. I have attempted to set all of my tables, columns, connection and database charset and collation settings to utf8mb4 / utf8mb4_unicode_ci. I can find no evidence anywhere that anything has charset latin1, yet when I execute the following code (either via node-mysql, or directly in "Sequel Pro" app on my Mac):

update MyTable m 
set m.Column8 = 1
where m.Column3 = 26 and m.Column4 = 76
collate utf8mb4_unicode_ci

I get this error message:

COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'

I cannot find anything set to latin1 in my configuration.

Output of show variables like "char%":

character_set_client        utf8mb4
character_set_connection    utf8mb4
character_set_database      utf8mb4
character_set_filesystem    utf8mb4
character_set_results       utf8mb4
character_set_server        utf8mb4
character_set_system        utf8
character_sets_dir          /rdsdbbin/mysql-5.6.22.R1/share/charsets/

Output of show variables like "collation%":

collation_connection        utf8mb4_unicode_ci
collation_database          utf8mb4_unicode_ci
collation_server            utf8mb4_unicode_ci

MyTable's CREATE TABLE info is:

CREATE TABLE `MyTable` (
  `Column1` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `Column2` varchar(12) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `Column3` bigint(20) unsigned NOT NULL,
  `Column4` bigint(20) unsigned NOT NULL,
  `Column5` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
  `Column6` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
  `Column7` varchar(112) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `Column8` tinyint(1) unsigned NOT NULL,
  `Column9` decimal(16,14) DEFAULT NULL,
  `Column10` decimal(17,14) DEFAULT NULL,
  `Column11` bigint(20) unsigned DEFAULT NULL,
  `Column12` bigint(20) unsigned DEFAULT NULL,
  `Column13` timestamp(6) NULL DEFAULT NULL,
  `Column14` timestamp(6) NULL DEFAULT NULL,
  `Column15` tinyint(4) NOT NULL DEFAULT '1',
  `Column16` tinyint(4) NOT NULL DEFAULT '1',
  `Column17` varchar(15) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `Column18` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `Column19` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`Column1`),
  KEY `IX_Reevues_Column3` (`Column3`),
  KEY `IX_Reevues_Column4` (`Column4`),
  KEY `IX_Reevues_Column6` (`Column6`),
  KEY `IX_Reevues_Column8` (`Column8`),
  KEY `IX_Reevues_Column2` (`Column2`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Mason G. Zhwiti about 9 years

This SQL statement was the simplest version of my real statement that I could show that still triggered the error. The real statement normally does contain a comparison of two string values, so that's why the COLLATE statement was originally added. Are you suggesting that if no string values are being compared, the COLLATE statement should be removed, or it will generate this type of error? As for #2, I don't see any stored routines with latin1, but I did find some with utf8 instead of utf8mb4, so thank you for that tip. Can you elaborate on point #3?
Rick James about 9 years

You simplified it too much. Numbers don't need collation. Were the string values in columns or literals?
Rick James about 9 years

#3 -- I don't know what impact it has. I have essentially never heard of needing to change them, nor seen anyone do them.
Mason G. Zhwiti about 9 years

You may be onto something here. In going through the history on this call, it looks like it was a copy/paste from a different call that was comparing a string value literal with a string column. COLLATE was there at the time for that, but then later the string comparison was removed, and the COLLATE remained. Do you think if no string columns were involved in the comparison, this is the type of error you'd end up seeing when using COLLATE?
Rick James about 9 years

It's still a mystery as to where "latin1" came from. Oh... Where is the UPDATE coming from? mysql commandline tool? phpmyadmin? PHP code? etc?
Rick James about 9 years

Oops, I see now: " node-mysql, or directly in "Sequel Pro" app " -- check the settings in them.
Mason G. Zhwiti about 9 years

Yeah that's what I can't figure out, I've checked all these settings, overriding defaults when necessary, and can't find any references to latin1 anywhere. However, removing COLLATE from this call at least causes the warning to go away, and I'm not expecting any ill effects at this point since I see now this SQL call no longer compares strings.
Rick James about 9 years

Hmmm... my character_set_server = 'latin1', but yours is not. You might change that back -- to see what happens.
eggyal about 9 years

@MasonG.Zhwiti: As to where latin1 came from, I have filed a bug report.
Mason G. Zhwiti about 9 years

@eggyal Thank you, so this confirms it: my issue is that I accidentally used COLLATE in a situation where there are no string comparisons, and the numbers being compared were being convert to strings on the fly, in the 'latin1' set, likely a bug in MySQL which you filed. If you want to post that as a separate answer, I'd like to split the bounty between you and Rick James. Thanks guys!
Rick James about 9 years

Aha! mysql> SELECT 1 > 2 COLLATE utf8mb4_unicode_ci; yields ERROR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'.
Rick James about 9 years

Are you saying that the over-simplification of the question led to a red herring??
eggyal about 9 years

@MasonG.Zhwiti: Yes, that's correct. You can't split bounties—award away to Rick. :)
Mason G. Zhwiti about 9 years

@RickJames Oh, no I was really seeing this in the field, too. ;-) But I was wrong in thinking I was still doing a string comparison in my real query. So my example code was right to highlight an incorrect use of COLLATE. ;) Going to award bounty now. Can you highlight in your answer that this specific issue was in fact the answer, and maybe link to eggyal's bug report? Thanks guys!
Mason G. Zhwiti about 9 years

Need to wait 4 hours to award.
Janus Bahs Jacquet over 3 years

So this is five years old, but apparently still a bug. I’m currently being hit by this bug in a similar case where collation + numbers is sort of inevitable. I’m trying to find exact duplicates in a table where there are lots of heavily accented characters. The table is utf8mb4, so I need to collate by utf8mb4_bin to avoid, say, *dʰéh₁(i)-lio- and *dʰéh₁(i̯)-lio- being considered the same (they should be considered different). So I do SELECT word FROM table GROUP BY word HAVING COUNT(word) > 1 – and get hit with this, because the COUNT() value is an int being compared with collation. :-/
Rick James over 3 years

@JanusBahsJacquet - Be aware that i̯ is actually 2 characters, the letter i and a non-spacing COMBINING INVERTED BREVE BELOW. 'i' = 'i̯' COLLATE utf8mb4_unicode_520_ci is TRUE for most collations, but not _general_ci and _bin.
Janus Bahs Jacquet over 3 years

@RickJames That was just a random example – there are many more where both forms are single characters. I asked a question about it and someone suggested GROUP BY HEX(word) as a workaround – that way, no explicit collation is needed.
Rick James over 3 years

@JanusBahsJacquet - Are you using 5.6 like the OP? If not, perhaps it is worth starting a new Question. Note also that COUNT(word) counts the word as long as word IS NOT NULL. Perhaps you want COUNT(DISTINCT word), which counts the number of distinct values of word. (Then we get into dissecting how DISTINCT works with COLLATIONs.)
Janus Bahs Jacquet over 3 years

@RickJames I already did ask a new question, that’s where I got that comment. :-) Yes, unfortunately on 5.6 and unable to upgrade; seems the bug was fixed in 5.7, according to the MySQL bug log. word is never NULL, so within the grouper result set, it’s fine that all non-null instances are counted.
Rick James over 3 years

@JanusBahsJacquet - Could you provide a link to that other Question.
Janus Bahs Jacquet over 3 years

Here it is: stackoverflow.com/questions/64231720 (copy-pasting is a pain on the iOS app – doesn’t work half the time, freezes the app the other half…).