▼
I read Ben Cartwright Cox' (extensive) blog post Grave flaws in BGP Error handling and then saw his talk about the same topic at NLNOG on Youtube.
The necessary background
In addition to the "well-known" BGP path attributes that we all know (because the RFC says we must) and love (because they make the internet work), it's also possible define new attributes to provide new functionality. These can be "transitive" attributes, which means that a BGP router that doesn't recognize them propagates them to its BGP neighbors unchanged.
The ability to create new optional transitive attributes has allowed us to run BGP version 4 for three decades without having to bump the version number because we had to make backward-incompatible changes that would make adoption all but impossible.
For instance, 32-bit autonomous system numbers were added as the 16-bit BGP AS numbers started to run out. In addition to the well-known (mandatory) 16-bit AS path, an optional 32-bit AS path was added. If a router in the middle didn't understand the 32-bit AS path, it would update the 16-bit AS path and propagate the 32-bit AS path unchanged.
The next 32-bit capable BGP router can then add back the AS numbers from the 16-bit path that are missing from the 32-bit path, and 32-bit AS numbers work even if routers in the middle don't understand them. (They just see "23456".)
Of course you can read all about BGP attributes in my book Internet Routing with BGP. It's even in the sample chapters! (Page 15.)
The error handling issue
In Ben's blog post, he talks about a Brazilian network included a malformed version of a still experimental attribute. All the big routers in the core of the internet don't run experiments, so they just saw an attribute they didn't recognize, and propagated it as per the transitive setting. Eventually BGP updates with the broken attribute arrived at routers that did understand the attribute, but saw that it was broken.
So as per the original BGP spec, they tore down the BGP session towards the router that sent them the broken attribute. And then, after a short delay, tried to set up a new BGP session towards that neighboring router. Only to encounter the same error again and tearing down the BGP session again. And so on.
Which is probably not wat you want. Which is nicely explained in RFC 7606, published in 2015, which suggests to treat such errors as if the neighboring router had asked to withdraw the route containing the offending path attribute. So if a neighbor tells me prefixes 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 are reachable through them, and 172.16.0.0/12 has a broken attribute, I just act as if my neighbor had told me that 172.16.0.0/12 is not reachable through them. But I don't bring down the BGP session so 10.0.0.0/8 and 192.168.0.0/16 remain reachable through the neighbor in question.
Ben seems to be rather annoyed that many router vendors don't implement the RFC 7606 behavior, implement it but don't enable it by default, and/or don't have a bug bounty program to reward security researchers for pointing out these deficiencies. He spent a good amount of time evaluating different implementations and then "fuzzing" attributes to see what would happen, So that's somewhat understandable. Here is his score card from his presentation slides:
My take
I agree that the RFC 7606 handling by default is what you want. I also agree that changing a default here, something router vendors loathe to do, shouldn't be problematic.
However, these are pretty obscure errors. This is not an internet extinction level issue.
For my own network, I would strongly prefer a mechanism to turn off handling of these often rather frivolous new attributes. Both to avoid being bitten by buggy implementations elsewhere, but also to avoid inflating BGP messages. As BGP updates propagate, the AS paths (the 16- and 32-bit versions) increase in length, so an update that was just under the limit at some point will exceed the maximum size of 4096 bytes at some point, and then definitely bad things will happen.
However, it's important that new transitive attributes aren't filtered out wholesale, as that would make it impossible to add new features to BGP. I'm not sure if there is a workable way to put a stop to frivolous BGP path attributes being injected into the global routing system while at the same time not robbing BGP of its forward compatibility with future new innovations.
Permalink - posted 2023-10-02
▼
Eind augustus liet Pieter Omtzigt weten inderdaad mee te gaan doen aan de verkiezingen van 22 november met een nieuwe partij: Nieuw Sociaal Contract. Dat zal schrikken geweest zijn voor de andere partijen. Maar veel bleef en blijft onduidelijk: een lijst komt pas over een week en een verkiezingsprogramma zelfs pas eind oktober. Ook heeft hij nog geen uitsluitsel gegeven of hij wel in alle kieskringen mee gaat doen.
Maar vorige week zondag in het programma Buitenhof zei hij dat hij zou willen meeregeren. In dat geval ligt het voor de hand dat je zoveel mogelijk zetels probeert te halen, zodat meer coalities met jou mogelijk worden en minder coalities zonder jou. Ik verwacht dus dat hij gewoon met een lijst met 40 of 50 kandidaten komt.
Een politiek duider suggereerde dat Omtzigt met een toch vrij conservatief programma gaat komen. Bij de algemene politieke beschouwingen gaf hij aan dat hij de arbeids- en studiemigratie stevig wil beperken, met een totaal van 50.000 netto immigranten per jaar. Daarnaast maakt Omtzigt een groot punt van bestaanszekerheid, en hij staat voor kritiek op (de houding van) het kabinet. Dat kan hem heel goed populair maken bij een groep kiezers die relatief slecht bediend wordt: sociaal-conservatieve en economisch-linkse kiezers.
Ondertussen lijkt Caroline van der Plas niet echt lekker in de wedstrijd te zitten. Ze beschuldigt de pers ervan haar nummer 2, Mona Keijzer, ten onrechte "premierskandidaat" te noemen, terwijl dat woord gewoon in de uitnodiging stond. Ze haalt ook nog twee ex-FvD'ers en een ex-PVV'er binnen, en lijkt dus duidelijk aan de rechterkant van het politieke spectrum geland te zijn. En ze is regelmatig kribbig en kortaf in de media. Nog interessanter: bij de algemene politieke beschouwingen nam ze het op voor arbeidsmigratie, we hebben die mensen toch nodig om het land (lees: de kassen en de slachthuizen) te laten draaien. Met dit standpunt kan ze zomaar een hoop kiezers richting Omtzigt jagen.
Ondertussen slaat Dilan Yeşilgöz de gebruikelijke rechtse VVD-taal uit maar zegt ook dat er in het verleden fouten gemaakt zijn, zonder echter duidelijk over die fouten en helemaal over de oplossingen te zijn. Het partijprogramma van de VVD is erg vaag, het wordt wachten op de doorrekening door het CPB om te zien wat de pijnpunten zijn. Ondertussen heeft ze de deur naar regeren met de PVV open gezet.
Als er straks een meerderheid mogelijk is voor VVD, PVV en BBB (die ook Wilders niet uitsluiten) dan kan dit een briljante zet blijken te zijn omdat ze dan links en rechts tegen elkaar uit kan spelen. Maar als zo'n rechtse coalitie niet mogelijk is, dan heeft ze wel de nadelen en niet de voordelen. De nadelen bestaan in elk geval niet uit een opstand onder VVD-leden. Die zitten er niet echt mee. Maar de stemmers? Er zullen vast wat VVD-stemmers zijn die niks zien in een coalitie met Wilders. (Zeker als ze oud genoeg zijn om zich ditzelfde experiment ruim tien jaar geleden te herinneren, waar Wilders wegliep toen er moeilijke beslissingen genomen moesten worden.) En de mensen die juist enorm vóór regeringsdeelname van de PVV zijn zullen misschien strategisch op de PVV in plaats van op de VVD stemmen.
Ondertussen wist Geert Wilders Thierry Baudet te laten zeggen dat Baudet erg veel ziet in samenzweringstheoriën over de aanslagen op 11 september 2001 en de maanlandingen. Wat Wilders vervolgens niet in dank afgenomen werd door een deel van zijn aanhangers, die nu misschien zullen vertrekken. En er zullen ook vast sowieso wat PVV-kiezers naar BBB en/of NSC gaan.
Ondertussen valt de SP door de mand als een partij die alleen maar cadeautjes uitdeelt, zelfs als die grotendeels bij de rijken terechtkomen: zij steunen een plan van de VVD om het aflopen van de huidige accijnsverlaging op diesel en benzine een jaar uit te stellen. De economen vegen de vloer aan met deze maatregel: ongericht, schadelijk voor het klimaat. Maar ja, minstens één van de V's in VVD staat voor vrrroem! Bizar dat de SP niet ziet dat je met datzelfde geld veel meer kan betekenen voor de lage en midden-inkomens.
En dan de Partij voor de Dieren. Toch vrij pijnlijk als je bestuur je fractievoorzitter tot iedereens verassing niet de lijsttrekker wil maken, minder dan drie maanden voor de verkiezingen. Vervolgens vertrekt dat bestuur, mede dankzij druk van de advoca(a)t(en) van voornoemde fractievoorzitter Esther Ouwehand. Maar de integriteitsmeldingen waar het allemaal mee begon blijven liggen. Ouwehand lijkt bezig geweest te zijn een bestuurslid op nogal onfrisse wijze te wippen, en er waren ook aanzienlijke aanmerkingen op haar leidinggeven. Verassend ook dat Ouwehand de problemen volledig buiten haarzelf legt. De gestaalde dierenliefhebbers zullen zich hierdoor niet laten afleiden, maar nieuwe aanhang trekken met interne ruzie wordt lastig.
Bij de ABP werden twee moties ingediend om het minimumloon (en daarmee AOW en uitkeringen) te verhogen met 1 - 2 %. Groenlinks/PvdA onderbouwden dit financieel met onder meer een aantal belastingverhogingen. Caroline van der Plas diende een erg vage motie van vergelijkbare strekking om het minimumloog "enigszins" te verhogen en dit te betalen "uit de uitgavenkant". Mede-ondertekend door Pieter Omtzigt.
De rest van de kamer was met stomheid geslagen, en kon alleen maar lachen bij de toelichting van Van der Plas. Wat hier gebeurd lijkt te zijn is dat BBB en NSC de belastingverhogingen die dit alles moesten dekken in de motie van GL/PvdA en anderen (die aangenomen werd) niet steunden maar wilden laten zien dat ze wel voor verhoging van het minimumloon waren. Van der Plas kreeg de beschuldiging gratis bier te verkopen. Omtzigt kreeg geen verbaasd gelach over zich heen, maar zijn reputatie van degelijkheid liep wel een flinke deuk op.
Conclusies
Ok, dat was een hoop tekst. Wat concludeer ik uit deze ontwikkelingen?
In de peilingen zien we drie groepen partijen: VVD, GL/PvdA en NSC die consequent minstens 20 zetels peilen, PVV en BBB die minder dan 20 maar meer dan 10 scoren, en alle anderen die onder de 10 blijven. Gezien bovenstaand verwacht ik daarin geen veranderingen.
De grote vraag wordt wie het beter gaat doen dan de huidige peilingen en wie slechter. Slechter is bij allen mogelijk. Ik verwacht dat VVD en GL/PvdA niet veel lager dan 25 zetels uit zullen komen, die hebben een duidelijke kiezersgroep die weinig reden heeft om te vertrekken, zelfs als alles niet perfect is.
Vooralsnog wordt het optreden van Yeşilgös positief beoordeeld, maar ik ben benieuwd hoe ze overeind blijft bij tegenwind en in de debatten. Frans Timmermans heeft al wat tegenwind maar heeft laten zien verkiezingen te kunnen winnen en is tot nu toe redelijk onzichtbaar.
Mijn inschatting is dat Pieter Omtzigt aan zijn maximum potentieel zit met de huidige bijna 30 zetels in de peilingen, en eigenlijk alleen maar teleur kan stellen. Misschien nauwelijks, misschien aanzienlijk.
Als inderdaad VVD, GL/PvdA en NSC ergens tussen de 25 en de 30 zetels scoren en PVV en BBB zo'n 15, dan zal veel, zo niet alles, afhangen van de voorkeuren van Pieter Omtzigt. Getalsmatig is dan VVD+NSC+GL/PvdA het makkelijkst. Wellicht dat Frans Timmermans dit ziet zitten als hij de grootste wordt en dus premier kan worden. Maar juniorpartner in een coalitie met twee rechtsere partijen? Dat lijkt me een stuk moeilijker.
Als dat het niet wordt betekent het dat Omtzigt ofwel over rechts ofwel over links moet. In beide gevallen waarschijnlijk met een minderheidskabinet. Over rechts zou VVD+NSC+BBB worden met zo'n 70 zetels. Tel daar de SGP bij op en je hebt zo goed als een meerderheid en je zal toch wel iets naar andere partijen moeten luisteren om dingen door de eerste kamer te krijgen.
De andere optie is GL/PvdA+NSC. Wellicht dat D66, ChristenUnie, Volt, Partij voor de Dieren en/of SP hier aan mee zouden willen doen. Maar zelfs als ze niet meedoen ligt het voor de hand dat ze vaak mee zullen stemmen.
Het kan nog bijna alle kanten op. Alleen eigenlijk geen kanten met Geert Wilders erin.
Permalink - posted 2023-09-24
▼
20 years ago today, the TV show Mythbusters aired its first regular episode on the Discovery Channel, after three earlier pilot episodes. And soon after, I had found a new favorite TV show.
What Mythbusters did between 2003 and 2016 was investigate whether (urban) myths, viral videos, movie stunts and idioms had some truth to them or not. The usual outcomes were: confirmed (results replicated without issue), plausible (results replicated but possibly some liberties taken) and busted (results could not be reasonably duplicated). If a myth was busted, they would often see what it took to replicate the results, going to extremes as needed.
I think the first episode I saw was "penny drop". That was the seventh episode, with the main myth being that a penny (one cent coin) dropped from the Empire State Building would kill you if it hit you down on the street. The big question here was how fast a penny would be traveling after being dropped from the iconic skyscraper.
At this point, we only had the two main hosts, Jamie Hyneman and Adam Savage. Adam had built a rig with a tube with holes in it and air being blown in from the bottom with the airspeed in the tube going down with increasing height from 100 km/h to about 60 km/h. As the pennies stayed in the tube, their terminal velocity must match a speed between those two numbers.
With the always exuberant Adam annoyed by the usually stoic Jamie:
Jamie:
Works for me.
Adam:
This is excited Jamie? This is a world first going out on television that we showed exactly how fast a penny goes. No math, no ideas, no timing off a building, we got it right here in the lab.
Jamie:
It's great. Good job.
Turns out those falling pennies aren't lethal. Adam and Jamie both have one fired at their hand with no damage, and...
Adam:
Come on, shoot me in the ass, come on. I can take it.
The interaction between the hosts was what made Mythbusters so funny.
That dynamic wouldn't change very much over the following almost decade and a half. For most of the run of the show, the "build team" consisting of Grant Imahara, Kari Byron and Tony Belleci would usually bust separate myths, but sometimes the main and secondary hosts would team up.
They really caught lighning in a bottle here. The interactions between the hosts were almost always fun, and as experienced builders with some notion of the scientific method, the Mythbusters weren't completely lost when it came to attacking myths, but they were also not so knowledgeable that they could predict the results very well in advance.
A few more quotes:
Adam:
I reject your reality and substitute my own.
Jamie:
I love it when common materials are dangerous.
Narrator Robert Lee:
This thing that looks like a maximum security bird house, is actually the vital fulcrum.
Adam, about to go through a carwash with Jamie on the roof of his car:
As long as we're testing methods for removing unwanted pests from the outside of your car, we couldn't finish this without going through the carwash.
Tory:
It's not the prettiest thing but I think it's going to work.
Kari: That’s what they said when they hired me.
Great stuff.
Permalink - posted 2023-09-23
▼
After looking at the SQLite Unicode behavior, it's now time to do the same for MySQL. Coincidentally, I'm currently migrating some old databases that were created in the very early 2000s to a more modern environment. I think those old databases were from the MySQL 3.x days, before MySQL gained any sort of Unicode support. Those old tables are thus still in the latin1 (ISO 8859-1) character set.
For a long time that worked fine for me as these tables contain data in English and Dutch, and Dutch only needs a few accents, which are all present in Latin 1. However... at some point it started becoming more convenient to use UTF-8 for the text in those databases. So I did. I just never told MySQL about the switch.
In hindsight, that was not a terrible decision, as it avoided complexities at a time when UTF-8 support was still immature. But these days, there's really no excuse to do anything other than make an entire workflow UTF-8 clean, barring any showstoppers.
So I migrated an old table to the test environment for the new system. And got some really weird results: on web pages "CO₂" showed up as "COâ‚‚", but in MySQL it showed up correct, be it that the number of characters is off (I only got 6 while I specified 8):
mysql> select substr(article, 1587, 8) from muart where id = 753;
+--------------------------+
| substr(article, 1587, 8) |
+--------------------------+
| CO₂ ui |
+--------------------------+
Further digging by looking at the raw data:
mysql> select hex(substr(article, 1587, 8)) from muart where id = 753;
+-------------------------------+
| hex(substr(article, 1587, 8)) |
+-------------------------------+
| 434FC3A2E2809AE2809A207569 |
+-------------------------------+
Fortunately, it's not necessary to decode this manually, there is an UTF-8 Decoder web tool for that. The decoder shows that COâ‚‚ is correct. So why is MySQL showing me CO₂? That's a problem I hadn't heard of before, but is apparently not uncommon:
Double-encoded UTF-8.
This happens when you take UTF-8, such as "CO₂", and then pretend it's another encoding (usually Latin 1) and convert that to UTF-8. So what had happened is that as I was importing my data in MySQL, the MySQL command line client would send UTF-8 to the MySQL server process, but the server would think that was Latin 1 and convert it to UTF-8. Then when I did a query, the server would convert the UTF-8 back to what it thought was Latin 1 for the convenience of the client, and the client then showed this as UTF-8 so everything looked good, but was actually stored in the database incorrectly.
Fortunately, the two related problems were both easy enough to fix. First, make sure the client and server agree on the character encoding. Let's first check what the server's original settings are:
mysql> SHOW SESSION VARIABLES LIKE 'character_set%';
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/share/mysql-8.0/charsets/ |
+--------------------------+--------------------------------+
Wow. But fortunately we don't have to fix all of those individually. We can simply do:
mysql --default_character_set=utf8mb4
And then:
mysql> SHOW SESSION VARIABLES LIKE 'character_set%';
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/share/mysql-8.0/charsets/ |
+--------------------------+--------------------------------+
Much better.
And convert the double-encoded UTF-8 into something more readable:
mysql> UPDATE muart SET title=CONVERT(CAST(CONVERT(title USING latin1) AS binary) USING utf8mb4);
Keen-eyed observers may have noted that MySQL has two versions of UTF-8: utf8mb3 and utf8mb4. This goes back to the early days of Unicode, where the idea was that all characters would fit into 16 bits. That results in a maximum of 3 bytes of UTF-8. But it soon became clear that 16 bits wasn't enough. So now it's 21 bits. UTF-8 can handle that just fine, but those characters that need 17 - 21 bits result in 4-byte UTF-8 sequences. So when dealing with MySQL, when you think "UTF-8", type "utf8mb4".
Permalink - posted 2023-09-21
https://en.wikipedia.org/wiki/European_ordering_rules
But as the transition from 8-bit characters to 16-bit characters was already well underway in the Windows world and in languages such as Java and Javascript,
# mysql --default_character_set=utf8mb4 muada
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 98
Server version: 8.0.34 MySQL Community Server - GPL
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> SHOW SESSION VARIABLES LIKE 'character_set%';
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/share/mysql-8.0/charsets/ |
+--------------------------+--------------------------------+
DUCET
https://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table
https://www.unicode.org/reports/tr10/
http://www.unicode.org/Public/UCA/latest/allkeys.txt
https://en.wikipedia.org/wiki/European_ordering_rules
https://en.wikipedia.org/wiki/ISO/IEC_14651
mysql> insert into test (id, text) values (5, concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)));
Query OK, 1 row affected (0.00 sec)
mysql> insert into test (id, text) values (6, concat('Zoe', char(12*16 + 12, 8*16 + 8 using utf8mb4)));
mysql> insert into test (id, text) values (7, 'Zoe');
mysql> select * from test order by text collate utf8mb4_bin;
+----+-------+
| id | text |
+----+-------+
| 1 | ZOE |
| 4 | ZoA |
| 7 | Zoe |
| 6 | Zoë |
| 3 | Zoy |
| 5 | Zoë |
| 2 | zoe |
+----+-------+
7 rows in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_bin) as sortweight from test order by
text colla
+----+-------+----------------------------+
| id | text | sortweight |
+----+-------+----------------------------+
| 1 | ZOE | 0x00005A00004F000045 |
| 4 | ZoA | 0x00005A00006F000041 |
| 7 | Zoe | 0x00005A00006F000065 |
| 6 | Zoë | 0x00005A00006F000065000308 |
| 3 | Zoy | 0x00005A00006F000079 |
| 5 | Zoë | 0x00005A00006F0000EB |
| 2 | zoe | 0x00007A00006F000065 |
+----+-------+----------------------------+
mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test order by text collate utf8mb4_0900_ai_ci;
+----+-------+------------------------+
| id | text | sortweight |
+----+-------+------------------------+
| 4 | ZoA | 0x1F211DDD1C47 |
| 1 | ZOE | 0x1F211DDD1CAA |
| 2 | zoe | 0x1F211DDD1CAA |
| 5 | Zoë | 0x1F211DDD1CAA |
| 6 | Zoë | 0x1F211DDD1CAA |
| 7 | Zoe | 0x1F211DDD1CAA |
| 3 | Zoy | 0x1F211DDD1F0B |
+----+-------+------------------------+
mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test order by text collate utf8mb4_0900_ai_ci desc;
+----+-------+------------------------+
| id | text | sortweight |
+----+-------+------------------------+
| 3 | Zoy | 0x1F211DDD1F0B |
| 1 | ZOE | 0x1F211DDD1CAA |
| 2 | zoe | 0x1F211DDD1CAA |
| 5 | Zoë | 0x1F211DDD1CAA |
| 6 | Zoë | 0x1F211DDD1CAA |
| 7 | Zoe | 0x1F211DDD1CAA |
| 4 | ZoA | 0x1F211DDD1C47 |
+----+-------+------------------------+
mysql> select *, weight_string(text collate utf8mb4_0900_as_ci) as sortweight from test order by text collate utf8mb4_0900_as_ci;
+----+-------+------------------------------------+
| id | text | sortweight |
+----+-------+------------------------------------+
| 4 | ZoA | 0x1F211DDD1C470000002000200020 |
| 1 | ZOE | 0x1F211DDD1CAA0000002000200020 |
| 2 | zoe | 0x1F211DDD1CAA0000002000200020 |
| 7 | Zoe | 0x1F211DDD1CAA0000002000200020 |
| 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B |
| 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B |
| 3 | Zoy | 0x1F211DDD1F0B0000002000200020 |
+----+-------+------------------------------------+
mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test order by text collate utf8mb4_0900_as_cs;
+----+-------+--------------------------------------------------------+
| id | text | sortweight |
+----+-------+--------------------------------------------------------+
| 4 | ZoA | 0x1F211DDD1C4700000020002000200000000800020008 |
| 2 | zoe | 0x1F211DDD1CAA00000020002000200000000200020002 |
| 7 | Zoe | 0x1F211DDD1CAA00000020002000200000000800020002 |
| 1 | ZOE | 0x1F211DDD1CAA00000020002000200000000800080008 |
| 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 |
| 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 |
| 3 | Zoy | 0x1F211DDD1F0B00000020002000200000000800020002 |
+----+-------+--------------------------------------------------------+
ê
mysql> select *, weight_string(text collate utf8mb4_0900_bin) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_bin;
Empty set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_as_cs;
Empty set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_ai_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_cs;
ERROR 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_cs'
mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_ci;
+----+-------+------------------------+
| id | text | sortweight |
+----+-------+------------------------+
| 1 | ZOE | 0x1F211DDD1CAA |
| 2 | zoe | 0x1F211DDD1CAA |
| 5 | Zoë | 0x1F211DDD1CAA |
| 6 | Zoë | 0x1F211DDD1CAA |
| 7 | Zoe | 0x1F211DDD1CAA |
+----+-------+------------------------+
5 rows in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_ci order by text utfmb4_0900_as_cs;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'utfmb4_0900_as_cs' at line 1
mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_ci order by text collate utfmb4_0900_as_cs;
ERROR 1273 (HY000): Unknown collation: 'utfmb4_0900_as_cs'
mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_ci order by text collate utf8mb4_0900_as_cs;
+----+-------+------------------------+
| id | text | sortweight |
+----+-------+------------------------+
| 2 | zoe | 0x1F211DDD1CAA |
| 7 | Zoe | 0x1F211DDD1CAA |
| 1 | ZOE | 0x1F211DDD1CAA |
| 5 | Zoë | 0x1F211DDD1CAA |
| 6 | Zoë | 0x1F211DDD1CAA |
+----+-------+------------------------+
5 rows in set (0.00 sec)
mysql>
mysql>
mysql>
mysql>
mysql> select *, weight_string(text collate utf8mb4_0900_bin) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_bin;
+----+------+------------------------+
| id | text | sortweight |
+----+------+------------------------+
| 5 | Zoë | 0x5A6FC3AB |
+----+------+------------------------+
1 row in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_as_cs;
+----+-------+--------------------------------------------------------+
| id | text | sortweight |
+----+-------+--------------------------------------------------------+
| 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 |
| 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 |
+----+-------+--------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_ai_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_ai_cs;
ERROR 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_cs'
mysql>
mysql>
mysql>
mysql>
mysql> select *, weight_string(text collate utf8mb4_0900_bin) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_bin;
+----+------+------------------------+
| id | text | sortweight |
+----+------+------------------------+
| 5 | Zoë | 0x5A6FC3AB |
+----+------+------------------------+
1 row in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_as_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_as_ci;
+----+-------+------------------------------------+
| id | text | sortweight |
+----+-------+------------------------------------+
| 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B |
| 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B |
+----+-------+------------------------------------+
2 rows in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_as_cs;
+----+-------+--------------------------------------------------------+
| id | text | sortweight |
+----+-------+--------------------------------------------------------+
| 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 |
| 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 |
+----+-------+--------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_bin) as sortweight from test where text = 'zoe' collate utf8mb4_0900_bin;
+----+------+------------------------+
| id | text | sortweight |
+----+------+------------------------+
| 2 | zoe | 0x7A6F65 |
+----+------+------------------------+
1 row in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test w
here text = 'zoe' collate utf8mb4_0900_as_cs;
+----+------+------------------------------------------------+
| id | text | sortweight |
+----+------+------------------------------------------------+
| 2 | zoe | 0x1F211DDD1CAA00000020002000200000000200020002 |
+----+------+------------------------------------------------+
1 row in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_as_ci) as sortweight from test where text = 'zoe' collate utf8mb4_0900_as_ci;
+----+------+--------------------------------+
| id | text | sortweight |
+----+------+--------------------------------+
| 1 | ZOE | 0x1F211DDD1CAA0000002000200020 |
| 2 | zoe | 0x1F211DDD1CAA0000002000200020 |
| 7 | Zoe | 0x1F211DDD1CAA0000002000200020 |
+----+------+--------------------------------+
3 rows in set (0.00 sec)
mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = 'zoe' collate utf8mb4_0900_ai_ci;
+----+-------+------------------------+
| id | text | sortweight |
+----+-------+------------------------+
| 1 | ZOE | 0x1F211DDD1CAA |
| 2 | zoe | 0x1F211DDD1CAA |
| 5 | Zoë | 0x1F211DDD1CAA |
| 6 | Zoë | 0x1F211DDD1CAA |
| 7 | Zoe | 0x1F211DDD1CAA |
+----+-------+------------------------+
mysql> show collation like '%utf8mb4_uni%';
+------------------------+---------+-----+---------+----------+---------+---------------+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+------------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | PAD SPACE |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE |
+------------------------+---------+-----+---------+----------+---------+---------------+
mysql> show collation like '%utf8mb4_0900%';
+--------------------+---------+-----+---------+----------+---------+---------------+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+--------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD |
| utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD |
| utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD |
| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD |
+--------------------+---------+-----+---------+----------+---------+---------------+
Permalink - posted 2023-09-16