UTF-8 and Textpattern
From time to time questions about “funny characters” pop up in the forums. Very often this is related to characterset issues. I’ve been working on and off at writing an article that explains what exactly Textpattern is doing and how you can find out if you are having any problems. You can find the article in Textbook: Unicode Support in Textpattern. It’s not finished yet and you don’t have to read it all, but at least take a look at the diagnostics part that will help you troubleshoot.
For new users: If you are installing Textpattern, you don’t have to worry about a thing. Textpattern will do all the right things (and it has been since 1.0rc5).
For upgraders from older versions most of the time things are ok, but not always. Under certain circumstances (mysql version 4.1 and a certain set of defaults), it was possible that you ended up with a situation where your tables were using a different charset from what PHP was using. And it seems we were not the only ones relying on defaults: While I was researching the problem I took a look at 4 popular, open source cms and weblog systems, and all are still doing it that (potentially problematic) way now. (I’ve written to some and will notify the rest as well).
So if you have mysql4.1 and are an upgrader and you do find out you have mismatching characterset values (as explained in the diagnostics section of the article in textbook) what can you do? Well, as you can see that part is still kind of missing from the article. Partly the reason is that there is absolutely no automated way to deal with this, because depending on how, when and for how long the issue has been there an automated attempt to fix it might make it worse. And partly the reason is that I am not sure what the best way to deal with it is: there are a lot of possible different situations you might start out with and a lof of potential solutions one might try out. It’s very time consuming to test and document them all, and given that there are only few people that actually are having issues with this, the quesion is whether it’s worth to invest so much time in this.
But not all hope is lost! Fortunately mamash has explained a procedure in the forums:
http://forum.textpattern.com/viewtopic.php?pid=72812#p72812
that might help in such a situation. So if you do have the outlined problem and can try the solution posted in the snippet, please report your experiences in the forums. Be as specific as you can: document what diagnostics were seeing before, what kind of problems you were having (that includes your actions, the expected behaviour and the actual behaviour), and what you did that solved it (or did not solve it).

http://bugs.mysql.com/bug.php?id=9948
has a big part of the responsibility for the problems people are having. It was the cause why server-defaults stopped being used for the connection charset. Apparently they’ve fixed it now as well.
— Sencer · Sep 3, 08:36 AM · #
The problem is on the MySQL side of the fence: articles that were stored using an older version of Textpattern and/or MySQL are now “helpfully” interpreted by the character set handling in MySQL. Textpattern tries its best to ensure MySQL doesn’t mangle anything, but that’s quite a task given the permutations involved (different Textpattern and MySQL versions, and articles that might have been stored using a different combination to the one that is fetching it).
— Alex · Sep 5, 01:35 AM · #