I thought you were going to get some sleep!

Don't worry; it'll wait a little while...I've got most of it sorted now. I did have to make the same change to the mysql.inc.php on the live system as the test one in the end. It was encoding the text in some, but not all, of the posts differently on output, which is weird. E.g. converting the apostrophe (e2 80 99) in the database to a control block (c2 92) on output. Setting the character encoding on the database connection seems to have fixed the randomness in output encoding.

One new post was affected by the upgrade, prior to the change to mysql.inc.php, but it was a user reporting an issue with the encoding and including a sample, so I'm not fixing that post (the database shows that the string seems to have been double-encoded on input, probably utf8-as-latin1 to utf8). The others seem to be OK.

The rest of the text in the database, as far as I can tell, is in utf8 after the export and re-import. I still don't know how it was converted by mysqldump, and that worries me somewhat as I don't know if it analysed the non-7-bit-ASCII characters to work out what the encoding was likely to be and converted it, or whether it just assumed it would be ISO Latin 1 and proceeded accordingly. So far, however, I've spotted no other issues.

If I do, and they're extensive, I'll add your function to the codebase and call it appropriately.

Last edited by Pak Chan; 10/12/2015 11:06 AM. Reason: Added response to function.