|
Joined: Aug 2004
Posts: 469
Addict
|
Addict
Joined: Aug 2004
Posts: 469 |
Seems like the standard nowadays and work was being done on this before the ownership change-over. Is this now on the cards - the move to finally get everything normalized on UTF-8?
|
|
|
|
Joined: Dec 2003
Posts: 6,628 Likes: 85
|
Joined: Dec 2003
Posts: 6,628 Likes: 85 |
I know it is still planned on as well as some major table cleanup.
SD did state that it would be a import not a normal update.
So I assume he wants to make sure he gets as many changes as possible done first so we don't have to import again.
Blue Man Group There is no such thing as stupid questions. Just stupid answers
|
|
|
|
Joined: Aug 2004
Posts: 469
Addict
|
Addict
Joined: Aug 2004
Posts: 469 |
Just thought I'd bring back this issue after a year to see how things are going. Are we anywhere close to finally moving to UTF-8?
|
|
|
|
Joined: Apr 2004
Posts: 1,973 Likes: 154
|
Joined: Apr 2004
Posts: 1,973 Likes: 154 |
I might not understand your question completely, but; 1. In 2006, I completely converted one of my larger forum's sql tables collation to "utf8_general_ci" (UTF-8 Unicode) and never ran in to any problems - not even password problems. Then again, I never allowed the use of extended character sets to be accepted in user names or passwords, and those forums are 100% English, even though they cater mostly to just USA/CANADA/UK/AU/JAPAN 2. In 2010, as part making that same forum much more SEO friendly, I updated in the language files / header meta tags from "iso-8859-1" to "utf-8" @ Control Panel > Languages > Language Editor > generic.php > CHARSET No problems whatsoever
|
|
|
|
Joined: Jun 2006
Posts: 16,366 Likes: 126
|
Joined: Jun 2006
Posts: 16,366 Likes: 126 |
It's easy to change the settings in the db and the charset used, but data in the db may need to be converted; this is especially true for languages that use characters outside of a latin character set...
|
|
|
|
Joined: Apr 2004
Posts: 1,973 Likes: 154
|
Joined: Apr 2004
Posts: 1,973 Likes: 154 |
Backup. Backup. And then backup again. Then CONVERT your MySQL table. Don't just change the collation, actually convert it. Then REPAIR your tables (also through SQL) so your tables are optimized as they are rebuilt (this is also what what the sql REPAIR command does). Next rebuild your tables in Control Panel > Content Rebuilder. Probably overkill, but i did it without loosing any sleep worrying about not doing it. If you are for certain that you have not introduced any extended character sets to your database, you most likely will not have any problems. I have roughly 4,000 users browsing that forum every day since about 2003. When there is a problem, I hear about it I haven't heard anything relating to character sets displaying funny or post not able to be edited because of these changes.
|
|
|
|
Joined: Aug 2004
Posts: 469
Addict
|
Addict
Joined: Aug 2004
Posts: 469 |
Thanks guys.
I need to convert from ISO-8859-2 to UTF-8. Seems like the latter has become the standard today so converting seems to make sense.
Any ideas on how to run a character conversion on our posts table?
Also, I understand our database size will baloon as an effect of the change, any idea what kind of increase we're looking at percentage-wise?
|
|
|
|
Joined: Dec 2003
Posts: 6,628 Likes: 85
|
Joined: Dec 2003
Posts: 6,628 Likes: 85 |
Question: Assuming I convert my tables to utf8_general_ci What do I change the generic language file setting to, Currently the CHARSET field is iso-8859-1.
Blue Man Group There is no such thing as stupid questions. Just stupid answers
|
|
|
|
Joined: Jun 2006
Posts: 16,366 Likes: 126
|
Joined: Jun 2006
Posts: 16,366 Likes: 126 |
@Ruben, UTF-8
The issue really is if one has posts in the table with multibyte characters which may need to be converted to be displayed correctly in the new character set.
|
|
|
|
Joined: Dec 2003
Posts: 6,628 Likes: 85
|
Joined: Dec 2003
Posts: 6,628 Likes: 85 |
Oh okay. I was not sure because it has a iso-8859-1 number in the language file where in phpmyadmin it is latin1_swedish_ci.
Blue Man Group There is no such thing as stupid questions. Just stupid answers
|
|
|
|
Joined: Aug 2004
Posts: 469
Addict
|
Addict
Joined: Aug 2004
Posts: 469 |
1. In 2006, I completely converted one of my larger forum's sql tables collation to "utf8_general_ci" (UTF-8 Unicode) and never ran in to any problems - not even password problems. I can imagine password problems would be the worst case scenario, with users unable to log in. Just a few questions if I may about how this worked out in your case: 1. How large was your forum at the time of the conversion (let's say GB-wise for a MySQL dump to get a general picture)? 2. How did you go about doing the conversion - what exact script/command line/etc. did you use for your particular charset conversion? 3. How did you run the actual conversion? Was it via something like PHPMyAdmin or did it run as just a SQL database query? 4. Did the entire conversion script/process take a long time to run?
|
|
|
|
Joined: Jun 2006
Posts: 16,366 Likes: 126
|
Joined: Jun 2006
Posts: 16,366 Likes: 126 |
We save passwords as an MD5 hash, so there shouldn't be any problems storing/converting as it's just A-Za-z0-9.
|
|
|
|
Joined: Apr 2004
Posts: 1,973 Likes: 154
|
Joined: Apr 2004
Posts: 1,973 Likes: 154 |
1. In 2006, I completely converted one of my larger forum's sql tables collation to "utf8_general_ci" (UTF-8 Unicode) and never ran in to any problems - not even password problems. I can imagine password problems would be the worst case scenario, with users unable to log in. Just a few questions if I may about how this worked out in your case: 1. How large was your forum at the time of the conversion (let's say GB-wise for a MySQL dump to get a general picture)? 2. How did you go about doing the conversion - what exact script/command line/etc. did you use for your particular charset conversion? 3. How did you run the actual conversion? Was it via something like PHPMyAdmin or did it run as just a SQL database query? 4. Did the entire conversion script/process take a long time to run? For the web page display, you can easily do the "modification 2" which I also mentioned above update in the language files / header meta tags from "iso-8859-1" to "utf-8" @ Control Panel > Languages > Language Editor > generic.php > CHARSETThis has no affect on the database and anyone can do this. It just simply tells browser/crawler what character set to expect/display the page as, rather than leave decision up to the browser/crawler to determine. --- As for the questions you're asking now, regarding the database; 1. mine is roughly 800MB as a complete mysqldump FORUM.sql file. CLOSE YOUR BOARD BEFORE YOU START WORKING ON IT!!!a) I used putty to ssh in to my server and used the following format to backup my DB: mysqldump -username -password ubbt_forums > ubbt.sql DETAILS @ https://www.ubbcentral.com/forums/ubbthreads.php/topics/191156#Post191156b) I then made a duplicate of the database on my server, using MySQL i) select the database ii) go to its "Operations" tab at the top iii) in the "Copy database to:" field, tape the name of your backup database, such as "ubbt-BAK" - be sure that "Structure and data", "CREATE DATABASE before copying" and "Add AUTO_INCREMENT value" are all checked ON -- others in that category are checked OFF. Click "Go" when ready. NOTE: if anything goes wrong for you, just delete your current database and rename your backup what your working database was named, ie; remove the "-BAK" from its name. 2 & 3. prepare for "i got my backups. no need to cross fingers. lets just get this done" mode: After you've confirmed that you have backups and that everything looks right, go back to the "ubbt" database's "Operations" menu (you're probably still there) i) In the "Collation:" group, choose "utf8_general_ci" from the drop-down menu. Click "Go". ii) from the left table listing/column, go in to the first table of your database, "ubbt_ADDRESS_BOOK" for example. iii) click "Operations" from the top tab group iv) in the "Table options" group, choose "utf8_general_ci" Collation. do not change any other options. Click "Go". Repeat this step for the other 64 tables in your "ubbt" database. 4. See item "iv" above It took me about 10 minutes to totally complete that step for my entire ubbt database. REPAIR your tables (also through MySQL) so your tables are optimized as they are rebuilt (this is also what what the sql REPAIR command does). i) Select the database name from the top ("ubbt") ii) When all the tables are listed, click on "Check All" on the bottom left. Choose "Repair table" from the drop-down list. Next, inside your UBB.Threads control panel, rebuild your tables. i) in Control Panel > Content Rebuilder. Rebuild posts, Topics, Forums, Signatures, and Private Messages. This last step is probably overkill, but i did it without loosing any sleep worrying about not doing it. Finally, login and visit a few forums and their posts to test if everything is as you expect it to be. If things go smoothly, no one will notice anything. If there are hiccups, you have a backup to revert to. One side effect that you/your members MIGHT come across, is some single byte characters (such as Swedish å, ä, ö, the temperature degree symbol, 1st/2nd/3rd...etc symbols, and most notoriously, Microsoft Word's backwards-single-quote " ' ", to name a few) will look like a black-box-single-character. This has no effect on your forum's function. It's only a display issue that you may run across once in a blue moon. If it bothers you, edit the post to replace the black-box-single-character with its equivalent character (ie; replace a backwards-single-quote with a standard single-quote) done. --- If you read this post and don't understand what I've written, do NOT perform the update to your site. Have a professional take care of that task for you. Take what I've written only as a "quick & dirty summery" of the steps one can take to accomplish this desired task, not as an absolute guideline. Again, if you read this post and don't understand what I've written, do NOT perform the update to your site. Have a professional take care of that task for you.
Last edited by id242; 03/22/2014 7:11 AM.
|
|
|
|
Joined: Jun 2006
Posts: 16,366 Likes: 126
|
Joined: Jun 2006
Posts: 16,366 Likes: 126 |
Contents of this thread have resulted in the wiki Article Converting to UTF8.
|
|
|
|
Joined: Jul 2006
Posts: 4,057
|
Joined: Jul 2006
Posts: 4,057 |
Black Diamonds <?>I have a virtual Server set up now so i can try out the conversion. Following this wiki conversion information Click me And this for the black diamonds Click Me Which i've done, and i have Diamonds in place of £ i've followed the Black Diamond link and double checked everything twice. What i can say, if i put the generic.php Characterset back to iso-8859-1 the Dimonds go away and show £ as expected. Original generic.php = iso-8859-1 Original Database = latin1_swedish_ci New generic.php = utf-8 New Database = utf8_general_ciI've rebuilt Posts, Topics, Forums, Signatures, and Private Messages. Cleared the cache. For Reference my Posts Rebuild 850MiB took 3hrs in version 7.5.9 which is quicker than 7.5.8 if i remember correctly.I've tried viewing the topics in IE and its showing the same diamonds just incase chrome was playing up. My Diamonds have a ? in them, if that's any different.Detail info : My virtual server Web Server Apache/2.2.3 PHP Version 5.1.6 MySQL Version 5.0.95 Forums 7.5.9 My Live ForumWeb Server Apache/2 PHP Version 5.3.29 MySQL Version 5.5.40 Forums 7.5.8 I don't think there is an issue with the versions, as everything is up and running as expected. My virtual server is running a copy of my live forums, and the conf file edited to work locally. Summary If I change the Characterset back to iso-8859-1 in generic.php it displays as expected. However my goal is to convert and have the correct result in anticipation of the next release. We have a forsale section and the "£" is used quite a lot, i cant just leave triangles. So my test Database has come from a higher version of mysql if that has any bearing on the results.Its not a problem to try again, i just want to be sure i'm not missing something. E.g. Versions ?? Or it should work and just try again. Thanks for any feed back.
BOOM !! Version v7.6.1.1 People who inspire me Isaac ME Gizmo
|
|
|
|
Joined: Jun 2006
Posts: 16,366 Likes: 126
|
Joined: Jun 2006
Posts: 16,366 Likes: 126 |
Quoting this post at UBBDevAs for your issue with converting to UTF8, aren't some of the characters used on your forum multibite? If so, you can't just move over to UTF8 as it doesn't support those characters. We've written a Wiki article regarding this issue at UTF-8 vs Latin-1 (ISO-8859-1), which also has links to several character set related issues.
|
|
|
|
Joined: Jul 2006
Posts: 4,057
|
Joined: Jul 2006
Posts: 4,057 |
What do you mean by Multibite G ? As above i followed your guide, but to answer correctly i need to understand multibite first. Its an english forum, no swedish or funky characters are used. Your advice is appreciate, and as above i followed your wiki guide. It did work but showed the black diamonds, and the fix did not work for the black diamonds. Multibite i cant remeber reading about. . . Thanks for your help
BOOM !! Version v7.6.1.1 People who inspire me Isaac ME Gizmo
|
|
|
|
Joined: Jun 2006
Posts: 16,366 Likes: 126
|
Joined: Jun 2006
Posts: 16,366 Likes: 126 |
Characters such as the Euro symbol, etc, are "multibyte", your best bet is reading through the wiki articles linked to in the comments section of that article i linked you to, they delve in depth as to what characters are in each character set.
An example of nonstandard or multibyte characters: euro symbol any german symbol that isn't your standard a-z 0-9 brittish pound symbol
|
|
|
|
Joined: Jul 2006
Posts: 4,057
|
Joined: Jul 2006
Posts: 4,057 |
Thanks for the feedback Gizmo, Yeah its the "£" As i have a forsale forum / section the "£" British Pound Sign is used a lot, so the Black Diamonds do stand out more in this part of the forum.
I will dig deeper and see if i can find a way around it.
Cheers
BOOM !! Version v7.6.1.1 People who inspire me Isaac ME Gizmo
|
|
|
|
Joined: Jun 2006
Posts: 626
Addict
|
Addict
Joined: Jun 2006
Posts: 626 |
As this thread is 4 years old, is all the information here still OK???
|
|
|
|
Joined: Jun 2006
Posts: 16,366 Likes: 126
|
Joined: Jun 2006
Posts: 16,366 Likes: 126 |
Most items on the topic will be the same regardless of your version; character sets are not defined by UBB.threads, they're global standards. Wikipedia: UTF8
|
|
|
1 members (Ruben),
1,658
guests, and
71
robots. |
Key:
Admin,
Global Mod,
Mod
|
|
|
|