We've been plagued by ISO to UTF-8 issues as we have been using UBB.Threads since 2002. Current installs still contain a mix of ISO and UTF-8 character sets. I've been editing files for years now, and I was wondering if there would be a chance to get rid of this issue once and for all, especially as I see some things are being consolidated and it means I've had a hard time finding the new places or different ways things are set. I've figured it out for some of the stuff, mainly:
adding header('Content-type: text/html; charset=ISO-8859-1'); to admin.inc.php
Setting $ubbt_lang['CHARSET'] = "ISO-8859-1"; in the language configs
Setting a HTML header in our outer wrapper
But this was after a lot of trying to move the database to UTF-8 format which didn't work.
I can still see stuff happening in the mailers, stating ISO formats so I'm really puzzled a bit by how this will all keep working.
don't modify admin.inc.php The language string is set in languages/english/generic.php or if use another language in their directory/generic.php The problem with ubb.threads < V7.7x is the old way to connect to the databse and hardcoded mailer code pages. With ubb.threads > 7.7x it should be possible to enhance the sql connect with codepage like this: To get true utf8 support, you need the following line after mysqli_connect: mysqli_set_charset( $this->dbh, "utf8" ); Ref: https://www.ubbcentral.com/forums/ubbthreads.php/topics/262249/re-copy-and-paste#Post262249
Then the next problem: your database ist running utf-8 but if you have a very old ubb.threads version the tables was created in iso (swedish latin-1 or something like this, not in utf-8) This can fixed easy in phpmyadmin. Set the collation to utf-8 for each table.
At this point all new text is stored in utf-8 and all old text show a black square ? for a non utf-8 char. The next step ist working inside the table and replace all non utf-8 chars with the correct utf-8 chars. This could be pain and need some experience with phpmyadmin, export and import tables and find the proper chars. I have done it with my board and now I hope I have no problems again.
I use the following way: make a backup first! Close the board and do a testposting with all special chars do you use. then change the collation and do a second testposting You find this posting at the end of the ubbt_POSTS table. take a look at the posts and you see how your chars are stored. For example I write ae = ä , ue=ü in my post to identify the chars. Then I export the table, use notepad ++ as editor and do a search/replace for this chars, save it and import it again. It can be edited inside phpmyadmin to but I dont know the propper sql commands so I do it this way Works for me, my board now runs in utf-8 on all files. I write it down here too: https://www.ubbcentral.com/forums/ubbthreads.php/topics/261912/char-coding-utf-8-and-older-forums
All this happen only to old boards. In the past at one point new setups create the tables allready in utf-8 and this user dont know anything about our trouble
At this point all new text is stored in utf-8 and all old text show a black square ? for a non utf-8 char. The next step ist working inside the table and replace all non utf-8 chars with the correct utf-8 chars. This could be pain and need some experience with phpmyadmin, export and import tables and find the proper chars. I have done it with my board and now I hope I have no problems again.
The SQL statement you used in the linked thread is already converting all existing ISO-8859-1 data to UTF-8:
Code
ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
You see broken characters because your default database connection is still using Latin1 as default. So basically, you replaced all UTF-8 characters with Latin1 encoded ones.
In attachment a screenshot of the encoding problem using an utf8_general_ci table. The first posted is actually using the wrong (default) Latin1 encoding while the second posting is using the proper UTF-8 encoding. See the difference? This is how UTF-8 data is supposed to look like in a database.
To enable UTF-8 database encoding support, you need to add the mysqli_set_charset( $this->dbh, "utf8" ); line to function connect in libs/mysqli.inc.php. However, this will only work with UTF-8 encoded UTF-8 data in utf8_general_ci tables. Any Latin1 encoded UTF-8 data will break in this mode.
I setup a xampp version of my board on my local computer to test it. After bringing my local version to work I modify the mysqli.inc.php to see what happen. Now I understand whats going on. In my screenshot you see 2 postings. The first one is without mofify the file, the second one with your modification.
Now I have to find a quick way to change my 'wrong' stored chars to the propper one. Can this be done with a sql command in phpmyadmin? This would be the easy way. If not I have to use export / search&replace / import again.
Now the big questions: will this come in new ubb.threads version by default and how can every admin handle the char encoding change for old postings? Should be a maintenance tool inside the control panel do this job? Could be do a lot trouble...
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8");
$result = $db -> query( "
SELECT
POST_ID,
convert(cast(convert(POST_SUBJECT using latin1) as binary) using utf8),
convert(cast(convert(POST_BODY using latin1) as binary) using utf8),
convert(cast(convert(POST_DEFAULT_BODY using latin1) as binary) using utf8)
FROM
ubbt_POSTS;
" );
while( list( $post_id, $post_subject, $post_body, $post_default_body ) = $result -> fetch_row() )
{
$db -> query( "
UPDATE
ubbt_POSTS
SET
POST_SUBJECT = '" . $db -> escape_string( $post_subject ) . "',
POST_BODY = '" . $db -> escape_string( $post_body ) . "',
POST_DEFAULT_BODY = '" . $db -> escape_string( $post_default_body ) . "'
WHERE
POST_ID = '" . $post_id . "';
" );
echo "Processing post #".$post_id."<br>";
}
$db ->close();
?>
This script is reading and converting the data in the POST_SUBJECT, POST_BODY, POST_DEFAULT_BODY fields to proper UTF-8 and then writes it back to the database. You need to replace DATABASE_USERNAME, DATABASE_PASSWORD, and DATABASE_NAME with your MySQL username, password, and database name.
But please try it first out on a test installation and make a backup of your database before running it
Thanks again. For my board there is memory problem while converting POSTS. The script must do it in steps or you get: Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 24576 bytes) in ....
4) do another backup (I use mysqldumper) and store it local an your computer, logout the forum 5) unpack the backup if you got .gz file, you will get a big text file with your database content 6) use notepad++ or an other editor which can edit utf-8 files for search and replace for each char that need to be convert. Please check casesensitiv for the search! Repeat for each char over the complete textfile. To get a list of chars who need to be convert search for ***!marker!*** and you find your testposting. In this line you find the wrong chars. Copy the line to a second editor window, delete not needed text and you have a source for copy&paste for your search in the db dump. It could look like this
7) save it 8) modify the mysqli.inc.php and add the line
Code
mysqli_set_charset( $this->dbh, "utf8" );
under the line
Code
$this->dbh = mysqli_connect($config['DATABASE_SERVER'], $config['DATABASE_USER'], $config['DATABASE_PASSWORD'], $config['DATABASE_NAME']) or die("Problem occured in connection");
and
Code
$this->dbh = mysqli_pconnect("p:" . $config['DATABASE_SERVER'], $config['DATABASE_USER'], $config['DATABASE_PASSWORD'], $config['DATABASE_NAME']) or die("Problem occured in connection");
(line 29 and line 31) 9) import the database file in your forum. If the file is > 2MB please use a tool such as mysqldump. You find a version who works with pphp7.2 here: https://github.com/DSB/MySQLDumper 10) login the forum, check it and reopen it for public
Note: please take care of your language file generic.php! please check it for the language encoding on top of the file. Change it to utf-8 if you use iso in the past.
This takes 5-10 minutes in all if you have mysqldumper installed and know how to handle the files and editor. Its easy. The only thing that makes problems: searching for a ' is not easy. It is stored as \' but this combination is used in several places and it is not allowed to be changed.
*NOTE* without any warranty
Last edited by Zarzal; 02/13/20192:58 PM. Reason: no warranty!
Using --skip-set-charset --default-character-set=latin1 will use a Latin1 connection for the SQL dump, so the dump contains the original UTF-8 characters. Then re-import the dump in UTF-8. This method works also for Latin1 tables with UTF-8 data. In this case, you need to adjust CHARSET=latin1 to CHARSET=utf8 in the SQL dump before re-importing the data.
I think I'll try and copy the board a database to a new server, then try my own way to convert the exported ISO to UTF. See what it does there. I just feel in one of the next releases it's just all going to break...