Previous Thread
Next Thread
Print Thread
Hop To
#262300 02/10/2019 5:31 PM
Joined: Dec 2003
Posts: 117
K
Member
Member
K Offline
Joined: Dec 2003
Posts: 117
Hi!

We've been plagued by ISO to UTF-8 issues as we have been using UBB.Threads since 2002. Current installs still contain a mix of ISO and UTF-8 character sets. I've been editing files for years now, and I was wondering if there would be a chance to get rid of this issue once and for all, especially as I see some things are being consolidated and it means I've had a hard time finding the new places or different ways things are set. I've figured it out for some of the stuff, mainly:


adding
header('Content-type: text/html; charset=ISO-8859-1');
to admin.inc.php

Setting
$ubbt_lang['CHARSET'] = "ISO-8859-1";
in the language configs

Setting a HTML header in our outer wrapper


But this was after a lot of trying to move the database to UTF-8 format which didn't work.

I can still see stuff happening in the mailers, stating ISO formats so I'm really puzzled a bit by how this will all keep working.

Joined: Jun 2006
Posts: 956
Old Hand
Old Hand
Joined: Jun 2006
Posts: 956
don't modify admin.inc.php
The language string is set in languages/english/generic.php or if use another language in their directory/generic.php
The problem with ubb.threads < V7.7x is the old way to connect to the databse and hardcoded mailer code pages.
With ubb.threads > 7.7x it should be possible to enhance the sql connect with codepage like this:
To get true utf8 support, you need the following line after mysqli_connect:
mysqli_set_charset( $this->dbh, "utf8" );
Ref: https://www.ubbcentral.com/forums/ubbthreads.php/topics/262249/re-copy-and-paste#Post262249

Then the next problem: your database ist running utf-8 but if you have a very old ubb.threads version the tables was created in iso (swedish latin-1 or something like this, not in utf-8) This can fixed easy in phpmyadmin. Set the collation to utf-8 for each table.

At this point all new text is stored in utf-8 and all old text show a black square ? for a non utf-8 char. The next step ist working inside the table and replace all non utf-8 chars with the correct utf-8 chars. This could be pain and need some experience with phpmyadmin, export and import tables and find the proper chars. I have done it with my board and now I hope I have no problems again.

I use the following way: make a backup first! Close the board and do a testposting with all special chars do you use. then change the collation and do a second testposting You find this posting at the end of the ubbt_POSTS table. take a look at the posts and you see how your chars are stored. For example I write ae = ä , ue=ü in my post to identify the chars. Then I export the table, use notepad ++ as editor and do a search/replace for this chars, save it and import it again. It can be edited inside phpmyadmin to but I dont know the propper sql commands so I do it this way smile Works for me, my board now runs in utf-8 on all files. I write it down here too:
https://www.ubbcentral.com/forums/ubbthreads.php/topics/261912/char-coding-utf-8-and-older-forums

All this happen only to old boards. In the past at one point new setups create the tables allready in utf-8 and this user dont know anything about our trouble

Last edited by Zarzal; 02/11/2019 11:05 AM.

my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Joined: Jul 2006
Posts: 116
Likes: 4
P
Member
Member
P Offline
Joined: Jul 2006
Posts: 116
Likes: 4
Originally Posted by Zarzal
At this point all new text is stored in utf-8 and all old text show a black square ? for a non utf-8 char. The next step ist working inside the table and replace all non utf-8 chars with the correct utf-8 chars. This could be pain and need some experience with phpmyadmin, export and import tables and find the proper chars. I have done it with my board and now I hope I have no problems again.
The SQL statement you used in the linked thread is already converting all existing ISO-8859-1 data to UTF-8:
Code
ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
You see broken characters because your default database connection is still using Latin1 as default. So basically, you replaced all UTF-8 characters with Latin1 encoded ones.

In attachment a screenshot of the encoding problem using an utf8_general_ci table. The first posted is actually using the wrong (default) Latin1 encoding while the second posting is using the proper UTF-8 encoding. See the difference? This is how UTF-8 data is supposed to look like in a database.

To enable UTF-8 database encoding support, you need to add the mysqli_set_charset( $this->dbh, "utf8" ); line to function connect in libs/mysqli.inc.php. However, this will only work with UTF-8 encoded UTF-8 data in utf8_general_ci tables. Any Latin1 encoded UTF-8 data will break in this mode.

This article is a good summary and explains the problem:
https://www.whitesmith.co/blog/latin1-to-utf8/

Attachments
latin1utf8.png

Joined: Jun 2006
Posts: 956
Old Hand
Old Hand
Joined: Jun 2006
Posts: 956
Phillip, thank you for the explanation!

I setup a xampp version of my board on my local computer to test it. After bringing my local version to work I modify the mysqli.inc.php to see what happen. Now I understand whats going on. In my screenshot you see 2 postings. The first one is without mofify the file, the second one with your modification.

Now I have to find a quick way to change my 'wrong' stored chars to the propper one. Can this be done with a sql command in phpmyadmin? This would be the easy way. If not I have to use export / search&replace / import again.

Now the big questions: will this come in new ubb.threads version by default and how can every admin handle the char encoding change for old postings? Should be a maintenance tool inside the control panel do this job? Could be do a lot trouble...

Regards
Zarzal
Attachments
utf8.jpg


my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Joined: Jul 2006
Posts: 116
Likes: 4
P
Member
Member
P Offline
Joined: Jul 2006
Posts: 116
Likes: 4
Something like this should fix the problem:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8");

$result = $db -> query( "
    SELECT 
    POST_ID, 
    convert(cast(convert(POST_SUBJECT using latin1) as binary) using utf8), 
    convert(cast(convert(POST_BODY using latin1) as binary) using utf8), 
    convert(cast(convert(POST_DEFAULT_BODY using latin1) as binary) using utf8) 
    FROM 
    ubbt_POSTS;
" );

while( list( $post_id, $post_subject, $post_body, $post_default_body ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_POSTS 
        SET 
        POST_SUBJECT = '" . $db -> escape_string( $post_subject ) . "',
        POST_BODY = '" . $db -> escape_string( $post_body ) . "',
        POST_DEFAULT_BODY = '" . $db -> escape_string( $post_default_body ) . "'
        WHERE
        POST_ID = '" . $post_id . "';
    " );
    
    echo "Processing post #".$post_id."<br>";
}

$db ->close();
?>
This script is reading and converting the data in the POST_SUBJECT, POST_BODY, POST_DEFAULT_BODY fields to proper UTF-8 and then writes it back to the database. You need to replace DATABASE_USERNAME, DATABASE_PASSWORD, and DATABASE_NAME with your MySQL username, password, and database name.

But please try it first out on a test installation and make a backup of your database before running it wink

Joined: Jun 2006
Posts: 956
Old Hand
Old Hand
Joined: Jun 2006
Posts: 956
Thanks again, I will give it a try on my local test system next weekend and report what happen to me smile

ubb use the following tables on my system who need to be converted:

ubbt_POSTS
ubbt_TOPICS

ubbt_PRIVATE_MESSAGE_POSTS
ubbt_PRIVATE_MESSAGE_TOPICS

ubbt_PROFILE_COMMENTS

maybe there some other with such entrys but this can be done by hand (example ubbt_USER_PROFILE for signature text)

Zarzal

Last edited by Zarzal; 02/13/2019 11:08 AM.

my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Joined: Jul 2006
Posts: 116
Likes: 4
P
Member
Member
P Offline
Joined: Jul 2006
Posts: 116
Likes: 4
The first script was for ubbt_POSTS. Here the rest:

ubbt_TOPICS:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$result = $db -> query( "
    SELECT 
    TOPIC_ID, 
    convert(cast(convert(TOPIC_SUBJECT using latin1) as binary) using utf8), 
    convert(cast(convert(TOPIC_LAST_POSTER_NAME using latin1) as binary) using utf8), 
    convert(cast(convert(TOPIC_POSTER_NAME using latin1) as binary) using utf8) 
    FROM 
    ubbt_TOPICS;
" );

while( list( $topic_id, $topic_subject, $topic_last_poster_name, $topic_poster_name ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_TOPICS 
        SET 
        TOPIC_SUBJECT = '" . $db -> escape_string( $topic_subject ) . "',
        TOPIC_LAST_POSTER_NAME = '" . $db -> escape_string( $topic_last_poster_name ) . "',
        TOPIC_POSTER_NAME = '" . $db -> escape_string( $topic_poster_name ) . "'
        WHERE
        TOPIC_ID = '" . $topic_id . "';
    " );
    
    echo "Processing topic #".$topic_id."<br>";
}

$db -> close();
?>
ubbt_PRIVATE_MESSAGE_TOPICS:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$result = $db -> query( "
    SELECT 
    TOPIC_ID, 
    convert(cast(convert(TOPIC_SUBJECT using latin1) as binary) using utf8), 
    convert(cast(convert(TOPIC_LAST_POSTER_NAME using latin1) as binary) using utf8)
    FROM 
    ubbt_PRIVATE_MESSAGE_TOPICS;
" );

while( list( $topic_id, $topic_subject, $topic_last_poster_name ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_PRIVATE_MESSAGE_TOPICS
        SET 
        TOPIC_SUBJECT = '" . $db -> escape_string( $topic_subject ) . "',
        TOPIC_LAST_POSTER_NAME = '" . $db -> escape_string( $topic_last_poster_name ) . "'
        WHERE
        TOPIC_ID = '" . $topic_id . "';
    " );
    
    echo "Processing PM topic #".$topic_id."<br>";
}

$db -> close();
?>
ubbt_PRIVATE_MESSAGE_POSTS:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$result = $db -> query( "
    SELECT 
    POST_ID, 
    convert(cast(convert(POST_BODY using latin1) as binary) using utf8), 
    convert(cast(convert(POST_DEFAULT_BODY using latin1) as binary) using utf8)
    FROM 
    ubbt_PRIVATE_MESSAGE_POSTS;
" );

while( list( $post_id, $post_body, $post_default_body ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_PRIVATE_MESSAGE_POSTS
        SET 
        POST_BODY = '" . $db -> escape_string( $post_body ) . "',
        POST_DEFAULT_BODY = '" . $db -> escape_string( $post_default_body ) . "'
        WHERE
        POST_ID = '" . $post_id . "';
    " );
    
    echo "Processing PM post #".$post_id."<br>";
}

$db -> close();
?>
ubbt_PROFILE_COMMENTS:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$result = $db -> query( "
    SELECT 
    COMMENT_ID, 
    convert(cast(convert(COMMENT_BODY using latin1) as binary) using utf8), 
    convert(cast(convert(COMMENT_DEFAULT_BODY using latin1) as binary) using utf8)
    FROM 
    ubbt_PROFILE_COMMENTS;
" );

while( list( $comment_id, $comment_body, $comment_default_body ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_PROFILE_COMMENTS
        SET 
        COMMENT_BODY = '" . $db -> escape_string( $comment_body ) . "',
        COMMENT_DEFAULT_BODY = '" . $db -> escape_string( $comment_default_body ) . "'
        WHERE
        COMMENT_ID = '" . $comment_id . "';
    " );
    
    echo "Processing comment #".$comment_id."<br>";
}

$db -> close();
?>

Joined: Jun 2006
Posts: 956
Old Hand
Old Hand
Joined: Jun 2006
Posts: 956
Thanks again. For my board there is memory problem while converting POSTS. The script must do it in steps or you get:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 24576 bytes) in ....

I will search for an other way to do it.


my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Joined: Jun 2006
Posts: 956
Old Hand
Old Hand
Joined: Jun 2006
Posts: 956
Ok, here my way step-by-step to do it on my test-board

1) close board
2) do a backup and store it in a save place
3) do a testposting with you admin account with all chars you need to convert. Example:
Code
***!marker!***
ae = ä
oe = ö
ue = ü
sz = ß
Ae = Ä
Oe = Ö
Ue = Ü
euro = €
grad = °
hoch 2 = ²
hoch 3 = ³
4) do another backup (I use mysqldumper) and store it local an your computer, logout the forum
5) unpack the backup if you got .gz file, you will get a big text file with your database content
6) use notepad++ or an other editor which can edit utf-8 files for search and replace for each char that need to be convert. Please check casesensitiv for the search! Repeat for each char over the complete textfile. To get a list of chars who need to be convert search for ***!marker!*** and you find your testposting. In this line you find the wrong chars. Copy the line to a second editor window, delete not needed text and you have a source for copy&paste for your search in the db dump. It could look like this
Code
ae = ä
oe = ö
ue = ü
sz = ß
Ae = Ä
Oe = Ö
Ue = Ü
euro = €
apo = \'
grad = °
hoch 2 = ²
hoch 3 = ³
7) save it
8) modify the mysqli.inc.php and add the line
Code
mysqli_set_charset( $this->dbh, "utf8" );
under the line
Code
$this->dbh = mysqli_connect($config['DATABASE_SERVER'], $config['DATABASE_USER'], $config['DATABASE_PASSWORD'], $config['DATABASE_NAME']) or die("Problem occured in connection");
and
Code
$this->dbh = mysqli_pconnect("p:" . $config['DATABASE_SERVER'], $config['DATABASE_USER'], $config['DATABASE_PASSWORD'], $config['DATABASE_NAME']) or die("Problem occured in connection");
(line 29 and line 31)
9) import the database file in your forum. If the file is > 2MB please use a tool such as mysqldump. You find a version who works with pphp7.2 here: https://github.com/DSB/MySQLDumper
10) login the forum, check it and reopen it for public

Note: please take care of your language file generic.php! please check it for the language encoding on top of the file. Change it to utf-8 if you use iso in the past.

This takes 5-10 minutes in all if you have mysqldumper installed and know how to handle the files and editor. Its easy.
The only thing that makes problems: searching for a ' is not easy. It is stored as \' but this combination is used in several places and it is not allowed to be changed.

*NOTE* without any warranty smile

Last edited by Zarzal; 02/13/2019 2:58 PM. Reason: no warranty!

my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Joined: Jul 2006
Posts: 116
Likes: 4
P
Member
Member
P Offline
Joined: Jul 2006
Posts: 116
Likes: 4
Originally Posted by Zarzal
Thanks again. For my board there is memory problem while converting POSTS. The script must do it in steps or you get:
Something like this should do the trick:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$offset = filter_input( INPUT_GET, "offset", FILTER_SANITIZE_NUMBER_INT );
$offset_sql = ( $offset ) ? " OFFSET " . $db -> escape_string( $offset ) : "" ;

$result = $db -> query( "
    SELECT 
    POST_ID, 
    convert(cast(convert(POST_SUBJECT using latin1) as binary) using utf8), 
    convert(cast(convert(POST_BODY using latin1) as binary) using utf8), 
    convert(cast(convert(POST_DEFAULT_BODY using latin1) as binary) using utf8) 
    FROM 
    ubbt_POSTS
    LIMIT 100" . $offset_sql. ";
" );

if( $result -> num_rows )
{
    
    while( list( $post_id, $post_subject, $post_body, $post_default_body ) = $result -> fetch_row() )
    {
        $db -> query( "
            UPDATE 
            ubbt_POSTS 
            SET 
            POST_SUBJECT = '" . $db -> escape_string( $post_subject ) . "',
            POST_BODY = '" . $db -> escape_string( $post_body ) . "',
            POST_DEFAULT_BODY = '" . $db -> escape_string( $post_default_body ) . "'
            WHERE
            POST_ID = '" . $post_id . "';
        " );

        echo "Processing post #".$post_id."<br>";
    }

    $offset = $offset + 100;
    echo "<script>setTimeout(function() {window.location.href = \"" . $_SERVER['PHP_SELF'] . "?offset=" . $offset . "\"}, 2000);</script>";

}
else
{
    echo "Finish";
}

$db ->close();
exit;
?>
I found in the meantime an easier way based on https://www.whitesmith.co/blog/latin1-to-utf8/. However, this requires shell access to run the mysqldump tool directly:
Code
mysqldump -uUSERNAME -pPASSWORD --skip-set-charset --default-character-set=latin1 DATABASENAME > database.sql

Using --skip-set-charset --default-character-set=latin1 will use a Latin1 connection for the SQL dump, so the dump contains the original UTF-8 characters. Then re-import the dump in UTF-8. This method works also for Latin1 tables with UTF-8 data. In this case, you need to adjust CHARSET=latin1 to CHARSET=utf8 in the SQL dump before re-importing the data.

Joined: Dec 2003
Posts: 117
K
Member
Member
K Offline
Joined: Dec 2003
Posts: 117
I think I'll try and copy the board a database to a new server, then try my own way to convert the exported ISO to UTF. See what it does there. I just feel in one of the next releases it's just all going to break...


Link Copied to Clipboard
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Recent Topics
spam issues
by ECNet - 03/19/2024 11:45 PM
Looking for a forum
by azr - 03/15/2024 11:26 PM
Editing Links in Post
by Outdoorking - 03/15/2024 9:31 AM
Question on barkrowler and the like
by Mors - 02/29/2024 6:51 PM
Member Permissions Help
by domspeak - 02/27/2024 6:31 PM
Who's Online Now
3 members (rootman, Gizmo, Nightcrawler), 562 guests, and 186 robots.
Key: Admin, Global Mod, Mod
Random Gallery Image
Latest Gallery Images
Los Angeles
Los Angeles
by isaac, August 6
3D Creations
3D Creations
by JAISP, December 30
Artistic structures
Artistic structures
by isaac, August 29
Stones
Stones
by isaac, August 19
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20230217)