Site Links
Home
Features
Documentation
Pricing & Order
Members Area
Support Options
UBBDev.com
UBBWiki.com
Who's Online Now
3 registered members (Conrad, AstroCat, driv), 72 guests, and 214 spiders.
Key: Admin, Global Mod, Mod
Member Spotlight
Posts: 112
Joined: June 2011
Show All Member Profiles 
Top Posters(30 Days)
isaac 14
Ruben 5
Morgan 4
M4TT 4
Gizmo 1
fidel 1
Latest Photos
Trees
Retreats
Rooms
Big stuff
Test
Previous Thread
Next Thread
Print Thread
The ISO and UTF-8 issues #262300 02/10/19 05:31 PM
Joined: Dec 2003
Posts: 117
Kayjey Offline OP
Member
Hi!

We've been plagued by ISO to UTF-8 issues as we have been using UBB.Threads since 2002. Current installs still contain a mix of ISO and UTF-8 character sets. I've been editing files for years now, and I was wondering if there would be a chance to get rid of this issue once and for all, especially as I see some things are being consolidated and it means I've had a hard time finding the new places or different ways things are set. I've figured it out for some of the stuff, mainly:


adding
header('Content-type: text/html; charset=ISO-8859-1');
to admin.inc.php

Setting
$ubbt_lang['CHARSET'] = "ISO-8859-1";
in the language configs

Setting a HTML header in our outer wrapper


But this was after a lot of trying to move the database to UTF-8 format which didn't work.

I can still see stuff happening in the mailers, stating ISO formats so I'm really puzzled a bit by how this will all keep working.

Express Hosting
Re: The ISO and UTF-8 issues [Re: Kayjey] #262308 02/11/19 10:58 AM
Joined: Jun 2006
Posts: 955
Zarzal Offline
Old Hand
don't modify admin.inc.php
The language string is set in languages/english/generic.php or if use another language in their directory/generic.php
The problem with ubb.threads < V7.7x is the old way to connect to the databse and hardcoded mailer code pages.
With ubb.threads > 7.7x it should be possible to enhance the sql connect with codepage like this:
To get true utf8 support, you need the following line after mysqli_connect:
mysqli_set_charset( $this->dbh, "utf8" );
Ref: https://www.ubbcentral.com/forums/ubbthreads.php/topics/262249/re-copy-and-paste#Post262249

Then the next problem: your database ist running utf-8 but if you have a very old ubb.threads version the tables was created in iso (swedish latin-1 or something like this, not in utf-8) This can fixed easy in phpmyadmin. Set the collation to utf-8 for each table.

At this point all new text is stored in utf-8 and all old text show a black square ? for a non utf-8 char. The next step ist working inside the table and replace all non utf-8 chars with the correct utf-8 chars. This could be pain and need some experience with phpmyadmin, export and import tables and find the proper chars. I have done it with my board and now I hope I have no problems again.

I use the following way: make a backup first! Close the board and do a testposting with all special chars do you use. then change the collation and do a second testposting You find this posting at the end of the ubbt_POSTS table. take a look at the posts and you see how your chars are stored. For example I write ae = ä , ue=ü in my post to identify the chars. Then I export the table, use notepad ++ as editor and do a search/replace for this chars, save it and import it again. It can be edited inside phpmyadmin to but I dont know the propper sql commands so I do it this way smile Works for me, my board now runs in utf-8 on all files. I write it down here too:
https://www.ubbcentral.com/forums/ubbthreads.php/topics/261912/char-coding-utf-8-and-older-forums

All this happen only to old boards. In the past at one point new setups create the tables allready in utf-8 and this user dont know anything about our trouble

Last edited by Zarzal; 02/11/19 11:05 AM.

my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Re: The ISO and UTF-8 issues [Re: Zarzal] #262331 02/13/19 02:49 AM
Joined: Jul 2006
Posts: 95
Philipp Offline
Journeyman
Originally Posted by Zarzal
At this point all new text is stored in utf-8 and all old text show a black square ? for a non utf-8 char. The next step ist working inside the table and replace all non utf-8 chars with the correct utf-8 chars. This could be pain and need some experience with phpmyadmin, export and import tables and find the proper chars. I have done it with my board and now I hope I have no problems again.

The SQL statement you used in the linked thread is already converting all existing ISO-8859-1 data to UTF-8:
Code
ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

You see broken characters because your default database connection is still using Latin1 as default. So basically, you replaced all UTF-8 characters with Latin1 encoded ones.

In attachment a screenshot of the encoding problem using an utf8_general_ci table. The first posted is actually using the wrong (default) Latin1 encoding while the second posting is using the proper UTF-8 encoding. See the difference? This is how UTF-8 data is supposed to look like in a database.

To enable UTF-8 database encoding support, you need to add the mysqli_set_charset( $this->dbh, "utf8" ); line to function connect in libs/mysqli.inc.php. However, this will only work with UTF-8 encoded UTF-8 data in utf8_general_ci tables. Any Latin1 encoded UTF-8 data will break in this mode.

This article is a good summary and explains the problem:
https://www.whitesmith.co/blog/latin1-to-utf8/

Attached Files latin1utf8.png
Re: The ISO and UTF-8 issues [Re: Kayjey] #262332 02/13/19 05:41 AM
Joined: Jun 2006
Posts: 955
Zarzal Offline
Old Hand
Phillip, thank you for the explanation!

I setup a xampp version of my board on my local computer to test it. After bringing my local version to work I modify the mysqli.inc.php to see what happen. Now I understand whats going on. In my screenshot you see 2 postings. The first one is without mofify the file, the second one with your modification.

Now I have to find a quick way to change my 'wrong' stored chars to the propper one. Can this be done with a sql command in phpmyadmin? This would be the easy way. If not I have to use export / search&replace / import again.

Now the big questions: will this come in new ubb.threads version by default and how can every admin handle the char encoding change for old postings? Should be a maintenance tool inside the control panel do this job? Could be do a lot trouble...

Regards
Zarzal

Attached Files utf8.jpg

my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Re: The ISO and UTF-8 issues [Re: Kayjey] #262333 02/13/19 09:16 AM
Joined: Jul 2006
Posts: 95
Philipp Offline
Journeyman
Something like this should fix the problem:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8");

$result = $db -> query( "
    SELECT 
    POST_ID, 
    convert(cast(convert(POST_SUBJECT using latin1) as binary) using utf8), 
    convert(cast(convert(POST_BODY using latin1) as binary) using utf8), 
    convert(cast(convert(POST_DEFAULT_BODY using latin1) as binary) using utf8) 
    FROM 
    ubbt_POSTS;
" );

while( list( $post_id, $post_subject, $post_body, $post_default_body ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_POSTS 
        SET 
        POST_SUBJECT = '" . $db -> escape_string( $post_subject ) . "',
        POST_BODY = '" . $db -> escape_string( $post_body ) . "',
        POST_DEFAULT_BODY = '" . $db -> escape_string( $post_default_body ) . "'
        WHERE
        POST_ID = '" . $post_id . "';
    " );
    
    echo "Processing post #".$post_id."<br>";
}

$db ->close();
?>

This script is reading and converting the data in the POST_SUBJECT, POST_BODY, POST_DEFAULT_BODY fields to proper UTF-8 and then writes it back to the database. You need to replace DATABASE_USERNAME, DATABASE_PASSWORD, and DATABASE_NAME with your MySQL username, password, and database name.

But please try it first out on a test installation and make a backup of your database before running it wink

Re: The ISO and UTF-8 issues [Re: Kayjey] #262334 02/13/19 10:51 AM
Joined: Jun 2006
Posts: 955
Zarzal Offline
Old Hand
Thanks again, I will give it a try on my local test system next weekend and report what happen to me smile

ubb use the following tables on my system who need to be converted:

ubbt_POSTS
ubbt_TOPICS

ubbt_PRIVATE_MESSAGE_POSTS
ubbt_PRIVATE_MESSAGE_TOPICS

ubbt_PROFILE_COMMENTS

maybe there some other with such entrys but this can be done by hand (example ubbt_USER_PROFILE for signature text)

Zarzal

Last edited by Zarzal; 02/13/19 11:08 AM.

my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Re: The ISO and UTF-8 issues [Re: Kayjey] #262336 02/13/19 11:57 AM
Joined: Jul 2006
Posts: 95
Philipp Offline
Journeyman
The first script was for ubbt_POSTS. Here the rest:

ubbt_TOPICS:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$result = $db -> query( "
    SELECT 
    TOPIC_ID, 
    convert(cast(convert(TOPIC_SUBJECT using latin1) as binary) using utf8), 
    convert(cast(convert(TOPIC_LAST_POSTER_NAME using latin1) as binary) using utf8), 
    convert(cast(convert(TOPIC_POSTER_NAME using latin1) as binary) using utf8) 
    FROM 
    ubbt_TOPICS;
" );

while( list( $topic_id, $topic_subject, $topic_last_poster_name, $topic_poster_name ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_TOPICS 
        SET 
        TOPIC_SUBJECT = '" . $db -> escape_string( $topic_subject ) . "',
        TOPIC_LAST_POSTER_NAME = '" . $db -> escape_string( $topic_last_poster_name ) . "',
        TOPIC_POSTER_NAME = '" . $db -> escape_string( $topic_poster_name ) . "'
        WHERE
        TOPIC_ID = '" . $topic_id . "';
    " );
    
    echo "Processing topic #".$topic_id."<br>";
}

$db -> close();
?>

ubbt_PRIVATE_MESSAGE_TOPICS:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$result = $db -> query( "
    SELECT 
    TOPIC_ID, 
    convert(cast(convert(TOPIC_SUBJECT using latin1) as binary) using utf8), 
    convert(cast(convert(TOPIC_LAST_POSTER_NAME using latin1) as binary) using utf8)
    FROM 
    ubbt_PRIVATE_MESSAGE_TOPICS;
" );

while( list( $topic_id, $topic_subject, $topic_last_poster_name ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_PRIVATE_MESSAGE_TOPICS
        SET 
        TOPIC_SUBJECT = '" . $db -> escape_string( $topic_subject ) . "',
        TOPIC_LAST_POSTER_NAME = '" . $db -> escape_string( $topic_last_poster_name ) . "'
        WHERE
        TOPIC_ID = '" . $topic_id . "';
    " );
    
    echo "Processing PM topic #".$topic_id."<br>";
}

$db -> close();
?>

ubbt_PRIVATE_MESSAGE_POSTS:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$result = $db -> query( "
    SELECT 
    POST_ID, 
    convert(cast(convert(POST_BODY using latin1) as binary) using utf8), 
    convert(cast(convert(POST_DEFAULT_BODY using latin1) as binary) using utf8)
    FROM 
    ubbt_PRIVATE_MESSAGE_POSTS;
" );

while( list( $post_id, $post_body, $post_default_body ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_PRIVATE_MESSAGE_POSTS
        SET 
        POST_BODY = '" . $db -> escape_string( $post_body ) . "',
        POST_DEFAULT_BODY = '" . $db -> escape_string( $post_default_body ) . "'
        WHERE
        POST_ID = '" . $post_id . "';
    " );
    
    echo "Processing PM post #".$post_id."<br>";
}

$db -> close();
?>

ubbt_PROFILE_COMMENTS:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$result = $db -> query( "
    SELECT 
    COMMENT_ID, 
    convert(cast(convert(COMMENT_BODY using latin1) as binary) using utf8), 
    convert(cast(convert(COMMENT_DEFAULT_BODY using latin1) as binary) using utf8)
    FROM 
    ubbt_PROFILE_COMMENTS;
" );

while( list( $comment_id, $comment_body, $comment_default_body ) = $result -> fetch_row() )
{
    $db -> query( "
        UPDATE 
        ubbt_PROFILE_COMMENTS
        SET 
        COMMENT_BODY = '" . $db -> escape_string( $comment_body ) . "',
        COMMENT_DEFAULT_BODY = '" . $db -> escape_string( $comment_default_body ) . "'
        WHERE
        COMMENT_ID = '" . $comment_id . "';
    " );
    
    echo "Processing comment #".$comment_id."<br>";
}

$db -> close();
?>

Re: The ISO and UTF-8 issues [Re: Kayjey] #262337 02/13/19 01:33 PM
Joined: Jun 2006
Posts: 955
Zarzal Offline
Old Hand
Thanks again. For my board there is memory problem while converting POSTS. The script must do it in steps or you get:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 24576 bytes) in ....

I will search for an other way to do it.


my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Re: The ISO and UTF-8 issues [Re: Kayjey] #262339 02/13/19 02:45 PM
Joined: Jun 2006
Posts: 955
Zarzal Offline
Old Hand
Ok, here my way step-by-step to do it on my test-board

1) close board
2) do a backup and store it in a save place
3) do a testposting with you admin account with all chars you need to convert. Example:
Code
***!marker!***
ae = ä
oe = ö
ue = ü
sz = ß
Ae = Ä
Oe = Ö
Ue = Ü
euro = €
grad = °
hoch 2 = ²
hoch 3 = ³

4) do another backup (I use mysqldumper) and store it local an your computer, logout the forum
5) unpack the backup if you got .gz file, you will get a big text file with your database content
6) use notepad++ or an other editor which can edit utf-8 files for search and replace for each char that need to be convert. Please check casesensitiv for the search! Repeat for each char over the complete textfile. To get a list of chars who need to be convert search for ***!marker!*** and you find your testposting. In this line you find the wrong chars. Copy the line to a second editor window, delete not needed text and you have a source for copy&paste for your search in the db dump. It could look like this
Code
ae = ä
oe = ö
ue = ü
sz = ß
Ae = Ä
Oe = Ö
Ue = Ü
euro = €
apo = \'
grad = °
hoch 2 = ²
hoch 3 = ³

7) save it
8) modify the mysqli.inc.php and add the line
Code
mysqli_set_charset( $this->dbh, "utf8" );
under the line
Code
$this->dbh = mysqli_connect($config['DATABASE_SERVER'], $config['DATABASE_USER'], $config['DATABASE_PASSWORD'], $config['DATABASE_NAME']) or die("Problem occured in connection");
and
Code
$this->dbh = mysqli_pconnect("p:" . $config['DATABASE_SERVER'], $config['DATABASE_USER'], $config['DATABASE_PASSWORD'], $config['DATABASE_NAME']) or die("Problem occured in connection");
(line 29 and line 31)
9) import the database file in your forum. If the file is > 2MB please use a tool such as mysqldump. You find a version who works with pphp7.2 here: https://github.com/DSB/MySQLDumper
10) login the forum, check it and reopen it for public

Note: please take care of your language file generic.php! please check it for the language encoding on top of the file. Change it to utf-8 if you use iso in the past.

This takes 5-10 minutes in all if you have mysqldumper installed and know how to handle the files and editor. Its easy.
The only thing that makes problems: searching for a ' is not easy. It is stored as \' but this combination is used in several places and it is not allowed to be changed.

*NOTE* without any warranty smile

Last edited by Zarzal; 02/13/19 02:58 PM. Reason: no warranty!

my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Re: The ISO and UTF-8 issues [Re: Zarzal] #262354 02/14/19 09:49 AM
Joined: Jul 2006
Posts: 95
Philipp Offline
Journeyman
Originally Posted by Zarzal
Thanks again. For my board there is memory problem while converting POSTS. The script must do it in steps or you get:

Something like this should do the trick:
Code
<?php
$db = new mysqli( "localhost", "DATABASE_USERNAME", "DATABASE_PASSWORD", "DATABASE_NAME" );
$db -> set_charset( "utf8" );

$offset = filter_input( INPUT_GET, "offset", FILTER_SANITIZE_NUMBER_INT );
$offset_sql = ( $offset ) ? " OFFSET " . $db -> escape_string( $offset ) : "" ;

$result = $db -> query( "
    SELECT 
    POST_ID, 
    convert(cast(convert(POST_SUBJECT using latin1) as binary) using utf8), 
    convert(cast(convert(POST_BODY using latin1) as binary) using utf8), 
    convert(cast(convert(POST_DEFAULT_BODY using latin1) as binary) using utf8) 
    FROM 
    ubbt_POSTS
    LIMIT 100" . $offset_sql. ";
" );

if( $result -> num_rows )
{
    
    while( list( $post_id, $post_subject, $post_body, $post_default_body ) = $result -> fetch_row() )
    {
        $db -> query( "
            UPDATE 
            ubbt_POSTS 
            SET 
            POST_SUBJECT = '" . $db -> escape_string( $post_subject ) . "',
            POST_BODY = '" . $db -> escape_string( $post_body ) . "',
            POST_DEFAULT_BODY = '" . $db -> escape_string( $post_default_body ) . "'
            WHERE
            POST_ID = '" . $post_id . "';
        " );

        echo "Processing post #".$post_id."<br>";
    }

    $offset = $offset + 100;
    echo "<script>setTimeout(function() {window.location.href = \"" . $_SERVER['PHP_SELF'] . "?offset=" . $offset . "\"}, 2000);</script>";

}
else
{
    echo "Finish";
}

$db ->close();
exit;
?>

I found in the meantime an easier way based on https://www.whitesmith.co/blog/latin1-to-utf8/. However, this requires shell access to run the mysqldump tool directly:
Code
mysqldump -uUSERNAME -pPASSWORD --skip-set-charset --default-character-set=latin1 DATABASENAME > database.sql


Using --skip-set-charset --default-character-set=latin1 will use a Latin1 connection for the SQL dump, so the dump contains the original UTF-8 characters. Then re-import the dump in UTF-8. This method works also for Latin1 tables with UTF-8 data. In this case, you need to adjust CHARSET=latin1 to CHARSET=utf8 in the SQL dump before re-importing the data.

Re: The ISO and UTF-8 issues [Re: Kayjey] #262359 02/16/19 01:32 PM
Joined: Dec 2003
Posts: 117
Kayjey Offline OP
Member
I think I'll try and copy the board a database to a new server, then try my own way to convert the exported ISO to UTF. See what it does there. I just feel in one of the next releases it's just all going to break...


ShoutChat Box
Today's Birthdays
No Birthdays
Recent Topics
Avatar Pic?
by AstroCat - 04/23/19 10:44 AM
Geo Targeting Display of Custom Islands?
by PianoWorld - 04/15/19 01:08 PM
How-To Guides and Technical Articles
by Morgan - 04/06/19 08:46 AM
i am not a robot.
by Anonymous - 03/29/19 07:18 PM
What directory are avatar images saved in?
by CyberChild - 03/29/19 12:57 PM
Forum Statistics
Forums35
Topics35,272
Posts192,341
Members12,158
Most Online978
Jun 24th, 2007
Random Image
Powered by UBB.threads™ PHP Forum Software 7.7.2
(Snapshot build 20190420)