Previous Thread
Next Thread
Print Thread
Hop To
#262227 02/03/2019 6:12 PM
Joined: Jun 2006
Posts: 319
Enthusiast
Enthusiast
Joined: Jun 2006
Posts: 319
Hi all,

I have just had my forum updated to the latest version 7.71 and was advised to change to the new format UTF8 which I agreed to be done and was advised that old post could show black diamonds and with the new content should display without.

What I was not told was that if members are using microsoft word to store documents in and they then copy and paste into the forum the black diamonds would show up in their post after the update.

I am try to find out how this can be worked around (correctly with the new format) because I don't want to have to edit each post.

A few members (Historian especially) use copy and paste to post on the forum (to save time). They even have some of their document published in print (that is why they use word).

Here is an example of a post after install.
https://www.outdoorking.com/forum/u...ott-bonnar-brochure-c1970.html#Post95889

Any help with this would be appreciated.

Joined: Apr 2004
Posts: 1,945
Likes: 145
UBB.threads Developer
UBB.threads Developer
Joined: Apr 2004
Posts: 1,945
Likes: 145
Edit /english/generic.php

$ubbt_lang['CHARSET'] = "utf-8";
replace with
$ubbt_lang['CHARSET'] = "iso-8859-1";

Your pages are telling the browser to display as UNICODE. But your content is WESTERN.

https://wordtothewise.com/2010/03/which-is-better-utf-8-or-iso/
Quote
Someone asked today on a mailing list whether they should be using UTF-8 or “ISO” encoding for sending email. What’s the best choice depends on some of the details of the situation, but here’s the answer I gave:

UTF-8 will work for pretty much anything, as it’s just an 8 bit encoding scheme for Unicode (which is supposed to be the one character encoding to rule them all). It’s well supported in most languages and development environments – Windows has been native UTF-16 under the covers since the mid 90s, for instance – and typical messages that use mainstream glyphs should render well from utf-8 in most western MUAs and browsers.

There are still a very few old or broken clients out there that will not handle UTF-8 well but (outside the asian language market, where there’s still some non-ASCII, non-Unicode legacy usage) they’re typically ones that don’t really handle any character set encoding well and the only thing safe to send to them is either plain ASCII or whichever ASCII superset their OS happens to support natively (which is probably an argument for sending Windows-1252 codepage, but not a terribly strong one).

The various extended ASCIIs (such as ISO-8859-*) will only work for messages that are written solely using characters from that character set. If you have even one character in a message that cannot be expressed in ISO-8859-1, then you can’t use ISO-8859-1 to send that message.

ISO-8859-1 (aka Latin1) is fairly sloppy in some respects – it has no apostrophe, nor single quotes, for instance – but it can handle an awful lot of languages, from Kurdish to Swahili. It can’t handle Dutch, Estonian, Finnish, Hungarian and Welsh particularly well, nor can it show the Euro symbol (ISO-8859-14 or -15 are needed for some characters there).

A common problem is that many people (and the software they write) think that Windows uses Latin1. It doesn’t, it uses Windows-1252. If you accept messages written on Windows, using the Windows-1252 code page, and throw them out on the wire as ISO-8859-1 what you end up with is not quite right. It mostly works, as the two codepages overlap quite a bit, but they have different glyphs in the 0x80-0x9f range. So if you use single or double quotes (“smart quotes”), or the Euro symbol, or ellipses, or bullet, or the trademark symbol in your message they’ll be garbled. This is so common that some mail clients and web browsers will actually treat a document that claims to be ISO-8859-1 as Windows-1252, but that’s a bug workaround and not something it’s really safe to rely on.


Current developer of UBB.threads PHP Forum Software
Current Release: UBBT 7.7.5 // Preview: UBBT 8.0.0
isaac @ id242.com // my forum @ CelicaHobby.com
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
I provided the following previously:
Quote
UTF
UTF is a family of multi-byte encoding schemes that can represent Unicode code points which can be reperesentative of up to 2^31 [roughly 2 billion] characters. UTF-8 is a flexible encoding system that uses between 1 and 4 bytes to represent the first 2^21 [roughly 2 million] code points.

Long story short: any character with a code point/ordinal representation below 127, aka 7-bit-safe ASCII is represented by the same 1-byte sequence as most other single-byte encodings. Any character with a code point above 127 is represented by a sequence of two or more bytes, with the particular of encoding best explained here.


ISO-8859
ISO-8859 is a family of single-byte encoding schemes used to represent alphabets that can be represented within the range of 127 to 255. These various alphabets are defined as "parts" in the format ISO-8859-n, the most familiar of these likely being ISO-8859-1 aka 'Latin-1'. As with UTF-8, 7-bit-safe ASCII remains unaffected regardless of the encoding family used.

The drawback to this encoding scheme is its inability to accommodate languages comprised of more than 128 symbols, or to safely display more than one family of symbols at one time. As well, ISO-8859 encodings have fallen out of favor with the rise of UTF. The ISO "Working Group" in charge of it having disbanded in 2004, leaving maintenance up to its parent subcommittee.


ISO-8859-1 is a legacy standards from back in 1980s (which has been abandoned in 2004). It can only represent 256 characters so only suitable for some languages in western world. Even for many supported languages, some characters are missing. If you create a text file in this encoding and try copy/paste some Chinese characters, you will see weird results. So in other words, don't use it. Unicode has taken over the world and UTF-8 is pretty much the standards these days unless you have some legacy reasons (like HTTP headers which needs to compatible with everything).

If you want to KEEP using ISO-8859-1 you can, but you will lose out on any benefits of a current, developing, character set.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Jun 2006
Posts: 956
Old Hand
Old Hand
Joined: Jun 2006
Posts: 956
Outdoorking, changing the string in generic.php is only one thing. You say that you just update your forum. How old is your first installation? Take a closer look to your database with phpmyadmin. The database itself should running in utf-8 but if you look at the tables you maybe find in the collaboration settings other settings as utf-8. In this case UBB.threads store any chars in this coding page and if you change the string in generic.php you will end up in black questionsmarks.

Now you can do 3 things:
1) set the generic.php string to your coding (maybe iso-something). All will be fine but you run in trouble sometimes in the future
2) set the generic.php to utf-8 and change the collaboration for each table to utf-8 too. This result in propper settings for any new posting only and show black questionsmarks for old postings because they stored in an other codepage.
3) do step 2 and then go to each table content and change the coding inside the table to utf-8 coding. This is something that only user should do who know what is phpmyadmin and how to handle table content. I write something about this here:
https://www.ubbcentral.com/forums/u...coding-utf-8-and-older-forums#Post261912
Thats my way to do it. It should be better ways but it work for me. It left over user profiles, signatures and forum intro text but thats can be changed step by step in the admin section if you see one of the nasty black ?
In my example I only change german umlaute like öäü but forget to search for €, `´ and '
This need to be changed too.

But remember: do a backup first! And do a backup! smile


my board: http://www.dragonclan-forum.de
my hobby: http://www.biker-reise.de
Ich kann bei Fragen zu UBBthreads in Deutsch weiterhelfen oder es zumindest versuchen
Zarzal #262249 02/04/2019 12:05 PM
Joined: Jul 2006
Posts: 116
Likes: 4
P
Member
Member
P Offline
Joined: Jul 2006
Posts: 116
Likes: 4
Originally Posted by Zarzal
2) set the generic.php to utf-8 and change the collaboration for each table to utf-8 too. This result in propper settings for any new posting only and show black questionsmarks for old postings because they stored in an other codepage.
The old posting are actually stored in utf8 while the new postings are in latin1 on an utf8 database collation table wink

The problem here is that the MySQL client is still connecting with the wrong default character set (latin1) to the database server:
$this->dbh = mysqli_connect($config['DATABASE_SERVER'], $config['DATABASE_USER'], $config['DATABASE_PASSWORD'], $config['DATABASE_NAME']) or die("Problem occured in connection");

To get true utf8 support, you need the following line after mysqli_connect:
mysqli_set_charset( $this->dbh, "utf8" );

This will switch the client character set to utf8. Any utf8 data will now display fine but latin1 data on utf8 tables may be broken.

Newer database server setups such as MariaDB 10.3 on the upcoming Debian Linux 10 release will support utf8mb4 as new default character set for both client and server.


Link Copied to Clipboard
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Recent Topics
Bots
by Outdoorking - 04/13/2024 5:08 PM
Can you add html to language files?
by Baldeagle - 04/07/2024 2:41 PM
Do I need to rebuild my database?
by Baldeagle - 04/07/2024 2:58 AM
This is not a bug, but a suggestion
by Baldeagle - 04/05/2024 11:25 PM
spam issues
by ECNet - 03/19/2024 11:45 PM
Who's Online Now
0 members (), 1,020 guests, and 174 robots.
Key: Admin, Global Mod, Mod
Random Gallery Image
Latest Gallery Images
Los Angeles
Los Angeles
by isaac, August 6
3D Creations
3D Creations
by JAISP, December 30
Artistic structures
Artistic structures
by isaac, August 29
Stones
Stones
by isaac, August 19
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20230217)