Firefox PHP

DB Character set change in phorum 5.2.8

Posted by chris 
DB Character set change in phorum 5.2.8
September 01, 2008 02:15PM
Hello,

Having upgraded a phorum from 5.1 -> 5.2.8, i noticed a small change in the mysqli code and it seems to be a problem for me.
Please correct me if i'm wrong.

In 5.1 a single 'set names utf8' was issued (if that was the config charset of choice, it was for me, i always use utf8).
In 5.2 there is an additional statement "set character set utf8", which ruins multilingual content for me, unless i comment out this command!

I tested it some more, here is how mysql behaves.

Language: SQL
mysql> SET names utf8; Query OK, 0 ROWS affected (0.00 sec)   mysql> SHOW VARIABLES LIKE ';character_set%';; +--------------------------+----------------------------+ | Variable_name | VALUE | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | BINARY | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 ROWS IN SET (0.00 sec)   mysql> SET CHARACTER SET utf8; Query OK, 0 ROWS affected (0.00 sec)   mysql> SHOW VARIABLES LIKE ';character_set%';; +--------------------------+----------------------------+ | Variable_name | VALUE | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | BINARY | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 ROWS IN SET (0.00 sec)

As you can see, issuing "set character set utf8" changes the character_set_connection back to latin1. This means that all foreign characters will be saved as questionmarks (????).

According to the Mysql Docs:
Quote

SET CHARACTER SET is similar to SET NAMES but sets character_set_connection and collation_connection to character_set_database and collation_database.

Obviously the default db collation and charset in this case is latin1. Now i'm not a mysql guru, so if this is resolved through some phorum option i'm sorry, do let me know. But as far as i can see i need to comment out this line.

Chris
Re: DB Character set change in phorum 5.2.8
September 02, 2008 11:32AM
that sounds like you were running a mixed charset before which was straightened out in 5.2.x.
which charset do you have in your language file?


Thomas Seifert
Phorum Development Team / Mysnip-Solutions.de
Custom Phorum and general software development
worry-free Phorum Hosting
Re: DB Character set change in phorum 5.2.8
September 02, 2008 02:04PM
Hello Thomas,
In 5.2 $PHORUM['DBCONFIG']['charset'] is => 'utf8'.
In 5.1 there was no such option i think, but phorum always used a set names utf8 command (great) :)
and of course $PHORUM["DATA"]["CHARSET"] = utf-8 as well (in both phorum versions).

It seems to me, that the "set character set utf8" command issued by phorum in the latest version, would
not work as expected in a db running with latin1 as a default encoding. Then again i find it strange that i'm
the only one with this prob (?).

Let me illustrate with another example, consider a test table exactly like phorum_messages but with fewer fields
just for testing this out:

Language: SQL
mysql> CREATE TABLE `phorum_messages_test` ( `message_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `body` mediumtext NOT NULL, PRIMARY KEY (`message_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; Query OK, 0 ROWS affected (0.03 sec)   mysql> SET names utf8; Query OK, 0 ROWS affected (0.00 sec)   mysql> INSERT INTO phorum_messages_test (body) VALUES (';non-latin1: ελληνικά';); Query OK, 1 ROW affected (0.02 sec)   mysql> SET CHARACTER SET utf8; Query OK, 0 ROWS affected (0.00 sec)   mysql> INSERT INTO phorum_messages_test (body) VALUES (';non-latin1: ελληνικά';); Query OK, 1 ROW affected (0.00 sec)   mysql> SELECT * FROM phorum_messages_test; +------------+------------------+ | message_id | body | +------------+------------------+ | 1 | non-latin1: ελληνικά | | 2 | non-latin1: ???????? | +------------+------------------+ 2 ROWS IN SET (0.00 sec)

This is exactly what happens in phorum5.2.8 in my system with the added "set character set utf8" command. Older posts are in proper utf8, newer ones are not saved properly.



Edited 1 time(s). Last edit at 09/02/2008 02:07PM by chris.
Re: DB Character set change in phorum 5.2.8
September 02, 2008 03:02PM
then something is completely broken in your case.
Phorum 5.1.x never sent the "set names" command. That must be something you added manually.
In 5.2 we added correct utf8 support which includes changing the connection character set.


Thomas Seifert
Phorum Development Team / Mysnip-Solutions.de
Custom Phorum and general software development
worry-free Phorum Hosting
Re: DB Character set change in phorum 5.2.8
September 02, 2008 03:58PM
Thomas,
There is nothing wrong with my data as saved on the db i assure you. All characters are displayed normally both on the web and on the console. If there was any problem i would be spending my time correcting all my data, and it's a lot of data. :)

Obviously if there was not a 'set names utf8' before, i added it, i didn't remember. Maybe i added it for a good reason, like you said you added proper utf8 support in 5.2. At worse it was simply redundant. :)

But! Lets totally forget about the old version, shall we?
You made me try a brand new installation of phorum5.2.8
Same box, same mysql server, completely new database for a fresh installation.

config.php:
$PHORUM['DBCONFIG']['charset'] => 'utf8'

english.php:
$PHORUM["DATA"]['CHARSET']="UTF-8";

Installed ok, made a test post with non-latin1 characters -> Data saved wrong! (questionmarks).

Commented out the line:
// mysqli_query( $conn,"SET CHARACTER SET {$PHORUM['DBCONFIG']['charset']}");

Made 2nd post -> Data saved correctly in utf8 and displayed properly.
See attached screenshot.

Chris



Edited 2 time(s). Last edit at 09/02/2008 04:03PM by chris.


Re: DB Character set change in phorum 5.2.8
September 02, 2008 04:06PM
What is the character set for your database?

Edit: you suggested above that is was latin1. I agree that there might be a problem with that setup, based on the piece of documentation that you pasted in your first message.

There were probably also good reasons to include both SET CHARACTER SET and SET NAMES in the connection code, so I don't really like the idea of simply throwing that away. To the other devs: do you remember the specifics?


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce



Edited 1 time(s). Last edit at 09/02/2008 04:26PM by Maurice Makaay.
Re: DB Character set change in phorum 5.2.8
September 02, 2008 04:26PM
Hello Maurice,
Default mysql encoding is latin1.

And that's my take on why i think this happens. See my very first post on the effects of
the set character set command.

Chris
Re: DB Character set change in phorum 5.2.8
September 02, 2008 05:00PM
The character set combinations in mysql can get quite tricky, but they're very versatile, personally i don't see any reason for using the character set command and it's the first time i've seen this problem. Assuming a full utf8 enviroment (db and client), 'set names' ensures all proper vars are utf8 and that's all i would ever use.

Cheers
Re: DB Character set change in phorum 5.2.8
September 02, 2008 06:48PM
well and its the first time we have seen this problem.
I guess its because your database itself is set to latin1 (not the server, but the database).

try it with
Language: SQL
ALTER DATABASE <your phorum DATABASE name> SET CHARACTER SET utf8


Thomas Seifert
Phorum Development Team / Mysnip-Solutions.de
Custom Phorum and general software development
worry-free Phorum Hosting
Re: DB Character set change in phorum 5.2.8
September 02, 2008 07:42PM
Thomas,
Yes, naturally that resolves the problem. The only question is what if someone on a shared host (yikes) can't control this?

Language: SQL
mysql> ALTER DATABASE phorum52 DEFAULT CHARACTER SET utf8; Query OK, 1 ROW affected (0.00 sec)   mysql> SET names utf8; Query OK, 0 ROWS affected (0.00 sec)   mysql> SET CHARACTER SET utf8; Query OK, 0 ROWS affected (0.00 sec)   mysql> SHOW VARIABLES LIKE ';character_set%';; +--------------------------+----------------------------+ | Variable_name | VALUE | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | BINARY | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 ROWS IN SET (0.00 sec)

Also in this case, set character set utf8 behaves exactly like set names utf8. So my question is why try to change the
mysql character set vars twice.

Then another Q. is what happens to existing installations with various different settings if set character set was to be removed.
Questions, questions. :)

Chris
Re: DB Character set change in phorum 5.2.8
September 02, 2008 07:45PM
as you need to create the database, you can surely change the character set of the database.
if you can't, then you can get in contact with your host.
most are running utf8 by default now anyway.

we are not going to remove the "set character set". there more that is defined for the connection the less can go wrong.


Thomas Seifert
Phorum Development Team / Mysnip-Solutions.de
Custom Phorum and general software development
worry-free Phorum Hosting
Re: DB Character set change in phorum 5.2.8
September 02, 2008 08:01PM
I stand by the opinion that set names is the proper way to go however.
Set character set breaks this if the db is in a different charset, and you can't rely on it anyway. Apart from needing the permissions to alter the db default, there may be other apps/tables coexisting that may be affected. And i see no good reason to depend on it.

set names = works in all cases flawlessly.
set character set = works but depends on the default db charset.

Your call :)

Cheers



Edited 1 time(s). Last edit at 09/02/2008 08:08PM by chris.
Sorry, only registered users may post in this forum.

Click here to login