Firefox PHP

How to switch the Phorum database to support Unicode?

Posted by Ulf Dunkel 
How to switch the Phorum database to support Unicode?
March 28, 2008 05:51AM
I am running Phorum 5.1.25 and want to upgrade soon to the new stable release 5.2.7.

I wonder if I can automatically convert the existing database to support Unicode. It shows "latin_swedish_ci" right now, which has not set by me. Is there any magic button I can press to have the database converted to UTF-8 automatically?

Thanks in advance, Ulf Dunkel

Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 06:03AM
Quote

Is there any magic button I can press to have the database converted to UTF-8 automatically?

easy answer to that: no.

There are multiple steps involved, as all data itself needs to be converted and this gets more tricky with serialized arrays.


Thomas Seifert
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 06:30AM
Hi Thomas.

I did it now the manual way like this:

1. Created a backup database of my P5 database.
2. Converted all collations in the P5 database to utf-8.
3. Exported the database as .sql dump to my hard drive.
4. Deleted the P5 database and recreated it.
5. Imported the .sql dump to create all tables new.
6. Then I updated the @header() command in my lib.php which provokes UTF-8 encoding in http:// requests for these pages.
7. Because the (Unicode) data were still shown wrong now, although the Browser set the right UTF-8 encoding, I did a fix in

\include\db\mysql.php:

function phorum_db_mysql_connect()
{
    $PHORUM = $GLOBALS["PHORUM"];
    
    static $conn;
    
    if (empty($conn))
    {
        $conn = mysql_connect($PHORUM["DBCONFIG"]["server"], $PHORUM["DBCONFIG"]["user"], $PHORUM["DBCONFIG"]["password"], true);
        mysql_select_db($PHORUM["DBCONFIG"]["name"], $conn);
        mysql_query("SET NAMES utf8");
    }
    
    return $conn;
}

Now everything is fine on my side (or almost everything, as the old Czech message data weren't converted properly. But this is a minor issue for me.)

Enjoy: [www.icalamus.net]

Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 06:46AM
I hope you really, really tested your phorum now as lots of serialized data can be broken this way. but maybe you are lucky and none of your serialized data contains unicode characters.


Thomas Seifert
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 06:48AM
What the heck is 'serialized data'? :-)

Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 06:55AM
php arrays stored in a serialized format ( [www.php.net] ) which contain length of the strings stored and so on.
its used in multiple places in phorum, e.g. for thread info.


Thomas Seifert
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 06:57AM
If you do get problems with serialized data, then check out the conversion script that I am working on. I already used it successfully on my own database test conversion. If you have some weird problems, then this might save you: convert_to_utf8.php.


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 07:50AM
I could not find any problems until now. What should I try to see if there are any problems with serialized data in my Phorum instance?

Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 09:43AM
Run the script ;-) It should leave non-problematic fields as they are and would inform you about fields that are updated because of wrong data.

It's really hard to give you a case to test it out. The case would be to find serialized data that contains special characters. An example could be threads where the last author has a special character in his name. But I cannot give you a "test these" shortlist I'm afraid.

The problem is that in serialized data, PHP does write up things like "the next string contains 5 characters". If those 5 characters contain special characters and you switch to UTF-8, then there is a chance that one single special character is replaced by multiple bytes in the UTF-8 version. So the string length could really become 9 characters for example in the database after converting it. The fix that needs to be done is updating the serialized data along with this to say "the next string contains 9 characters". This is the exact thing that my script targets for fixing.


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 12:36PM
Quote
mmakaay
Run the script ;-)

Is it possible that this script requires Phorum 5.2.7? It won't run with 5.1.25:

Fatal error: Call to undefined function: phorum_db_interact() in /.../phorum/convert_to_utf8.php

Any hints?

Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 12:52PM
Are the only serialized fields in the database the meta field in messages and pm_messages tables, profile_fields in settings table, and moderator_data in users table?
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 01:17PM
Yes, you are right. The script is 5.2 only.


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: How to switch the Phorum database to support Unicode?
March 28, 2008 01:20PM
The field that I take care of in my script are the fields that I'm aware of. These are (copied from the code):

"moderator_data in {$PHORUM['user_table']}"
"settings_data in {$PHORUM['user_table']}"
"meta data in {$PHORUM['pm_messages_table']}"
"settings in {$PHORUM['settings_table']}"
"meta data in {$PHORUM['message_table']}"
"forum_path data in {$PHORUM['forums_table']}"


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: How to switch the Phorum database to support Unicode?
April 26, 2008 09:03AM
I have finally found some time to set up my Phorums from 5.1.25 to 5.2.7 and also ran the convert_to_utf8.php. I have backuped everything before and after (in different backups of course) and checked the exported .sql files for any still existing Latin-1 umlauts like "äöüßÄÖÜ" etc.

All data in the databases are definitely UTF-8 encoded now. Browsers also show the pages (still using one of your new default templates) in UTF-8.

But why do I still see strange characters in browser, while they are shown the right way when I force the browser to encode in ISO-8859-1 or stuff?

Check them out:
Calamus SL Phorum
iCalamus Phorum


Any hints?

Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode?
April 26, 2008 10:59AM
I have researched further. Check this test message:

Test Message for German Umlauts in UTF-8

The umlauts are shown well in the browser:
äöü ß àâá èêé ìîí òôó ùôú
ÄÖÜ ß ÀÂÁ ÈÊÉ ÌÎÍ ÒÔÓ ÙÛÚ
In the mySQL database, they are shown like this instead (phpMyAdmin shows UTF-8 encoding!):
äöü ß à âá èêé ìîí òôó ùôú
ÄÖÜ ß ÀÂÁ ÈÊÉ ÌÎÍ ÒÔÓ ÙÛÚ
In a pure ANSI editor, they are shown like this:
äöü ß àâá èêé ìîí òôó 
ùôú\r\nÄÖÜ ß ÀÂÁ ÈÊÉ ÌÎÍ ÒÔÓ ÙÛÚ
I wonder if they are really stored and read out in UTF-8 or rather in UTF-16, which would explain why they look so strange in phpMyAdmin. And it would explain why my UTF-8 encoded messages are shown wrong in Phorum.

Can someone please check this?

Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode?
April 26, 2008 12:17PM
what do you have in config.php for the charset setting?


Thomas Seifert
Re: How to switch the Phorum database to support Unicode?
April 26, 2008 01:33PM
Much ado for nothing, Thomas.
You pointed me to the right place. Thanks a bunch!

I wondered if Phorum 5.2.7 really required PHP5 but it seems to work almost fine with PHP4.

Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode?
April 26, 2008 07:28PM
Quote
Ulf Dunkel
I wondered if Phorum 5.2.7 really required PHP5 but it seems to work almost fine with PHP4.

The main thing is that we don't test it with PHP 4. So, there are no guarantees what it will do.

Brian - Cowboy Ninja Coder - Personal Blog - Twitter
Re: How to switch the Phorum database to support Unicode?
April 27, 2008 04:48AM
Quote
mmakaay
If you do get problems with serialized data, then check out the conversion script that I am working on. I already used it successfully on my own database test conversion. If you have some weird problems, then this might save you: convert_to_utf8.php.

Hi Maurice.
Am I right that the "Forum description" field isn't converted by your script already?
I tumbled over the Latvian "ā" when I updated my forums - see them right here: iCalamus Forums

The "ā" in the Latvian forum is shown as "?" whatever I do. Unfortunately I cannot access my phpmyadmin this weekend.

Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode?
April 27, 2008 11:34AM
"Forum description" is not a serialized data field. That one should be taken care of by converting the table itself to UTF-8.


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: How to switch the Phorum database to support Unicode?
October 30, 2017 11:40AM
Quote
Maurice Makaay
If you do get problems with serialized data, then check out the conversion script that I am working on. I already used it successfully on my own database test conversion. If you have some weird problems, then this might save you: convert_to_utf8.php.

Once I migrated the forum to UTF-8, the subject of messages is lost if it contains accentuated chars.
How can I correct?

Cactus : [www.cactuspro.com]
Re: How to switch the Phorum database to support Unicode?
August 06, 2018 11:00AM
Thanks for the script!

Quote
Ulf Dunkel
Run the script ;-)

Is it possible that this script requires Phorum 5.2.7? It won't run with 5.1.25:

Fatal error: Call to undefined function: phorum_db_interact() in /.../phorum/convert_to_utf8.php

Any hints?

As a simple solution, you could copy mysql.php and mysql folder from 5.2 to your installation, run the script and then restore old mysql.php. It works for me, so I can upgrade to UTF8 without having dozens of issues with upgrade to 5.2.

Also you may want to comment the block for converting tables, since it's better to do manually from mysql console (also it could take a lot of time, ~5-10 min for my small phorum instance)

SET @DATABASE_NAME = 'phorum';
SELECT CONCAT(  'ALTER TABLE `', t.`TABLE_SCHEMA` ,  '`.`', t.`TABLE_NAME` ,  '` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;' ) AS sqlcode FROM  `information_schema`.`TABLES` t WHERE t . `TABLE_COLLATION` = 'cp1251_general_ci' AND t.`TABLE_SCHEMA` =  @DATABASE_NAME ORDER BY 1 LIMIT 0 , 999;
#then execute all the queries which will be printed by this query
Sorry, only registered users may post in this forum.

Click here to login