How to switch the Phorum database to support Unicode?
Posted by Ulf Dunkel
March 28, 2008 05:51AM |
Registered: 21 years ago Posts: 146 |
I am running Phorum 5.1.25 and want to upgrade soon to the new stable release 5.2.7.
I wonder if I can automatically convert the existing database to support Unicode. It shows "latin_swedish_ci" right now, which has not set by me. Is there any magic button I can press to have the database converted to UTF-8 automatically?
Thanks in advance, Ulf Dunkel
Regards, Ulf Dunkel
I wonder if I can automatically convert the existing database to support Unicode. It shows "latin_swedish_ci" right now, which has not set by me. Is there any magic button I can press to have the database converted to UTF-8 automatically?
Thanks in advance, Ulf Dunkel
Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode? March 28, 2008 06:03AM |
Admin Registered: 20 years ago Posts: 9,240 |
March 28, 2008 06:30AM |
Registered: 21 years ago Posts: 146 |
Hi Thomas.
I did it now the manual way like this:
1. Created a backup database of my P5 database.
2. Converted all collations in the P5 database to utf-8.
3. Exported the database as .sql dump to my hard drive.
4. Deleted the P5 database and recreated it.
5. Imported the .sql dump to create all tables new.
6. Then I updated the @header() command in my lib.php which provokes UTF-8 encoding in http:// requests for these pages.
7. Because the (Unicode) data were still shown wrong now, although the Browser set the right UTF-8 encoding, I did a fix in
\include\db\mysql.php:
Now everything is fine on my side (or almost everything, as the old Czech message data weren't converted properly. But this is a minor issue for me.)
Enjoy: [www.icalamus.net]
Regards, Ulf Dunkel
I did it now the manual way like this:
1. Created a backup database of my P5 database.
2. Converted all collations in the P5 database to utf-8.
3. Exported the database as .sql dump to my hard drive.
4. Deleted the P5 database and recreated it.
5. Imported the .sql dump to create all tables new.
6. Then I updated the @header() command in my lib.php which provokes UTF-8 encoding in http:// requests for these pages.
7. Because the (Unicode) data were still shown wrong now, although the Browser set the right UTF-8 encoding, I did a fix in
\include\db\mysql.php:
function phorum_db_mysql_connect() { $PHORUM = $GLOBALS["PHORUM"]; static $conn; if (empty($conn)) { $conn = mysql_connect($PHORUM["DBCONFIG"]["server"], $PHORUM["DBCONFIG"]["user"], $PHORUM["DBCONFIG"]["password"], true); mysql_select_db($PHORUM["DBCONFIG"]["name"], $conn); mysql_query("SET NAMES utf8"); } return $conn; }
Now everything is fine on my side (or almost everything, as the old Czech message data weren't converted properly. But this is a minor issue for me.)
Enjoy: [www.icalamus.net]
Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode? March 28, 2008 06:46AM |
Admin Registered: 20 years ago Posts: 9,240 |
March 28, 2008 06:48AM |
Registered: 21 years ago Posts: 146 |
Re: How to switch the Phorum database to support Unicode? March 28, 2008 06:55AM |
Admin Registered: 20 years ago Posts: 9,240 |
php arrays stored in a serialized format ( [www.php.net] ) which contain length of the strings stored and so on.
its used in multiple places in phorum, e.g. for thread info.
Thomas Seifert
its used in multiple places in phorum, e.g. for thread info.
Thomas Seifert
March 28, 2008 06:57AM |
Admin Registered: 18 years ago Posts: 8,532 |
If you do get problems with serialized data, then check out the conversion script that I am working on. I already used it successfully on my own database test conversion. If you have some weird problems, then this might save you: convert_to_utf8.php.
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
Maurice Makaay
Phorum Development Team



March 28, 2008 07:50AM |
Registered: 21 years ago Posts: 146 |
March 28, 2008 09:43AM |
Admin Registered: 18 years ago Posts: 8,532 |
Run the script ;-) It should leave non-problematic fields as they are and would inform you about fields that are updated because of wrong data.
It's really hard to give you a case to test it out. The case would be to find serialized data that contains special characters. An example could be threads where the last author has a special character in his name. But I cannot give you a "test these" shortlist I'm afraid.
The problem is that in serialized data, PHP does write up things like "the next string contains 5 characters". If those 5 characters contain special characters and you switch to UTF-8, then there is a chance that one single special character is replaced by multiple bytes in the UTF-8 version. So the string length could really become 9 characters for example in the database after converting it. The fix that needs to be done is updating the serialized data along with this to say "the next string contains 9 characters". This is the exact thing that my script targets for fixing.
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
It's really hard to give you a case to test it out. The case would be to find serialized data that contains special characters. An example could be threads where the last author has a special character in his name. But I cannot give you a "test these" shortlist I'm afraid.
The problem is that in serialized data, PHP does write up things like "the next string contains 5 characters". If those 5 characters contain special characters and you switch to UTF-8, then there is a chance that one single special character is replaced by multiple bytes in the UTF-8 version. So the string length could really become 9 characters for example in the database after converting it. The fix that needs to be done is updating the serialized data along with this to say "the next string contains 9 characters". This is the exact thing that my script targets for fixing.
Maurice Makaay
Phorum Development Team



March 28, 2008 12:36PM |
Registered: 21 years ago Posts: 146 |
Re: How to switch the Phorum database to support Unicode? March 28, 2008 12:52PM |
Registered: 17 years ago Posts: 340 |
March 28, 2008 01:17PM |
Admin Registered: 18 years ago Posts: 8,532 |
Yes, you are right. The script is 5.2 only.
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
Maurice Makaay
Phorum Development Team



March 28, 2008 01:20PM |
Admin Registered: 18 years ago Posts: 8,532 |
The field that I take care of in my script are the fields that I'm aware of. These are (copied from the code):
"moderator_data in {$PHORUM['user_table']}"
"settings_data in {$PHORUM['user_table']}"
"meta data in {$PHORUM['pm_messages_table']}"
"settings in {$PHORUM['settings_table']}"
"meta data in {$PHORUM['message_table']}"
"forum_path data in {$PHORUM['forums_table']}"
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
"moderator_data in {$PHORUM['user_table']}"
"settings_data in {$PHORUM['user_table']}"
"meta data in {$PHORUM['pm_messages_table']}"
"settings in {$PHORUM['settings_table']}"
"meta data in {$PHORUM['message_table']}"
"forum_path data in {$PHORUM['forums_table']}"
Maurice Makaay
Phorum Development Team



April 26, 2008 09:03AM |
Registered: 21 years ago Posts: 146 |
I have finally found some time to set up my Phorums from 5.1.25 to 5.2.7 and also ran the convert_to_utf8.php. I have backuped everything before and after (in different backups of course) and checked the exported .sql files for any still existing Latin-1 umlauts like "äöüßÄÖÜ" etc.
All data in the databases are definitely UTF-8 encoded now. Browsers also show the pages (still using one of your new default templates) in UTF-8.
But why do I still see strange characters in browser, while they are shown the right way when I force the browser to encode in ISO-8859-1 or stuff?
Check them out:
Calamus SL Phorum
iCalamus Phorum
Any hints?
Regards, Ulf Dunkel
All data in the databases are definitely UTF-8 encoded now. Browsers also show the pages (still using one of your new default templates) in UTF-8.
But why do I still see strange characters in browser, while they are shown the right way when I force the browser to encode in ISO-8859-1 or stuff?
Check them out:
Calamus SL Phorum
iCalamus Phorum
Any hints?
Regards, Ulf Dunkel
April 26, 2008 10:59AM |
Registered: 21 years ago Posts: 146 |
I have researched further. Check this test message:
Test Message for German Umlauts in UTF-8
The umlauts are shown well in the browser:
Can someone please check this?
Regards, Ulf Dunkel
Test Message for German Umlauts in UTF-8
The umlauts are shown well in the browser:
äöü ß àâá èêé ìîí òôó ùôú ÄÖÜ ß ÀÂÁ ÈÊÉ ÌÎÍ ÒÔÓ ÙÛÚIn the mySQL database, they are shown like this instead (phpMyAdmin shows UTF-8 encoding!):
äöü ß à âá èêé ìîà òôó ùôú ÄÖÜ ß ÀÂà ÈÊÉ ÌÎà ÒÔÓ ÙÛÚIn a pure ANSI editor, they are shown like this:
äöü ß àâá èêé ìîàòôó ùôú\r\nÄÖÜ ß ÀÂàÈÊÉ ÌÎàÒÃâ€Ãƒâ€œ ÙÛÚI wonder if they are really stored and read out in UTF-8 or rather in UTF-16, which would explain why they look so strange in phpMyAdmin. And it would explain why my UTF-8 encoded messages are shown wrong in Phorum.
Can someone please check this?
Regards, Ulf Dunkel
Re: How to switch the Phorum database to support Unicode? April 26, 2008 12:17PM |
Admin Registered: 20 years ago Posts: 9,240 |
April 26, 2008 01:33PM |
Registered: 21 years ago Posts: 146 |
April 26, 2008 07:28PM |
Admin Registered: 22 years ago Posts: 4,495 |
Quote
Ulf Dunkel
I wondered if Phorum 5.2.7 really required PHP5 but it seems to work almost fine with PHP4.
The main thing is that we don't test it with PHP 4. So, there are no guarantees what it will do.
Brian - Cowboy Ninja Coder - Personal Blog - Twitter
April 27, 2008 04:48AM |
Registered: 21 years ago Posts: 146 |
Quote
mmakaay
If you do get problems with serialized data, then check out the conversion script that I am working on. I already used it successfully on my own database test conversion. If you have some weird problems, then this might save you: convert_to_utf8.php.
Hi Maurice.
Am I right that the "Forum description" field isn't converted by your script already?
I tumbled over the Latvian "ā" when I updated my forums - see them right here: iCalamus Forums
The "ā" in the Latvian forum is shown as "?" whatever I do. Unfortunately I cannot access my phpmyadmin this weekend.
Regards, Ulf Dunkel
April 27, 2008 11:34AM |
Admin Registered: 18 years ago Posts: 8,532 |
"Forum description" is not a serialized data field. That one should be taken care of by converting the table itself to UTF-8.
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
Maurice Makaay
Phorum Development Team



October 30, 2017 11:40AM |
Registered: 17 years ago Posts: 131 |
Quote
Maurice Makaay
If you do get problems with serialized data, then check out the conversion script that I am working on. I already used it successfully on my own database test conversion. If you have some weird problems, then this might save you: convert_to_utf8.php.
Once I migrated the forum to UTF-8, the subject of messages is lost if it contains accentuated chars.
How can I correct?
Cactus : [www.cactuspro.com]
Re: How to switch the Phorum database to support Unicode? August 06, 2018 11:00AM |
Registered: 4 years ago Posts: 1 |
Thanks for the script!
As a simple solution, you could copy mysql.php and mysql folder from 5.2 to your installation, run the script and then restore old mysql.php. It works for me, so I can upgrade to UTF8 without having dozens of issues with upgrade to 5.2.
Also you may want to comment the block for converting tables, since it's better to do manually from mysql console (also it could take a lot of time, ~5-10 min for my small phorum instance)
Quote
Ulf Dunkel
Run the script ;-)
Is it possible that this script requires Phorum 5.2.7? It won't run with 5.1.25:
Fatal error: Call to undefined function: phorum_db_interact() in /.../phorum/convert_to_utf8.php
Any hints?
As a simple solution, you could copy mysql.php and mysql folder from 5.2 to your installation, run the script and then restore old mysql.php. It works for me, so I can upgrade to UTF8 without having dozens of issues with upgrade to 5.2.
Also you may want to comment the block for converting tables, since it's better to do manually from mysql console (also it could take a lot of time, ~5-10 min for my small phorum instance)
SET @DATABASE_NAME = 'phorum'; SELECT CONCAT( 'ALTER TABLE `', t.`TABLE_SCHEMA` , '`.`', t.`TABLE_NAME` , '` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;' ) AS sqlcode FROM `information_schema`.`TABLES` t WHERE t . `TABLE_COLLATION` = 'cp1251_general_ci' AND t.`TABLE_SCHEMA` = @DATABASE_NAME ORDER BY 1 LIMIT 0 , 999; #then execute all the queries which will be printed by this query
Sorry, only registered users may post in this forum.