not able to do search in chinese ?
Posted by sfinder
not able to do search in chinese ? April 06, 2006 05:30PM |
Registered: 18 years ago Posts: 4 |
I tested phorum5.1. and it looks like doesn't support chinese character searching. there are some chinese post on this forum. but search function just returns "no result found" message. please try to do serch by ÖÐ . these 2 characters appear in post [www.phorum.org] but search engine just return nothing.
Let me know if this forum only work for Latin.
Thanks
Edited 2 time(s). Last edit at 04/07/2006 10:30AM by sfinder.
Let me know if this forum only work for Latin.
Thanks
Edited 2 time(s). Last edit at 04/07/2006 10:30AM by sfinder.
Re: no able to do search in chinese ? April 07, 2006 07:19AM |
Admin Registered: 21 years ago Posts: 9,240 |
you can't search for single characters but for whole WORDS.
i.e. I searched for the string
£¨ºÃÖ÷Ò⣡£© from the same post you quoted ...
[www.phorum.org]
Thomas Seifert
i.e. I searched for the string
£¨ºÃÖ÷Ò⣡£© from the same post you quoted ...
[www.phorum.org]
Thomas Seifert
Re: not able to do search in chinese ? April 07, 2006 12:12PM |
Registered: 18 years ago Posts: 4 |
Could you please tell me how you setup this support forum? I can do 2 chinese words search in this forum but not able to do same search on my own forum which just downloaded from your website.
1) is there anything to do with the DEFAULT CHARSET setting in the database since I use the one default by installation "latin1" ?
2)Could you please share me your language file used in this forum? Since I suspect it may caused by setings in my language file.
Following shows the first couple lines of my language file.
<?php
$language="Chinese (GB)";
// uncomment this to hide this language from the user-select-box
//$language_hide=1;
// check the php-docs for the syntax of these entries (http://www.php.net/manual/en/function.strftime.php)
// One tip, don't use T for showing the time zone as users can change their time zone.
$PHORUM['long_date']="%YÄê%mÔÂ%dÈÕ%A%Hʱ%M·Ö%SÃë";
$PHORUM['short_date']="%yÄê%mÔÂ%dÈÕ%H:%M";
// locale setting for localized times/dates
// see that page: [www.w3.org]
// for the needed string
$PHORUM['locale']='chinese';//"ZH";
// charset for use in converting html into safe valid text
// also used in the header template for the <xml> and for
// the Content-Type header. for a list of supported charsets, see
// [www.php.net]
// you may also need to set a meta tag with a charset in it.
$PHORUM["DATA"]['CHARSET']="GB2312";
// some languages need additional meta tags
// to set encoding, etc.
$PHORUM["DATA"]['LANG_META']='<meta http-equiv="content-type" content="text/html; charset=GB2312">';
// encoding set for outgoing mails
$PHORUM["DATA"]["MAILENCODING"]="8bit";
1) is there anything to do with the DEFAULT CHARSET setting in the database since I use the one default by installation "latin1" ?
2)Could you please share me your language file used in this forum? Since I suspect it may caused by setings in my language file.
Following shows the first couple lines of my language file.
<?php
$language="Chinese (GB)";
// uncomment this to hide this language from the user-select-box
//$language_hide=1;
// check the php-docs for the syntax of these entries (http://www.php.net/manual/en/function.strftime.php)
// One tip, don't use T for showing the time zone as users can change their time zone.
$PHORUM['long_date']="%YÄê%mÔÂ%dÈÕ%A%Hʱ%M·Ö%SÃë";
$PHORUM['short_date']="%yÄê%mÔÂ%dÈÕ%H:%M";
// locale setting for localized times/dates
// see that page: [www.w3.org]
// for the needed string
$PHORUM['locale']='chinese';//"ZH";
// charset for use in converting html into safe valid text
// also used in the header template for the <xml> and for
// the Content-Type header. for a list of supported charsets, see
// [www.php.net]
// you may also need to set a meta tag with a charset in it.
$PHORUM["DATA"]['CHARSET']="GB2312";
// some languages need additional meta tags
// to set encoding, etc.
$PHORUM["DATA"]['LANG_META']='<meta http-equiv="content-type" content="text/html; charset=GB2312">';
// encoding set for outgoing mails
$PHORUM["DATA"]["MAILENCODING"]="8bit";
Re: not able to do search in chinese ? April 07, 2006 12:21PM |
Admin Registered: 21 years ago Posts: 9,240 |
Re: not able to do search in chinese ? April 07, 2006 05:04PM |
Registered: 18 years ago Posts: 4 |
The searching behaviour for chinese character is broken. It only works for searching whold chinese sentence. Forum searching tool treat the whole chinese sentence as a word. it make searching in chinese very difficult. It is not full text searching. I guest someone has to look into the searching code and find out how to perform unicode searching
use following phase as an example
ÓïÑÔ²»Í¨
these 4 characters are in the sentence of following
µ«¿ÉÄÜÓïÑÔ²»Í¨£¬
but once you perform searching by ÓïÑÔ²»Í¨ . no result will be return
use following phase as an example
ÓïÑÔ²»Í¨
these 4 characters are in the sentence of following
µ«¿ÉÄÜÓïÑÔ²»Í¨£¬
but once you perform searching by ÓïÑÔ²»Í¨ . no result will be return
Re: not able to do search in chinese ? April 07, 2006 06:35PM |
Admin Registered: 19 years ago Posts: 8,532 |
As we found out in the support chat, Chinese sentences do not contain whitespace. Therefor Phorum does not recognize the words in it using full text searching. The proposed solution was to drop the full text searching algorithm and go for the less fast search method by changing include/db/config.php (set use_ft to zero).
Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: not able to do search in chinese ? April 12, 2006 06:17PM |
Registered: 20 years ago Posts: 687 |
It would also be possible to analyse the search string and only revert to use_ft=0 when it is Chinese.
Here is the code I use to detect Chinese if that is helpful.
Phorum code usage : (note, this only checks first character, which seems reasonable but could be modified if necessary)
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Here is the code I use to detect Chinese if that is helpful.
function isChineseCharacter($unicode){ // takes one character for input // thanks to www.mdbg.net for this function $unicodeValue = uniord($unicode); $charIsChinese = false; if( ($unicodeValue>=0x3400 && $unicodeValue<0xa000) || ($unicodeValue>=0xf900 && $unicodeValue<0xfb00) || ($unicodeValue>=0x20000 && $unicodeValue<0x30000) ) { $charIsChinese = true; } return $charIsChinese; }
Phorum code usage : (note, this only checks first character, which seems reasonable but could be modified if necessary)
if (isChineseCharacter(mb_substr($search,0,1)){ $use_ft=false; // query is Chinese, revert to old searching style } else{ $use_ft=true; // query is not Chinese, we can use Full Text Search }
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ? April 12, 2006 06:21PM |
Registered: 20 years ago Posts: 687 |
Quote
mmakaay
As we found out in the support chat, Chinese sentences do not contain whitespace. Therefor Phorum does not recognize the words in it using full text searching.
Yes, isn't that so annoying?! ;-)
Parsing Chinese is incredibly laborious for the same reason - my current best attempt is here: CantoDict Parser
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ? April 12, 2006 08:11PM |
Admin Registered: 19 years ago Posts: 8,532 |
Thanks for the suggestion. It might help people get the most out of their search function on mixed Chinese/English forums.
BTW: I think that code would be perfect to turn into a mod.
Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
BTW: I think that code would be perfect to turn into a mod.
Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: not able to do search in chinese ? April 12, 2006 10:41PM |
Admin Registered: 23 years ago Posts: 4,495 |
Yeah, a common hook mod could check phorum_page and $PHORUM["args"]["search"] and set the ft flag accordingly.
Brian - Cowboy Ninja Coder - Personal Blog - Twitter
Re: not able to do search in chinese ? April 13, 2006 05:54AM |
Registered: 20 years ago Posts: 687 |
I'll be pretty much compelled to make such a module myself for my site when I go to 5.1, which I will of course share here.
The only problem is I'm not going to have time to do this for at least the next few weeks, and it could be longer :-(
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
The only problem is I'm not going to have time to do this for at least the next few weeks, and it could be longer :-(
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
I figured out. don't use the full text search in the phorum_db_search function April 24, 2006 02:19AM |
Registered: 18 years ago Posts: 3 |
Re: I figured out. don't use the full text search in the phorum_db_search function November 30, 2006 11:49AM |
Registered: 20 years ago Posts: 687 |
I have (finally) written a module to disable fulltext searching for Chinese, but Brian's solution doesn't seem to be working for me.
This is my code:
Note the debug message, which gets output as "0", but the search method does not get changed.
Any ideas why this may be? Is it possible the flag is getting re-read from the config file after my module is run? I've tried using both the "common" and "search" hooks
Thanks,
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
This is my code:
if (phorum_page == "search"){ if (phorum_isChineseCharacter(mb_substr($PHORUM["args"]["search"],0,1))){ $PHORUM["DBCONFIG"]["mysql_use_ft"] = 0; // disable full text searching // debug message echo "<br/>use_ft changed to : " . $PHORUM["DBCONFIG"]["mysql_use_ft"]; } }
Note the debug message, which gets output as "0", but the search method does not get changed.
Any ideas why this may be? Is it possible the flag is getting re-read from the config file after my module is run? I've tried using both the "common" and "search" hooks
Thanks,
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: I figured out. don't use the full text search in the phorum_db_search function November 30, 2006 11:55AM |
Admin Registered: 21 years ago Posts: 9,240 |
Re: not able to do search in chinese ? November 30, 2006 12:12PM |
Registered: 20 years ago Posts: 687 |
Great, thanks Thomas.
I was incorrectly using:
The module seems to work using either the common or search hooks. Any reason to favour one over the other? I would have thought "search" would make more sense?
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
I was incorrectly using:
$PHORUM=$GLOBALS["PHORUM"];
The module seems to work using either the common or search hooks. Any reason to favour one over the other? I would have thought "search" would make more sense?
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ? November 30, 2006 12:21PM |
Registered: 20 years ago Posts: 687 |
The latest version of my module may be found here: [www.phorum.org]
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ? November 30, 2006 01:57PM |
Admin Registered: 19 years ago Posts: 8,532 |
If you do no have other stuff in the common hook for this module, then using the search hook would be best practice. That way, the module won't be loaded at every request, but only when doing a search.
The common hook would be more appropriate in case there were a lot of places where the search function would get called. But that's not the case.
Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
The common hook would be more appropriate in case there were a lot of places where the search function would get called. But that's not the case.
Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: not able to do search in chinese ? November 30, 2006 02:11PM |
Registered: 20 years ago Posts: 687 |
Thanks Maurice,
The version I uploaded uses the "search" hook.
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
The version I uploaded uses the "search" hook.
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ? December 03, 2006 09:08AM |
Registered: 17 years ago Posts: 1 |
Re: not able to do search in chinese ? December 04, 2006 12:21PM |
Registered: 20 years ago Posts: 687 |
It only works with unicode I think.
You could hack my script to assume that if the first character isn't standard ASCII though then treat it as "Chinese". This would mean Full Text searching would work for A-Z, a-z 0-9 etc. but would be disabled for everything else.
As an aside, I used Big5 for my Chinese learning site for many years, but have never regretted changing to unicode utf-8. The advantages are huge:
- users can post in simplified or traditional characters.
- Far better MySQL / PHP support.
- Much more accessible for modern web broswers.
- you can have posts in other languages, such as Japanese, Arabic etc.
Your users won't notice any difference if you change. Even if they use Big5 when they input, the browser should detect that the page is unicode so the correct data will end up in your database.
The only problem I had was in converting all the old posts to the new encoding. That was a pain - I had to write a custom script to do it :-(
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
You could hack my script to assume that if the first character isn't standard ASCII though then treat it as "Chinese". This would mean Full Text searching would work for A-Z, a-z 0-9 etc. but would be disabled for everything else.
As an aside, I used Big5 for my Chinese learning site for many years, but have never regretted changing to unicode utf-8. The advantages are huge:
- users can post in simplified or traditional characters.
- Far better MySQL / PHP support.
- Much more accessible for modern web broswers.
- you can have posts in other languages, such as Japanese, Arabic etc.
Your users won't notice any difference if you change. Even if they use Big5 when they input, the browser should detect that the page is unicode so the correct data will end up in your database.
The only problem I had was in converting all the old posts to the new encoding. That was a pain - I had to write a custom script to do it :-(
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: chinese search bug? for phorum-5.1.16a December 25, 2006 10:05AM |
Registered: 17 years ago Posts: 1 |
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\include\format_functions.php on line 49
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\mods\bbcode\bbcode.php on line 73
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\mods\html\html.php on line 10
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\mods\smileys\smileys.php on line 35
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\include\format_functions.php on line 109
Warning: Cannot modify header information - headers already sent by (output started at C:\xampplite\htdocs\phorum\include\format_functions.php:49) in C:\xampplite\htdocs\phorum\cache\tpl-default-header-toplevel_stage2-0fb4d293d1d5e59a82defdabfa167fdf.php on line 5
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\mods\bbcode\bbcode.php on line 73
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\mods\html\html.php on line 10
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\mods\smileys\smileys.php on line 35
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\include\format_functions.php on line 109
Warning: Cannot modify header information - headers already sent by (output started at C:\xampplite\htdocs\phorum\include\format_functions.php:49) in C:\xampplite\htdocs\phorum\cache\tpl-default-header-toplevel_stage2-0fb4d293d1d5e59a82defdabfa167fdf.php on line 5
Sorry, only registered users may post in this forum.