Firefox PHP

not able to do search in chinese ?

Posted by sfinder 
not able to do search in chinese ?
April 06, 2006 05:30PM
I tested phorum5.1. and it looks like doesn't support chinese character searching. there are some chinese post on this forum. but search function just returns "no result found" message. please try to do serch by ÖÐ . these 2 characters appear in post [www.phorum.org] but search engine just return nothing.

Let me know if this forum only work for Latin.

Thanks



Edited 2 time(s). Last edit at 04/07/2006 10:30AM by sfinder.
Re: no able to do search in chinese ?
April 07, 2006 07:19AM
you can't search for single characters but for whole WORDS.
i.e. I searched for the string
£¨ºÃÖ÷Ò⣡£© from the same post you quoted ...
[www.phorum.org]


Thomas Seifert
Re: not able to do search in chinese ?
April 07, 2006 12:12PM
Could you please tell me how you setup this support forum? I can do 2 chinese words search in this forum but not able to do same search on my own forum which just downloaded from your website.
1) is there anything to do with the DEFAULT CHARSET setting in the database since I use the one default by installation "latin1" ?
2)Could you please share me your language file used in this forum? Since I suspect it may caused by setings in my language file.

Following shows the first couple lines of my language file.
<?php

$language="Chinese (GB)";
// uncomment this to hide this language from the user-select-box
//$language_hide=1;

// check the php-docs for the syntax of these entries (http://www.php.net/manual/en/function.strftime.php)
// One tip, don't use T for showing the time zone as users can change their time zone.
$PHORUM['long_date']="%YÄê%mÔÂ%dÈÕ%A%Hʱ%M·Ö%SÃë";
$PHORUM['short_date']="%yÄê%mÔÂ%dÈÕ%H:%M";

// locale setting for localized times/dates
// see that page: [www.w3.org]
// for the needed string
$PHORUM['locale']='chinese';//"ZH";
// charset for use in converting html into safe valid text
// also used in the header template for the <xml> and for
// the Content-Type header. for a list of supported charsets, see
// [www.php.net]
// you may also need to set a meta tag with a charset in it.
$PHORUM["DATA"]['CHARSET']="GB2312";

// some languages need additional meta tags
// to set encoding, etc.
$PHORUM["DATA"]['LANG_META']='<meta http-equiv="content-type" content="text/html; charset=GB2312">';

// encoding set for outgoing mails
$PHORUM["DATA"]["MAILENCODING"]="8bit";
Re: not able to do search in chinese ?
April 07, 2006 12:21PM
this forum is running the default english language-file, nothing else.


Thomas Seifert
Re: not able to do search in chinese ?
April 07, 2006 05:04PM
The searching behaviour for chinese character is broken. It only works for searching whold chinese sentence. Forum searching tool treat the whole chinese sentence as a word. it make searching in chinese very difficult. It is not full text searching. I guest someone has to look into the searching code and find out how to perform unicode searching

use following phase as an example
ÓïÑÔ²»Í¨
these 4 characters are in the sentence of following
µ«¿ÉÄÜÓïÑÔ²»Í¨£¬

but once you perform searching by ÓïÑÔ²»Í¨ . no result will be return
Re: not able to do search in chinese ?
April 07, 2006 06:35PM
As we found out in the support chat, Chinese sentences do not contain whitespace. Therefor Phorum does not recognize the words in it using full text searching. The proposed solution was to drop the full text searching algorithm and go for the less fast search method by changing include/db/config.php (set use_ft to zero).


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: not able to do search in chinese ?
April 12, 2006 06:17PM
It would also be possible to analyse the search string and only revert to use_ft=0 when it is Chinese.
Here is the code I use to detect Chinese if that is helpful.

function isChineseCharacter($unicode){
	// takes one character for input
	// thanks to www.mdbg.net for this function
	$unicodeValue = uniord($unicode);
	$charIsChinese = false;
	if( ($unicodeValue>=0x3400 && $unicodeValue<0xa000) || ($unicodeValue>=0xf900 && $unicodeValue<0xfb00) || ($unicodeValue>=0x20000 && $unicodeValue<0x30000) )
	{
		$charIsChinese = true;
	}
	return $charIsChinese;
}

Phorum code usage : (note, this only checks first character, which seems reasonable but could be modified if necessary)

if (isChineseCharacter(mb_substr($search,0,1)){
    $use_ft=false; // query is Chinese, revert to old searching style
}
else{
    $use_ft=true; // query is not Chinese, we can use Full Text Search
}


/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ?
April 12, 2006 06:21PM
Quote
mmakaay
As we found out in the support chat, Chinese sentences do not contain whitespace. Therefor Phorum does not recognize the words in it using full text searching.

Yes, isn't that so annoying?! ;-)
Parsing Chinese is incredibly laborious for the same reason - my current best attempt is here: CantoDict Parser

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ?
April 12, 2006 08:11PM
Thanks for the suggestion. It might help people get the most out of their search function on mixed Chinese/English forums.

BTW: I think that code would be perfect to turn into a mod.


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: not able to do search in chinese ?
April 12, 2006 10:41PM
Yeah, a common hook mod could check phorum_page and $PHORUM["args"]["search"] and set the ft flag accordingly.

Brian - Cowboy Ninja Coder - Personal Blog - Twitter
Re: not able to do search in chinese ?
April 13, 2006 05:54AM
I'll be pretty much compelled to make such a module myself for my site when I go to 5.1, which I will of course share here.
The only problem is I'm not going to have time to do this for at least the next few weeks, and it could be longer :-(

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
I figured out. don't use the full text search in the phorum_db_search function
April 24, 2006 02:19AM
My site: www.xlogit.com/bbs is running in Chinese and fully searchable in Chinease (almost, there is sometimes unrecognizable characters).

--sorry, I didn't see the previous posts which have already mentioned this solution.



Edited 1 time(s). Last edit at 04/24/2006 12:11PM by xlogit.
Re: I figured out. don't use the full text search in the phorum_db_search function
November 30, 2006 11:49AM
I have (finally) written a module to disable fulltext searching for Chinese, but Brian's solution doesn't seem to be working for me.
This is my code:

if (phorum_page == "search"){
	if (phorum_isChineseCharacter(mb_substr($PHORUM["args"]["search"],0,1))){
		$PHORUM["DBCONFIG"]["mysql_use_ft"] = 0; // disable full text searching
		// debug message
		echo "<br/>use_ft changed to :  " . $PHORUM["DBCONFIG"]["mysql_use_ft"];
	}

}

Note the debug message, which gets output as "0", but the search method does not get changed.
Any ideas why this may be? Is it possible the flag is getting re-read from the config file after my module is run? I've tried using both the "common" and "search" hooks

Thanks,

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: I figured out. don't use the full text search in the phorum_db_search function
November 30, 2006 11:55AM
try it with
GLOBAL $PHORUM;
at the first start of the function.
otherwise you only change it locally.


Thomas Seifert
Re: not able to do search in chinese ?
November 30, 2006 12:12PM
Great, thanks Thomas.
I was incorrectly using:
$PHORUM=$GLOBALS["PHORUM"];

The module seems to work using either the common or search hooks. Any reason to favour one over the other? I would have thought "search" would make more sense?

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ?
November 30, 2006 12:21PM
The latest version of my module may be found here: [www.phorum.org]

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ?
November 30, 2006 01:57PM
If you do no have other stuff in the common hook for this module, then using the search hook would be best practice. That way, the module won't be loaded at every request, but only when doing a search.

The common hook would be more appropriate in case there were a lot of places where the search function would get called. But that's not the case.


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce
Re: not able to do search in chinese ?
November 30, 2006 02:11PM
Thanks Maurice,
The version I uploaded uses the "search" hook.

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: not able to do search in chinese ?
December 03, 2006 09:08AM
Thanks sheik for the mod.

I have a question: does the script only search Chinese characters in unicode? I am running a test forum in traditional Chinese (big-5) and the script seems not working well with it...
Re: not able to do search in chinese ?
December 04, 2006 12:21PM
It only works with unicode I think.
You could hack my script to assume that if the first character isn't standard ASCII though then treat it as "Chinese". This would mean Full Text searching would work for A-Z, a-z 0-9 etc. but would be disabled for everything else.

As an aside, I used Big5 for my Chinese learning site for many years, but have never regretted changing to unicode utf-8. The advantages are huge:
- users can post in simplified or traditional characters.
- Far better MySQL / PHP support.
- Much more accessible for modern web broswers.
- you can have posts in other languages, such as Japanese, Arabic etc.

Your users won't notice any difference if you change. Even if they use Big5 when they input, the browser should detect that the page is unicode so the correct data will end up in your database.
The only problem I had was in converting all the old posts to the new encoding. That was a pain - I had to write a custom script to do it :-(

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: chinese search bug? for phorum-5.1.16a
December 25, 2006 10:05AM
Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\include\format_functions.php on line 49

Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\mods\bbcode\bbcode.php on line 73

Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\mods\html\html.php on line 10

Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\mods\smileys\smileys.php on line 35

Warning: Invalid argument supplied for foreach() in C:\xampplite\htdocs\phorum\include\format_functions.php on line 109

Warning: Cannot modify header information - headers already sent by (output started at C:\xampplite\htdocs\phorum\include\format_functions.php:49) in C:\xampplite\htdocs\phorum\cache\tpl-default-header-toplevel_stage2-0fb4d293d1d5e59a82defdabfa167fdf.php on line 5
Sorry, only registered users may post in this forum.

Click here to login