Firefox PHP

Alternate search backend with sphinx fulltext search

Posted by Thomas Seifert 
Re: Alternate search backend with sphinx fulltext search
September 13, 2006 11:06AM
Thomas i look to charset and "sbcs" is set.
But i can't make a search for french special characters.

Example : Passé, été, août, ...
Re: Alternate search backend with sphinx fulltext search
September 13, 2006 11:18AM
I tried it on my installation and I can search for special characters using ISO-8859-1. The only problem is that the excerpts code does not yet support sbcs. Therefore, the special characters are not shown at all.

If on your system the search doesn't turn up anything, then maybe the data that is collected by the indexer from MySQL is in UTF-8, which would confuse the indexing which expects single byte chars. This is just a hunch and I have no solution for you if this would be the case.

About the path and index names. The pathnames aren't really important. Things will work without _msg too. You can use anything you like there. The double underscore in the module is my fault. I use different index names and I had to change back the index names in the module before packaging. That's where the typo entered the arena. I'll fix this in the zipfile right away.

Edit: while fixing the wrong index name, I also changed the path in the example config to match the message index name.


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce



Edited 1 time(s). Last edit at 09/13/2006 11:21AM by mmakaay.
Re: Alternate search backend with sphinx fulltext search
September 14, 2006 10:30AM
Hi Maurice,

In database i have french characters like é, è, ê etc...
I don't know how the data is collected by the indexer from MySQL.

I put in conf file : sbcs
Re: Alternate search backend with sphinx fulltext search
September 14, 2006 11:16AM
Sorry, I can't help you with solving that problem. Maybe you could check out the Sphinx website and maybe post a question in the forums if you can't find a solution there.

Looking at the forums, I found this one. Maybe you could use
sql_query_pre = SET CHARACTER_SET_RESULTS latin1
in Sphinx' config, to make sure that the data that is retrieved from MySQL is latin-1 and not UTF-8. If your MySQL server is returning the data UTF-8 to Sphinx currently then this just might work.


Maurice Makaay
Phorum Development Team
my blog linkedin profile secret sauce



Edited 1 time(s). Last edit at 09/14/2006 11:20AM by mmakaay.
Re: Alternate search backend with sphinx fulltext search
September 27, 2006 04:03PM
Hi Maurice,

This solution doesn't work. :(
SET CHARACTER_SET_RESULTS latin1



Edited 1 time(s). Last edit at 09/27/2006 04:03PM by momo.
Re: Alternate search backend with sphinx fulltext search
November 29, 2006 04:36PM
sheik,

I'm willing to work towards Chinese support in Sphinx - provided that somebody would like to volunteer and help me with beta-testing. I don't speak Chinese myself unfortunately.

Andrew Aksyonoff,
[sphinxsearch.com]
Re: Alternate search backend with sphinx fulltext search
November 30, 2006 06:31PM
Hi Andrew,
As I'm sure you know, searching for Chinese is really a problem of searching without any delimiters between words.
The easy solution is to revert to brute force pattern matching when Chinese is entered as a search term.
I have just coded a rudimentary solution for Phorum here: [www.phorum.org]
(note, Phorum already supports wildcard text searching, so all I had to do was detect Chinese and temporarily disable Full Text searching).

You are more than welcome to use my code, provided you keep the existing acknowledgements attached (as I already used code kindly sent to me by a user of my site).

Best Regards,

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: Alternate search backend with sphinx fulltext search
November 30, 2006 07:41PM
Andrew is the developer of sphinxsearch which is a fulltext indexing engine on its own (the one used for this module here). I think he's not asking for disabling fulltext search, just for hints how to get fulltext matching in chinese to work.


Thomas Seifert
Phorum Development Team / Mysnip-Solutions.de
Custom Phorum and general software development
worry-free Phorum Hosting
Re: Alternate search backend with sphinx fulltext search
November 30, 2006 07:52PM
It's incredibly complex to get full text matching in Chinese to work :-(

You need a huge lookup table, with tens of thousands of compound words (hundreds of thousands if you want to be able to search for names, places, slang) etc.
Then throw in the wonderful complication of simplified vs traditional characters, and "common" shorthand that continually creeps into the language and the problem becomes even more fun.

In summary, spaces between words are actually pretty cool from a programming viewpoint, we're lucky to have them! ;-)

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Re: Alternate search backend with sphinx fulltext search
December 03, 2006 02:44PM
I uploaded 0.9.1 which works ONLY with sphinx 0.9.7.
I also added the use of delta indexes for lowering the load of full reindexing.
I'm using the sphinx_search module in production now on my forums and reindexing is done once a day. delta indexes are updated every half hour.

Reindexing is done with the following command:
/usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx_phorum.conf --rotate --all

Delta-Indexes are updated with:
/usr/local/sphinx/bin/indexer --quiet --config /usr/local/sphinx/etc/sphinx_phorum.conf --rotate phorum5_msg_d phorum5_author_delta


That gives currently these two versions:
- 0.9 is written for and works with 0.9.6
- 0.9.1 is written for and works with 0.9.7 and delta indexes (works without them too)


Thomas Seifert
Phorum Development Team / Mysnip-Solutions.de
Custom Phorum and general software development
worry-free Phorum Hosting
Sorry, only registered users may post in this forum.

Click here to login