Alternate search backend with sphinx fulltext search
Posted by Thomas Seifert
Re: Alternate search backend with sphinx fulltext search September 13, 2006 05:06AM |
Registered: 18 years ago Posts: 117 |
September 13, 2006 05:18AM |
Admin Registered: 19 years ago Posts: 8,532 |
I tried it on my installation and I can search for special characters using ISO-8859-1. The only problem is that the excerpts code does not yet support sbcs. Therefore, the special characters are not shown at all.
If on your system the search doesn't turn up anything, then maybe the data that is collected by the indexer from MySQL is in UTF-8, which would confuse the indexing which expects single byte chars. This is just a hunch and I have no solution for you if this would be the case.
About the path and index names. The pathnames aren't really important. Things will work without _msg too. You can use anything you like there. The double underscore in the module is my fault. I use different index names and I had to change back the index names in the module before packaging. That's where the typo entered the arena. I'll fix this in the zipfile right away.
Edit: while fixing the wrong index name, I also changed the path in the example config to match the message index name.
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
Edited 1 time(s). Last edit at 09/13/2006 05:21AM by mmakaay.
If on your system the search doesn't turn up anything, then maybe the data that is collected by the indexer from MySQL is in UTF-8, which would confuse the indexing which expects single byte chars. This is just a hunch and I have no solution for you if this would be the case.
About the path and index names. The pathnames aren't really important. Things will work without _msg too. You can use anything you like there. The double underscore in the module is my fault. I use different index names and I had to change back the index names in the module before packaging. That's where the typo entered the arena. I'll fix this in the zipfile right away.
Edit: while fixing the wrong index name, I also changed the path in the example config to match the message index name.
Maurice Makaay
Phorum Development Team



Edited 1 time(s). Last edit at 09/13/2006 05:21AM by mmakaay.
Re: Alternate search backend with sphinx fulltext search September 14, 2006 04:30AM |
Registered: 18 years ago Posts: 117 |
September 14, 2006 05:16AM |
Admin Registered: 19 years ago Posts: 8,532 |
Sorry, I can't help you with solving that problem. Maybe you could check out the Sphinx website and maybe post a question in the forums if you can't find a solution there.
Looking at the forums, I found this one. Maybe you could use
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
Edited 1 time(s). Last edit at 09/14/2006 05:20AM by mmakaay.
Looking at the forums, I found this one. Maybe you could use
sql_query_pre = SET CHARACTER_SET_RESULTS latin1in Sphinx' config, to make sure that the data that is retrieved from MySQL is latin-1 and not UTF-8. If your MySQL server is returning the data UTF-8 to Sphinx currently then this just might work.
Maurice Makaay
Phorum Development Team



Edited 1 time(s). Last edit at 09/14/2006 05:20AM by mmakaay.
Re: Alternate search backend with sphinx fulltext search September 27, 2006 10:03AM |
Registered: 18 years ago Posts: 117 |
Re: Alternate search backend with sphinx fulltext search November 29, 2006 10:36AM |
Registered: 16 years ago Posts: 1 |
sheik,
I'm willing to work towards Chinese support in Sphinx - provided that somebody would like to volunteer and help me with beta-testing. I don't speak Chinese myself unfortunately.
Andrew Aksyonoff,
[sphinxsearch.com]
I'm willing to work towards Chinese support in Sphinx - provided that somebody would like to volunteer and help me with beta-testing. I don't speak Chinese myself unfortunately.
Andrew Aksyonoff,
[sphinxsearch.com]
November 30, 2006 12:31PM |
Registered: 20 years ago Posts: 687 |
Hi Andrew,
As I'm sure you know, searching for Chinese is really a problem of searching without any delimiters between words.
The easy solution is to revert to brute force pattern matching when Chinese is entered as a search term.
I have just coded a rudimentary solution for Phorum here: [www.phorum.org]
(note, Phorum already supports wildcard text searching, so all I had to do was detect Chinese and temporarily disable Full Text searching).
You are more than welcome to use my code, provided you keep the existing acknowledgements attached (as I already used code kindly sent to me by a user of my site).
Best Regards,
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
As I'm sure you know, searching for Chinese is really a problem of searching without any delimiters between words.
The easy solution is to revert to brute force pattern matching when Chinese is entered as a search term.
I have just coded a rudimentary solution for Phorum here: [www.phorum.org]
(note, Phorum already supports wildcard text searching, so all I had to do was detect Chinese and temporarily disable Full Text searching).
You are more than welcome to use my code, provided you keep the existing acknowledgements attached (as I already used code kindly sent to me by a user of my site).
Best Regards,
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz

Re: Alternate search backend with sphinx fulltext search November 30, 2006 01:41PM |
Admin Registered: 20 years ago Posts: 9,240 |
November 30, 2006 01:52PM |
Registered: 20 years ago Posts: 687 |
It's incredibly complex to get full text matching in Chinese to work :-(
You need a huge lookup table, with tens of thousands of compound words (hundreds of thousands if you want to be able to search for names, places, slang) etc.
Then throw in the wonderful complication of simplified vs traditional characters, and "common" shorthand that continually creeps into the language and the problem becomes even more fun.
In summary, spaces between words are actually pretty cool from a programming viewpoint, we're lucky to have them! ;-)
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
You need a huge lookup table, with tens of thousands of compound words (hundreds of thousands if you want to be able to search for names, places, slang) etc.
Then throw in the wonderful complication of simplified vs traditional characters, and "common" shorthand that continually creeps into the language and the problem becomes even more fun.
In summary, spaces between words are actually pretty cool from a programming viewpoint, we're lucky to have them! ;-)
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz

Re: Alternate search backend with sphinx fulltext search December 03, 2006 08:44AM |
Admin Registered: 20 years ago Posts: 9,240 |
I uploaded 0.9.1 which works ONLY with sphinx 0.9.7.
I also added the use of delta indexes for lowering the load of full reindexing.
I'm using the sphinx_search module in production now on my forums and reindexing is done once a day. delta indexes are updated every half hour.
Reindexing is done with the following command:
Delta-Indexes are updated with:
That gives currently these two versions:
- 0.9 is written for and works with 0.9.6
- 0.9.1 is written for and works with 0.9.7 and delta indexes (works without them too)
Thomas Seifert
I also added the use of delta indexes for lowering the load of full reindexing.
I'm using the sphinx_search module in production now on my forums and reindexing is done once a day. delta indexes are updated every half hour.
Reindexing is done with the following command:
/usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx_phorum.conf --rotate --all
Delta-Indexes are updated with:
/usr/local/sphinx/bin/indexer --quiet --config /usr/local/sphinx/etc/sphinx_phorum.conf --rotate phorum5_msg_d phorum5_author_delta
That gives currently these two versions:
- 0.9 is written for and works with 0.9.6
- 0.9.1 is written for and works with 0.9.7 and delta indexes (works without them too)
Thomas Seifert
Sorry, only registered users may post in this forum.