Alternate search backend with sphinx fulltext search
Posted by Thomas Seifert
All files from this thread
File Name | File Size | Posted by | Date | ||
---|---|---|---|---|---|
sphinx_search-0.9.zip | 8.2 KB | open | download | Thomas Seifert | 10/13/2006 | Read message |
sphinx_search-0.9.2.zip | 9.7 KB | open | download | Thomas Seifert | 04/24/2007 | Read message |
Alternate search backend with sphinx fulltext search July 13, 2006 08:16AM |
Admin Registered: 22 years ago Posts: 9,240 |
This is a new search-backend based on sphinx (http://sphinxsearch.com/).
It works with phorum-5.1.16 and later.
It completely bypasses the internal mysql-fulltext-search
You *won't* be able to run this in a shared hosting environment as you need to install sphinx on the server and run its indexer regularly.
Here is its README content with some more information:
Thomas Seifert
Edited 11 time(s). Last edit at 04/24/2007 07:04PM by ts77.
It works with phorum-5.1.16 and later.
It completely bypasses the internal mysql-fulltext-search
You *won't* be able to run this in a shared hosting environment as you need to install sphinx on the server and run its indexer regularly.
Here is its README content with some more information:
Module for using sphinx as fulltext search-engine sphinx can be downloaded and installed from [sphinxsearch.com] mod written by Thomas Seifert (thomas@phorum.org) v0.5 - Jul 13, 2006 - first implementation, without admin interface v0.7 - Sep 07, 2006 - received some more testing and hence ... - fix for date-order v0.8 - Sep 13, 2006 by Maurice Makaay (maurice@phorum.org) - little hack in SetGroups of the sphinxapi.php script to not deny group 0 (is_int(0) is false, but we can have 0 for announcements) - if no results are found, return empty results immediately (before, no results would trigger undefined index warnings) - changed example config in README to use settings inheritance v0.9 - Oct 13, 2006 by Maurice Makaay (maurice@phorum.org) - Return search control to the Phorum search script in case an unhandled type of search is encountered (e.g. the USER_ID search type for Phorum 5.2). v0.9.1 - Dec 02,2006 by Thomas Seifert (thomas@phorum.org) - Updated to sphinx 0.9.7, added delta indexes and a host-ip v0.9.2 - Apr 24,2007 by Thomas Seifert (thomas@phorum.org) - limited results to 1000 - some security for the forum_id works with phorum-versions >= 5.1.16 ----------------------------------------------------- To get this to work you will need to install sphinx on your server. This will not be possible in a shared hosting environment and no, there is no workaround for that. some caveats: - data is stored in both phorum_search-table (not needed here) and in the sphinx filesystem - this doubles the sizes needed for search even though the search-table is not needed. You can clear it out every now and then. - search-data is not updated in real-time - sphinx reindexes the data if the indexer is called for the index, which does a full reindexing of the sphinx index. There is a workaround available in sphinx [sphinxsearch.com] which works essentially with two indexes. A large one and smaller one with only the deltas to the full index and which can be reindexed therefore more often, but I don't use that method yet. Still, its not realtime. For me the full indexing of 350,000 messages took around 5 minutes. I'd run the full indexing around once a day with those times. Before you can run a search through sphinx you will need to setup sphinx correctly and do at least one full indexing of your phorum-data. Try the searching on the command-line using search from the sphinx install first. Also you will need to start searchd from sphinx so that the mod can connect to it for searching. The relevant parts from my sphinx.conf: You will need to change *AT LEAST* the database settings, i.e. username, password, database-name and maybe the table name for the messages table used. If you change the index names you will need to change them in the sphinx_search too. ATTENTION: The delta indexes require MySQL-4.1 or higher because of the use of subqueries and they need a separate table for holding the last indexed ID, its structure is the following: CREATE TABLE sph_counter ( counter_id int(11) unsigned NOT NULL default '0', `type` enum('author','message') NOT NULL default 'message', max_doc_id int(11) NOT NULL, PRIMARY KEY (counter_id,`type`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; ------------------------------------------------------------------------------------- source phorum5_base { # data source type # for now, known types are 'mysql' and 'xmlpipe' # MUST be defined type = mysql # whether to strip HTML strip_html = 0 # what HTML attributes to index if stripping HTML index_html_attrs = ################################################################# # some straightforward parameters for 'mysql' source type sql_host = localhost sql_user = username sql_pass = password sql_db = database sql_port = 3306 # optional, default is 3306 # optional, default is empty sql_group_column = forum_id sql_date_column = datestamp } index phorum5_base { # morphology # default is not to use any morphology = none # stopwords file stopwords = # minimum word length min_word_len = 1 # charset encoding type # known types are 'sbcs' (Single Byte CharSet) and 'utf-8' charset_type = sbcs path = /usr/local/sphinx/var/data/foo } source phorum5_msg : phorum5_base { sql_query_pre = REPLACE INTO sph_counter SELECT 1, 'message', MAX(message_id) FROM phorum_messages WHERE status=2 sql_query = \ SELECT message_id, forum_id, datestamp, author, subject, body \ FROM phorum_messages \ WHERE message_id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 and type = 'message' ) and status=2 } source phorum5_msg_delta : phorum5_base { # main document fetch query # you can specify any number of fields # # mandatory sql_query = \ SELECT message_id, forum_id, datestamp, author, subject, body \ FROM phorum_messages \ WHERE message_id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 and type = 'message' ) and status=2 } index phorum5_msg : phorum5_base { # which document source to index source = phorum5_msg # this is path and index file name without extension # files <indexpath>.spi/spd/spr will be created by indexer path = /usr/local/sphinx/var/data/phorum5_msg } index phorum5_msg_d : phorum5_base { # which document source to index source = phorum5_msg_delta # this is path and index file name without extension # files <indexpath>.spi/spd/spr will be created by indexer path = /usr/local/sphinx/var/data/phorum5_msg_d } source phorum5_author : phorum5_base { sql_query_pre = REPLACE INTO sph_counter SELECT 1, 'author', MAX(message_id) FROM phorum_messages WHERE status=2 sql_query = \ SELECT message_id, forum_id, datestamp, author \ FROM p50_000001_messages \ WHERE message_id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 and type = 'author' ) and status=2 } source phorum5_author_delta : phorum5_base { # main document fetch query # you can specify any number of fields # # mandatory sql_query = \ SELECT message_id, forum_id, datestamp, author \ FROM phorum_messages \ WHERE message_id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 and type = 'author' ) and status=2 } index phorum5_author : phorum5_base { source = phorum5_author path = /usr/local/sphinx/var/data/phorum5_author } index phorum5_author_delta : phorum5_base { source = phorum5_author_delta path = /usr/local/sphinx/var/data/phorum5_author_delta } --------------------------------------------------------------------------------------------- Keep in mind that I just started with Sphinx, there might be bugs, problems and probably easier solutions.
Thomas Seifert
Edited 11 time(s). Last edit at 04/24/2007 07:04PM by ts77.
Re: Alternate search backend with sphinx fulltext search September 06, 2006 03:17AM |
Registered: 19 years ago Posts: 117 |
Hello Thomas,
I spent all the night testing your mod and sphinx fulltext search. :)
It's running but i've still "create temporary table phorum_search_9344ef8fb4a8c..." process in mysql.
Sphinx should replace the old search system ?
Why we still have this mysql process when we make a search ?
I hope that sphinx search will help my server overload. :)
Thanks Thomas.
PS : I have search queries in query.log.
I spent all the night testing your mod and sphinx fulltext search. :)
It's running but i've still "create temporary table phorum_search_9344ef8fb4a8c..." process in mysql.
Sphinx should replace the old search system ?
Why we still have this mysql process when we make a search ?
I hope that sphinx search will help my server overload. :)
Thanks Thomas.
PS : I have search queries in query.log.
Re: Alternate search backend with sphinx fulltext search September 06, 2006 04:24AM |
Admin Registered: 22 years ago Posts: 9,240 |
Re: Alternate search backend with sphinx fulltext search September 06, 2006 04:29AM |
Registered: 19 years ago Posts: 117 |
Yes the mod is enabled.
I've just truncate phorum_search table.
Now i cannot make a search on message. It's answering me no results. It seems to work with the old search system.
But on author search it's running fine. It gives me many results but still using phorum_search tmp table.
Also i have the log of my queries on sphinx.
I've just truncate phorum_search table.
Now i cannot make a search on message. It's answering me no results. It seems to work with the old search system.
But on author search it's running fine. It gives me many results but still using phorum_search tmp table.
Also i have the log of my queries on sphinx.
Re: Alternate search backend with sphinx fulltext search September 06, 2006 04:32AM |
Admin Registered: 22 years ago Posts: 9,240 |
Re: Alternate search backend with sphinx fulltext search September 06, 2006 04:37AM |
Registered: 19 years ago Posts: 117 |
Yesssss !
It's running now.
I've fixed the problem.
The message search file wasn't named correctly.
You should change this line on sphinx.conf:
path = /usr/local/sphinx/var/data/phorum5_HEAD
by
path = /usr/local/sphinx/var/data/phorum5_HEAD_msg
Search with author still using tmp table. I'm looking why...
It's running now.
I've fixed the problem.
The message search file wasn't named correctly.
You should change this line on sphinx.conf:
path = /usr/local/sphinx/var/data/phorum5_HEAD
by
path = /usr/local/sphinx/var/data/phorum5_HEAD_msg
Search with author still using tmp table. I'm looking why...
Re: Alternate search backend with sphinx fulltext search September 06, 2006 06:55AM |
Admin Registered: 22 years ago Posts: 9,240 |
Re: Alternate search backend with sphinx fulltext search September 06, 2006 08:08AM |
Registered: 19 years ago Posts: 117 |
Re: Alternate search backend with sphinx fulltext search September 07, 2006 05:06AM |
Registered: 19 years ago Posts: 117 |
Re: Alternate search backend with sphinx fulltext search September 07, 2006 05:10AM |
Registered: 21 years ago Posts: 687 |
Thomas, does Sphinx still rely on spaces between words to perform the full text search?
In other words, do think it will still fail to index Chinese (blocks of characters with no spaces)
Thank you,
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
In other words, do think it will still fail to index Chinese (blocks of characters with no spaces)
Thank you,
/\dam
--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Sorry, only registered users may post in this forum.