Firefox PHP

Alternate search backend with sphinx fulltext search

Posted by Thomas Seifert 
All files from this thread

File Name File Size   Posted by Date  
sphinx_search-0.9.zip 8.2 KB open | download Thomas Seifert 10/13/2006 Read message
sphinx_search-0.9.2.zip 9.7 KB open | download Thomas Seifert 04/24/2007 Read message
Alternate search backend with sphinx fulltext search
July 13, 2006 08:16AM
This is a new search-backend based on sphinx (http://sphinxsearch.com/).
It works with phorum-5.1.16 and later.

It completely bypasses the internal mysql-fulltext-search
You *won't* be able to run this in a shared hosting environment as you need to install sphinx on the server and run its indexer regularly.

Here is its README content with some more information:
Module for using sphinx as fulltext search-engine
sphinx can be downloaded and installed from
[sphinxsearch.com]

mod written by Thomas Seifert (thomas@phorum.org)

v0.5 - Jul 13, 2006
     - first implementation, without admin interface
v0.7 - Sep 07, 2006
     - received some more testing and hence ...
     - fix for date-order
v0.8 - Sep 13, 2006 by Maurice Makaay (maurice@phorum.org)
     - little hack in SetGroups of the sphinxapi.php script to not deny
       group 0 (is_int(0) is false, but we can have 0 for announcements)
     - if no results are found, return empty results immediately (before,
       no results would trigger undefined index warnings)
     - changed example config in README to use settings inheritance
v0.9 - Oct 13, 2006 by Maurice Makaay (maurice@phorum.org)
     - Return search control to the Phorum search script in case an
       unhandled type of search is encountered (e.g. the USER_ID search
       type for Phorum 5.2).
v0.9.1 - Dec 02,2006 by Thomas Seifert (thomas@phorum.org)
       - Updated to sphinx 0.9.7, added delta indexes and a host-ip
	
v0.9.2 - Apr 24,2007 by Thomas Seifert (thomas@phorum.org)
       - limited results to 1000
       - some security for the forum_id

works with phorum-versions >= 5.1.16
-----------------------------------------------------

To get this to work you will need to install sphinx on your server.
This will not be possible in a shared hosting environment and no, there
is no workaround for that.

some caveats:
  - data is stored in both phorum_search-table (not needed here) and in the
    sphinx filesystem - this doubles the sizes needed for search even though
    the search-table is not needed. You can clear it out every now and then.
  - search-data is not updated in real-time - sphinx reindexes the data if the indexer
    is called for the index, which does a full reindexing of the sphinx index.
    There is a workaround available in sphinx [sphinxsearch.com]
    which works essentially with two indexes. A large one and smaller one with only the deltas
    to the full index and which can be reindexed therefore more often, but I don't use that method yet.
    Still, its not realtime.
    For me the full indexing of 350,000 messages took around 5 minutes.
    I'd run the full indexing around once a day with those times.


Before you can run a search through sphinx you will need to setup sphinx correctly and do at least one full
indexing of your phorum-data.
Try the searching on the command-line using search from the sphinx install first.
Also you will need to start searchd from sphinx so that the mod can connect to it for searching.

The relevant parts from my sphinx.conf:

You will need to change *AT LEAST* the database settings, i.e. username, password, database-name and maybe
the table name for the messages table used.
If you change the index names you will need to change them in the sphinx_search too.

ATTENTION: 

The delta indexes require MySQL-4.1 or higher because of the use of subqueries and they need a separate table 
for holding the last indexed ID, its structure is the following:

CREATE TABLE sph_counter (
  counter_id int(11) unsigned NOT NULL default '0',
  `type` enum('author','message') NOT NULL default 'message',
  max_doc_id int(11) NOT NULL,
  PRIMARY KEY  (counter_id,`type`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;



-------------------------------------------------------------------------------------
source phorum5_base
{
        # data source type
        # for now, known types are 'mysql' and 'xmlpipe'
        # MUST be defined
        type                            = mysql

        # whether to strip HTML
        strip_html                      = 0

        # what HTML attributes to index if stripping HTML
        index_html_attrs        =

        #################################################################

        # some straightforward parameters for 'mysql' source type
        sql_host                = localhost
        sql_user                = username
        sql_pass                = password
        sql_db                  = database
        sql_port                = 3306    # optional, default is 3306


        # optional, default is empty
        sql_group_column        = forum_id
        sql_date_column         = datestamp


}

index phorum5_base
{

        # morphology
        # default is not to use any
        morphology              = none

        # stopwords file
        stopwords               =

        # minimum word length
        min_word_len            = 1

        # charset encoding type
        # known types are 'sbcs' (Single Byte CharSet) and 'utf-8'
        charset_type            = sbcs

        path                    = /usr/local/sphinx/var/data/foo

}

source phorum5_msg : phorum5_base
{
        sql_query_pre           = REPLACE INTO sph_counter SELECT 1, 'message', MAX(message_id) FROM phorum_messages WHERE status=2
        sql_query               = \
                SELECT message_id, forum_id, datestamp, author, subject, body \
                FROM phorum_messages \
                WHERE message_id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 and type = 'message' ) and status=2

}
source phorum5_msg_delta : phorum5_base
{

        # main document fetch query
        # you can specify any number of fields
        #
        # mandatory
        sql_query               = \
                SELECT message_id, forum_id, datestamp, author, subject, body \
                FROM phorum_messages \
                WHERE message_id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 and type = 'message' ) and status=2

}

index phorum5_msg : phorum5_base
{
        # which document source to index
        source                  = phorum5_msg

        # this is path and index file name without extension
        # files <indexpath>.spi/spd/spr will be created by indexer
        path                    = /usr/local/sphinx/var/data/phorum5_msg

}

index phorum5_msg_d : phorum5_base
{
        # which document source to index
        source                  = phorum5_msg_delta

        # this is path and index file name without extension
        # files <indexpath>.spi/spd/spr will be created by indexer
        path                    = /usr/local/sphinx/var/data/phorum5_msg_d

}

source phorum5_author : phorum5_base
{
        sql_query_pre           = REPLACE INTO sph_counter SELECT 1, 'author', MAX(message_id) FROM phorum_messages WHERE status=2
        sql_query               = \
                SELECT message_id, forum_id, datestamp, author \
                FROM p50_000001_messages \
                WHERE message_id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 and type = 'author' ) and status=2
}

source phorum5_author_delta : phorum5_base
{

        # main document fetch query
        # you can specify any number of fields
        #
        # mandatory
        sql_query               = \
                SELECT message_id, forum_id, datestamp, author \
                FROM phorum_messages \
                WHERE message_id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 and type = 'author' ) and status=2

}

index phorum5_author : phorum5_base
{
        source = phorum5_author
        path = /usr/local/sphinx/var/data/phorum5_author
}



index phorum5_author_delta : phorum5_base
{
        source = phorum5_author_delta
        path = /usr/local/sphinx/var/data/phorum5_author_delta
}



---------------------------------------------------------------------------------------------

Keep in mind that I just started with Sphinx, there might be bugs, problems and probably easier solutions.


Thomas Seifert



Edited 11 time(s). Last edit at 04/24/2007 07:04PM by ts77.
Attachments:
open | download - sphinx_search-0.9.zip (8.2 KB)
open | download - sphinx_search-0.9.2.zip (9.7 KB)
Re: Alternate search backend with sphinx fulltext search
September 06, 2006 03:17AM
Hello Thomas,

I spent all the night testing your mod and sphinx fulltext search. :)
It's running but i've still "create temporary table phorum_search_9344ef8fb4a8c..." process in mysql.
Sphinx should replace the old search system ?
Why we still have this mysql process when we make a search ?

I hope that sphinx search will help my server overload. :)

Thanks Thomas.

PS : I have search queries in query.log.
Re: Alternate search backend with sphinx fulltext search
September 06, 2006 04:24AM
momo, yes, it bypasses the phorum-search.
are you sure that mod is really enabled?
your results or the create temporary table seems to suggest that the original search is still used.


Thomas Seifert
Re: Alternate search backend with sphinx fulltext search
September 06, 2006 04:29AM
Yes the mod is enabled.

I've just truncate phorum_search table.
Now i cannot make a search on message. It's answering me no results. It seems to work with the old search system.

But on author search it's running fine. It gives me many results but still using phorum_search tmp table.

Also i have the log of my queries on sphinx.
Re: Alternate search backend with sphinx fulltext search
September 06, 2006 04:32AM
can you contact me by pm?
I'd like to take a look at your phorum.
ftp-access would be nice.


Thomas Seifert
Re: Alternate search backend with sphinx fulltext search
September 06, 2006 04:37AM
Yesssss !
It's running now.

I've fixed the problem.

The message search file wasn't named correctly.

You should change this line on sphinx.conf:
path = /usr/local/sphinx/var/data/phorum5_HEAD

by
path = /usr/local/sphinx/var/data/phorum5_HEAD_msg


Search with author still using tmp table. I'm looking why...
Re: Alternate search backend with sphinx fulltext search
September 06, 2006 06:55AM
fixed by direct contact. its a problem in phorum-5.1.15, will be fixed in 5.1.16.
contact me if you need that fix earlier.


Thomas Seifert
Re: Alternate search backend with sphinx fulltext search
September 06, 2006 08:08AM
Thanks to Thomas who help me on debugging sphinx install on phorum.

Now it's running very nice. The search is more powerfull with sphinx than mysql phorum search.
On a busy phorum you should use it instead of phorum_search table. It's a very nice mod. :)
Re: Alternate search backend with sphinx fulltext search
September 07, 2006 05:06AM
Hello Thomas,

I have a little problem with sphinx search. Results are not ordered by date desc.
Maybe because i'm running phorum with french date format ? (DD/MM/YYYY)

Do you know this problem ?

Thank you Thomas.
Re: Alternate search backend with sphinx fulltext search
September 07, 2006 05:10AM
Thomas, does Sphinx still rely on spaces between words to perform the full text search?
In other words, do think it will still fail to index Chinese (blocks of characters with no spaces)

Thank you,

/\dam

--
My notable Phorum sites:
Movie Deaths Database - "review comments" system mostly powered by Phorum
Learn Chinese! - integrated forum quiz
Sorry, only registered users may post in this forum.

Click here to login