MOD_SEARCHM DESCRIPTION ======================= An Apache2 DSO module html file search engine based on the Swish-e C API returning results by replacing tags in a user supplied html template. Persons with Swish-e knowledge and ability to generate a Swish-e index file should find the searchm interface familiar. Persons without Swish-e knowledge can get started quickly. Direct support for Fedora Core 2 and FreeBSD 5.3-RELEASE. MOD_SEARCHM REQUIREMENTS ======================== Apache 2 (Developing using 2.0.49, earlier version untested with mod_searchm.) swish-e-2.4.2 (Earlier Swish-e versions are not compatible with mod_searchm.) mod_usertrack optional for state management/ search thread control. For SEARCHM_DBM option: Apache2 server with DBM support. Tested only with distributed apr-util sdbm backend. For SEARCHM_BDB option: Berkeley DB4 (4.2.52 or above) (www.sleepycat.com). For SEARCHM_MYSQL option: Mysql 4.1.11 or above (www.mysql.com). Tested with PreFork and Worker MPMs. MOD_SEARCHM OPERATION - OVERVIEW ================================ The module handles all request that set the module-handler to mod-searchm. The module determines its information from the query string portion of the request URL. See "Searchm Query Parameters" for more information. Once the request is received, the parameters are parsed and mod_searchm uses the Swish-e index file(s) specified by the index= parameter(s). The query string that will be used is obtained from the query= parameter(s). Once the Swish-e index file query information is determined, mod_searchm uses the Swish-e C API to generate the results of the query. If no hits are produced, then the file specified by the configuration directive SearchmNoHits is returned. Otherwise, the file specified by the configuration directive SearchmBegin is sent as the beginning of the returned html page; Then the file specified by SearchmResult is sent, once for each returned result; Then, the file specified by SearchmEnd is sent to complete the returned html page. When a file is returned, all are replaced by their associated values. See Searchm Tags for more information. The mod_searchm module requires that Swish-e was used to create an index file databases. See the Swish-e documentation for the details. If mod_usertrack is installed and activated for the directory/location being served, and, if the requesting User Agent accepts cookies, and, the request is formatted correctly, and, if mod_searchm recognizes the cookie as set by the mod_usertrack CookieName httpd configuration directive that is sent by the requesting User Agent, then, mod_searchm will associate that cookie with the search thread. BUILDING MOD_SEARCHM - SETUP ============================ The build system uses a modified GNU MAKE 3.79.1 or above Makefile generated by 'apxs -g'. If 'apxs -g -n searchm' successfully creates a simple searchm module on your system, and make (or gmake) builds the simple module, then a mod_searchm supplied Makefile should work properly. Within the simple module, the .deps file and the Makefile should be inspected carefully for setting values within the mod_searchm supplied or custom Makefile.inc. The build system uses two make files, a primary Makefile, and a settings Makefile.inc. Hopefully, only settings within Makefile.inc will reqire alteration. Supported systems contain a Makefile.inc within the corresponding dist/DIST directories, and the Makefile setup target may be used to automatically copy necessary files into the source directory. Missing files may be created using touch(1). For example (FreeBSD 5.3-RELEASE) $gmake DIST=freebsd setup For example (FC2) $make DIST=FC setup Otherwise, a custom Makefile.inc must be created from an existing Makefile.inc. Depending on your system, the .deps file from the generated sample module may be required. In most cases, using $touch .deps inside the mod_searchm project directory is acceptable. Some systems require setting top_srcdir and top_builddir. FreeBSD 5.3-RELEASE uses /usr/local/share/apache2 while FC2 requires no modification. See the Makefile.inc comments for additional parameters. After running the setup target, edit Makefile.inc as necessary. Note that the clean target removes Makefile.inc. BUILDING MOD_SEARCHM - Building and installing the DSO ====================================================== As a non root user, make (or gmake) As root, make (or gmake) install As non root user which is part of APACHE_GID group, make (or gmake) swish will create a index.swish-e Swish-e index file using the installed swish-e.conf file. For non supported distributions, use one of the mod_searchm.conf configuration files from a supported dist as a guide for setting mod_searchm Apache2 configuration directives. BUILDING MOD_SEARCHM - Uninstalling =================================== As root, make (or gmake) uinstall will completely erase mod_searchm from your system, including any and all files placed in the mod_searchm directories. Understand the Makefile uinstall process prior to using. THE SWISH-E INDEX ================= Creating a Swish-e index file is best described in the Swish-e documentation. Supported systems have a sample Swish-e configuration file, swish-e.conf, swish-e.1.conf, and swish-e.db4.conf as part of the mod_searchm distribution. On FC2, if your Swish-e package was the authors RPM, then that configuration file may be used directly. On FreeBSD 5.3-RELEASE, the swish-e.conf configuration file assumes that swish-e-2.4.2 package was installed. Also, for FreeBSD, add a user to group www and run the swish-e binary as that user. MOD_SEARCHM SEARCH THREADS ========================== A mod_searchm search thread is defined as the sequence of requests resulting forom initiating a search. A user agent/ browser/ remote client initiates a search thread via the search HTML form. All submitted information is stored in a database, and used by subsequent search thread requests. An error terminates the search thread. A successful initiated search continues the search tread, now identified by a unique request identification number (identifies the search thread). The information for a search thread is stored within a database, and all requeries are only required to return the unique request identification number and additional required information to continue the search thread (page number). Associated with each search thread is a last access time. When this time exceeds a preset limit, the search thread expires, and is deleted from the database and any requests by the search thread will return an error. MOD_SEARCHM AND MOD_USERTRACK ============================= Mod_searchm recognizes mod_usertrack cookies. If mod_usertrack is active for the directory/ location being served, then that cookie will be associated with a created search thread and stored in the database. How the cookie is used, by mod_searchm, is complicated by many factors; If the requesting remote client/ User Agent/ Browser accepts cookies; If the requesting remote client/ User Agent/ Browser deletes a cookie while in the middle of a search thread; If mod_searchm recognizes the mod_usertrack cookie; If the mod_usertrack configuration sets a session cookie or a cookie with an expiration date; If the requesting remote client/ User Agent/ Browser restarts. For now, if a cookie is associated with a search thread AND a cookie was sent by the remote client/ user agent/ browser, they must match. Typically, the mod_usertrack directives that need to be set in a mod-searchm handled directory/location are; CookieStyle, CookieTracking, and CookieName. CookieName should be set to a simple ASCII string (isalpha characters only), without any white space or PCRE characters, etc. CookieTracking must be ON (obviously). CookieStyle settings are a matter of preference. MOD_SEARCHM OPERATION ===================== Once built and installed, the mod_searchm Apache2 configurations directives must be set prior to operation. See "Mod_Searchm Operations" section for more information. Most swish-e(1) functionality is available in mod_searchm. Basically, mod_searchm allows index files to be "queried", based upon parameters stated on the query/command line, and results for a successful search are returned in three "phases." Results for an unsuccessful search (no hits/matches or use input error conditions) are returned separately, and the three "phases" are not run. Phase one is the replacement of all tags in the file identified by the mod_searchm Apache2 configuration directive SearchmBegin setting. This is typically the top portion of the returned html page, and usually contains the
section and the beginning of the section. Phase two is the replacement of all tags in the file identified by the mod_searchm Apache2 configuration directive SearchmResult setting FOR EACH RESULT RETURNED. This file is typically the important part of the section. Phase three is the replacement of all the tags in the file identified by the mod_searchm Apache2 configuration directive SearchmEnd file. This file typically contains the end of the section, and completes the returned html page. If no hits/results are generated, invalid search terms given, or no index file was given, the file identified by the mod_searchm Apache2 configuration directive SearchmNoHits is returned (with tag replacement), and the three phases are not run. If another error happens, processing stops at the point of the error, and a page indicating the error is returned. MOD_SEARCHM OPERATION Returning the results =========================================== See SearchmBegin, SearchmResult, SearchmEnd, and SearchmNoHits mod_searchm Apache2 configuration directives. The SearchmBegin, SearchmResult, and SearchmEnd files combine to form a complete html document. All within the files are processed and the appropriate text is substituted. During phase one (begin), the SearchmBegin file is returned. During phase two (result) the SearchmResult file is sent once for each result being returned. During phase three (end), SearchmEnd is sent once. If the Swish-e index file query does not create any results (nohits), then the SearchmNoHits file ( a complete html page) is returned. MOD_SEARCHM OPERATION DATABASE ============================== A created search thread is assigned a request identification, which is imbedded via the searchm_reqid tag within returned documents. Subsequent request containing this reqid will be associated with the search thread. When a search thread is created, all the search parameter information is stored within a database. Subsequent requests of a search thread need only to provide the request identification and a new page number. Each successful request of a search thread updates the last request time. When a successful request is completed, the database is purged if the SEARCHM_PURGE_TIME limit has been exceeded. A database purge deletes all expired search threads. Manually purging a database is discouraged, but possible. MOD_SEARCHM OPERATION DATABASE DBM ================================== The DBM database option is easiest to implement, and is part of the Apache API. If SEARCHM_DBM is defined when building mod_searchm (via -DSEARCHM_DBM), mod_searchm will use the Apache API DBM services for database requirements. INCLUDES=-DSEARCHM_DBM Relevant Apache2 build time configuration options: --with-dbm, --with-gdbm, --with-ndbm, and --with-berkeley-db. Options may be seen with: cd srclib/apr-util ./configure --help The options determine which database backend will be used to implement the default apr-util dbm and which DBM database types will be available at runtime. A missing --with-dbm, or, a --with-dbm=yes setting, will build the apr-util distribution included sdbm (srclib/apr-util/dbm/sdbm) and use it as the default DBM backend. A other valid --with-dbm value will build with the indicated database backend as default. The location of that database backend must be specified by either of the --with-gdbm, --with-ndbm, or --with-berkeley-db options. Each of --with-gdbm, with-ndbm, or --with-berkeley-db options will build associated DBM support into Apache2. Although, if the Apache2 build process detects gdbm, ndbm, or berkeley-db in standard locations, it will automatically build support for them. However, mod_searchm only uses the default DBM backend. Fedora Core Apache2 apr-util uses the apr-util included sdbm as the default dbm. The gdbm and berkeley-db backends are also available. The 5.3-RELEASE Apache2 apr-util uses the included sdbm as the default dbm and includes no other built in DBM support. The mod_searchm Apache2 configuration directive SearchmDBM sets the dbm name. Depending on the default Apache2 DBM backend, this may not be a true filename. Many DBM packages append suffixes for separate required support files. See the apr-util apr_dbm_open function for more information. If APR_ENOTIMPL ( (70023) This function has not been implemented on this platform.) is in the error log, then required DBM backend support was not built into Apache2. Manually purging the database is possible if the server is shutdown, or the server is not processing the mod-searchm handler. Predicting when the server is not processing a mod_searchm handled request is tricky at best. Depending on the DBM database backend, simply erasing all the mod_searchm DBM files is the best solution for deleting the database, but only do so with the server shutdown. The DBM database option requires that all requests of a search thread be processed by the single server machine upon which the DBM database is physically located. A remote DBM database may not be shared among servers. MOD_SEARCHM OPERATION DATABASE DB4 ================================== If SEARCHM_BDB is defined when building mod_searchm (via -DSEARCHM_BDB), mod_searchm will use SleepyCat DB4 for database requirements. INCLUDES=-DSEARCHM_BDB The SearchmBDB configuration directive may be used to set the environment home and the database file information. Within the database file, the DB_SEARCHM database contains the information associated with valid search threads. MOD_SEARCHM OPERATION DATABASE MYSQL ==================================== If SEARCHM_MYSQL is defined when building mod_searchm (via -DSEARCHM_MYSQL), mod_searchm will use MySQL for database requirements. INCLUDES=-DSEARCHM_MYSQL The SearchmMySQL configuration directive may be used to set the connection parameters. The database is purged, when necessary, as a two step process. All rows to be purged are marked with atime equals to '0'. Once all rows to be purged are marked, then a DELETE of all atime='0' rows occurs. If the server crashes during the purge cycle, some rows will be marked deleted, but will not have been deleted. GRANT ALL PRIVILEGES ON SEARCHM.* TO 'bob'@'localhost'; GRANT ALL PRIVILEGES ON SEARCHM.* TO 'bob'@'waiter'; GRANT ALL PRIVILEGES ON SEARCHM.* TO 'bob'@'waiter.localzone'; CREATE DATABASE SEARCHM; USE SEARCHM; CREATE TABLE searchm ( reqid CHAR(32) ASCII NOT NULL PRIMARY KEY, cookie VARCHAR(255) NULL, structure SMALLINT UNSIGNED NULL, display SMALLINT(5) UNSIGNED NULL, swishsort VARCHAR(255) NULL, swishquery TEXT NOT NULL, ui_index SMALLINT(5) UNSIGNED NOT NULL, swishindex TEXT NOT NULL, ui_limit SMALLINT(5) NULL, swishlimit TEXT NULL, atime VARCHAR(32) ASCII NOT NULL) reqid: ASCII string of all HEX digits, no terminating '\0'. Exactly 32 chars. One reqid of SEARCHM_KEY_PURGETIME is used to store the last purge time. cookie: A string of up to 255 characters. A '\0' terminates the string. ui_index: The number of swishindex strings. 0 is invalid. swishindex: A '\0' separated list of strings. swishsort: A string of up to 255 characters. May be NULL. A '\0' terminates the string. structure: The SwishSetStructure value. May be NULL and/or 0. swishquery: The SwishExecute query string. Terminated by a '\0'. ui_limit: The number of swishlimit entries. May be NULL and/or 0. swishlimit: Contains ui_limit entries. An entry is: name'\0'type'\0'limit_hi'\0'limit_low'\0'. display: The number of results to display per page. May be NULL and/or 0. MOD_SEARCHM OPERATION FileNames =============================== Filename issues are complicated by, to name a few, OS dependencies, file system dependencies, and file name restrictions. Mod_searchm allows filename and path specifications in the SearchmIndexDir, SearchmLockFile, and SearchmDBM (for DBM database option) directives, and, in the command line/query_string parameter index= value. For SearchmStrict, the safest and most portable setting, index= may only be a filename, relative to the URL directory/location. The actual path/filename created internally of the physical index should be portable. For SearchmRelative, index= may be a relative path/filename, or, may be a filename. For the relative case, the SearchmIndexDir and index= value are merged to create the actual file path specification, which is OS dependent. If the index= value is a filename only, then same as SearchmStrict. For SearchmAbsolute, the most versatile and least portable setting, index= may be an absolute /path/filename, or, may be a relative path/filename, or, may be a filename. For the absolute case, the /path/filename is OS dependent. As a general rule, SearchmStrict is the best setting. Additional testing on non-unix systems/file systems required. MOD_SEARCHM WEIGHTED LIST SORTING ================================== Weighted list sorting attempts to sort an unordered list of items into a sorted element list. Each item in a item list has an associated numerical weight N, where N is an integer value. The first item in the item list is assigned item 0. The item list is then sorted into a numerical sequence of consecutive elements, with the first element assigned as element 0. Items of lesser weight precede items of greater weight in the sorted element list. The sort order of items from the item list with equal weights is not defined or predictable, except that items of lesser weight will precede the items of equal weight, and items of greater weight will follow the items of equal weight. Unique item lists may not contain items assigned equal weights. Non-unique item lists may contain items assigned equal weights. To force the item numbers of the weighted list to correspond exactly to the element numbers of the sorted list, each item weight must equal the item number, and the item list must be unique. To create a 'sorted' element list that has no predictable order, all items in the item list should have equal weights, which implies the item list must be non-unique. When sorting, if the number of items in the item list exceeds the number of elements the sorted list may contain, then items of greater weight are excluded from the sorted list. Example 1 Item 0 1 2 3 4 Weight 0 24 0 16 3 is sorted into a sequence of elements Element 0 1 2 3 4 Weight 0 0 3 16 24 Elements 0 and 1 are weight 0, but exactly which element is item 0 or item 2 from the list is not predictable. Example 2 Item 0 1 2 3 4 Weight 0 1 2 3 4 is sorted into a sequence of elements Element 0 1 2 3 4 Weight 0 1 2 3 4 For this example all item numbers correspond exactly to element numbers. Example 3 Item 0 1 2 3 4 Weight 0 0 0 0 0 is sorted into a sequence of elements Element 0 1 2 3 4 Weight 0 0 0 0 0 For this example all items have equal weights, resulting in a 'sorted' list without any predictable order. Some mod_searchm command/query line parameters use weighted list sorting; index, triplets/sub-queries (metatag, query, and conj), and sort pairs (sort, and order). Getting Started Tutorial (FC) ============================= Install the authors swish-e RPM. The Swish-e index will be in /var/swish-e, and will be named index.swish-e. Because of the Swish-e index file location, set the mod_searchm configuration directive SearchmAbsolute ON and SearchmStrict OFF. This will allow the parameter index= value to be an absolute file specification. As a general rule, anticipate using mod_searchm with SearchmStrict ON. When accessing the search.html page, enter /var/swish-e/index.swish-e into the index= field. Otherwise, use the swish target in the mod_searchm project Makefile to generate the Swish-e index file index.swish-e. Set SearchmAbsolute OFF, SearchmRelative OFF, and SearchmStrict ON. The default value in the search.html index= field should be correct. From the localhost, access http://localhost/test/searchm/search.html file. Adjust the search.html form field index as required. Hit submit. What is going on? The search.html page generates a request to mod_searchm instructing it to search the index.swish-e index for the word "swish-e". The results are requested to be displayed 10 per page and page 1 to be displayed. Because page numbers start at 0, this is actually the second page. The results will be sorted by rank, in descending order (highest rank first). MOD_SEARCHM TAGS AND TAG REPLACEMENT ==================================== Tags are of the form: where tag is either searchm_tag or swishtag. The entire tag is case sensitive. The tag may NOT span a line. Backslash line continuation within a searchm tag is not recognized or supported. When a tag is replaced in one of the SearchmBegin, SearchmResult, SearchmEnd, or SearchmNoHits files, the entire is removed, and the value of that tag is substituted. Mod_Searchm determines the value of the tag based on the results of the swish-e results properties and the parameters calculated by mod_searchm during the request. All tags are strings which may represent an integer or an actual string. A tags value is always a string. Integer values are returned as a string containing only digits. String values may contain alphanumeric characters. For example. The searchm_qpage tag usually substitutes with an integer. The actual substitution is a string containing only numeric characters. However, if the a value was not given on the query/command-line, then the string "none" will be returned. This distinguishes between a value of "0" given on the query/command-line, and a missing page= parameter. Invalid tags are sent without alteration. Design Consideration: Invalid tags are replaced with the string "Invalid Tag." For now, an invalid tag will probably be treated as a comment by the rendering browser. MOD_SEARCHM TAGS SERVER CONFIGURATION ===================================== searchm_cbegin The configuration directive SearchmBegin setting. Valid after successful server configuration for request. searchm_cresult The configuration directive SearchmResult setting. Valid after successful server configuration for request. searchm_cend The configuration directive SearchmEnd setting. Valid after successful server configuration for request. searchm_cnohits The configuration directive SearchmNoHits setting. Valid after successful server configuration for request. searchm_cabsolute integer The SearchmAbsolute setting for this query. Valid after successful server configuration for request. searchm_crelative integer The SearchmRelative setting for this query. Valid after successful server configuration for request. searchm_cstrict integer The SearchmStrict setting for this query. Valid after successful server configuration for request. searchm_cindexdir string The SearchmIndexDir server setting for this query. May be none. Valid after successful server configuration for request. searchm_date string The time the server processes the request. Printed as a ctime string. Valid after successful server configuration for request. Note: Using this tag may cause certain security risks. Especially for security depending on a time skews or time differences. MOD_SEARCHM TAGS COMMAND/QUERY LINE =================================== searchm_qpage integer The page= parameter value, if any was given on the command /query line. "none" if not given on the command/query line. Valid after successful command/query line parsing. searchm_qdisplay integer The display= parameter value, if any was given on command/ query line. "none" if not given one command/ query line. A requery will always equal "none". Valid after successful command/query line parsing. searchm_qqueryN string For sub-query N, the queryN given on the command/query line. If an empty value given, then an empty string is returned. "none" if no queryN given. A requery will always equal "none". Valid after successful command/query line parsing. searchm_qmetatagN string For sub-query N, the metatagN value given on the command/query line. If an empty value given, then an empty string is returned. "none" if no metatagN given. A requery will always equal "none". Valid after successful command/query line parsing. searchm_qconjN string For sub-query N, the conjN value given on the command/query line. If an empty value given, then an empty string is returned. "none" if no conjN given. A requery will always equal "none". Valid after successful command/query line parsing. searchm_qsortN string The string parsed from the query/command line sortN value. If an empty value given, then an empty string is returned. "none" if none given. A requery will always equal "none". Valid after successful command/query line parsing. searchm_qorderN string The sort order parsed from the query/command line orderN value. One of "asc", "desc", "none". If an empty value given, then an empty string is returned. "none" if order is not given on the command/query line. A requery will always equal "none". Valid after successful command/query line parsing. searchm_qindexN string The complete filename of the Nth index file element from the sorted list. The value of N is bounded by 0 <= N < SEARCHM_MAX_INDEX. A requery will always equal "none". Valid after successful command/query line parsing. searchm_subqueryN string For sub-query N, the triplets metatagN, queryN, and conjN combined. "none" if sub-query N not created. A requery will always equal "none". Valid after successful command/query line parsing. searchm_query string The actual string used in the SwishExecute function. Valid after successful command/query line parsing. searchm_within string A space separated string representing the structure value passed to SwishSetStructure. "none" if nothing set. Valid after successful command/query line parsing. searchm_sort string The string passed to SwishSetSort. This string is created from sort pairs. "none" if invalid or none given. Valid after successful command/query line parsing. searchm_index string The string passed to the Swish-e C API SwishInit function. Valid after successful command/query line parsing. searchm_qcmdstring string The command/query line as received for this request. All visible '+' chars are changed to ' '. Valid after request validated. searchm_reqid string Valid after successful results. A unique ID (encoded for URL) to identify a a specific search thread. Will remain constant among all returned pages for a specific search thread. searchm_limitN string A yes value means that limitN was passed to SwishSetSearchLimit. A no value means that limitN was not given, or was not passed to SwishSetSearchLimit. searchm_low_limitN string The actual char * low parameter passed to SwishSetSearchLimit. "none" if searchm_limitN "no". searchm_up_limitN string The actual char * hi paramter passed to SwishSetSearchLimit. "none" if searchm_limitN "no". searchm_prop_limitN string The actual char * property parameter passed to SwishSetSearchLimit. "none" if searchm_limitN "no". MOD_SEARCHM TAGS PRE NO HITS DETERMINATION ========================================== searchm_dbnameN string The IndexName field in SWISH-CONFIG for index file element N. The value of N is bounded by 0 <= N < SEARCHM_MAX_INDEX. "none" if none set. Valid after successful SwishInit. searchm_dbdescN string The IndexDescription field in SWISH-CONFIG for index file element N. The value of N is bounded by 0 <= N < SEARCHM_MAX_INDEX. "none" if none set. Valid after successful SwishInit. searchm_dbpointerN string The IndexPointer field in SWISH-CONFIG for index file element N. The value of N is bounded by 0 <= N < SEARCHM_MAX_INDEX. "none" if none set. Valid after successful SwishInit. searchm_dbadminN string The IndexAdmin field in SWISH-CONFIG for index file element N. The value of N is bounded by 0 <= N < SEARCHM_MAX_INDEX. "none" if none set. Valid after successful SwishInit. MOD_SEARCHM CALCULATED TAGS AFTER HITS ====================================== searchm_display integer The calculated number of results displayed per page. Not necessarily always equal to searchm_qdisplay. Valid after successful SwishExecute. searchm_page integer The page number being displayed. Page numbers start at 1. Not necessarily always equal to searchm_qpage. Valid after successful SwishExecute. searchm_hits integer The total number of results generated by the search/ query/ request. Valid after successful SwishExecute. searchm_pagenext integer The next page number. Assumes page numbers start at 1. Same as searchm_total_pages if no next. Valid after successful SwishExecute. Design Consideration: If currently viewing the last page, then pagenext would point to first page, instead of last page. searchm_pageprev integer The previous page number. Assumes page numbers start at 1. Same as searchm_pagefirst if no previous page. Valid after successful SwishExecute. Design Consideration: If currently viewing the first page, then pageprev would point to last page, instead of first page. searchm_total_pages integer The total number of pages (full and/or partial) required to display the results. Assumes page numbers starts at 1. Valid after successful SwishExecute. searchm_full_pages integer The total number of full pages required to display the results. Assumes page numbers starts at 1. If the last page is a partial page, then this will be one less than total. If the last page is a full page, then this will be equal total. Valid after successful SwishExecute. searchm_start integer The start record for the page. Assumes record numbers start at 1. Valid after successful SwishExecute. searchm_end integer The end record for the page. Assumes record numbers start at 1. Valid after successful SwishExecute. searchm_last_page integer The number of results to be displayed on the last page. If zero, then last page is a full page. Valid after successful SwishExecute. searchm_hilitem_string string The term list for swishdefault and NULL metanames converted into a QUERY_STRING suitable for passing to mod_hilitem. If the search does not use swishdefault or NULL metaname, then this is an empty string. Valid during results phase. searchm_parsed_words string SwishParsedWords output for this result. Valid during results phase. searchm_removed_words string SwishRemovedWords output for this result. Valid during results phase. searchm_description string The modified Swish-e auto property swishdescription tag. Valid during results phase. An expensive tag to implement. Calling more than once per result is not encouraged. MOD_SEARCHM ERROR TAG ===================== searchm_error string The error, if any. Only valid in NoHits Phase. Currently defined if the request was missing search terms, or if the request was missing an index file to search, or if the request attempted to search too many index files, or if the request resulted in nohits. SWISH-E TAGS ============ Swish-e tags are explicitly defined as a Swish-e document auto property, or, as a Swish-e document user property. During result phase replacement file processing, any tag that is not a valid searchm_tag assumed to be either a Swish-e document auto property or a Swish-e document user property. Furthermore, a set of Swish-e index files searched during a query are not required to contain exactly the same Swish-e document auto properties and/or Swish-e document user properties. SWISH-E TAGS DOCUMENT AUTO PROPERTIES ===================================== Swish-e defines document auto properties. Mod_searchm defines alternate specific searchm_tags for some Swish-e document auto properties. If the Swish-e auto property is unavailable, then 'none' is returned. The swishrank, swishtitle, swishdocsize, and swishdbfile tags are default for Swish-e index files. The searchm_description tag value is the modified swishdescription value. Example: The swishtitle tag is a Swish-e document auto property. During result phase replacement file processing it will be assumed to be a document property because it is not searchm_tag. The tag will be replaced by the title of the document creating the result or none. Example: The keywords metaname is neither a searchm_tag or a Swish-e auto property. It is assumed to be a user property. If keywords is available for the document, the tag will be replaced by the Swish-e keywords value. Otherwise, it will be considered to be an invalid tag. SWISH TAGS SWISHDESCRIPTION =========================== Some Swish-e tags are dependent on the type of file being returned. The 'swishdescription' tag (See StoreDescription in SWISH-CONFIG) will return the description for index files with the description present. If the swishdescription tag is unavailable, then 'none' is returned, instead of an invalid tag. MOD_SEARCHM SEARCHM_DESCRIPTION =============================== Mod_searchm defines a 'searchm_description' tag. If the 'swishdescription' tag is available, mod_searchm intercepts the results of the 'swishdescription' tag and process the data as follows: Only the range of text that contains the search words are returned. Only the swishdefault and NULL metanames are ranged. If no search terms match, then up to the first SEARCHM_DESCRIPTION_MAX characters of swishdescription are returned. Design Consideration: If no search terms match, then "none" is returned. If a PCRE error happened, then that error is returned. "none" is returned if an internal error happened. Currently, the implementation is minimal. Supports only ASCII documents. Supports only results being rendered in an HTML document. The compile time SEARCHM_DESCRIPTION_PRE, SEARCHM_DESCRIPTION_POST, and SEARCHM_DESCRIPTION_MAX values may be adjusted for some refinement of the output. The SEARCHM_DESC_HTML_HILITE_BEGIN and SEARCHM_DESC_HTML_HILITE_END compile time options (see searchm.h) may be used to enclose the matched search terms within html tags. The default is and . The searchm_description_handler function processes the swishdescription value for the searchm_description tag. Hopefully, the function parameters/ arguments are sufficient for site/ installation specific implementation and customization. Based on SwishParsedWords, an internal metaname list is created which associates search terms/words and search terms/phrases for a specific result with metanames. Based on the metaname "swishdefault" and an assumed default (NUL), a terms list is created, and is the p_tl parameter of the searchm_description_handler function. The terms list contains a list of terms, along with the compiled RE. An empty (p_tl->tl==NULL) terms list indicates no terms for this result are associated with the swishdefault and/or NULL metanames. See searchm_termelem_t, searchm_termlist_t for more information. The terms in the list may not correspond directly with the terms entered for the query. Due to the final RE constructed, some terms may be duplicates, and are dropped. Terms of less than two characters are removed. char * searchm_description_handler( apr_pool_t * pool, /* request pool */ apr_pool_t * ptemp, /* temporary pool, valid only during this result */ const char * desc, /* The complete Swish-e swishdescription property for this result*/ const char ** stopword_list, /* The SwishParsedWords result, alloc from ptemp */ const char ** removed_list, /* The SwishRemovedStopwords results, alloced from ptemp */ p_searchm_termlist_t const p_tl); /* the terms list */ Returns the string to replace the searchm_description tag. MOD_SEARCHM/ MOD_HILITEM INTEGRATION ==================================== From the terms list for the "swishdefault" and/or "NULL" metaname, a searchm_hilitem_string tag is created. Searches not using a swishdefault and/or NULL metaname will result in an emtpy searchm_hilitem_string tag. Current Implementation: The searchm_hilitem_string is an amperstand separated list of search term regular expressions, suitable to be passed to mod_hilitem. Each RE has 3 subexpressions, and the second (middle) is the subexpression used to match the term. RE template: (^|[[:space::]])([^[:space:]]*TERM[^[:space:]]*)([[:space:]]|$) Ex: swishdefault=(greenacres bluecanyon redrock heaven) creates the searchm_hilitem_string (^|[[:space::]])([^[:space:]]*greenacres[^[:space:]]*)([[:space:]]|$)& (^|[[:space::]])([^[:space:]]*bluecanyon[^[:space:]]*)([[:space:]]|$)& (^|[[:space::]])([^[:space:]]*redrock[^[:space:]]*)([[:space:]]|$)& (^|[[:space::]])([^[:space:]]*heaven[^[:space:]]*)([[:space:]]|$) Note that the actual string is an encoded version, which uses hex triplets to encode all non isalnum characters. Later modifications may pass only the terms, and assume mod_hilitem will construct the actual RE. Ex: swishtitle=(swish) results in an empty searchm_hilitem_string. MOD_SEARCHM SWISH TAGS HEADER VALUES ==================================== For a particular version of Swish-e, certain index file header values are available. As a side note, this implies that the Swish-e version used to create the index files must be the Swish-e C API version used to search the index files. The SwishHeaderNames function returns a list of available header names. The prehits tag searchm_header_names returns a comma separated list of all available header names. During the results phase (ONLY!), the header values are available as a searchm_hv_NAME tags where NAME is the actual header name, with any spaces substituted with an ASCII underscore '_'. searchm_hv_ is an invalid tag. Example: The searchm_hv_Maintained_by tag would substitute the Swish-e header name "Maintained by" header value. TERMS ===== Http request: the request generated by and submitted by a client. Typically by a form utilizing the POST method. For client convenience, the GET method avoids re-submit issues. However, some restrictions regarding the length of a GET request may cause data loss or a an unsuccessful query. In almost all situations, GET should work also. URL QUERY_STRING: The query string portion of the request URL. command/query line: the query_string portion of the request URL. request: The http request in general. query: The set of conditions under which the swish-e index files are searched. parameter: the first portion of a parameter=value pair in the query string portion of the URL. These are typically separated by an ASCII ampersand (&). Referred to as parameter index=. parameter value: the second portion of a parameter=value pair in the query string portion of the URL. Typically referred to as parameter index= value. fragment: not used in and ignored by mod_searchm. MOD_SEARCHM QUERY PARAMETERS ============================ Any unrecognized parameter is an error. display=positive_integer number of results to display on a page display==0 defaults to all results on one page. The actual number of results displayed per page may be different. Multiple definitions not accepted. page=positive_integer the page number to display page numbers start at page 1 The actual page number displayed may be different. Multiple definitions are ignored. within= [file | title | head | body | comments | header | emphasized | meta | all] Search inside the structure. Multiple specifications accumulate. Only last definition of duplicate within=value pairs accepted. Incorrect is ignored. See SwishSetStructure in the Swish-e C API docs. indexN=file_name An index within which to search. Possibly opens certain security holes. Interpretation depends on mod_searchm Apache2 configuration directives. Empty values are ignored. The N assigns the file_name a weight, and determines the order of the index file(s). -(INT_MAX-1)