I just ran an advanced search looking for the old, mod, and version words (search.php?keywords=old+mod+version&ter ... mit=Search)
Result came back ignoring old and mod, and flooding me with 50+ pages of posts containing the "version" word (see below).
Would it be possible to set up the forum search no accept any string with length >= 2 during a search please ? I'm a heavy search user, and I really need to be able to search even common words.
Forum search discards too many common words
Forum search discards too many common words
Koub - Please consider English is not my native language.
Re: Forum search discards too many common words
Happened around the last forum update: viewtopic.php?p=661790#p661790.
Re: Forum search discards too many common words
Let me offer some insight into the search situation. phpBB offers three search backends that we can use: phpBB Native Fulltext, MySQL Fulltext, and Sphinx Fulltext.
We were using phpBB Native Fulltext for years. There were reports that it had multiple issues, such as not supporting words with underscores as well as often failing to find all results.
In January this year, I switched the backend to use MySQL Fulltext. This backend appears to be reliable in finding posts, and it performs better with quoted queries (supports underscores and phrases), but as this thread notes, when not using quotes it does have a limit of 4 characters per word which doesn't appear possible to change. You are able to search for the phrase "old mod version".
The third option is Sphinx, an open source search server. Its website is a bit dated as it features a "Chat on Skype" link on the homepage, but it did receive one fresh release this year, so it's not completely abandoned. What are its advantages and what exciting new issues it would surface is unclear to me without setting it up to try... so if there is interest I can bump the priority.
EDIT: There is a MySQL config variable to change the minimum word length too. I'm not sure if phpBB will understand if it's changed but it's something I will attempt.
We were using phpBB Native Fulltext for years. There were reports that it had multiple issues, such as not supporting words with underscores as well as often failing to find all results.
In January this year, I switched the backend to use MySQL Fulltext. This backend appears to be reliable in finding posts, and it performs better with quoted queries (supports underscores and phrases), but as this thread notes, when not using quotes it does have a limit of 4 characters per word which doesn't appear possible to change. You are able to search for the phrase "old mod version".
The third option is Sphinx, an open source search server. Its website is a bit dated as it features a "Chat on Skype" link on the homepage, but it did receive one fresh release this year, so it's not completely abandoned. What are its advantages and what exciting new issues it would surface is unclear to me without setting it up to try... so if there is interest I can bump the priority.
EDIT: There is a MySQL config variable to change the minimum word length too. I'm not sure if phpBB will understand if it's changed but it's something I will attempt.
ovo
Re: Forum search discards too many common words
I've set the MySQL variables ft_min_word_len and innodb_ft_min_token_size to 2, confirmed that the setting is applied on the server, and rebuilt the post index, but so far I can't see a difference in the search results...
ovo
Re: Forum search discards too many common words
Ah the good old days when my search for the error message "there are no trains" returned no results because all the words were stripped out as common! Happy to report that at least now I get results when I search using quotation marks.
My own personal Factorio super-power - running out of power.
Re: Forum search discards too many common words
Sphinx / searchd are basically "finished software", so it is not surprising that the release cadence has slowed since 2001. Please note that Version 3 and above are no longer GPL Licensed or open source - and it is seemingly maintained maintained by a single Developer / Corporation. The Version 2 codebase was forked in 2017 to become Manticore Search. I have never tested it with phpBB, but it reportedly works good. Ready-to-install packages are provided for most major Linux Distributions. Be aware that some configuration options have been Deprecated or Renamed - which may lead to unexpected startup errors.Sanqui wrote: Thu Sep 25, 2025 3:23 pmThe third option is Sphinx, an open source search server. Its website is a bit dated as it features a "Chat on Skype" link on the homepage, but it did receive one fresh release this year, so it's not completely abandoned. What are its advantages and what exciting new issues it would surface is unclear to me without setting it up to try... so if there is interest I can bump the priority.
The biggest problem I have seen with Sphinx installations has been the Indexer choking the system on Disk Access - this was in the era before Solid State Disks were commonplace.
Re: Forum search discards too many common words
Doesn't seem to change anything, the words old, and mod are still ignored in my search :Sanqui wrote: Thu Sep 25, 2025 4:02 pm I've set the MySQL variables ft_min_word_len and innodb_ft_min_token_size to 2, confirmed that the setting is applied on the server, and rebuilt the post index, but so far I can't see a difference in the search results...
Note that I'm searching for any combination of the words old, mod, and version anywhere in the post, and not for the string "old mod version".
Koub - Please consider English is not my native language.
Re: Forum search discards too many common words
The 3 letter words seem to be already ignored by the forum software, not even being sent to the mysql search engine. Otherwise it wouldn't be indicated as "ignored". There must be some setting to increase the length within the forum software as well. Or it read from the mysql config, cached the value and didn't update when the index length was modified within mysql.
Re: Forum search discards too many common words
fulltext_mysql.php:Tertius wrote: Thu Sep 25, 2025 8:01 pmThere must be some setting to increase the length within the forum software as well,
Code: Select all
// check word length
$clean_len = utf8_strlen(str_replace('*', '', $clean_word));
if (($clean_len < $this->config['fulltext_mysql_min_word_len']) || ($clean_len > $this->config['fulltext_mysql_max_word_len']))
{
$this->common_words[] = $word;
unset($this->split_words[$i]);
}
PHP is a hell of a drug.