Forum search discards too many common words

Discussions related to the forums itself. Call for moderators. Trash Posts area.
Koub
Global Moderator
Global Moderator
Posts: 7987
Joined: Fri May 30, 2014 8:54 am
Contact:

Forum search discards too many common words

Post by Koub »

I just ran an advanced search looking for the old, mod, and version words (search.php?keywords=old+mod+version&ter ... mit=Search)

Result came back ignoring old and mod, and flooding me with 50+ pages of posts containing the "version" word (see below).
2025-09-25 15_33_10-Factorio Forums - Search et 26 pages de plus - Profil 1 – Microsoft​ Edge.jpg
2025-09-25 15_33_10-Factorio Forums - Search et 26 pages de plus - Profil 1 – Microsoft​ Edge.jpg (12.4 KiB) Viewed 267 times
Would it be possible to set up the forum search no accept any string with length >= 2 during a search please ? I'm a heavy search user, and I really need to be able to search even common words.
Koub - Please consider English is not my native language.
Loewchen
Global Moderator
Global Moderator
Posts: 10394
Joined: Wed Jan 07, 2015 5:53 pm
Contact:

Re: Forum search discards too many common words

Post by Loewchen »

Happened around the last forum update: viewtopic.php?p=661790#p661790.
User avatar
Sanqui
Factorio Staff
Factorio Staff
Posts: 375
Joined: Mon May 07, 2018 7:22 pm
Contact:

Re: Forum search discards too many common words

Post by Sanqui »

Let me offer some insight into the search situation. phpBB offers three search backends that we can use: phpBB Native Fulltext, MySQL Fulltext, and Sphinx Fulltext.

We were using phpBB Native Fulltext for years. There were reports that it had multiple issues, such as not supporting words with underscores as well as often failing to find all results.

In January this year, I switched the backend to use MySQL Fulltext. This backend appears to be reliable in finding posts, and it performs better with quoted queries (supports underscores and phrases), but as this thread notes, when not using quotes it does have a limit of 4 characters per word which doesn't appear possible to change. You are able to search for the phrase "old mod version".

The third option is Sphinx, an open source search server. Its website is a bit dated as it features a "Chat on Skype" link on the homepage, but it did receive one fresh release this year, so it's not completely abandoned. What are its advantages and what exciting new issues it would surface is unclear to me without setting it up to try... so if there is interest I can bump the priority.

EDIT: There is a MySQL config variable to change the minimum word length too. I'm not sure if phpBB will understand if it's changed but it's something I will attempt.
ovo
User avatar
Sanqui
Factorio Staff
Factorio Staff
Posts: 375
Joined: Mon May 07, 2018 7:22 pm
Contact:

Re: Forum search discards too many common words

Post by Sanqui »

I've set the MySQL variables ft_min_word_len and innodb_ft_min_token_size to 2, confirmed that the setting is applied on the server, and rebuilt the post index, but so far I can't see a difference in the search results...
ovo
Amarula
Filter Inserter
Filter Inserter
Posts: 637
Joined: Fri Apr 27, 2018 1:29 pm
Contact:

Re: Forum search discards too many common words

Post by Amarula »

Ah the good old days when my search for the error message "there are no trains" returned no results because all the words were stripped out as common! Happy to report that at least now I get results when I search using quotation marks.
My own personal Factorio super-power - running out of power.
eugenekay
Filter Inserter
Filter Inserter
Posts: 726
Joined: Tue May 15, 2018 2:14 am
Contact:

Re: Forum search discards too many common words

Post by eugenekay »

Sanqui wrote: Thu Sep 25, 2025 3:23 pmThe third option is Sphinx, an open source search server. Its website is a bit dated as it features a "Chat on Skype" link on the homepage, but it did receive one fresh release this year, so it's not completely abandoned. What are its advantages and what exciting new issues it would surface is unclear to me without setting it up to try... so if there is interest I can bump the priority.
Sphinx / searchd are basically "finished software", so it is not surprising that the release cadence has slowed since 2001. Please note that Version 3 and above are no longer GPL Licensed or open source - and it is seemingly maintained maintained by a single Developer / Corporation. The Version 2 codebase was forked in 2017 to become Manticore Search. I have never tested it with phpBB, but it reportedly works good. Ready-to-install packages are provided for most major Linux Distributions. Be aware that some configuration options have been Deprecated or Renamed - which may lead to unexpected startup errors.

The biggest problem I have seen with Sphinx installations has been the Indexer choking the system on Disk Access - this was in the era before Solid State Disks were commonplace.
Koub
Global Moderator
Global Moderator
Posts: 7987
Joined: Fri May 30, 2014 8:54 am
Contact:

Re: Forum search discards too many common words

Post by Koub »

Sanqui wrote: Thu Sep 25, 2025 4:02 pm I've set the MySQL variables ft_min_word_len and innodb_ft_min_token_size to 2, confirmed that the setting is applied on the server, and rebuilt the post index, but so far I can't see a difference in the search results...
Doesn't seem to change anything, the words old, and mod are still ignored in my search :
2025-09-25 21_47_38-Window.jpg
2025-09-25 21_47_38-Window.jpg (15.2 KiB) Viewed 94 times
Note that I'm searching for any combination of the words old, mod, and version anywhere in the post, and not for the string "old mod version".
Koub - Please consider English is not my native language.
Tertius
Smart Inserter
Smart Inserter
Posts: 1430
Joined: Fri Mar 19, 2021 5:58 pm
Contact:

Re: Forum search discards too many common words

Post by Tertius »

The 3 letter words seem to be already ignored by the forum software, not even being sent to the mysql search engine. Otherwise it wouldn't be indicated as "ignored". There must be some setting to increase the length within the forum software as well. Or it read from the mysql config, cached the value and didn't update when the index length was modified within mysql.
eugenekay
Filter Inserter
Filter Inserter
Posts: 726
Joined: Tue May 15, 2018 2:14 am
Contact:

Re: Forum search discards too many common words

Post by eugenekay »

Tertius wrote: Thu Sep 25, 2025 8:01 pmThere must be some setting to increase the length within the forum software as well,
fulltext_mysql.php:

Code: Select all

// check word length
$clean_len = utf8_strlen(str_replace('*', '', $clean_word));
if (($clean_len < $this->config['fulltext_mysql_min_word_len']) || ($clean_len > $this->config['fulltext_mysql_max_word_len']))
{
	$this->common_words[] = $word;
	unset($this->split_words[$i]);
}
The fulltext_mysql_min_word_len configuration value (in phpBB) has a default Value of 4. The computed array of "common words" is later used by the actual search function to craft the "error message" shown in the Screenshot.

PHP is a hell of a drug.
Post Reply

Return to “This Forum”