Releases: parpalak/rose
v3.0.0
In this release, the mechanism for highlighting of found words in snippets has been rewritten. In the previous implementation, these words were identified using regular expressions. That approach required the stemmer to be able to reverse the stem back into possible irregular words. After implementing formatting preservation in snippets, formatting and highlighting started to conflict. Now, the highlighting has been made more consistent.
The major version has been increased for this release because the IrregularWordsStemmerInterface
has been removed due to being unnecessary. If you didn't have your own stemmer implementation, you can safely update to this version. There are no other backward compatibility breaks with the previous version.
Full Changelog: v2.1.0...v3.0.0
v2.1.0
This release includes several indexing and searching improvements, as well as enhancements for developers:
- External Transaction Support for Indexing: Indexing can now be run within an external transaction, allowing Rose to avoid starting its own transaction.
- PdoStorage::drop(): a new method has been added to remove all index tables. This feature is useful if your software that utilizes Rose needs to be uninstallable.
- The DomExtractor class has been made more extensible.
Full Changelog: v2.0.4...v2.1.0
v2.0.4
- Fixed an issue with searching words containing underscores. Previously, underscores were treated as word separators and replaced with spaces during search, even though they were considered part of words during indexing. Now underscores are considered part of words during search. This makes sense, as underscores have special meanings in programming and other technical contexts.
- Improved support for formatting spanning multiple sentences when generating snippets.
- Snippets can now consist of sentences with two words; previously, the minimum size was three words.
- Fixed issues with URL decoding for images, which led to incorrect addresses for some images in recommendations.
v2.0.3
v2.0.2
v2.0.1
v2.0
New features
In this version, performance has been optimized for searching among a large number of items (for example, > 10 000 of items are indexed, and > 1000 of items fall under the search criteria). Two changes have been made to achieve this:
-
Refactored the way of storing information about words in the title and keywords. Fulltext index is now used to store them. Therefore, the keyword_index and keyword_multiple_index tables are no longer needed in the database index.
-
The consideration of external relevance ratios has been moved into the main query of the full-text search. In the old version, applying limit to queries did not provide any optimization because the external relevance was taken into account at the PHP code level after the query was executed.
Bugs
- Fixed the missing highlighting in snippets for some English words (e.g., those ending in "y").
Backward compatibility breaking changes
Due to the refactoring, the calculation of the impact of keywords on relevance has changed. There may be slight changes in the sorting order for queries that include keywords.
Some internal interfaces have been changed, e.g. StorageReadInterface, StorageWriteInterface, IrregularWordsStemmerInterface. If you used a custom storage or a stemmer, code adjustments may be required when updating to the current version.
In other cases, it is necessary and sufficient to re-index content when updating. However, the keyword_index and keyword_multiple_index tables have to be removed manually so that these unnecessary tables do not occupy space anymore.
v1.1
New features
- Improved algorithm for splitting text into sentences to generate snippets.
- Snippets can now retain basic formatting: bold, italic, superscript, and subscript.
- Added support for PostgreSQL and SQLite databases, in addition to MySQL/MariaDB.
Bugs
- Fixed deprecations in PHP 8.2.
- Fixed the inability to work in disabled emulation mode for prepared statements (PDO::ATTR_EMULATE_PREPARES) with MySQL.
v1.0
- Dropped support for PHP versions prior to 7.4.
- Requires MySQL 5.7+ and MariaDB 10.2+ for database operations.
New Features
- Introducing the PdoStorage::getSimilar() method for recommendation systems. It finds other indexed items similar to the provided indexed item.
- Added support for storing image information in metadata.
- Revamped the mechanism for extracting text from HTML pages. Custom extractors can now be created for other formats.
API Changes Breaking Backward Compatibility
- The approach of influencing relevance has changed. Instead of passing the relevance ratio through the
ResultSet::setRelevanceRatio()
method, it should now be set during indexing usingIndexable::setRelevanceRatio()
. - Snippet information is now saved during indexing, eliminating the need to call SnippetBuilder for query execution. Consequently, additional storage space will be required for storing snippets in the database.
v0.4
Release features
- Updated DB structure (
PdoStorage::erase()
call is required on updates):- Optimized indexing speed and index disk usage in DB (~1.5 times).
- Added storing some meta-information (currently word count) for indexing texts.
- Revised algorithm for calculating relevance. Now the following factors are taken into account:
- The abundance of words for calculating pairwise relevance (proximity relevance).
- The size of indexed text (see below)
- Improved algorithm of choosing sentences for snippets (the abundance of words is taken into account, see #20).
- Refinements in Russian stemmer.
The size of indexed text affects relevance
In this release the size of indexed text itself has some impact on relevance. Texts of medium size (300...350 words) are preferred (although the factors like the number of occurances and words frequency are more important). This is done under the assumption that too short text cannot fully disclose a thought or concept, and too long text contains a lot of thoughts or concepts. This is how word count affects increasing relevance:
The size of the indexed text affects relevance
In this release, the size of the indexed text itself has some impact on relevance. Texts of medium size (300 to 350 words) are preferred, although factors like the number of occurrences and word frequency are more important. This is based on the assumption that a text that is too short cannot fully convey a thought or concept, and a text that is too long may contain multiple thoughts or concepts. This is how word count affects the increase in relevance: