Releases · parpalak/rose

04 Sep 18:44

parpalak

v3.0.0

3fd37b5

v3.0.0 Latest

Latest

In this release, the mechanism for highlighting of found words in snippets has been rewritten. In the previous implementation, these words were identified using regular expressions. That approach required the stemmer to be able to reverse the stem back into possible irregular words. After implementing formatting preservation in snippets, formatting and highlighting started to conflict. Now, the highlighting has been made more consistent.

The major version has been increased for this release because the IrregularWordsStemmerInterface has been removed due to being unnecessary. If you didn't have your own stemmer implementation, you can safely update to this version. There are no other backward compatibility breaks with the previous version.

Full Changelog: v2.1.0...v3.0.0

Assets 2

04 Sep 17:15

parpalak

v2.1.0

251778d

v2.1.0

This release includes several indexing and searching improvements, as well as enhancements for developers:

External Transaction Support for Indexing: Indexing can now be run within an external transaction, allowing Rose to avoid starting its own transaction.
PdoStorage::drop(): a new method has been added to remove all index tables. This feature is useful if your software that utilizes Rose needs to be uninstallable.
The DomExtractor class has been made more extensible.

Full Changelog: v2.0.4...v2.1.0

Assets 2

30 Apr 20:57

parpalak

v2.0.4

7fb7c2a

v2.0.4

Fixed an issue with searching words containing underscores. Previously, underscores were treated as word separators and replaced with spaces during search, even though they were considered part of words during indexing. Now underscores are considered part of words during search. This makes sense, as underscores have special meanings in programming and other technical contexts.
Improved support for formatting spanning multiple sentences when generating snippets.
Snippets can now consist of sentences with two words; previously, the minimum size was three words.
Fixed issues with URL decoding for images, which led to incorrect addresses for some images in recommendations.

Assets 2

23 Mar 14:43

parpalak

v2.0.3

19976f4

v2.0.3

Fixed "garbage" output for queries such as "..".
Fixed highlighting for words containing hyphens.

Assets 2

16 Dec 21:05

parpalak

v2.0.2

d00e67d

v2.0.2

Added versions 2.0 and 3.0 of the psr/log to composer.json to ensure compatibility.

Assets 2

29 Nov 09:54

parpalak

v2.0.1

5c8179d

v2.0.1

Fixed an unserialize() warning for ArrayStorage in PHP 8.3.
Fixed broken recommendations in v2.0 when there is an item with only a title present.

Assets 2

17 Nov 18:12

parpalak

v2.0

68e2f03

v2.0

New features
In this version, performance has been optimized for searching among a large number of items (for example, > 10 000 of items are indexed, and > 1000 of items fall under the search criteria). Two changes have been made to achieve this:

Refactored the way of storing information about words in the title and keywords. Fulltext index is now used to store them. Therefore, the keyword_index and keyword_multiple_index tables are no longer needed in the database index.
The consideration of external relevance ratios has been moved into the main query of the full-text search. In the old version, applying limit to queries did not provide any optimization because the external relevance was taken into account at the PHP code level after the query was executed.

Bugs

Fixed the missing highlighting in snippets for some English words (e.g., those ending in "y").

Backward compatibility breaking changes
Due to the refactoring, the calculation of the impact of keywords on relevance has changed. There may be slight changes in the sorting order for queries that include keywords.

Some internal interfaces have been changed, e.g. StorageReadInterface, StorageWriteInterface, IrregularWordsStemmerInterface. If you used a custom storage or a stemmer, code adjustments may be required when updating to the current version.

In other cases, it is necessary and sufficient to re-index content when updating. However, the keyword_index and keyword_multiple_index tables have to be removed manually so that these unnecessary tables do not occupy space anymore.

Assets 2

17 Nov 09:47

parpalak

v1.1

67c95db

v1.1

New features

Improved algorithm for splitting text into sentences to generate snippets.
Snippets can now retain basic formatting: bold, italic, superscript, and subscript.
Added support for PostgreSQL and SQLite databases, in addition to MySQL/MariaDB.

Bugs

Fixed deprecations in PHP 8.2.
Fixed the inability to work in disabled emulation mode for prepared statements (PDO::ATTR_EMULATE_PREPARES) with MySQL.

Assets 2

25 May 20:34

parpalak

v1.0

f1082e8

v1.0

Dropped support for PHP versions prior to 7.4.
Requires MySQL 5.7+ and MariaDB 10.2+ for database operations.

New Features

Introducing the PdoStorage::getSimilar() method for recommendation systems. It finds other indexed items similar to the provided indexed item.
Added support for storing image information in metadata.
Revamped the mechanism for extracting text from HTML pages. Custom extractors can now be created for other formats.

API Changes Breaking Backward Compatibility

The approach of influencing relevance has changed. Instead of passing the relevance ratio through the ResultSet::setRelevanceRatio() method, it should now be set during indexing using Indexable::setRelevanceRatio().
Snippet information is now saved during indexing, eliminating the need to call SnippetBuilder for query execution. Consequently, additional storage space will be required for storing snippets in the database.

Assets 2

06 Jan 12:14

parpalak

v0.4

48207ce

v0.4

Release features

Updated DB structure (PdoStorage::erase() call is required on updates):
1. Optimized indexing speed and index disk usage in DB (~1.5 times).
2. Added storing some meta-information (currently word count) for indexing texts.
Revised algorithm for calculating relevance. Now the following factors are taken into account:
1. The abundance of words for calculating pairwise relevance (proximity relevance).
2. The size of indexed text (see below)
Improved algorithm of choosing sentences for snippets (the abundance of words is taken into account, see #20).
Refinements in Russian stemmer.

The size of indexed text affects relevance
In this release the size of indexed text itself has some impact on relevance. Texts of medium size (300...350 words) are preferred (although the factors like the number of occurances and words frequency are more important). This is done under the assumption that too short text cannot fully disclose a thought or concept, and too long text contains a lot of thoughts or concepts. This is how word count affects increasing relevance:

The size of the indexed text affects relevance
In this release, the size of the indexed text itself has some impact on relevance. Texts of medium size (300 to 350 words) are preferred, although factors like the number of occurrences and word frequency are more important. This is based on the assumption that a text that is too short cannot fully convey a thought or concept, and a text that is too long may contain multiple thoughts or concepts. This is how word count affects the increase in relevance:

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: parpalak/rose

v3.0.0

v2.1.0

v2.0.4

v2.0.3

v2.0.2

v2.0.1

v2.0

v1.1

v1.0

v0.4