Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ Because there is no official TIGER reference implementation, we compare on Beaut
Following the original setup, we report NDCG (N) and Hit Rate (H) with @5, @10, and additionaly @20 cutoffs.


| Model | Dataset | N@5 | N@10 | N@20 | H@5 | H@10 | N@20 |
| Model | Dataset | N@5 | N@10 | N@20 | H@5 | H@10 | H@20 |
|--------|---------|-------------|-------------|-------------|-------------|-------------|-------------|
| SASRec | Beauty | 0.02087 | 0.02718 | 0.03447 | 0.03197 | 0.051647 | 0.08071 |
| TIGER | Beauty | **0.02524** | **0.03191** | **0.03940** | **0.03756** | **0.05822** | **0.08800** |
Expand Down
4 changes: 2 additions & 2 deletions notebooks/DatasetProcessing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"interactions_dataset_path = '../data/Beauty/Beauty_5.json'\n",
"metadata_path = '../data/Beauty/metadata.json'\n",
"\n",
"interactions_output_path = '../data/Beauty/inter_new.json'\n",
"interactions_output_path = '../data/Beauty/inter.json'\n",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

это вообще странная штука,
во-первых, даже тут https://zenodo.org/records/17351848 лежит и inter, и inter_new,
во-вторых, они различаются (посмотрел дифф), так что может быть в этом был какой-то смысл

minor: https://zenodo.org/records/17351848 отсюда без впн невозможно было что-то скачать (очень долго)

"embeddings_output_path = '../data/Beauty/content_embeddings.pkl'"
]
},
Expand Down Expand Up @@ -287,7 +287,7 @@
"metadata": {},
"outputs": [],
"source": [
"data = get_data(pl.from_pandas(df), item_ids_mapping)"
"data = get_data(pl.from_pandas(df), item_ids_mapping_df)"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

запустил, ругался, что пытаешься работать со словариком вместо df
в сигнатуре функции и в старой IRec версии тоже стояло с df

]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

второй интересный момент, что я вообще не смог сбилдить эмбеддинги этим токенизатором, поймал

OSError: huggyllama/llama-7b does not appear to have a file named model-00001-of-00002.safetensors. Checkout 'https://huggingface.co/huggyllama/llama-7b/tree/main' for available files.

может, потому что запускал из datasphere, а нужен впн тут (файлы-то есть вроде там, только тяжелые очень),
запускал с такими зависимостями:

%pip install polars==1.33.1
%pip install numpy==2.3.3
%pip install pyarrow==17.0.0
%pip install pandas==2.2.3
%pip install transformers==4.56.1
%pip install sentencepiece==0.2.1

},
{
Expand Down
4 changes: 4 additions & 0 deletions tiger/modeling/trainer/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ def train(self):
LOGGER.debug('Start training...')

while (step_num < 200_000):
if self._epoch_cnt is not None and epoch_num >= self._epoch_cnt:
LOGGER.debug(
'Reached the maximum number of epochs ({}). Finish training'.format(self._epoch_cnt))
break
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

без этого прокидывание max_epoch_cnt в конфиг не работало

if best_epoch + self._epochs_threshold < epoch_num:
LOGGER.debug(
'There is no progress during {} epochs. Finish training'.format(self._epochs_threshold))
Expand Down