-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5fa4207
commit 6c8245c
Showing
17 changed files
with
549,532 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Li Yu <buaayuli@ruc.edu.cn> | ||
Wei Gong <gongwei808@hotmail.com> | ||
Dongsong Zhang <dzhang15@uncc.edu> | ||
Yu Ding <dinggyu@163.com> | ||
Ze Fu <zf2@uncc.edu> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
Copyright 2024, Li Yu, Wei Gong, Dongsong Zhang and Yu Ding | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,59 @@ | ||
# 2023.0131 | ||
[![INFORMS Journal on Computing Logo](https://INFORMSJoC.github.io/logos/INFORMS_Journal_on_Computing_Header.jpg)](https://pubsonline.informs.org/journal/ijoc) | ||
|
||
# From Interaction to Prediction: A Multi-Interactive Attention based Approach to Product Rating Prediction | ||
|
||
This archive is distributed in association with the [INFORMS Journal on | ||
Computing](https://pubsonline.informs.org/journal/ijoc) under the [MIT License](LICENSE). | ||
|
||
The software and data in this repository are a snapshot of the software and data | ||
that were used in the research reported on in the paper | ||
[From Interaction to Prediction: A Multi-Interactive Attention based Approach to Product Rating Prediction](https://doi.org/10.1287/ijoc.2023.0131) by Yu L, Gong W, Zhang D, and Ding Y. | ||
|
||
|
||
## Cite | ||
|
||
To cite the contents of this repository, please cite both the paper and this repo, using their respective DOIs. | ||
|
||
https://doi.org/10.1287/ijoc.2023.0131 | ||
|
||
https://doi.org/10.1287/ijoc.2023.0131.cd | ||
|
||
Below is the BibTex for citing this snapshot of the repository. | ||
|
||
``` | ||
@misc{Yu2024, | ||
author = {Li Yu, Wei Gong, Dongsong Zhang and Yu Ding}, | ||
publisher = {INFORMS Journal on Computing}, | ||
title = {From Interaction to Prediction: A Multi-Interactive Attention based Approach to Product Rating Prediction}, | ||
year = {2024}, | ||
doi = {10.1287/ijoc.2023.0131.cd}, | ||
url = {https://github.com/INFORMSJoC/2023.0131}, | ||
note = {Available for download at https://github.com/INFORMSJoC/2023.0131}, | ||
} | ||
``` | ||
|
||
## Description | ||
Despite increasing research on product rating prediction, very few studies have considered user-item interaction relationships at multiple levels. To address this critical limitation, we propose a novel Rating Prediction method based on Multi-Interaction Attention (RPMIA) by learning user-item interaction relationships at three levels simultaneously from online consumer reviews for predicting product ratings with reasonable interpretability. Specifically, RPMIA first deploys a multi-head cross attention mechanism to capture the interaction between contexts of items and users. Then, it uses a bi-layer gate-based mechanism to extract aspects of users and items, and a self-attention mechanism is further used to learn their interaction at the aspect level. | ||
Finally, the aspects of users and items are coupled together to form meaningful user-item aspect pairs (UIAP) via a joint attention. A multi-task predictor that integrates a factorization machine and a feedforward neural network is designed to generate a rating prediction. We have empirically evaluated RPMIA with seven real-world datasets. The results demonstrate that RPMIA outperforms the state-of-the-art methods consistently and significantly. We also conduct a user study to assess the interpretability of the RPMIA method. | ||
|
||
<img width="841" alt="" src="figs/model.png"> | ||
|
||
## Data and instructions to run RPMIA | ||
|
||
We used seven publicly available [OCR datasets](http://jmcauley.ucsd.edu/data/amazon/) collected from Amazon.com on different products, including music instruments, office products, digital music, grocery & gourmet food,video games, tools & home improvement, and sports and outdoors products, which have been widely used for recommendation evaluation in previous studies.These datasets consist of consumers’ product ratings ranging from 1 to 5 and corresponding textual reviews. Each consumer or item has at least 5 reviews. All the datasets are in folder `dataset/` (where `dataset_D`->Digital Music,`dataset_G`->Grocery and Gourmet Food,`dataset_M`->Musical Instruments,`dataset_O`->Office Products,`dataset_S`->Sports and Outdoors,`dataset_T`->Tools and Home Improvement,`dataset_V`->Video Games. In this repository, we use `dataset_G` as an example). | ||
|
||
The code `src/RPMIA.py` implements and RPMIA model. In this implementation, we define five modules, including an embedding module, a context-aware module, an aspect-aware module, a UIAP-aware module, and a prediction module. | ||
|
||
The package `_auxiliaryTools/` implements data preprocessing of datasets, where code `ExtractData.py` implements extracting dataset from file, `Evaluation.py` defines a function named `get_test_list_mask` to batch process test data (including user ID, item ID, rating, etc.) and related user reviews, item reviews, and mask information according to a given batch size, and return multiple lists containing batches of different types of data. | ||
|
||
Use the following command to run the code: | ||
``` | ||
python src/RPMIA.py | ||
``` | ||
|
||
|
||
## Prerequisties (please install the following packages before you run our RPMIA model) | ||
- tensorflow 1.15 | ||
- python 3.6 | ||
- numpy 1.19.2 | ||
- pandas 1.1.5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
import math | ||
|
||
|
||
def get_test_list_mask(batch_size, test_rating, user_reviews, item_reviews, user_masks, item_masks): | ||
user_test_batchs, item_test_batchs, user_input_test_batchs, item_input_test_batchs, user_mask_test_batchs, item_mask_test_batchs, rating_input_test_batchs = [], [], [], [], [], [], [] | ||
for count in range(int(math.ceil(len(test_rating) / float(batch_size)))): | ||
user_test, item_test, user_input_test, item_input_test, user_mask_test, item_mask_test, rating_input_test = [], [], [], [], [], [], [] | ||
for idx in range(batch_size): | ||
index = (count * batch_size + idx) | ||
if (index >= len(test_rating)): | ||
break | ||
rating = test_rating[index] | ||
user_test.append(rating[0]) | ||
item_test.append(rating[1]) | ||
user_input_test.append(user_reviews.get(rating[0])) | ||
item_input_test.append(item_reviews.get(rating[1])) | ||
user_mask_test.append(user_masks.get(rating[0])) | ||
item_mask_test.append(item_masks.get(rating[1])) | ||
rating_input_test.append([rating[2]]) | ||
user_test_batchs.append(user_test) | ||
item_test_batchs.append(item_test) | ||
user_input_test_batchs.append(user_input_test) | ||
item_input_test_batchs.append(item_input_test) | ||
user_mask_test_batchs.append(user_mask_test) | ||
item_mask_test_batchs.append(item_mask_test) | ||
rating_input_test_batchs.append(rating_input_test) | ||
#print count, len(item_input_test_batchs[count]) | ||
return user_test_batchs, item_test_batchs, user_input_test_batchs, item_input_test_batchs, user_mask_test_batchs, item_mask_test_batchs, rating_input_test_batchs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
import numpy as np | ||
import scipy.sparse as sp | ||
import pandas as pd | ||
from time import time | ||
import os | ||
|
||
class Dataset(object): | ||
"'extract dataset from file'" | ||
|
||
def __init__(self, max_length, path): | ||
self.word_id_dict = self.load_word_dict(path + "WordDict.out") | ||
print ("wordId_dict finished") | ||
self.userReview_dict, self.userMask_dict = self.load_reviews(max_length, len(self.word_id_dict), path + "UserReviews.out") | ||
self.itemReview_dict, self.itemMask_dict = self.load_reviews(max_length, len(self.word_id_dict), path + "ItemReviews.out") | ||
print ("load reviews finished") | ||
self.num_users, self.num_items = len(self.userReview_dict), len(self.itemReview_dict) | ||
self.trainMtrx = self.load_ratingFile_as_mtrx(path + "TrainInteraction.out") | ||
self.testRatings = self.load_ratingFile_as_list(path + "TestInteraction.out") | ||
|
||
def load_word_dict(self, path): | ||
wordId_dict = {} | ||
|
||
with open(path, "r") as f: | ||
line = f.readline().replace("\n", "") | ||
while line != None and line != "": | ||
arr = line.split("\t") | ||
wordId_dict[arr[0]] = int(arr[1]) | ||
line = f.readline().replace("\n", "") | ||
|
||
return wordId_dict | ||
|
||
def load_reviews(self, max_doc_length, padding_word_id, path): | ||
entity_review_dict = {} | ||
entity_mask_dict = {} | ||
|
||
with open(path, "r") as f: | ||
line = f.readline().replace("\n", "") | ||
while line != None and line != "": | ||
review = [] | ||
mask = [] | ||
arr = line.split("\t") | ||
entity = int(arr[0]) | ||
word_list = arr[1].split(" ") | ||
|
||
for i in range(len(word_list)): | ||
if (word_list[i] == "" or word_list[i] == None or (not (word_list[i] in self.word_id_dict))): | ||
continue | ||
review.append(self.word_id_dict.get(word_list[i])) | ||
mask.append(1.0) | ||
if (len(review) >= max_doc_length): | ||
break | ||
if (len(review) < max_doc_length): | ||
review, mask = self.padding_word(max_doc_length, padding_word_id, review, mask) | ||
entity_review_dict[entity] = review | ||
entity_mask_dict[entity] = mask | ||
line = f.readline().replace("\n", "") | ||
return entity_review_dict, entity_mask_dict | ||
|
||
def padding_word(self, max_size, max_word_idx, review, mask): | ||
review.extend([max_word_idx]*(max_size - len(review))) | ||
mask.extend([0.0] * (max_size - len(mask))) | ||
return review, mask | ||
|
||
def load_ratingFile_as_mtrx(self, file_path): | ||
mtrx = sp.dok_matrix((self.num_users, self.num_items), dtype=np.float32) | ||
with open(file_path, "r") as f: | ||
line = f.readline() | ||
line = line.strip() | ||
while line != None and line != "": | ||
arr = line.split("\t") | ||
user, item, rating = int(arr[0]), int(arr[1]), float(arr[2]) | ||
if (rating > 0): | ||
mtrx[user, item] = rating | ||
line = f.readline() | ||
|
||
return mtrx | ||
|
||
def load_ratingFile_as_list(self, file_path): | ||
rateList = [] | ||
|
||
with open(file_path, "r") as f: | ||
line = f.readline() | ||
while line != None and line != "": | ||
arr = line.split("\t") | ||
user, item = int(arr[0]), int(arr[1]) | ||
rate = float(arr[2]) | ||
rateList.append([user, item, rate]) | ||
line = f.readline() | ||
|
||
return rateList | ||
|
||
if __name__ == "__main__": | ||
os.environ["CUDA_VISIBLE_DEVICES"] = "1" | ||
latent_dim = 25#denote as k | ||
word_latent_dim = 300#denote as d in paper | ||
window_size = 3#denote as c in paper | ||
max_doc_length = 300 | ||
|
||
# loading data | ||
firTime = time() | ||
dataSet = Dataset(max_doc_length, "output/") | ||
word_dict, user_reviews, item_reviews, user_masks, item_masks, train, testRatings = dataSet.word_id_dict, dataSet.userReview_dict, dataSet.itemReview_dict, dataSet.userMask_dict, dataSet.itemMask_dict, dataSet.trainMtrx, dataSet.testRatings | ||
secTime = time() | ||
|
||
word_dict_df = pd.DataFrame([word_dict]) | ||
user_reviews_df = pd.DataFrame([user_reviews]) | ||
item_reviews_df = pd.DataFrame([item_reviews]) | ||
|
||
num_users, num_items = train.shape | ||
print ("load data: %.3fs" % (secTime - firTime)) | ||
print (num_users, num_items) | ||
|
Empty file.
Oops, something went wrong.