Skip to content

Commit

Permalink
Add initial code for paper (#1)
Browse files Browse the repository at this point in the history
  • Loading branch information
superrabbit11223344 authored Nov 6, 2024
1 parent 5fa4207 commit 6c8245c
Show file tree
Hide file tree
Showing 17 changed files with 549,532 additions and 1 deletion.
5 changes: 5 additions & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Li Yu <buaayuli@ruc.edu.cn>
Wei Gong <gongwei808@hotmail.com>
Dongsong Zhang <dzhang15@uncc.edu>
Yu Ding <dinggyu@163.com>
Ze Fu <zf2@uncc.edu>
20 changes: 20 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Copyright 2024, Li Yu, Wei Gong, Dongsong Zhang and Yu Ding

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

60 changes: 59 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,59 @@
# 2023.0131
[![INFORMS Journal on Computing Logo](https://INFORMSJoC.github.io/logos/INFORMS_Journal_on_Computing_Header.jpg)](https://pubsonline.informs.org/journal/ijoc)

# From Interaction to Prediction: A Multi-Interactive Attention based Approach to Product Rating Prediction

This archive is distributed in association with the [INFORMS Journal on
Computing](https://pubsonline.informs.org/journal/ijoc) under the [MIT License](LICENSE).

The software and data in this repository are a snapshot of the software and data
that were used in the research reported on in the paper
[From Interaction to Prediction: A Multi-Interactive Attention based Approach to Product Rating Prediction](https://doi.org/10.1287/ijoc.2023.0131) by Yu L, Gong W, Zhang D, and Ding Y.


## Cite

To cite the contents of this repository, please cite both the paper and this repo, using their respective DOIs.

https://doi.org/10.1287/ijoc.2023.0131

https://doi.org/10.1287/ijoc.2023.0131.cd

Below is the BibTex for citing this snapshot of the repository.

```
@misc{Yu2024,
author = {Li Yu, Wei Gong, Dongsong Zhang and Yu Ding},
publisher = {INFORMS Journal on Computing},
title = {From Interaction to Prediction: A Multi-Interactive Attention based Approach to Product Rating Prediction},
year = {2024},
doi = {10.1287/ijoc.2023.0131.cd},
url = {https://github.com/INFORMSJoC/2023.0131},
note = {Available for download at https://github.com/INFORMSJoC/2023.0131},
}
```

## Description
Despite increasing research on product rating prediction, very few studies have considered user-item interaction relationships at multiple levels. To address this critical limitation, we propose a novel Rating Prediction method based on Multi-Interaction Attention (RPMIA) by learning user-item interaction relationships at three levels simultaneously from online consumer reviews for predicting product ratings with reasonable interpretability. Specifically, RPMIA first deploys a multi-head cross attention mechanism to capture the interaction between contexts of items and users. Then, it uses a bi-layer gate-based mechanism to extract aspects of users and items, and a self-attention mechanism is further used to learn their interaction at the aspect level.
Finally, the aspects of users and items are coupled together to form meaningful user-item aspect pairs (UIAP) via a joint attention. A multi-task predictor that integrates a factorization machine and a feedforward neural network is designed to generate a rating prediction. We have empirically evaluated RPMIA with seven real-world datasets. The results demonstrate that RPMIA outperforms the state-of-the-art methods consistently and significantly. We also conduct a user study to assess the interpretability of the RPMIA method.

<img width="841" alt="" src="figs/model.png">

## Data and instructions to run RPMIA

We used seven publicly available [OCR datasets](http://jmcauley.ucsd.edu/data/amazon/) collected from Amazon.com on different products, including music instruments, office products, digital music, grocery & gourmet food,video games, tools & home improvement, and sports and outdoors products, which have been widely used for recommendation evaluation in previous studies.These datasets consist of consumers’ product ratings ranging from 1 to 5 and corresponding textual reviews. Each consumer or item has at least 5 reviews. All the datasets are in folder `dataset/` (where `dataset_D`->Digital Music,`dataset_G`->Grocery and Gourmet Food,`dataset_M`->Musical Instruments,`dataset_O`->Office Products,`dataset_S`->Sports and Outdoors,`dataset_T`->Tools and Home Improvement,`dataset_V`->Video Games. In this repository, we use `dataset_G` as an example).

The code `src/RPMIA.py` implements and RPMIA model. In this implementation, we define five modules, including an embedding module, a context-aware module, an aspect-aware module, a UIAP-aware module, and a prediction module.

The package `_auxiliaryTools/` implements data preprocessing of datasets, where code `ExtractData.py` implements extracting dataset from file, `Evaluation.py` defines a function named `get_test_list_mask` to batch process test data (including user ID, item ID, rating, etc.) and related user reviews, item reviews, and mask information according to a given batch size, and return multiple lists containing batches of different types of data.

Use the following command to run the code:
```
python src/RPMIA.py
```


## Prerequisties (please install the following packages before you run our RPMIA model)
- tensorflow 1.15
- python 3.6
- numpy 1.19.2
- pandas 1.1.5
28 changes: 28 additions & 0 deletions _auxiliaryTools/Evaluation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import math


def get_test_list_mask(batch_size, test_rating, user_reviews, item_reviews, user_masks, item_masks):
user_test_batchs, item_test_batchs, user_input_test_batchs, item_input_test_batchs, user_mask_test_batchs, item_mask_test_batchs, rating_input_test_batchs = [], [], [], [], [], [], []
for count in range(int(math.ceil(len(test_rating) / float(batch_size)))):
user_test, item_test, user_input_test, item_input_test, user_mask_test, item_mask_test, rating_input_test = [], [], [], [], [], [], []
for idx in range(batch_size):
index = (count * batch_size + idx)
if (index >= len(test_rating)):
break
rating = test_rating[index]
user_test.append(rating[0])
item_test.append(rating[1])
user_input_test.append(user_reviews.get(rating[0]))
item_input_test.append(item_reviews.get(rating[1]))
user_mask_test.append(user_masks.get(rating[0]))
item_mask_test.append(item_masks.get(rating[1]))
rating_input_test.append([rating[2]])
user_test_batchs.append(user_test)
item_test_batchs.append(item_test)
user_input_test_batchs.append(user_input_test)
item_input_test_batchs.append(item_input_test)
user_mask_test_batchs.append(user_mask_test)
item_mask_test_batchs.append(item_mask_test)
rating_input_test_batchs.append(rating_input_test)
#print count, len(item_input_test_batchs[count])
return user_test_batchs, item_test_batchs, user_input_test_batchs, item_input_test_batchs, user_mask_test_batchs, item_mask_test_batchs, rating_input_test_batchs
112 changes: 112 additions & 0 deletions _auxiliaryTools/ExtractData.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
import numpy as np
import scipy.sparse as sp
import pandas as pd
from time import time
import os

class Dataset(object):
"'extract dataset from file'"

def __init__(self, max_length, path):
self.word_id_dict = self.load_word_dict(path + "WordDict.out")
print ("wordId_dict finished")
self.userReview_dict, self.userMask_dict = self.load_reviews(max_length, len(self.word_id_dict), path + "UserReviews.out")
self.itemReview_dict, self.itemMask_dict = self.load_reviews(max_length, len(self.word_id_dict), path + "ItemReviews.out")
print ("load reviews finished")
self.num_users, self.num_items = len(self.userReview_dict), len(self.itemReview_dict)
self.trainMtrx = self.load_ratingFile_as_mtrx(path + "TrainInteraction.out")
self.testRatings = self.load_ratingFile_as_list(path + "TestInteraction.out")

def load_word_dict(self, path):
wordId_dict = {}

with open(path, "r") as f:
line = f.readline().replace("\n", "")
while line != None and line != "":
arr = line.split("\t")
wordId_dict[arr[0]] = int(arr[1])
line = f.readline().replace("\n", "")

return wordId_dict

def load_reviews(self, max_doc_length, padding_word_id, path):
entity_review_dict = {}
entity_mask_dict = {}

with open(path, "r") as f:
line = f.readline().replace("\n", "")
while line != None and line != "":
review = []
mask = []
arr = line.split("\t")
entity = int(arr[0])
word_list = arr[1].split(" ")

for i in range(len(word_list)):
if (word_list[i] == "" or word_list[i] == None or (not (word_list[i] in self.word_id_dict))):
continue
review.append(self.word_id_dict.get(word_list[i]))
mask.append(1.0)
if (len(review) >= max_doc_length):
break
if (len(review) < max_doc_length):
review, mask = self.padding_word(max_doc_length, padding_word_id, review, mask)
entity_review_dict[entity] = review
entity_mask_dict[entity] = mask
line = f.readline().replace("\n", "")
return entity_review_dict, entity_mask_dict

def padding_word(self, max_size, max_word_idx, review, mask):
review.extend([max_word_idx]*(max_size - len(review)))
mask.extend([0.0] * (max_size - len(mask)))
return review, mask

def load_ratingFile_as_mtrx(self, file_path):
mtrx = sp.dok_matrix((self.num_users, self.num_items), dtype=np.float32)
with open(file_path, "r") as f:
line = f.readline()
line = line.strip()
while line != None and line != "":
arr = line.split("\t")
user, item, rating = int(arr[0]), int(arr[1]), float(arr[2])
if (rating > 0):
mtrx[user, item] = rating
line = f.readline()

return mtrx

def load_ratingFile_as_list(self, file_path):
rateList = []

with open(file_path, "r") as f:
line = f.readline()
while line != None and line != "":
arr = line.split("\t")
user, item = int(arr[0]), int(arr[1])
rate = float(arr[2])
rateList.append([user, item, rate])
line = f.readline()

return rateList

if __name__ == "__main__":
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
latent_dim = 25#denote as k
word_latent_dim = 300#denote as d in paper
window_size = 3#denote as c in paper
max_doc_length = 300

# loading data
firTime = time()
dataSet = Dataset(max_doc_length, "output/")
word_dict, user_reviews, item_reviews, user_masks, item_masks, train, testRatings = dataSet.word_id_dict, dataSet.userReview_dict, dataSet.itemReview_dict, dataSet.userMask_dict, dataSet.itemMask_dict, dataSet.trainMtrx, dataSet.testRatings
secTime = time()

word_dict_df = pd.DataFrame([word_dict])
user_reviews_df = pd.DataFrame([user_reviews])
item_reviews_df = pd.DataFrame([item_reviews])

num_users, num_items = train.shape
print ("load data: %.3fs" % (secTime - firTime))
print (num_users, num_items)

Empty file added _auxiliaryTools/__init__.py
Empty file.
Loading

0 comments on commit 6c8245c

Please sign in to comment.