Merge branch 'feature/hierarchical_recommender' into 'main'

Add HierarchicalRecommender See pull request #60
sb-ai-lab · Nov 11, 2024 · 4e95eea · 4e95eea
2 parents 1d3d813 + 93ec302
commit 4e95eea
Show file tree

Hide file tree

Showing 11 changed files with 2,196 additions and 73 deletions.
diff --git a/README.md b/README.md
@@ -193,7 +193,8 @@ To build RePlay from sources please use the [instruction](CONTRIBUTING.md#instal
 10. [10_bert4rec_example.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/10_bert4rec_example.ipynb) - An example of using transformer-based BERT4Rec model to generate recommendations.
 11. [11_sasrec_dataframes_comparison.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/11_sasrec_dataframes_comparison.ipynb) - speed comparison of using different frameworks (pandas, polars, pyspark) for data processing during SASRec training.
 12. [12_neural_ts_exp.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/12_neural_ts_exp.ipynb) - An example of using Neural Thompson Sampling bandit model (based on Wide&Deep architecture).
-
+13. [13_personalized_bandit_comparison.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/13_personalized_bandit_comparison.ipynb) - A comparison of context-free and contextual bandit models.
+14. [14_hierarchical_recommender.ipynb](https://github.com/sb-ai-lab/RePlay/blob/main/examples/14_hierarchical_recommender.ipynb) - An example of using HierarchicalRecommender with user-disjoint LinUCB.
 
 ### Videos and papers
 * **Video guides**:

diff --git a/docs/pages/modules/models.rst b/docs/pages/modules/models.rst
@@ -33,6 +33,8 @@ ___________________
     "Wrapper for implicit (Experimental)", "Python CPU"
     "Wrapper for LightFM (Experimental)", "Python CPU"
     "RL-based CQL Recommender (Experimental)", "PySpark"
+    "ULinUCB (Experimental)", "Python CPU"
+    "Hierarchical Recommender (Experimental)", "PySpark"
 
 To get more info on how to choose base model, please see this  :doc:`page </pages/useful_data/algorithm_selection>`.
 
@@ -294,6 +296,11 @@ NeuralTS (Experimental)
 .. autoclass:: replay.experimental.models.NeuralTS
     :special-members: __init__
 
+ULinUCB Recommender (Experimental)
+``````````````````````````````````
+.. autoclass:: replay.experimental.models.ULinUCB
+    :special-members: __init__
+
 CQL Recommender (Experimental)
 ```````````````````````````````````
 Conservative Q-Learning (CQL) algorithm is a SAC-based data-driven deep reinforcement learning algorithm, 
@@ -305,6 +312,15 @@ which achieves state-of-the-art performance in offline RL problems.
     :special-members: __init__
 
 
+Hierarchical models
+___________________
+
+Hierarchical Recommender (Experimental)
+```````````````````````````````````````
+.. autoclass:: replay.experimental.models.HierarchicalRecommender
+    :special-members: __init__
+
+
 Wrappers and other models with distributed inference
 ____________________________________________________
 Wrappers for popular recommendation libraries and algorithms

diff --git a/docs/pages/useful_data/algorithm_selection.md b/docs/pages/useful_data/algorithm_selection.md
@@ -32,6 +32,7 @@ The same goes for new items.
 |Wilson Recommender         |Collaborative    | binary ratings                         | + | - |
 |UCB                        |Collaborative    | binary ratings                         | + | + |
 |KL-UCB                     |Collaborative    | binary ratings                         | + | + |
+|LinUCB                     |Collaborative    | binary ratings                         | + | - |
 |Random Recommender         |Collaborative    | converted to unary ratings             | + | + |
 |K-Nearest Neighbours       |Collaborative    | converted to unary ratings             | + | - |
 |Alternating Least Squares  |Collaborative    | implicit feedback                      | - | - |

diff --git a/docs/pages/useful_data/res_1m.csv b/docs/pages/useful_data/res_1m.csv
@@ -8,8 +8,10 @@ PopRec,0.645,0.157,0.39,0.244,0.034,0.118,12.3
 MultVAE,0.64,0.151,0.396,0.238,0.031,0.123,26.977
 NeuroMF,0.627,0.111,0.318,0.193,0.257,0.235,350.737
 ADMM SLIM,0.591,0.084,0.304,0.159,0.367,0.237,77.647
+Hierarchical Recommender (as HCB),  0.566, 0.081, 0.243, 0.154, 0.030, 0.195, 278.18
 Word2Vec,0.515,0.072,0.244,0.138,0.145,0.24,25.133
 Wilson,0.414,0.045,0.181,0.092,0.017,0.262,10.034
 RandomRec (popular),0.382,0.028,0.142,0.069,0.654,0.318,6.827
+uLinUCB, 0.211, 0.014, 0.076, 0.036, 0.008, 0.385, 35.986
 RandomRec (uniform),0.183,0.009,0.068,0.026,0.961,0.537,7.898
 ALS (Explicit),0.093,0.005,0.033,0.013,0.266,0.684,13.876
diff --git a/examples/13_personalized_bandit_comparison.ipynb b/examples/13_personalized_bandit_comparison.ipynb
@@ -162,57 +162,10 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
-   "id": "69aa3572",
+   "execution_count": null,
+   "id": "f6917809",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "24/10/03 17:31:28 WARN Utils: Your hostname, sudakovcom-MS-7D48 resolves to a loopback address: 127.0.1.1; using 10.255.173.26 instead (on interface enp3s0)\n",
-      "24/10/03 17:31:28 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address\n",
-      "Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties\n",
-      "Setting default log level to \"WARN\".\n",
-      "To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
-      "24/10/03 17:31:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n",
-      "24/10/03 17:31:28 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "\n",
-       "            <div>\n",
-       "                <p><b>SparkSession - hive</b></p>\n",
-       "                \n",
-       "        <div>\n",
-       "            <p><b>SparkContext</b></p>\n",
-       "\n",
-       "            <p><a href=\"http://localhost:4040\">Spark UI</a></p>\n",
-       "\n",
-       "            <dl>\n",
-       "              <dt>Version</dt>\n",
-       "                <dd><code>v3.2.4</code></dd>\n",
-       "              <dt>Master</dt>\n",
-       "                <dd><code>local[*]</code></dd>\n",
-       "              <dt>AppName</dt>\n",
-       "                <dd><code>pyspark-shell</code></dd>\n",
-       "            </dl>\n",
-       "        </div>\n",
-       "        \n",
-       "            </div>\n",
-       "        "
-      ],
-      "text/plain": [
-       "<pyspark.sql.session.SparkSession at 0x78a215982bb0>"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "spark = State().session\n",
     "spark"
@@ -674,13 +627,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "03-Oct-24 17:32:03, replay, INFO: Columns with ids of users or items are present in mapping. The dataframe will be treated as an interactions log.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
+      "03-Oct-24 17:32:03, replay, INFO: Columns with ids of users or items are present in mapping. The dataframe will be treated as an interactions log.\n",
       "/home/sudakovcom/Desktop/RePlayHDILab2024/.conda/lib/python3.8/site-packages/pyspark/sql/pandas/conversion.py:471: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.\n",
       "  arrow_data = [[(c, t) for (_, c), t in zip(pdf_slice.iteritems(), arrow_types)]\n"
      ]
@@ -1082,13 +1029,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "03-Oct-24 17:32:47, replay, INFO: Column with ids of users or items is absent in mapping. The dataframe will be treated as a users'/items' features dataframe.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
+      "03-Oct-24 17:32:47, replay, INFO: Column with ids of users or items is absent in mapping. The dataframe will be treated as a users'/items' features dataframe.\n",
       "/home/sudakovcom/Desktop/RePlayHDILab2024/.conda/lib/python3.8/site-packages/pyspark/sql/pandas/conversion.py:471: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.\n",
       "  arrow_data = [[(c, t) for (_, c), t in zip(pdf_slice.iteritems(), arrow_types)]\n"
      ]
@@ -1445,13 +1386,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "03-Oct-24 17:33:02, replay, INFO: Column with ids of users or items is absent in mapping. The dataframe will be treated as a users'/items' features dataframe.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
+      "03-Oct-24 17:33:02, replay, INFO: Column with ids of users or items is absent in mapping. The dataframe will be treated as a users'/items' features dataframe.\n",
       "/home/sudakovcom/Desktop/RePlayHDILab2024/.conda/lib/python3.8/site-packages/pyspark/sql/pandas/conversion.py:471: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.\n",
       "  arrow_data = [[(c, t) for (_, c), t in zip(pdf_slice.iteritems(), arrow_types)]\n"
      ]
@@ -4198,7 +4133,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.20"
+   "version": "3.9.16"
   }
  },
  "nbformat": 4,