TAHIR0110 · sdp2004 · Jul 20, 2024 · Jul 20, 2024 · Jul 20, 2024 · Jul 20, 2024
diff --git a/Music recommendation feature.ipynb b/Music recommendation feature.ipynb
@@ -0,0 +1 @@
+{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.10.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[{"sourceId":8726191,"sourceType":"datasetVersion","datasetId":5236926}],"isInternetEnabled":false,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Content Based Recommendation Systems\n\nA recommendation system (or recommender system) is a class of machine learning that uses data to help predict, narrow down, and find what people are looking for among an exponentially growing number of options.\n\nRecommendation systems are divided into three:\n\n* Collaborative Filtering\n* Content Based RS\n* Hybrid Models\n\nIn this notebook we are going to discuss Content Based RS.\n\n## Content Based Recommendation Systems\n\n* Content-based filtering methods are based on a description of the item and a profile of the user's preferences. These methods are best suited to situations where there is known data on an item (name, location, description, etc.), but not on the user. Content-based recommenders treat recommendation as a user-specific classification problem and learn a classifier for the user's likes and dislikes based on an item's features.\n* It is used to models such as TF_IDF and Word2Vec in order to capture similarity.\n* It is very powerful that a item adding newly is recommend.  \n* A key issue with content-based filtering is whether the system can learn user preferences from users' actions regarding one content source and use them across other content types. When the system is limited to recommending content of the same type as the user is already using, the value from the recommendation system is significantly less than when other content types from other services can be recommended.\n* To overcome this, most content-based recommender systems now use some form of the hybrid system.\n* Content-based recommender systems can also include opinion-based recommender systems. ","metadata":{"_uuid":"211893b3-b34f-45c2-8bba-ae96b890ff26","_cell_guid":"64ffe843-0f51-4bf5-9d01-3479b120064d","trusted":true}},{"cell_type":"markdown","source":"## What is TF-IDF?\n\nTF-IDF stands for Term Frequency Inverse Document Frequency of records. It can be defined as the calculation of how relevant a word in a series or corpus is to a text. The meaning increases proportionally to the number of times in the text a word appears but is compensated by the word frequency in the corpus (data-set).\n\nTF-IDF is a weight factor which a word display important into document and had been calculated with statistics method. TF-IDF method use to a lot domains (sentiment analysis, RS, stop words filterin etc.) This method is divided into two. Fistly we will analyze Term Frequency(TF).\n\n### Term Frequency\n\nIn document d, the frequency represents the number of instances of a given word t. Therefore, we can see that it becomes more relevant when a word appears in the text, which is rational. Since the ordering of terms is not significant, we can use a vector to describe the text in the bag of term models. For each specific term in the paper, there is an entry with the value being the term frequency.\nThe weight of a term that occurs in a document is simply proportional to the term frequency.\n\n### Inverse Document Frequency\n\nMainly, it tests how relevant the word is. The key aim of the search is to locate the appropriate records that fit the demand. Since tf considers all terms equally significant, it is therefore not only possible to use the term frequencies to measure the weight of the term in the paper. First, find the document frequency of a term t by counting the number of documents containing the term:\n\n**TF-IDF method used as multipy TF value and IDF value. (TF * IDF)**\n\nI have applied this method in my model. And I have found the similarity in between with cosine distance.","metadata":{"_uuid":"ef5594de-1f51-4f29-a84f-19136909b11e","_cell_guid":"87fdf135-277f-4fce-af94-b69dcb9ff45e","trusted":true}},{"cell_type":"markdown","source":"","metadata":{"_uuid":"2f2f516a-e4cc-4223-94d7-9e373614f9ef","_cell_guid":"2db1677c-31ac-4739-bb17-934203a8c590","trusted":true}},{"cell_type":"code","source":"# This Python 3 environment comes with many helpful analytics libraries installed\n# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python\n# For example, here's several helpful packages to load\n\nimport numpy as np # linear algebra\nimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n\n# For Text\n\nimport matplotlib.pyplot as plt\nimport seaborn as sb\n\nfrom sklearn.metrics.pairwise import cosine_similarity\nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom sklearn.manifold import TSNE\n\nimport warnings\nwarnings.filterwarnings('ignore')\n\n\n# Capture similarity \nfrom sklearn.metrics.pairwise import linear_kernel\n\nimport os\nfor dirname, _, filenames in os.walk('/kaggle/input'):\n    for filename in filenames:\n        print(os.path.join(dirname, filename))\n\n# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using \"Save & Run All\" \n# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session","metadata":{"_uuid":"969f5946-9174-4282-a6ad-d3735e6d38ae","_cell_guid":"c91fd311-1267-464c-9f00-6d899c98ac9a","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:35.099082Z","iopub.execute_input":"2024-07-18T06:43:35.099880Z","iopub.status.idle":"2024-07-18T06:43:37.958493Z","shell.execute_reply.started":"2024-07-18T06:43:35.099838Z","shell.execute_reply":"2024-07-18T06:43:37.957125Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Let's get to know our dataset.","metadata":{"_uuid":"f815a9d3-982d-4783-8ff2-a97788b32b76","_cell_guid":"d0583ba6-7426-4f13-b439-4f1a1a4f063c","trusted":true}},{"cell_type":"code","source":"\n\ndata=pd.read_csv(\"/kaggle/input/musicaldata/musicaldata.csv\")\ndata.head(4000)","metadata":{"_uuid":"7835eb94-3a34-4253-a5d9-5ed64b45283d","_cell_guid":"0ff8ee63-f975-46ed-8346-450fef2d00f1","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:37.960867Z","iopub.execute_input":"2024-07-18T06:43:37.961541Z","iopub.status.idle":"2024-07-18T06:43:38.037706Z","shell.execute_reply.started":"2024-07-18T06:43:37.961497Z","shell.execute_reply":"2024-07-18T06:43:38.036554Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"> I wanted to make a suggestion depends on the cast members, description and \"listed_in\" which contains the shows type but there is another column named \"type\" it is a bit confusing I know. >","metadata":{"_uuid":"f47c0286-051f-44b7-8f3b-dff2636c57a2","_cell_guid":"5e4f0150-982c-40c2-ae42-df0e5b50ca14","trusted":true}},{"cell_type":"markdown","source":"Drop nan values on these columns to make a proper matrix which contains linear_kernel values of selected strings.","metadata":{"_uuid":"9e742cf3-6450-474e-b320-42275dea7b1b","_cell_guid":"4d9511c6-3b3c-4a66-b1f8-b77c1eb16472","trusted":true}},{"cell_type":"code","source":"data.shape","metadata":{"_uuid":"86c52013-2fe9-4582-826e-83e63d83a71c","_cell_guid":"546f44f4-1e9f-4316-b67a-4147d1cc61ac","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.039377Z","iopub.execute_input":"2024-07-18T06:43:38.040377Z","iopub.status.idle":"2024-07-18T06:43:38.048032Z","shell.execute_reply.started":"2024-07-18T06:43:38.040335Z","shell.execute_reply":"2024-07-18T06:43:38.046782Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data.info()","metadata":{"_uuid":"c20641aa-17af-4e43-ab22-671baaf3af06","_cell_guid":"5ccfef9a-1c31-4211-977d-71895d623c2e","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.049427Z","iopub.execute_input":"2024-07-18T06:43:38.050029Z","iopub.status.idle":"2024-07-18T06:43:38.081228Z","shell.execute_reply.started":"2024-07-18T06:43:38.049999Z","shell.execute_reply":"2024-07-18T06:43:38.079843Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data.isnull().sum()","metadata":{"_uuid":"488e4a21-2e98-4b64-b4b7-1f3a8774d197","_cell_guid":"8782b655-413f-407e-b6b0-95f80f0c86c4","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.085155Z","iopub.execute_input":"2024-07-18T06:43:38.085578Z","iopub.status.idle":"2024-07-18T06:43:38.097471Z","shell.execute_reply.started":"2024-07-18T06:43:38.085546Z","shell.execute_reply":"2024-07-18T06:43:38.096046Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data.dropna(inplace = True)\ndata.isnull().sum().plot.bar()\nplt.show()\n","metadata":{"execution":{"iopub.status.busy":"2024-07-18T06:43:38.098926Z","iopub.execute_input":"2024-07-18T06:43:38.099358Z","iopub.status.idle":"2024-07-18T06:43:38.469804Z","shell.execute_reply.started":"2024-07-18T06:43:38.099322Z","shell.execute_reply":"2024-07-18T06:43:38.468707Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data = data.drop(['track id'], axis = 1)\n\n\ndata","metadata":{"_uuid":"705daa1b-1a41-4765-a5c4-ca06243f8e73","_cell_guid":"9587614f-971d-4d90-a243-7059d10e7a53","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.471285Z","iopub.execute_input":"2024-07-18T06:43:38.472181Z","iopub.status.idle":"2024-07-18T06:43:38.503136Z","shell.execute_reply.started":"2024-07-18T06:43:38.472139Z","shell.execute_reply":"2024-07-18T06:43:38.501725Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data = data.sort_values(by=[' mood'], ascending=False)\ndata\n","metadata":{"_uuid":"7d10088e-9fb4-4e32-8a52-f0355a04e139","_cell_guid":"54bd07ed-5919-4c34-b8c2-537fc8bb4c63","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.504752Z","iopub.execute_input":"2024-07-18T06:43:38.505194Z","iopub.status.idle":"2024-07-18T06:43:38.531330Z","shell.execute_reply.started":"2024-07-18T06:43:38.505155Z","shell.execute_reply":"2024-07-18T06:43:38.530232Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"%%capture\nsong_vectorizer = CountVectorizer(lowercase=False)\nsong_vectorizer.fit(data[' genre'])\n","metadata":{"_uuid":"f4e99d39-6244-4a54-bd59-da0c0b4eb263","_cell_guid":"4cc6a4dd-69da-4dd2-9e9f-3db4ad1bf9ec","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:20:14.961494Z","iopub.execute_input":"2024-07-18T07:20:14.962320Z","iopub.status.idle":"2024-07-18T07:20:15.016863Z","shell.execute_reply.started":"2024-07-18T07:20:14.962281Z","shell.execute_reply":"2024-07-18T07:20:15.015952Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"27044823-561d-4684-9c1e-b58330477a09","_cell_guid":"b02809b0-533d-4f2a-a814-bcb371d4b623","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data = data.sort_values(by=[' mood'], ascending=False).head(8407)\n","metadata":{"_uuid":"9b9c0d05-34c7-4fbe-b3e8-3da5311f79aa","_cell_guid":"0895c987-d693-4ec5-98f4-52964893929f","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:20:18.625596Z","iopub.execute_input":"2024-07-18T07:20:18.626296Z","iopub.status.idle":"2024-07-18T07:20:18.633823Z","shell.execute_reply.started":"2024-07-18T07:20:18.626259Z","shell.execute_reply":"2024-07-18T07:20:18.632754Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"def get_similarities(song_genre, datas):\n\n# Getting vector for the input song.\n    text_array1 = song_vectorizer.transform(datas[datas[' genre']==song_genre][' mother tongue']).toarray()\n    num_array1 = datas[datas[' genre']==song_genre].select_dtypes(include=np.number).to_numpy()\n\n# We will store similarity for each row of the dataset.\n    sim = []\n    for idx, row in data.iterrows():\n\t    genre = row[' genre']\n\t\n\t# Getting vector for current song.\n\t    text_array2 = song_vectorizer.transform(datas[datas[' genre']==genre][' mother tongue']).toarray()\n\t    num_array2 = datas[datas[' genre']==genre].select_dtypes(include=np.number).to_numpy()\n\n\t# Calculating similarities for text as well as numeric features\n\t    text_sim = cosine_similarity(text_array1, text_array2)[0][0]\n\t    num_sim = cosine_similarity(num_array1, num_array2)[0][0]\n\t    sim.append(text_sim + num_sim)\n\t\n    return sim\n","metadata":{"_uuid":"7f64a3c5-6644-4500-b768-9588a1a1796e","_cell_guid":"f6ba9b5c-1cd6-4329-9716-768fc66b6185","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:21:22.321219Z","iopub.execute_input":"2024-07-18T07:21:22.322530Z","iopub.status.idle":"2024-07-18T07:21:22.332061Z","shell.execute_reply.started":"2024-07-18T07:21:22.322483Z","shell.execute_reply":"2024-07-18T07:21:22.330729Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"def recommend_songs(song_genre, datas=data):\n  # Base case\n   if data[data[' genre'] == song_genre].shape[0] == 0:\n        print('This song is not so popular')\n     \n        for song in datas.sample(n=5)[' genre'].values:\n            print(song)\n        return\n   \n   datas['similarity_factor'] = get_similarities(song_genre, datas)\n \n   datas.sort_values(by=['similarity_factor', ' mood'],\n                   ascending = [False, False],\n                   inplace=True)\n   \n  # First song will be the input song itself as the similarity will be highest.\n   display(datas[[' genre', ' age', ' mother tongue']][2:7])\n","metadata":{"_uuid":"f42b9e20-bfde-468c-a061-baede0e7acc9","_cell_guid":"16949b27-8cac-4a55-84f8-c0109e8141f5","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:32:45.646510Z","iopub.execute_input":"2024-07-18T07:32:45.646931Z","iopub.status.idle":"2024-07-18T07:32:45.656010Z","shell.execute_reply.started":"2024-07-18T07:32:45.646887Z","shell.execute_reply":"2024-07-18T07:32:45.654684Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"recommend_songs('classical')\n","metadata":{"_uuid":"123ddf07-1178-4198-82ac-3e4b0077b682","_cell_guid":"5c7396f0-7d3a-48f5-8c60-9dc8a74d671f","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:50:20.350621Z","iopub.execute_input":"2024-07-18T07:50:20.351890Z","iopub.status.idle":"2024-07-18T07:59:03.032540Z","shell.execute_reply.started":"2024-07-18T07:50:20.351829Z","shell.execute_reply":"2024-07-18T07:59:03.031273Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"8fc6e3b5-9204-4b61-81f1-1ad298187b0b","_cell_guid":"0d8ef6be-f82b-4727-bf71-ae5e3b93b065","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"db38054b-03da-4b74-90c8-da0952e22e97","_cell_guid":"00755ea1-9ccb-48c3-b5bf-ed8502dba41a","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"ebeabf14-181a-4931-a489-3692eaa20b94","_cell_guid":"2c060445-e4a7-47f3-bc6e-77b60ca5f21d","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"fd7cc031-07e1-470a-b04e-e4914737aa34","_cell_guid":"f34ee388-4084-468f-bce4-f85059694cda","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"\n","metadata":{"_uuid":"8fe27341-f205-43b1-bbb0-1aeac550fa46","_cell_guid":"e6249211-42c8-4197-ab2d-cedf58afcd17","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"In this above. My model brought the recommendations I wanted to bring acording to the mood of the songs of different genres . Firstly it brought to Transformers movies.\n\n# Conclusion\n\n* In this notebook. I worked to explain content based recommendation system.\n* Content based RS models are powerful in new item recommendation. \n* In general it is used TF-IDF and Word2Vec model while content based RS designs.\n* TF-IDF method is the method which explains words frequency in document.\n* It can be created a recommendation model by using cosine distance with the aid of TF-IDF weights.\n\n","metadata":{"_uuid":"a5fbdfc2-4e81-4d76-985e-c10441e7fe32","_cell_guid":"8d31b41f-78cd-4b03-b692-8644e3d9a1be","trusted":true}}]}
diff --git a/notebook10298254f6.ipynb b/notebook10298254f6.ipynb
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.10.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[{"sourceId":8726191,"sourceType":"datasetVersion","datasetId":5236926}],"isInternetEnabled":false,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Content Based Recommendation Systems\n\nA recommendation system (or recommender system) is a class of machine learning that uses data to help predict, narrow down, and find what people are looking for among an exponentially growing number of options.\n\nRecommendation systems are divided into three:\n\n* Collaborative Filtering\n* Content Based RS\n* Hybrid Models\n\nIn this notebook we are going to discuss Content Based RS.\n\n## Content Based Recommendation Systems\n\n* Content-based filtering methods are based on a description of the item and a profile of the user's preferences. These methods are best suited to situations where there is known data on an item (name, location, description, etc.), but not on the user. Content-based recommenders treat recommendation as a user-specific classification problem and learn a classifier for the user's likes and dislikes based on an item's features.\n* It is used to models such as TF_IDF and Word2Vec in order to capture similarity.\n* It is very powerful that a item adding newly is recommend. \n* A key issue with content-based filtering is whether the system can learn user preferences from users' actions regarding one content source and use them across other content types. When the system is limited to recommending content of the same type as the user is already using, the value from the recommendation system is significantly less than when other content types from other services can be recommended.\n* To overcome this, most content-based recommender systems now use some form of the hybrid system.\n* Content-based recommender systems can also include opinion-based recommender systems. ","metadata":{"_uuid":"211893b3-b34f-45c2-8bba-ae96b890ff26","_cell_guid":"64ffe843-0f51-4bf5-9d01-3479b120064d","trusted":true}},{"cell_type":"markdown","source":"## What is TF-IDF?\n\nTF-IDF stands for Term Frequency Inverse Document Frequency of records. It can be defined as the calculation of how relevant a word in a series or corpus is to a text. The meaning increases proportionally to the number of times in the text a word appears but is compensated by the word frequency in the corpus (data-set).\n\nTF-IDF is a weight factor which a word display important into document and had been calculated with statistics method. TF-IDF method use to a lot domains (sentiment analysis, RS, stop words filterin etc.) This method is divided into two. Fistly we will analyze Term Frequency(TF).\n\n### Term Frequency\n\nIn document d, the frequency represents the number of instances of a given word t. Therefore, we can see that it becomes more relevant when a word appears in the text, which is rational. Since the ordering of terms is not significant, we can use a vector to describe the text in the bag of term models. For each specific term in the paper, there is an entry with the value being the term frequency.\nThe weight of a term that occurs in a document is simply proportional to the term frequency.\n\n### Inverse Document Frequency\n\nMainly, it tests how relevant the word is. The key aim of the search is to locate the appropriate records that fit the demand. Since tf considers all terms equally significant, it is therefore not only possible to use the term frequencies to measure the weight of the term in the paper. First, find the document frequency of a term t by counting the number of documents containing the term:\n\n*TF-IDF method used as multipy TF value and IDF value. (TF IDF)*\n\nI have applied this method in my model. And I have found the similarity in between with cosine distance.","metadata":{"_uuid":"ef5594de-1f51-4f29-a84f-19136909b11e","_cell_guid":"87fdf135-277f-4fce-af94-b69dcb9ff45e","trusted":true}},{"cell_type":"markdown","source":"","metadata":{"_uuid":"2f2f516a-e4cc-4223-94d7-9e373614f9ef","_cell_guid":"2db1677c-31ac-4739-bb17-934203a8c590","trusted":true}},{"cell_type":"code","source":"# This Python 3 environment comes with many helpful analytics libraries installed\n# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python\n# For example, here's several helpful packages to load\n\nimport numpy as np # linear algebra\nimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n\n# For Text\n\nimport matplotlib.pyplot as plt\nimport seaborn as sb\n\nfrom sklearn.metrics.pairwise import cosine_similarity\nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom sklearn.manifold import TSNE\n\nimport warnings\nwarnings.filterwarnings('ignore')\n\n\n# Capture similarity \nfrom sklearn.metrics.pairwise import linear_kernel\n\nimport os\nfor dirname, _, filenames in os.walk('/kaggle/input'):\n for filename in filenames:\n print(os.path.join(dirname, filename))\n\n# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using \"Save & Run All\" \n# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session","metadata":{"_uuid":"969f5946-9174-4282-a6ad-d3735e6d38ae","_cell_guid":"c91fd311-1267-464c-9f00-6d899c98ac9a","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:35.099082Z","iopub.execute_input":"2024-07-18T06:43:35.099880Z","iopub.status.idle":"2024-07-18T06:43:37.958493Z","shell.execute_reply.started":"2024-07-18T06:43:35.099838Z","shell.execute_reply":"2024-07-18T06:43:37.957125Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Let's get to know our dataset.","metadata":{"_uuid":"f815a9d3-982d-4783-8ff2-a97788b32b76","_cell_guid":"d0583ba6-7426-4f13-b439-4f1a1a4f063c","trusted":true}},{"cell_type":"code","source":"\n\ndata=pd.read_csv(\"/kaggle/input/musicaldata/musicaldata.csv\")\ndata.head(4000)","metadata":{"_uuid":"7835eb94-3a34-4253-a5d9-5ed64b45283d","_cell_guid":"0ff8ee63-f975-46ed-8346-450fef2d00f1","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:37.960867Z","iopub.execute_input":"2024-07-18T06:43:37.961541Z","iopub.status.idle":"2024-07-18T06:43:38.037706Z","shell.execute_reply.started":"2024-07-18T06:43:37.961497Z","shell.execute_reply":"2024-07-18T06:43:38.036554Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"> I wanted to make a suggestion depends on the cast members, description and \"listed_in\" which contains the shows type but there is another column named \"type\" it is a bit confusing I know. >","metadata":{"_uuid":"f47c0286-051f-44b7-8f3b-dff2636c57a2","_cell_guid":"5e4f0150-982c-40c2-ae42-df0e5b50ca14","trusted":true}},{"cell_type":"markdown","source":"Drop nan values on these columns to make a proper matrix which contains linear_kernel values of selected strings.","metadata":{"_uuid":"9e742cf3-6450-474e-b320-42275dea7b1b","_cell_guid":"4d9511c6-3b3c-4a66-b1f8-b77c1eb16472","trusted":true}},{"cell_type":"code","source":"data.shape","metadata":{"_uuid":"86c52013-2fe9-4582-826e-83e63d83a71c","_cell_guid":"546f44f4-1e9f-4316-b67a-4147d1cc61ac","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.039377Z","iopub.execute_input":"2024-07-18T06:43:38.040377Z","iopub.status.idle":"2024-07-18T06:43:38.048032Z","shell.execute_reply.started":"2024-07-18T06:43:38.040335Z","shell.execute_reply":"2024-07-18T06:43:38.046782Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data.info()","metadata":{"_uuid":"c20641aa-17af-4e43-ab22-671baaf3af06","_cell_guid":"5ccfef9a-1c31-4211-977d-71895d623c2e","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.049427Z","iopub.execute_input":"2024-07-18T06:43:38.050029Z","iopub.status.idle":"2024-07-18T06:43:38.081228Z","shell.execute_reply.started":"2024-07-18T06:43:38.049999Z","shell.execute_reply":"2024-07-18T06:43:38.079843Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data.isnull().sum()","metadata":{"_uuid":"488e4a21-2e98-4b64-b4b7-1f3a8774d197","_cell_guid":"8782b655-413f-407e-b6b0-95f80f0c86c4","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.085155Z","iopub.execute_input":"2024-07-18T06:43:38.085578Z","iopub.status.idle":"2024-07-18T06:43:38.097471Z","shell.execute_reply.started":"2024-07-18T06:43:38.085546Z","shell.execute_reply":"2024-07-18T06:43:38.096046Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data.dropna(inplace = True)\ndata.isnull().sum().plot.bar()\nplt.show()\n","metadata":{"execution":{"iopub.status.busy":"2024-07-18T06:43:38.098926Z","iopub.execute_input":"2024-07-18T06:43:38.099358Z","iopub.status.idle":"2024-07-18T06:43:38.469804Z","shell.execute_reply.started":"2024-07-18T06:43:38.099322Z","shell.execute_reply":"2024-07-18T06:43:38.468707Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data = data.drop(['track id'], axis = 1)\n\n\ndata","metadata":{"_uuid":"705daa1b-1a41-4765-a5c4-ca06243f8e73","_cell_guid":"9587614f-971d-4d90-a243-7059d10e7a53","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.471285Z","iopub.execute_input":"2024-07-18T06:43:38.472181Z","iopub.status.idle":"2024-07-18T06:43:38.503136Z","shell.execute_reply.started":"2024-07-18T06:43:38.472139Z","shell.execute_reply":"2024-07-18T06:43:38.501725Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data = data.sort_values(by=[' mood'], ascending=False)\ndata\n","metadata":{"_uuid":"7d10088e-9fb4-4e32-8a52-f0355a04e139","_cell_guid":"54bd07ed-5919-4c34-b8c2-537fc8bb4c63","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T06:43:38.504752Z","iopub.execute_input":"2024-07-18T06:43:38.505194Z","iopub.status.idle":"2024-07-18T06:43:38.531330Z","shell.execute_reply.started":"2024-07-18T06:43:38.505155Z","shell.execute_reply":"2024-07-18T06:43:38.530232Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"%%capture\nsong_vectorizer = CountVectorizer(lowercase=False)\nsong_vectorizer.fit(data[' genre'])\n","metadata":{"_uuid":"f4e99d39-6244-4a54-bd59-da0c0b4eb263","_cell_guid":"4cc6a4dd-69da-4dd2-9e9f-3db4ad1bf9ec","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:20:14.961494Z","iopub.execute_input":"2024-07-18T07:20:14.962320Z","iopub.status.idle":"2024-07-18T07:20:15.016863Z","shell.execute_reply.started":"2024-07-18T07:20:14.962281Z","shell.execute_reply":"2024-07-18T07:20:15.015952Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"27044823-561d-4684-9c1e-b58330477a09","_cell_guid":"b02809b0-533d-4f2a-a814-bcb371d4b623","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"data = data.sort_values(by=[' mood'], ascending=False).head(8407)\n","metadata":{"_uuid":"9b9c0d05-34c7-4fbe-b3e8-3da5311f79aa","_cell_guid":"0895c987-d693-4ec5-98f4-52964893929f","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:20:18.625596Z","iopub.execute_input":"2024-07-18T07:20:18.626296Z","iopub.status.idle":"2024-07-18T07:20:18.633823Z","shell.execute_reply.started":"2024-07-18T07:20:18.626259Z","shell.execute_reply":"2024-07-18T07:20:18.632754Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"def get_similarities(song_genre, datas):\n\n# Getting vector for the input song.\n text_array1 = song_vectorizer.transform(datas[datas[' genre']==song_genre][' mother tongue']).toarray()\n num_array1 = datas[datas[' genre']==song_genre].select_dtypes(include=np.number).to_numpy()\n\n# We will store similarity for each row of the dataset.\n sim = []\n for idx, row in data.iterrows():\n\t genre = row[' genre']\n\t\n\t# Getting vector for current song.\n\t text_array2 = song_vectorizer.transform(datas[datas[' genre']==genre][' mother tongue']).toarray()\n\t num_array2 = datas[datas[' genre']==genre].select_dtypes(include=np.number).to_numpy()\n\n\t# Calculating similarities for text as well as numeric features\n\t text_sim = cosine_similarity(text_array1, text_array2)[0][0]\n\t num_sim = cosine_similarity(num_array1, num_array2)[0][0]\n\t sim.append(text_sim + num_sim)\n\t\n return sim\n","metadata":{"_uuid":"7f64a3c5-6644-4500-b768-9588a1a1796e","_cell_guid":"f6ba9b5c-1cd6-4329-9716-768fc66b6185","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:21:22.321219Z","iopub.execute_input":"2024-07-18T07:21:22.322530Z","iopub.status.idle":"2024-07-18T07:21:22.332061Z","shell.execute_reply.started":"2024-07-18T07:21:22.322483Z","shell.execute_reply":"2024-07-18T07:21:22.330729Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"def recommend_songs(song_genre, datas=data):\n # Base case\n if data[data[' genre'] == song_genre].shape[0] == 0:\n print('This song is not so popular')\n \n for song in datas.sample(n=5)[' genre'].values:\n print(song)\n return\n \n datas['similarity_factor'] = get_similarities(song_genre, datas)\n \n datas.sort_values(by=['similarity_factor', ' mood'],\n ascending = [False, False],\n inplace=True)\n \n # First song will be the input song itself as the similarity will be highest.\n display(datas[[' genre', ' age', ' mother tongue']][2:7])\n","metadata":{"_uuid":"f42b9e20-bfde-468c-a061-baede0e7acc9","_cell_guid":"16949b27-8cac-4a55-84f8-c0109e8141f5","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:32:45.646510Z","iopub.execute_input":"2024-07-18T07:32:45.646931Z","iopub.status.idle":"2024-07-18T07:32:45.656010Z","shell.execute_reply.started":"2024-07-18T07:32:45.646887Z","shell.execute_reply":"2024-07-18T07:32:45.654684Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"recommend_songs('classical')\n","metadata":{"_uuid":"123ddf07-1178-4198-82ac-3e4b0077b682","_cell_guid":"5c7396f0-7d3a-48f5-8c60-9dc8a74d671f","collapsed":false,"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2024-07-18T07:50:20.350621Z","iopub.execute_input":"2024-07-18T07:50:20.351890Z","iopub.status.idle":"2024-07-18T07:59:03.032540Z","shell.execute_reply.started":"2024-07-18T07:50:20.351829Z","shell.execute_reply":"2024-07-18T07:59:03.031273Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"8fc6e3b5-9204-4b61-81f1-1ad298187b0b","_cell_guid":"0d8ef6be-f82b-4727-bf71-ae5e3b93b065","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"db38054b-03da-4b74-90c8-da0952e22e97","_cell_guid":"00755ea1-9ccb-48c3-b5bf-ed8502dba41a","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"ebeabf14-181a-4931-a489-3692eaa20b94","_cell_guid":"2c060445-e4a7-47f3-bc6e-77b60ca5f21d","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"_uuid":"fd7cc031-07e1-470a-b04e-e4914737aa34","_cell_guid":"f34ee388-4084-468f-bce4-f85059694cda","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"\n","metadata":{"_uuid":"8fe27341-f205-43b1-bbb0-1aeac550fa46","_cell_guid":"e6249211-42c8-4197-ab2d-cedf58afcd17","collapsed":false,"jupyter":{"outputs_hidden":false},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"In this above. My model brought the recommendations I wanted to bring acording to the mood of the songs of different genres . Firstly it brought to Transformers movies.\n\n# Conclusion\n\n In this notebook. I worked to explain content based recommendation system.\n* Content based RS models are powerful in new item recommendation. \n* In general it is used TF-IDF and Word2Vec model while content based RS designs.\n* TF-IDF method is the method which explains words frequency in document.\n* It can be created a recommendation model by using cosine distance with the aid of TF-IDF weights.\n\n","metadata":{"_uuid":"a5fbdfc2-4e81-4d76-985e-c10441e7fe32","_cell_guid":"8d31b41f-78cd-4b03-b692-8644e3d9a1be","trusted":true}}]}