Merge pull request #2 from Jayshah6699/main

PR to update forked repo
Jayshah6699 · Feb 2, 2021 · 92d216a · 92d216a
2 parents 91d6d61 + 0d1b4e3
commit 92d216a
Show file tree

Hide file tree

Showing 4,157 changed files with 1,479,433 additions and 22,879 deletions.
diff --git a/.github/config.yml b/.github/config.yml
@@ -0,0 +1,31 @@
+# Configuration for welcome - https://github.com/behaviorbot/welcome
+
+# Configuration for new-issue-welcome - https://github.com/behaviorbot/new-issue-welcome
+# Comment to be posted to on first time issues
+
+newIssueWelcomeComment: >
+  Hello there!👋 Welcome to the project!🚀⚡
+  
+  
+  Thank you and congrats🎉 for opening your very first issue in this project. The goal of this project is to have in a single place all data science projects with clean datasets amalgamated with high accuracy models to solve real world problems.
+  Please adhere to our [Code of Conduct](https://github.com/Jayshah6699/datascience-mashup/blob/main/CODE_OF_CONDUCT.md).
+  Please make sure not to start working on the issue, unless you get assigned to it.😄
+  
+  
+# Configuration for new-pr-welcome - https://github.com/behaviorbot/new-pr-welcome
+# Comment to be posted to on PRs from first time contributors in your repository
+
+newPRWelcomeComment: >
+  Hello there!👋 Welcome to the project!💖
+  
+  
+  Thank you and congrats🎉 for opening your first pull request. The goal of this project is to have in a single place all data science projects with clean datasets amalgamated with high accuracy models to solve real world problems.
+  Please make sure you have followed our [Contributing Guidelines](https://github.com/Jayshah6699/datascience-mashup/blob/main/CONTRIBUTING.md).🙌🙌 We will get back to you as soon as we can 😄.
+  
+  
+# Configuration for first-pr-merge - https://github.com/behaviorbot/first-pr-merge
+# Comment to be posted to on pull requests merged by a first time user
+
+firstPRMergeComment: >
+  Congrats on merging your first pull request! 🎉 All the best for your amazing open source journey ahead 🚀.
+  
diff --git a/Bitcoin_Prediction/Bitcoin_prediction.png b/Bitcoin_Prediction/Bitcoin_prediction.png
diff --git a/Bitcoin_Prediction/README.md b/Bitcoin_Prediction/README.md
@@ -0,0 +1,9 @@
+# Dataset
+
+* CSV files for select bitcoin exchanges for the time period of Jan 2012 to December 2020, with minute to minute updates of OHLC (Open, High, Low, Close), Volume in BTC and indicated currency, and weighted bitcoin price.
+
+* Timestamps are in Unix time. Timestamps without any trades or activity have their data fields filled with NaNs.
+
+* Link- https://www.kaggle.com/mczielinski/bitcoin-historical-data
+
+![Bitcoin-Prediciton](https://github.com/AmanSingh0-0/datascience-mashup/raw/main/Bitcoin_Prediction/Bitcoin_prediction.png)
diff --git a/Bitcoin_Prediction/bitcoin-prediction.ipynb b/Bitcoin_Prediction/bitcoin-prediction.ipynb
diff --git a/Butterfly Classification/Butterfly Classification.ipynb b/Butterfly Classification/Butterfly Classification.ipynb
diff --git a/Butterfly Classification/ButterflyClassification.ipynb b/Butterfly Classification/ButterflyClassification.ipynb
diff --git a/Butterfly Classification/DS_Store b/Butterfly Classification/DS_Store
diff --git a/Butterfly Classification/README.md b/Butterfly Classification/README.md
@@ -0,0 +1,6 @@
+# Dataset
+* This dataset contains images and textual descriptions for ten categories (species) of butterflies.
+
+* The image dataset comprises 832 images in total, with the distribution ranging from 55 to 100 images per category. Images were collected from Google Images by querying with the scientific (Latin) name of the species, for example "Danaus plexippus", and manually filtered for those depicting the butterfly of interest.
+
+* Link - http://www.josiahwang.com/dataset/leedsbutterfly/leedsbutterfly_dataset_v1.0.zip
diff --git a/Butterfly Classification/butterfly_classification.ipynb b/Butterfly Classification/butterfly_classification.ipynb
diff --git a/Credit Card Fraud Detection/README.md b/Credit Card Fraud Detection/README.md
@@ -0,0 +1,7 @@
+### Dataset Details
+* The datasets contains transactions made by credit cards in September 2013 by european cardholders.
+* This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
+
+* It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.
+
+* Link- https://www.kaggle.com/mlg-ulb/creditcardfraud
diff --git a/Credit Card Fraud Detection/credit-card-fraud-detection.ipynb b/Credit Card Fraud Detection/credit-card-fraud-detection.ipynb
@@ -0,0 +1 @@
+{"cells":[{"metadata":{"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","trusted":true},"cell_type":"code","source":"import pandas as pd \nfrom sklearn.model_selection import train_test_split \nfrom sklearn.ensemble import RandomForestClassifier ","execution_count":14,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"data = pd.read_csv(\"../input/creditcardfraud/creditcard.csv\") ","execution_count":15,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"data.head(5) ","execution_count":16,"outputs":[{"output_type":"execute_result","execution_count":16,"data":{"text/plain":"   Time        V1        V2        V3        V4        V5        V6        V7  \\\n0   0.0 -1.359807 -0.072781  2.536347  1.378155 -0.338321  0.462388  0.239599   \n1   0.0  1.191857  0.266151  0.166480  0.448154  0.060018 -0.082361 -0.078803   \n2   1.0 -1.358354 -1.340163  1.773209  0.379780 -0.503198  1.800499  0.791461   \n3   1.0 -0.966272 -0.185226  1.792993 -0.863291 -0.010309  1.247203  0.237609   \n4   2.0 -1.158233  0.877737  1.548718  0.403034 -0.407193  0.095921  0.592941   \n\n         V8        V9  ...       V21       V22       V23       V24       V25  \\\n0  0.098698  0.363787  ... -0.018307  0.277838 -0.110474  0.066928  0.128539   \n1  0.085102 -0.255425  ... -0.225775 -0.638672  0.101288 -0.339846  0.167170   \n2  0.247676 -1.514654  ...  0.247998  0.771679  0.909412 -0.689281 -0.327642   \n3  0.377436 -1.387024  ... -0.108300  0.005274 -0.190321 -1.175575  0.647376   \n4 -0.270533  0.817739  ... -0.009431  0.798278 -0.137458  0.141267 -0.206010   \n\n        V26       V27       V28  Amount  Class  \n0 -0.189115  0.133558 -0.021053  149.62      0  \n1  0.125895 -0.008983  0.014724    2.69      0  \n2 -0.139097 -0.055353 -0.059752  378.66      0  \n3 -0.221929  0.062723  0.061458  123.50      0  \n4  0.502292  0.219422  0.215153   69.99      0  \n\n[5 rows x 31 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Time</th>\n      <th>V1</th>\n      <th>V2</th>\n      <th>V3</th>\n      <th>V4</th>\n      <th>V5</th>\n      <th>V6</th>\n      <th>V7</th>\n      <th>V8</th>\n      <th>V9</th>\n      <th>...</th>\n      <th>V21</th>\n      <th>V22</th>\n      <th>V23</th>\n      <th>V24</th>\n      <th>V25</th>\n      <th>V26</th>\n      <th>V27</th>\n      <th>V28</th>\n      <th>Amount</th>\n      <th>Class</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0.0</td>\n      <td>-1.359807</td>\n      <td>-0.072781</td>\n      <td>2.536347</td>\n      <td>1.378155</td>\n      <td>-0.338321</td>\n      <td>0.462388</td>\n      <td>0.239599</td>\n      <td>0.098698</td>\n      <td>0.363787</td>\n      <td>...</td>\n      <td>-0.018307</td>\n      <td>0.277838</td>\n      <td>-0.110474</td>\n      <td>0.066928</td>\n      <td>0.128539</td>\n      <td>-0.189115</td>\n      <td>0.133558</td>\n      <td>-0.021053</td>\n      <td>149.62</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0.0</td>\n      <td>1.191857</td>\n      <td>0.266151</td>\n      <td>0.166480</td>\n      <td>0.448154</td>\n      <td>0.060018</td>\n      <td>-0.082361</td>\n      <td>-0.078803</td>\n      <td>0.085102</td>\n      <td>-0.255425</td>\n      <td>...</td>\n      <td>-0.225775</td>\n      <td>-0.638672</td>\n      <td>0.101288</td>\n      <td>-0.339846</td>\n      <td>0.167170</td>\n      <td>0.125895</td>\n      <td>-0.008983</td>\n      <td>0.014724</td>\n      <td>2.69</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1.0</td>\n      <td>-1.358354</td>\n      <td>-1.340163</td>\n      <td>1.773209</td>\n      <td>0.379780</td>\n      <td>-0.503198</td>\n      <td>1.800499</td>\n      <td>0.791461</td>\n      <td>0.247676</td>\n      <td>-1.514654</td>\n      <td>...</td>\n      <td>0.247998</td>\n      <td>0.771679</td>\n      <td>0.909412</td>\n      <td>-0.689281</td>\n      <td>-0.327642</td>\n      <td>-0.139097</td>\n      <td>-0.055353</td>\n      <td>-0.059752</td>\n      <td>378.66</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1.0</td>\n      <td>-0.966272</td>\n      <td>-0.185226</td>\n      <td>1.792993</td>\n      <td>-0.863291</td>\n      <td>-0.010309</td>\n      <td>1.247203</td>\n      <td>0.237609</td>\n      <td>0.377436</td>\n      <td>-1.387024</td>\n      <td>...</td>\n      <td>-0.108300</td>\n      <td>0.005274</td>\n      <td>-0.190321</td>\n      <td>-1.175575</td>\n      <td>0.647376</td>\n      <td>-0.221929</td>\n      <td>0.062723</td>\n      <td>0.061458</td>\n      <td>123.50</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2.0</td>\n      <td>-1.158233</td>\n      <td>0.877737</td>\n      <td>1.548718</td>\n      <td>0.403034</td>\n      <td>-0.407193</td>\n      <td>0.095921</td>\n      <td>0.592941</td>\n      <td>-0.270533</td>\n      <td>0.817739</td>\n      <td>...</td>\n      <td>-0.009431</td>\n      <td>0.798278</td>\n      <td>-0.137458</td>\n      <td>0.141267</td>\n      <td>-0.206010</td>\n      <td>0.502292</td>\n      <td>0.219422</td>\n      <td>0.215153</td>\n      <td>69.99</td>\n      <td>0</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 31 columns</p>\n</div>"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"valid = data[data['Class'] == 0]\nprint('Valid Transactions: {}'.format(len(data[data['Class'] == 0])))","execution_count":17,"outputs":[{"output_type":"stream","text":"Valid Transactions: 284315\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"fraud = data[data['Class'] == 1] \nprint('Fraud Cases: {}'.format(len(data[data['Class'] == 1]))) ","execution_count":18,"outputs":[{"output_type":"stream","text":"Fraud Cases: 492\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"X = data.drop(['Class'], axis = 1) \nY = data[\"Class\"] \nx = X.values \ny = Y.values","execution_count":21,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"X_train, X_test, Y_train, Y_Test = train_test_split(x, y, test_size = 0.25, random_state = 128)","execution_count":26,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"model = RandomForestClassifier() \nmodel.fit(X_train, Y_train) ","execution_count":27,"outputs":[{"output_type":"execute_result","execution_count":27,"data":{"text/plain":"RandomForestClassifier()"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"Y_Pred = model.predict(X_test)","execution_count":29,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"from sklearn.metrics import accuracy_score","execution_count":30,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"acc = accuracy_score(Y_Test, Y_Pred) \nprint(\"The accuracy is {}\".format(acc)) ","execution_count":33,"outputs":[{"output_type":"stream","text":"The accuracy is 0.9995646189713772\n","name":"stdout"}]}],"metadata":{"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.7.6","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"}},"nbformat":4,"nbformat_minor":4}
diff --git a/Diabetes-Prediction-master/Diabetes_Prediction.ipynb b/Diabetes-Prediction-master/Diabetes_Prediction.ipynb
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"cells":[{"metadata":{"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","trusted":true},"cell_type":"code","source":"import pandas as pd \nfrom sklearn.model_selection import train_test_split \nfrom sklearn.ensemble import RandomForestClassifier ","execution_count":14,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"data = pd.read_csv(\"../input/creditcardfraud/creditcard.csv\") ","execution_count":15,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"data.head(5) ","execution_count":16,"outputs":[{"output_type":"execute_result","execution_count":16,"data":{"text/plain":" Time V1 V2 V3 V4 V5 V6 V7 \\\n0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 \n1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 \n2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 \n3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 \n4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 \n\n V8 V9 ... V21 V22 V23 V24 V25 \\\n0 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 \n1 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 \n2 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 \n3 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 \n4 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 \n\n V26 V27 V28 Amount Class \n0 -0.189115 0.133558 -0.021053 149.62 0 \n1 0.125895 -0.008983 0.014724 2.69 0 \n2 -0.139097 -0.055353 -0.059752 378.66 0 \n3 -0.221929 0.062723 0.061458 123.50 0 \n4 0.502292 0.219422 0.215153 69.99 0 \n\n[5 rows x 31 columns]","text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Time</th>\n <th>V1</th>\n <th>V2</th>\n <th>V3</th>\n <th>V4</th>\n <th>V5</th>\n <th>V6</th>\n <th>V7</th>\n <th>V8</th>\n <th>V9</th>\n <th>...</th>\n <th>V21</th>\n <th>V22</th>\n <th>V23</th>\n <th>V24</th>\n <th>V25</th>\n <th>V26</th>\n <th>V27</th>\n <th>V28</th>\n <th>Amount</th>\n <th>Class</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0.0</td>\n <td>-1.359807</td>\n <td>-0.072781</td>\n <td>2.536347</td>\n <td>1.378155</td>\n <td>-0.338321</td>\n <td>0.462388</td>\n <td>0.239599</td>\n <td>0.098698</td>\n <td>0.363787</td>\n <td>...</td>\n <td>-0.018307</td>\n <td>0.277838</td>\n <td>-0.110474</td>\n <td>0.066928</td>\n <td>0.128539</td>\n <td>-0.189115</td>\n <td>0.133558</td>\n <td>-0.021053</td>\n <td>149.62</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>0.0</td>\n <td>1.191857</td>\n <td>0.266151</td>\n <td>0.166480</td>\n <td>0.448154</td>\n <td>0.060018</td>\n <td>-0.082361</td>\n <td>-0.078803</td>\n <td>0.085102</td>\n <td>-0.255425</td>\n <td>...</td>\n <td>-0.225775</td>\n <td>-0.638672</td>\n <td>0.101288</td>\n <td>-0.339846</td>\n <td>0.167170</td>\n <td>0.125895</td>\n <td>-0.008983</td>\n <td>0.014724</td>\n <td>2.69</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1.0</td>\n <td>-1.358354</td>\n <td>-1.340163</td>\n <td>1.773209</td>\n <td>0.379780</td>\n <td>-0.503198</td>\n <td>1.800499</td>\n <td>0.791461</td>\n <td>0.247676</td>\n <td>-1.514654</td>\n <td>...</td>\n <td>0.247998</td>\n <td>0.771679</td>\n <td>0.909412</td>\n <td>-0.689281</td>\n <td>-0.327642</td>\n <td>-0.139097</td>\n <td>-0.055353</td>\n <td>-0.059752</td>\n <td>378.66</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1.0</td>\n <td>-0.966272</td>\n <td>-0.185226</td>\n <td>1.792993</td>\n <td>-0.863291</td>\n <td>-0.010309</td>\n <td>1.247203</td>\n <td>0.237609</td>\n <td>0.377436</td>\n <td>-1.387024</td>\n <td>...</td>\n <td>-0.108300</td>\n <td>0.005274</td>\n <td>-0.190321</td>\n <td>-1.175575</td>\n <td>0.647376</td>\n <td>-0.221929</td>\n <td>0.062723</td>\n <td>0.061458</td>\n <td>123.50</td>\n <td>0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2.0</td>\n <td>-1.158233</td>\n <td>0.877737</td>\n <td>1.548718</td>\n <td>0.403034</td>\n <td>-0.407193</td>\n <td>0.095921</td>\n <td>0.592941</td>\n <td>-0.270533</td>\n <td>0.817739</td>\n <td>...</td>\n <td>-0.009431</td>\n <td>0.798278</td>\n <td>-0.137458</td>\n <td>0.141267</td>\n <td>-0.206010</td>\n <td>0.502292</td>\n <td>0.219422</td>\n <td>0.215153</td>\n <td>69.99</td>\n <td>0</td>\n </tr>\n </tbody>\n</table>\n<p>5 rows × 31 columns</p>\n</div>"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"valid = data[data['Class'] == 0]\nprint('Valid Transactions: {}'.format(len(data[data['Class'] == 0])))","execution_count":17,"outputs":[{"output_type":"stream","text":"Valid Transactions: 284315\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"fraud = data[data['Class'] == 1] \nprint('Fraud Cases: {}'.format(len(data[data['Class'] == 1]))) ","execution_count":18,"outputs":[{"output_type":"stream","text":"Fraud Cases: 492\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"X = data.drop(['Class'], axis = 1) \nY = data[\"Class\"] \nx = X.values \ny = Y.values","execution_count":21,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"X_train, X_test, Y_train, Y_Test = train_test_split(x, y, test_size = 0.25, random_state = 128)","execution_count":26,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"model = RandomForestClassifier() \nmodel.fit(X_train, Y_train) ","execution_count":27,"outputs":[{"output_type":"execute_result","execution_count":27,"data":{"text/plain":"RandomForestClassifier()"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"Y_Pred = model.predict(X_test)","execution_count":29,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"from sklearn.metrics import accuracy_score","execution_count":30,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"acc = accuracy_score(Y_Test, Y_Pred) \nprint(\"The accuracy is {}\".format(acc)) ","execution_count":33,"outputs":[{"output_type":"stream","text":"The accuracy is 0.9995646189713772\n","name":"stdout"}]}],"metadata":{"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.7.6","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"}},"nbformat":4,"nbformat_minor":4}