Data Mining and User Portrait Course Design and Report, including experiments and reports of 3 Kaggle datasets and a small paper (including duplicate check report)
Analysis of Credit Card Fraud
European credit cardholders' recorded 284,807 transactions in the 2-day period of September 2013, 492 of which were fraudulent transactions, accounting for 0.172%. The data are mapped into V1, V2,..., V28 numerical attributes by PCA transformation. Only the two variables of transaction time and amount have not undergone PCA transformation. The output variable is a binary, 1 is normal and 0 is the fraudulent transaction.
Data: https://www.kaggle.com/mlg-ulb/creditcardfraud
Recommendation of Bank Products
The data of the bank product recommendation competition are the bank product recommendation data of Santander Bank in one and a half years. Through the bank products (credit cards, savings accounts, checkbooks) held by customers in history, the products that customers may purchase and use in the future are predicted.
Data: kaggle competition download-c santander-product-recommendation
Loan Default Forecast Competition Data "Kaggle Competition"
Loan Default Forecast Competition Data, which is personal financial transaction data, has passed standardized and anonymous processing. Including nearly 800 attribute variables of 200,000 samples, each of which is independent of each other. Each sample is marked as defaulted or not defaulted, and if it has defaulted, the loss is marked at the same time. The loss is between 0 and 100, which means the loss rate of the loan. The loss rate of non-default is 0, and the default loss of personal loans is predicted and modeled through the features of the samples. The data come from Imperial College London.
Data: kaggle competitions download-c loan-default-prediction
There are many deficiencies. If time allow, I hope the peers can make some suggestions.