MoleculeNet benchmark dataset & MolMapNet dataset
- 
            Updated
            Mar 29, 2022 
- HTML
MoleculeNet benchmark dataset & MolMapNet dataset
A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
Data-Splitter is a Python script designed to split a large CSV file containing data into three different formats: JSON, a database table, and another CSV file. The script ensures a random distribution of data across the three output formats based on custom-defined ratios.
splitting image dataset into train, val, test sets
Comparative Analysis of Data Protection Mechanisms in Public Clouds
This project focuses on cleaning and analyzing a loan application dataset to gain insights into the factors influencing loan defaults. Through systematic data cleaning, visualization, and merging with previous application data, it provides a robust foundation for further predictive modeling.
Analyzed customer churn using transaction data. Built ML model to predict lapses. Dataset includes customer status, collection/redemption info, and program tenure. Delivered business presentation outlining modeling approach, findings, and churn reduction strategies.
ML model for Crop Detection
In this project, I have used logistic regression, a supervised machine learning algorithm, to predict whether a person has diabetes or not based on various features such as age, blood pressure, glucose level, body mass index, etc. I have used Python and popular libraries such as Pandas, Scikit-Learn, and Matplotlib to perfom model building
A basic Python script to split a .dat file into individual sample files.
Julia package for "FDR Control via Data Splitting for Testing-after-Clustering (arXiv: 2410.06451)"
Predicting company bankruptcy using various machine learning models. The dataset is sourced from Kaggle: Company Bankruptcy Prediction.
Split a dataset into subsets of specified sizes such that each subset preserves the original label distribution enhanced by stratification on UMAP-based pseudo-labels. This method ensures splits are balanced both by true labels and the data’s underlying manifold structure.
Apply DUPLEX data split to the given dataset and return training and test datasets. REF: Snee, R. D. (1977). Validation of regression models: methods and examples. Technometrics, 19(4), 415-428.
Pattern Recognition - ECE - TTU - Spring 2021
Final project program DBA mitra Ruangguru X Studi Independen Bersertifikat Kampus Merdeka batch 2
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
A sample model for predicting the systolic level of an individual by providing the age,cholesterol and blood pressure
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
An application that splits a single Excel file into multiple files
Add a description, image, and links to the data-splitting topic page so that developers can more easily learn about it.
To associate your repository with the data-splitting topic, visit your repo's landing page and select "manage topics."