Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated FeatureClassifier: partial update, still integrate to 1e4 #40

Closed
wants to merge 133 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
133 commits
Select commit Hold shift + click to select a range
9468319
Merged new feature series method, as well as temp model file for comp…
Ethadhani Jul 10, 2024
82bd82e
fixed discrepancy between strongest MMR calculations, as well as fixe…
Ethadhani Jul 10, 2024
0118284
update __init__ to include alternative featureKclassifier method
Ethadhani Jul 11, 2024
fce5dd2
Made old and new version of spock entirely compatible and producing i…
Ethadhani Jul 12, 2024
c8be93c
check point, will try implementing multiprocessing
Ethadhani Jul 12, 2024
b9b771d
fixed multi processing and increased speed
Ethadhani Jul 12, 2024
31bab28
removed redundant code
Ethadhani Jul 12, 2024
6d9e700
fixed num of data collections
Ethadhani Jul 12, 2024
79e990b
Worked on comparison pipeline
Ethadhani Jul 12, 2024
23e0ca8
Cleaned up comments
Ethadhani Jul 12, 2024
1aee18d
PIPELINE: CompairFeatGen.... and compairHelper.py contain the pipelin…
Ethadhani Jul 12, 2024
155561c
remove old feature classifier file, this version is non functional
Ethadhani Jul 12, 2024
08fed33
working version: replaced old feature generation method with new meth…
Ethadhani Jul 12, 2024
180bb51
remove comparison files of old and new method
Ethadhani Jul 15, 2024
68b8c9c
new train doc
Ethadhani Jul 15, 2024
d093f3b
New training data generation process
Ethadhani Jul 16, 2024
d07061f
remove empty file
Ethadhani Jul 16, 2024
16e46b9
Started working on new model training
Ethadhani Jul 16, 2024
b44a675
created new spock training file, and removed old notebooks
Ethadhani Jul 16, 2024
8251f23
rename training doc
Ethadhani Jul 16, 2024
44f7164
added source of data
Ethadhani Jul 16, 2024
8968144
added old rebound method
Ethadhani Jul 17, 2024
419d212
added support for old rebound megno
Ethadhani Jul 17, 2024
5888138
removed redundant file and fixed pool
Ethadhani Jul 17, 2024
108a263
made spock change the simulation in place if using old rebound versio…
Ethadhani Jul 17, 2024
9159c20
made changes to change simulation in place if using old rebound, will…
Ethadhani Jul 17, 2024
1226c82
not the last commit but the one before that, has the wrong descriptio…
Ethadhani Jul 17, 2024
c69cc4d
made spock data feature generation compatible with old rebound, speci…
Ethadhani Jul 17, 2024
a9ccb8c
fixed try except to handle unstable system
Ethadhani Jul 17, 2024
e6df5ff
Generated old rebound training data
Ethadhani Jul 17, 2024
d7a692b
rename old rebound data generation
Ethadhani Jul 17, 2024
5610094
trained spock with old rebound and compared to trained version with n…
Ethadhani Jul 17, 2024
7c061e0
updated to create more compatibility between old and new rebound, cha…
Ethadhani Jul 17, 2024
21e1ff4
Replace old model with new spock model
Ethadhani Jul 17, 2024
55ceb5a
re generated training data for spock with old and new rebound
Ethadhani Jul 17, 2024
0871cb4
update model
Ethadhani Jul 22, 2024
fb28376
small alterations and checks in gen training data
Ethadhani Jul 22, 2024
411edeb
update model training/model
Ethadhani Jul 22, 2024
98089ef
fix collision error in simsetup
Ethadhani Jul 22, 2024
fc99452
changes to old rebound train
Ethadhani Jul 22, 2024
0fa118b
update model and training
Ethadhani Jul 22, 2024
56a9a55
update gen training with documentation
Ethadhani Jul 22, 2024
c0f9412
Fixed predict_proba to return the probability of the system being sta…
Ethadhani Jul 26, 2024
577f509
Updated example notebooks, ran with new rebound+spock
Ethadhani Jul 29, 2024
b1f6c41
Fixed comments and removed internal testing doc
Ethadhani Jul 31, 2024
ac808b4
Old rebound comparison files, showing that we can use the new version…
Ethadhani Jul 31, 2024
d958018
make compatible with old spock
Ethadhani Jul 31, 2024
9934b27
Passed featureclassifier tests
Ethadhani Jul 31, 2024
fb07f0a
re ran examples with new model
Ethadhani Jul 31, 2024
cdbb2a6
removed comment to make more compatible
Ethadhani Jul 31, 2024
fcdd1c7
remove line
Ethadhani Jul 31, 2024
a801821
re added old files used for comparison and non feature classifier models
Ethadhani Jul 31, 2024
0a593b7
made file same as main spock
Ethadhani Jul 31, 2024
e3815f1
fixed thread pool
Ethadhani Jul 31, 2024
278bbef
added reproducibility note
Ethadhani Jul 31, 2024
fedcb16
added integration to Tmax option for featureclassifier, as well as cr…
Ethadhani Aug 1, 2024
214a180
added doc string and made it so that if tmax<1e4, intigration gets ru…
Ethadhani Aug 2, 2024
69931ad
generate Tmax train data and trainTmax model test
Ethadhani Aug 2, 2024
41a6fae
test
Ethadhani Aug 2, 2024
b7ebb30
test
Ethadhani Aug 2, 2024
d3a6b6a
test
Ethadhani Aug 2, 2024
c799d63
test
Ethadhani Aug 2, 2024
a40b8d6
test
Ethadhani Aug 2, 2024
e26e6c5
cap names
Ethadhani Aug 2, 2024
0989c34
added Tmax in feature return
Ethadhani Aug 2, 2024
b7d5365
changed function var name to make more sense
Ethadhani Aug 2, 2024
ac708d6
delete test.bin
Ethadhani Aug 2, 2024
3512ae5
made compatible with old spock and added Tmax feature (not used in mo…
Ethadhani Aug 2, 2024
0c4835d
Tmax as binary choice, no longer makes if tmax<1e4 then tmax=1e4
Ethadhani Aug 2, 2024
9f39102
delete test and update trainTmax
Ethadhani Aug 2, 2024
2962c10
added systems for testing not clean data
Ethadhani Aug 2, 2024
c36b1eb
Updated secular time scale calculation and changed function location
Ethadhani Oct 11, 2024
2708a1b
changed predict_stable pipeline to not use pandas df and use numpy ar…
Ethadhani Oct 11, 2024
f8fe33d
changed to using lambda to map data collection on simulations
Ethadhani Oct 11, 2024
0d1ae3c
updated data run function to default to 1e5 orbits if int based on se…
Ethadhani Oct 11, 2024
f04253d
changed specific feature functions to static
Ethadhani Oct 11, 2024
fe095ed
Updated comments and doc strings
Ethadhani Oct 11, 2024
3e451b8
Merge remote-tracking branch 'upstream/master'
Ethadhani Oct 12, 2024
1fedcff
removed static method conflict
Ethadhani Oct 12, 2024
8693df9
Ran tests varying int dt and integrating to one or two times Tsec
Ethadhani Oct 17, 2024
a557b04
Updated how predict_stable passes features to XGBoost to match old spock
Ethadhani Oct 20, 2024
0dc55fe
Updated choice to use 1*Tsec with dt=0.05
Ethadhani Oct 20, 2024
20b3a85
Generated data for integrating to one secular time scale
Ethadhani Oct 21, 2024
d080c0a
Integrating to one secular time scale model training and comparison t…
Ethadhani Oct 21, 2024
308fd23
Re named trainTmax to train and compare Tsec
Ethadhani Oct 21, 2024
472c4e0
Fixed file names and spelling mistake
Ethadhani Oct 21, 2024
2a79a42
IMPORTANT: change info & tests
Ethadhani Oct 21, 2024
02b3730
Removed Tmax choice
Ethadhani Oct 21, 2024
298d55a
Removed old test/change files and info
Ethadhani Oct 21, 2024
b356b75
renamed data generation file
Ethadhani Oct 21, 2024
bfdc583
updated comment
Ethadhani Oct 21, 2024
9edb0de
Updated model
Ethadhani Oct 21, 2024
2aa97d0
added Tsec to features for predict_stable, note, previous commit will…
Ethadhani Oct 21, 2024
3e63e8c
Updated examples
Ethadhani Oct 21, 2024
a45f4a1
removed binary file with FC model, it has been replaced with JSON
Ethadhani Oct 21, 2024
3fc359e
Normalized Tsec with period of inner most planet
Ethadhani Oct 21, 2024
025bca8
ran jupyter examples with adjusted Tsec
Ethadhani Oct 21, 2024
e4f3117
changed min period finder to look at abs
Ethadhani Oct 21, 2024
38cd8c1
fixed formatting
Ethadhani Oct 21, 2024
6801c9c
added warning for if Tsec>1e6 and we do not integrate to Tsec
Ethadhani Oct 22, 2024
4de89af
Updated style for ClassifierSeries
Ethadhani Oct 31, 2024
e8b2877
Updated features.py style
Ethadhani Oct 31, 2024
63b4ae7
Small style fixes
Ethadhani Nov 12, 2024
095dc34
Generated data integrated to 1e4
Ethadhani Nov 18, 2024
753536d
Generated Tsec Data
Ethadhani Nov 18, 2024
7634afa
removed old models
Ethadhani Nov 18, 2024
6d09da3
re trained and updated model with Tsec integration and feature
Ethadhani Nov 25, 2024
53804f1
Example of generating data using old rebound for pre cleaned data as…
Ethadhani Nov 26, 2024
8f45664
comparison of testing only on systems that are stable after short int
Ethadhani Nov 26, 2024
d681815
clean data verses not clean data comparison
Ethadhani Nov 26, 2024
a512504
updated rebound comparison
Ethadhani Nov 26, 2024
1ec64c7
Tsec Ablation study
Ethadhani Nov 26, 2024
5c1b193
remove old rebound dirty data generation example
Ethadhani Nov 26, 2024
12836b5
Made comparisons print AUC and FPR
Ethadhani Nov 27, 2024
20a0f74
re trained and updated model in accordance to original paper hyperpar…
Ethadhani Nov 27, 2024
2746c81
small style change
Ethadhani Nov 29, 2024
7586b9a
Update notebook examples
Ethadhani Nov 29, 2024
9757fec
Updated example notebooks
Ethadhani Nov 29, 2024
ae2bcf3
fixed print typo and re ran
Dec 5, 2024
deccbdf
reviewed comments, updating documentation and style
Dec 10, 2024
b73b36f
updated scikit-learn requirment due to depreciation warning, scikit-l…
Dec 10, 2024
fc68aad
un changed sklearn dependency
Dec 10, 2024
dad1d29
updated model trained with XGBoost version 2.1
Dec 10, 2024
5713f12
reran all examples using updated XGBoost version
Dec 10, 2024
790bd3b
deleted model file
Dec 19, 2024
f1210e1
re trained model using unstable XGBoost 3.0.0 to try alleviate sklear…
Dec 19, 2024
c103aa1
test: upload retrained model using unstable XGBoost 3.0.0 to try alle…
Dec 19, 2024
5c1cde4
re trained model using older (~2.1) XGBoost and changed saved name to…
Dec 23, 2024
d92153e
changed scikit-learn requirement until xgboost bug is patched
Dec 23, 2024
ccde5ab
SPOCK FC version with fully integrated Tsec
Dec 23, 2024
a22d7b2
fixed dependency
Dec 23, 2024
5c4e548
SPOCK FC version with fully integrated Tsec
Dec 23, 2024
086be72
SPOCK 1e4 integrated version
Dec 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
261 changes: 261 additions & 0 deletions generate_training_data/generate_FC_data.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Generate SPOCK training data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import spock\n",
"import random\n",
"import numpy as np\n",
"import rebound\n",
"import pandas as pd\n",
"from spock import simsetup\n",
"from spock import FeatureClassifier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"load dataset"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"#specify the data path\n",
"#We will be using cleaned data generated from the original spock initial conditions data, filtered according to https://github.com/Ethadhani/SPOCKcleanData.git\n",
"datapath = '../../cleanData/csvs/resonant/'\n",
"initial = pd.read_csv(datapath+'clean_initial_conditions.csv')\n",
"labels = pd.read_csv(datapath+'clean_labels.csv')\n",
"#drop junk column\n",
"initial = initial.drop('Unnamed: 0', axis = 1)\n",
"#merge labels and initial conditions based on runstring\n",
"Initialdataset = initial.set_index('runstring').join(labels.set_index('runstring'))\n",
"Initialdataset = Initialdataset.drop('Unnamed: 0', axis = 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can establish a function that, given a list of initial conditions, will return a rebound simulation"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def get_sim(row, dataset):\n",
" '''Given a row number, and a data sheet containing initial conditions, returns a corresponding simulation\n",
" \n",
" Arguments:\n",
" row: what row the simulation you would like to create is on\n",
" format of row is in order: \n",
" [index, 'p0m', 'p0x', 'p0y', 'p0z', 'p0vx', 'p0vy', 'p0vz', 'p1m', 'p1x', 'p1y',\n",
" 'p1z', 'p1vx', 'p1vy', 'p1vz', 'p2m', 'p2x', 'p2y', 'p2z', 'p2vx',\n",
" 'p2vy', 'p2vz', 'p3m', 'p3x', 'p3y', 'p3z', 'p3vx', 'p3vy', 'p3vz']\n",
"\n",
" dataset: what dataset contains your initial conditions\n",
"\n",
" return: returns a rebound simulation with the specified initial conditions'''\n",
" try:\n",
" data = dataset.loc[row]\n",
" sim = rebound.Simulation()\n",
" sim.G=4*np.pi**2\n",
" sim.add(m=data['p0m'], x=data['p0x'], y=data['p0y'], z=data['p0z'], vx=data['p0vx'], vy=data['p0vy'], vz=data['p0vz'])\n",
" sim.add(m=data['p1m'], x=data['p1x'], y=data['p1y'], z=data['p1z'], vx=data['p1vx'], vy=data['p1vy'], vz=data['p1vz'])\n",
" sim.add(m=data['p2m'], x=data['p2x'], y=data['p2y'], z=data['p2z'], vx=data['p2vx'], vy=data['p2vy'], vz=data['p2vz'])\n",
" sim.add(m=data['p3m'], x=data['p3x'], y=data['p3y'], z=data['p3z'], vx=data['p3vx'], vy=data['p3vy'], vz=data['p3vz'])\n",
" return sim\n",
" except:\n",
" print(\"Error reading initial condition {0}\".format(row))\n",
" return None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now generate the set of system row indices"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"#generates the indexes of the systems\n",
"systemNum = range(Initialdataset.shape[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can note the column names and import the different feature generators"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"col = ['EMcrossnear', 'EMfracstdnear', 'EPstdnear', 'MMRstrengthnear', 'EMcrossfar', 'EMfracstdfar', 'EPstdfar', 'MMRstrengthfar', 'MEGNO', 'MEGNOstd', 'InitialStable']"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"spock = FeatureClassifier()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then establish some helper functions that will allow us to map the spock.generate_feature function to the different systems by mapping to different row numbers and generating the correct simulation"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def getList(features):\n",
" '''Helper function which isolates the data list from the generate_features return'''\n",
" return list(features[0][0].values())+[features[1]]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def getFeat(num):\n",
" '''when given a index of a row, loads initial conditions and returns the spock generated features'''\n",
" #gets features based on index num\n",
" sim = get_sim(num,initial)\n",
" return spock.generate_features(sim)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'4.3.2'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rebound.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now map getFeat to the different rows of the Initial df, this will create each simulation and generate the spock features."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"from multiprocessing import Pool\n",
"if __name__ == \"__main__\":\n",
" with Pool() as pool:\n",
" features = pool.map(getFeat,systemNum)\n",
" pool.close()\n",
" pool.join()\n",
"#formats the data correctly\n",
"formattedFeat = pd.DataFrame(np.array(list(map(getList,features))), columns = col)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then join the generated features with the corresponding labels"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"dataset = pd.DataFrame.join(formattedFeat,labels)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then save the new training data spreadsheet."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"dataset.to_csv(datapath+'1e4Data.csv')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "ethadhani",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
19 changes: 17 additions & 2 deletions generate_training_data/reproducibility.ipynb

Large diffs are not rendered by default.

49 changes: 21 additions & 28 deletions jupyter_examples/ComparingToNbody.ipynb

Large diffs are not rendered by default.

61 changes: 13 additions & 48 deletions jupyter_examples/GiantImpactPhase.ipynb

Large diffs are not rendered by default.

21 changes: 7 additions & 14 deletions jupyter_examples/GridOfStabilityPredictions.ipynb

Large diffs are not rendered by default.

183 changes: 33 additions & 150 deletions jupyter_examples/QuickStart.ipynb

Large diffs are not rendered by default.

Loading
Loading