Update clustering.py #37

humbleOldSage · 2023-08-11T08:08:19Z

Changes in clustering.py file to shift dependency from hlu09's tour_model_extended to main branch's trip_model. Still need to change type of data being passed to fit function for this to work. Marked with a TODO. Explained in detail at #35 (comment)

Changes in clustering.py file to shift dependency from hlu09's tour_model_extended to main branch trip_model. Still need to change type of data being passed to fit function for this to work.

TRB_label_assist/clustering.py

All dependencies of this notebook from custom branch are removed. There currently seems no errors while generating maps in clustering_examples notebook.

With these changes, no change in e-mission-server should be required.

shankari

Much better....

Have you run the code?
Please indicate "testing done".
Do you get the same graphs as the paper?
Note that the check is not "do I get a graph", it is "do I get the same graph"

TRB_label_assist/clustering.py

humbleOldSage · 2023-08-17T07:24:08Z

Have you run the code?

Yes

Do you get the same graphs as the paper?

Yet to confirm

Please indicate "testing done".

Ongoing. The way I am planning to test this is I'll match and compare labels generated by both custom branch and master branch. This will verify that master branch and custom branch are functioning similarly.

Is there any other way I can test this ?

humbleOldSage · 2023-08-17T16:11:36Z

Do you get the same graphs as the paper?

They differ. Let me check why this is happening.

passing way of clustering to the e-mission-server. It was 'origin-destination' by default. Now can take one of three values, 'origin','destination' or 'origin-destination'.

humbleOldSage · 2023-08-20T16:56:09Z

Tested. This is running with no errors.
Can confirm this generates the same results.

shankari

this particular change seems fine if it works.

Turned out to be pretty simple after all?!
I would like to see more information in the PR issue that it works (screenshots, information about the model indicating that it works)
is this the only notebook that is affected by the change? I know that we have a notebook which generates the performance (accuracy/F-score) of various algorithms. I would expect that it would also need to be changed...

TRB_label_assist/clustering.py

humbleOldSage · 2023-08-20T19:15:37Z

is this the only notebook that is affected by the change? I know that we have a notebook which generates the performance (accuracy/F-score) of various algorithms. I would expect that it would also need to be changed...

Almost All the other notebooks have dependencies on this module

previous suggestions to improve readability.

This reverts commit 3e19b32.

humbleOldSage · 2023-08-20T20:22:48Z

I would like to see more information in the PR issue that it works (screenshots, information about the model indicating that it works)

Screenshot from the latest run, so no errors.

humbleOldSage · 2023-08-20T21:06:14Z

Left is current result. Right is from research paper.
Suburban 50m.

Suburban 100m

Suburban 150m

humbleOldSage · 2023-08-20T22:04:48Z

Left is current result. Right is from research paper.
College 50m.

College 100m

College 150m

Suggestions from previous comments to improve readability.

…VM_decision_boundaries` compatible with changes in `clustering.py` and `mapping.py` files. Also porting these 3 notebooks to trip_model `cluster_performance.ipynb`, `generate_figs_for_poster` and `SVM_decision_boundaries` now have no dependence on the custom branch. Results of plots are attached to show no difference in theie previous and current outputs.

TRB_label_assist/SVM_decision_boundaries.ipynb

TRB_label_assist/cluster_performance.ipynb

TRB_label_assist/clustering_examples.ipynb

`Classification_performance` and `regenerate_classification_performance_results.py` are not tested yet as they would take too long to run. The itertools removal in these two files is tested in other notebooks and it works. Other files, like models.py will be tested once any of the above two are run.

This reverts commit bb404e9.

[Partially Tested] Suggested changes implemented bb404e9 `Classification_performance` and `regenerate_classification_performance_results.py` are not tested yet as they would take too long to run. The itertools removal in these two files is tested in other notebooks and it works. Other files, like models.py will be tested once any of the above two are run.

humbleOldSage · 2023-11-07T05:49:35Z

Since this is partially tested, I'll keep the PR as draft, as soon as I have completed the final testing, I'll mark it as ready to merge.

shankari

Almost done, just a few minor changes

TRB_label_assist/SVM_decision_boundaries.ipynb

TRB_label_assist/models.py

Fixed names of variables to be more self-explanatory

humbleOldSage · 2023-11-10T04:56:44Z

Not tested. Needs Testing.

shankari

Even smaller cleanups

TRB_label_assist/clustering_examples.ipynb

TRB_label_assist/generate_figs_for_poster.ipynb

TRB_label_assist/get_performance_for_poster.ipynb

1. Change in models file a.t. changes in greedy_similarity_binning in e-mission-server 2.Minor fixes

humbleOldSage · 2023-11-16T20:45:08Z

generate_figs_from_poster.ipynb :

plot after latest testing

snap from the research paper :

plot after latest testing

snaps from the research paper :

humbleOldSage · 2023-11-16T21:31:35Z

generate_figs_for_poster.ipynb :

On the left are Plots after current testing, on the right are images from runs of notebook with @hlu109 custom branch :

naive fixed-width clustering from the first user's data

150m

50m

100m

DBSCAN without SVM: home cluster with a blue cluster to the south that was merged in

DBSCAN + SVM: home cluster and blue cluster to the south have been separated

humbleOldSage · 2023-11-16T21:51:56Z

Clustering_example.ipynb

Left is current test result. Right is from research paper.
Suburban 50m.

Suburban 100m

Suburban 150m

Left is current result. Right is from research paper.
College 50m.

College 100m

College 150m

humbleOldSage · 2023-11-16T22:08:39Z

SVM_decision_boundary.ipynb :

On the left are plots from current test, on the right are plots from old runs :

humbleOldSage · 2023-11-16T22:31:18Z

get_cluster_performance.ipynb :

For each pair, top one is the result of current test, bottom one is result from older runs :

humbleOldSage · 2023-11-16T22:55:53Z

All model results :

shankari

@humbleOldSage I don't think you have addressed several of the prior review comments. They are fairly simple, so you might have just missed them - please check the review history carefully.

Please make sure that all comments are addressed before marking as ready for review.

TRB_label_assist/clustering_examples.ipynb

TRB_label_assist/models.py

shankari · 2023-11-19T01:34:54Z

TRB_label_assist/models.py

@@ -378,13 +405,19 @@ def _distance_helper(self, tripa, tripb, loc_type):

            copied from the Similarity class on the e-mission-server. 


Now that we are no longer on a custom branch, can't the distance calculation in e-mission-server be re-used here? Why do we need copy-pasted code? I am generally again copy-pasting to support DRY. Since this change has already had multiple revisions, I am OK with deferring this to the next PR, but I want to make sure that it is not forgotten.

We are indeed using e-mission-server.

Line 416:

dist= ecc.calDistance([pta_lon,pta_lat],[ptb_lon,ptb_lat])

ecc here is on e-mission-server :

import emission.core.common as ecc

That's an old comment from Hannah that we should remove.

While predicting in the greedy_similarity_binning.py on e-mission-server, the flow goes like :

predict -> _nearest_bin ->similar ( in e-mission-server/emission/analysis/modelling/similarity/similarity_metric.py) ->similarity ( in e-mission-server/emission/analysis/modelling/similarity/od_similarity.py) -> ecc.calDistance.

Ans so currently we are using this ecc.calDistance

I think that you have misunderstood the comment. The comment is not about ecc.calDistance - it is clear that calDistance is from
This says that _distanceHelper is copied from e-mission-server, which is it

$ grep -r distance_helper emission/ emission//analysis/modelling/tour_model_first_only_orig/similarity.py: if not self.distance_helper(a, b): emission//analysis/modelling/tour_model_first_only_orig/similarity.py: def distance_helper(self, a, b): emission//analysis/modelling/tour_model/similarity.py: if not self.distance_helper(a, b): emission//analysis/modelling/tour_model/similarity.py: def distance_helper(self, a, b): emission//incomplete_tests/TestSimilarity.py: self.assertTrue(sim.distance_helper(b,c))

However, the implementation does seem to be a bit different, and I don't see a function with this name in trip_model. There must be an equivalent in trip_model to calculate distances between trips, which we should reuse here.

Other than ecc.calDistance, there just pre-processing in _distance_helper in the form of coordinate extraction :

#tripa is taken from the test datframe. #tripb is taken from the stored bin list. pta_lat = tripa[[loc_type + '_lat']] pta_lon = tripa[[loc_type + '_lon']] if loc_type == 'start': ptb_lat = tripb[1] ptb_lon = tripb[0] elif loc_type == 'end': ptb_lat = tripb[3] ptb_lon = tripb[2]

We do have extract_features function ( at e-mission-server/emission/analysis/modelling/similarity/od_similarity.py) on e-mission-server that extracts latitude and longitude of trips, but it works just for Entry type data ( since data frames are not used in e-mission-server).

There must be an equivalent in trip_model to calculate distances between trips, which we should reuse here.

For the reason above, there isn't. However, this is what we can do to use the _nearest_bin function ( e-mission-server/emission/analysis/modelling/trip_model/greedy_similarity_binning.py) which is closest to _distance_helper :

convert tripa ( the test trip) to entry type data using df_row_to_entry ( in e-mission-server/emission/storage/timeseries/builtin_timeseries.py).

Pass this entry to the _nearest_bin function.

Let me know if this works, and I'll test this.

From #37 (comment)

Since this change has already had multiple revisions, I am OK with deferring this to the next PR, but I want to make sure that it is not forgotten.

I am fine with returning to this later. However, eventually, we should simplify the codebase to either use only dataframes, or use only entries, or, if we are going to support some level of mix and match, have the utility functions support both combinations.

Can you please file an issue for this so that we don't forget it?

filed here #39 .

Minor Fixes to improve readability.

shankari · 2023-11-23T06:11:02Z

@humbleOldSage two more comments.

Improved readability

shankari · 2023-11-25T19:07:26Z

Squash-merging since this is 21 commits for some fairly simple changes.
@humbleOldSage please account for this while making any future changes.

* Update clustering.py Changes in clustering.py file to shift dependency from hlu09's tour_model_extended to main branch trip_model. Still need to change type of data being passed to fit function for this to work. * moving clustering_examples.ipynb to trip_model All dependencies of this notebook from custom branch are removed. There currently seems no errors while generating maps in clustering_examples notebook. * Removing changes in builtimeseries.py With these changes, no change in e-mission-server should be required. * Changes to support TRB_Label_Assist passing way of clustering to the e-mission-server. It was 'origin-destination' by default. Now can take one of three values, 'origin','destination' or 'origin-destination'. * suggestions previous suggestions to improve readability. * Revert "suggestions" This reverts commit 3e19b32. * Improving readability Suggestions from previous comments to improve readability. * making `cluster_performance.ipynb`, `generate_figs_for_poster` and `SVM_decision_boundaries` compatible with changes in `clustering.py` and `mapping.py` files. Also porting these 3 notebooks to trip_model `cluster_performance.ipynb`, `generate_figs_for_poster` and `SVM_decision_boundaries` now have no dependence on the custom branch. Results of plots are attached to show no difference in theie previous and current outputs. * Unified Interface for fit function Unified Interface for fit function across all models. Passing 'Entry' Type data from the notebooks till the Binning functions. Default set to 'none'. * Fixing `models.py` to support `regenerate_classification_performance_results.py` Prior to this update, `NaiveBinningClassifier` in 'models.py' had dependencies on both of tour model and trip model. Now, this classifier is completely dependent on trip model. All the other notebooks (except `classification_performance.ipynb`) were tested as well and they are working as usual. Other minor fixes to support previous changes. * [PARTIALLY TESTED] Single database read and Code Cleanuo 1. removed mentions of `tour_model` or `tour_model_first_only` . 2. removed two reads from database. 3. Removed notebook outputs ( this could be the reason a few diffs are too big to view) * Delete TRB_label_assist/first_trial_results/cv results DBSCAN+SVM (destination).csv not required. * Reverting Notebook Reverting notebooks to initial state, since running on the browser messed up the cell index numbers. This was causing unnecessary git diffs even when no changes were made. running on VS code should resolve this. WIll do the subsequent changes on VS code and commit again. * [Partially Tested]Handled Whitespaces Whitespaces corrected. * [Partially Tested] Suggested changes implemented `Classification_performance` and `regenerate_classification_performance_results.py` are not tested yet as they would take too long to run. The itertools removal in these two files is tested in other notebooks and it works. Other files, like models.py will be tested once any of the above two are run. * Revert "[Partially Tested] Suggested changes implemented" This reverts commit bb404e9. * [Partially Tested] Suggested changes implemented [Partially Tested] Suggested changes implemented bb404e9 `Classification_performance` and `regenerate_classification_performance_results.py` are not tested yet as they would take too long to run. The itertools removal in these two files is tested in other notebooks and it works. Other files, like models.py will be tested once any of the above two are run. * Minor variable fixes Fixed names of variables to be more self-explanatory * [TESTED] All the notebooks and files are tested 1. Change in models file a.t. changes in greedy_similarity_binning in e-mission-server 2.Minor fixes * Minor Fixes Minor Fixes to improve readability. * Minor Fixes in models.py Improved readability

Update clustering.py

431b33d

Changes in clustering.py file to shift dependency from hlu09's tour_model_extended to main branch trip_model. Still need to change type of data being passed to fit function for this to work.

shankari reviewed Aug 11, 2023

View reviewed changes

TRB_label_assist/clustering.py Outdated Show resolved Hide resolved

moving clustering_examples.ipynb to trip_model

36065b4

All dependencies of this notebook from custom branch are removed. There currently seems no errors while generating maps in clustering_examples notebook.

humbleOldSage requested a review from shankari August 16, 2023 00:21

Removing changes in builtimeseries.py

97406c4

With these changes, no change in e-mission-server should be required.

humbleOldSage mentioned this pull request Aug 16, 2023

Update greedy_similarity_binning.py e-mission/e-mission-server#930

Closed

shankari requested changes Aug 16, 2023

View reviewed changes

Changes to support TRB_Label_Assist

88988d3

passing way of clustering to the e-mission-server. It was 'origin-destination' by default. Now can take one of three values, 'origin','destination' or 'origin-destination'.

humbleOldSage requested a review from shankari August 20, 2023 16:59

shankari requested changes Aug 20, 2023

View reviewed changes

TRB_label_assist/clustering.py Show resolved Hide resolved

humbleOldSage marked this pull request as draft August 20, 2023 19:31

humbleOldSage added 2 commits August 20, 2023 15:36

suggestions

3e19b32

previous suggestions to improve readability.

Revert "suggestions"

0899ee4

This reverts commit 3e19b32.

Improving readability

667ab24

Suggestions from previous comments to improve readability.

humbleOldSage marked this pull request as ready for review August 20, 2023 23:08

humbleOldSage requested a review from shankari August 20, 2023 23:09

humbleOldSage marked this pull request as draft August 22, 2023 03:25

humbleOldSage marked this pull request as ready for review August 22, 2023 18:45

humbleOldSage marked this pull request as draft August 22, 2023 20:17

humbleOldSage commented Aug 22, 2023

View reviewed changes

TRB_label_assist/SVM_decision_boundaries.ipynb Show resolved Hide resolved

humbleOldSage commented Aug 23, 2023

View reviewed changes

TRB_label_assist/cluster_performance.ipynb Outdated Show resolved Hide resolved

humbleOldSage commented Aug 23, 2023

View reviewed changes

TRB_label_assist/clustering_examples.ipynb Outdated Show resolved Hide resolved

shankari mentioned this pull request Nov 3, 2023

Survey Assist Using RF e-mission/e-mission-server#938

Open

humbleOldSage added 3 commits November 7, 2023 00:24

Revert "[Partially Tested] Suggested changes implemented"

97475ef

This reverts commit bb404e9.

humbleOldSage requested a review from shankari November 7, 2023 05:49

shankari requested changes Nov 9, 2023

View reviewed changes

Minor variable fixes

2a39b12

Fixed names of variables to be more self-explanatory

shankari requested changes Nov 10, 2023

View reviewed changes

[TESTED] All the notebooks and files are tested

e0beb0e

1. Change in models file a.t. changes in greedy_similarity_binning in e-mission-server 2.Minor fixes

humbleOldSage requested a review from shankari November 16, 2023 22:16

humbleOldSage marked this pull request as ready for review November 16, 2023 22:45

shankari reviewed Nov 19, 2023

View reviewed changes

TRB_label_assist/clustering_examples.ipynb Outdated Show resolved Hide resolved

TRB_label_assist/models.py Show resolved Hide resolved

shankari reviewed Nov 19, 2023

View reviewed changes

Minor Fixes

c8c3883

Minor Fixes to improve readability.

humbleOldSage requested a review from shankari November 22, 2023 00:51

Minor Fixes in models.py

9225572

Improved readability

shankari approved these changes Nov 25, 2023

View reviewed changes

shankari merged commit 8d27847 into e-mission:master Nov 25, 2023

humbleOldSage mentioned this pull request Nov 26, 2023

👔 Switch the survey assist to use the random forest model e-mission/e-mission-docs#972

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update clustering.py #37

Update clustering.py #37

humbleOldSage commented Aug 11, 2023 •

edited

Loading

shankari left a comment •

edited

Loading

humbleOldSage commented Aug 17, 2023 •

edited

Loading

humbleOldSage commented Aug 17, 2023 •

edited

Loading

humbleOldSage commented Aug 20, 2023

shankari left a comment

humbleOldSage commented Aug 20, 2023 •

edited

Loading

humbleOldSage commented Aug 20, 2023

humbleOldSage commented Aug 20, 2023

humbleOldSage commented Aug 20, 2023

humbleOldSage commented Nov 7, 2023

shankari left a comment

humbleOldSage commented Nov 10, 2023

shankari left a comment

humbleOldSage commented Nov 16, 2023

humbleOldSage commented Nov 16, 2023

humbleOldSage commented Nov 16, 2023

humbleOldSage commented Nov 16, 2023

humbleOldSage commented Nov 16, 2023

humbleOldSage commented Nov 16, 2023

shankari left a comment

shankari Nov 19, 2023

humbleOldSage Nov 21, 2023 •

edited

Loading

humbleOldSage Nov 22, 2023 •

edited

Loading

shankari Nov 23, 2023 •

edited

Loading

humbleOldSage Nov 24, 2023 •

edited

Loading

shankari Nov 25, 2023 •

edited

Loading

humbleOldSage Nov 26, 2023

shankari commented Nov 23, 2023

shankari commented Nov 25, 2023

		@@ -378,13 +405,19 @@ def _distance_helper(self, tripa, tripb, loc_type):

		copied from the Similarity class on the e-mission-server.

Update clustering.py #37

Update clustering.py #37

Conversation

humbleOldSage commented Aug 11, 2023 • edited Loading

shankari left a comment • edited Loading

Choose a reason for hiding this comment

humbleOldSage commented Aug 17, 2023 • edited Loading

humbleOldSage commented Aug 17, 2023 • edited Loading

humbleOldSage commented Aug 20, 2023

shankari left a comment

Choose a reason for hiding this comment

humbleOldSage commented Aug 20, 2023 • edited Loading

humbleOldSage commented Aug 20, 2023

humbleOldSage commented Aug 20, 2023

humbleOldSage commented Aug 20, 2023

humbleOldSage commented Nov 7, 2023

shankari left a comment

Choose a reason for hiding this comment

humbleOldSage commented Nov 10, 2023

shankari left a comment

Choose a reason for hiding this comment

humbleOldSage commented Nov 16, 2023

humbleOldSage commented Nov 16, 2023

naive fixed-width clustering from the first user's data

DBSCAN without SVM: home cluster with a blue cluster to the south that was merged in

DBSCAN + SVM: home cluster and blue cluster to the south have been separated

humbleOldSage commented Nov 16, 2023

humbleOldSage commented Nov 16, 2023

humbleOldSage commented Nov 16, 2023

humbleOldSage commented Nov 16, 2023

shankari left a comment

Choose a reason for hiding this comment

shankari Nov 19, 2023

Choose a reason for hiding this comment

humbleOldSage Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

humbleOldSage Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

shankari Nov 23, 2023 • edited Loading

Choose a reason for hiding this comment

humbleOldSage Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

shankari Nov 25, 2023 • edited Loading

Choose a reason for hiding this comment

humbleOldSage Nov 26, 2023

Choose a reason for hiding this comment

shankari commented Nov 23, 2023

shankari commented Nov 25, 2023

humbleOldSage commented Aug 11, 2023 •

edited

Loading

shankari left a comment •

edited

Loading

humbleOldSage commented Aug 17, 2023 •

edited

Loading

humbleOldSage commented Aug 17, 2023 •

edited

Loading

humbleOldSage commented Aug 20, 2023 •

edited

Loading

humbleOldSage Nov 21, 2023 •

edited

Loading

humbleOldSage Nov 22, 2023 •

edited

Loading

shankari Nov 23, 2023 •

edited

Loading

humbleOldSage Nov 24, 2023 •

edited

Loading

shankari Nov 25, 2023 •

edited

Loading