API Migration Automation script - patch 2 #1031

niccolopaganini · 2023-09-13T22:25:12Z

Converted the .ipynb file to python scripts (12 in total; each serving their own purpose; explained in README.md)
Added a shell script which executes the aforementioned python scripts
README.md file included which explaining the whole process
Screenshots directory included for README
Archive folder containing the .ipynb file (with updated code) and YML file for reference (included if in-case for future reference)

niccolopaganini · 2023-09-13T22:28:31Z

Closed API Migration Automation script #1025 and reopened it here as I made multiple changes and wanted a separate PR for this patch.
Issue tracked in Script to narrow down on changes (methods) that's considered for API migration e-mission-docs#963 -> I will be compiling all the errors faced here (which will be done before 9/14/2023 9AM)

niccolopaganini · 2023-09-14T23:07:35Z

Closed API Migration Automation script #1025 and reopened it here as I made multiple changes and wanted a separate PR for this patch.

Issue tracked in Script to narrow down on changes (methods) that's considered for API migration e-mission-docs#963 -> I will be compiling all the errors faced here (which will be done before 9/14/2023 9AM)

Mentioned in issue - Also mentioning here for reference: Spotty internet connection (I couldn't connect my work computer to campus wifi); I will try to update the error logs sometime tonight

shankari

This is a lot better than the previous notebook, but still needs some cleanup.
High level comments below.

shankari · 2023-09-16T04:05:11Z

bin/API_migration_scripts/an_archive/environment.yml

why is environment.yml in an_archive? It is still valid and needs to be activated for the script to run, correct?

Please make sure to pin the dependencies for all packages!

The e-mission environment already has several of these dependencies; have you considered just adding on to the existing environment with a separate file, similar to https://github.com/e-mission/e-mission-server/blob/master/setup/environment36.notebook.additions.yml? What are the pros and cons of that approach?

shankari · 2023-09-16T04:05:17Z

bin/API_migration_scripts/an_archive/environment.yml

+  - _anaconda_depends
+  - python=3.11
+  - os
+  - csv


why do you need csv when you have pandas?

shankari · 2023-09-16T04:34:29Z

bin/API_migration_scripts/an_archive/environment.yml

+  - requests
+  - bs4
+  - re
+  - subprocess


why do you need subprocess?

shankari · 2023-09-16T04:42:44Z

bin/API_migration_scripts/README.md

+```
+links.py -> links = get_links("https://developer.android.com/sdk/api_diff/33/changes/alldiffs_index_changes")
+```
+_Link should look something like this:_
+```
+https://developer.android.com/sdk/api_diff/33/changes/alldiffs_index_changes
+```


It seems like it would be much better for this to be a command line argument instead of requiring code edits?

shankari · 2023-09-16T04:43:58Z

bin/API_migration_scripts/README.md

+**3. Update locations for ```output.py``` accordingly**
+```
+    match_csv_java(".../e-mission-phone/plugins", ".../e-mission-phone/bin/API_Migration_scripts/simplify_classes.csv")
+```


why do we need to update this location given that the default value is a relative path? Are you thinking that people may have different names for the plugins directory? If so, why and how?

shankari · 2023-09-16T04:47:22Z

bin/API_migration_scripts/README.md

+```
+chmod +x run_them_py_and_delete_em_csvs.sh
+```


Why do you need this if you are going to use bash to run the script anyway?

shankari · 2023-09-16T18:46:39Z

bin/API_migration_scripts/an_archive/Changes for script.ipynb

This won't actually run with the current dependencies (which don't include any of the jupyter notebook modules).
And it is actually possible to edit the python scripts directly.

So is it actually worth keeping the notebook around?

shankari · 2023-09-16T18:58:10Z

bin/API_migration_scripts/run_them_py_and_delete_em_csvs.sh

I have not encountered this kind of script structure before, where you essentially run 10 different python files, each of which creates a new csv, before. It almost seems like you took every cell from the notebook and made it a separate python file. While modularity is important, this seems extreme, and I'm concerned that the fragmentation will make it harder for people to understand the code and the flow.

Have you considered making each of these a function in a shared file, and then creating a __main__ that calls all the functions? You can then pass the dataframe around the functions instead of writing and reading csvs at every stage.

If you have, please explain your rationale for this approach.

shankari · 2023-09-16T19:01:16Z

bin/API_migration_scripts/screenshots/CSVs.png

If something can be represented as text, it should be, to make it easier to search, copy/paste etc
In this case, this can be replaced by

$ ls -al ...

shankari · 2023-09-16T19:01:30Z

bin/API_migration_scripts/screenshots/csv order.png

niccolopaganini · 2023-09-22T18:47:43Z

Addressed issues regarding fragmentation (changing from 12 scripts to 4 "head" python scripts) which allows the modularity without being fragmented
Paths have been changed top be relative
API changes link previously had to be changed by the user. Now it can be inputted by the user.
Pending: README needs to be updated & setup script needs to be created.

niccolopaganini · 2023-09-23T02:35:09Z

4. README needs to be updated ✅

shankari · 2023-09-24T04:49:30Z

Addressed issues regarding fragmentation (changing from 12 scripts to 4 "head" python scripts) which allows the modularity without being fragmented

What is your rationale for even needing 4 scripts? Why not just combine into one script?
As I pointed out, you can still have separate methods in the one script.

niccolopaganini · 2023-10-03T16:37:26Z

Not sure what happened but when I deleted the repo on GitHub, the local dir was also deleted. I restored it from a recent HEAD and had to work on it afterward. I have now (hopefully) addressed all the comments. With this push,

I have boiled it down to one python script
Parallelized the process so now instead of taking 4 minutes to run, it takes 28 seconds on average to execute everything
Two bash files
- One to execute everything
- One to remove the rest once the work is done
Updated README
Also removed all the CSV nonsense so a person using this will just have to refer one file

shankari · 2023-10-05T04:21:28Z

README.md

why is this file included here?

shankari · 2023-10-05T04:38:02Z

bin/API_migration_scripts/API_changes_identifier.py

+
+    links = []
+    for a in soup.find_all("a"):
+        if a.has_attr("href") and a["href"].startswith("/sdk/api_diff/33/changes/"):


this is still hardcoded

shankari · 2023-10-05T04:39:27Z

bin/API_migration_scripts/API_mcf.sh

+if [ -e classes.csv ]; then 
+    rm classes.csv
+fi
+
+if [ -e modified_csv.csv ]; then 
+    rm modified_csv.csv
+fi


why do we have to write csv files any more given that all the functions are in the same file? We can just pass the dataframes around and avoid the I/O overhead.

shankari · 2023-10-05T04:41:13Z

bin/API_migration_scripts/API_changes_identifier.py

@@ -0,0 +1,248 @@
+#API changes identifier
+import os
+import csv


I had this an earlier comment. Why do we need csv instead just using pandas?
#1031 (comment)

Also, where is the environment.yml with the new packages needed by this code?

shankari · 2023-10-05T04:49:08Z

bin/API_migration_scripts/API_changes_identifier.py

+    unique_packages = set()
+    with open("unique_packages.csv", "r") as unique_file:
+        reader = csv.reader(unique_file)
+        for row in reader:
+            if row:
+                unique_packages.add(row[0].strip())  # Remove leading/trailing whitespace


Suggested change

unique_packages = set()

with open("unique_packages.csv", "r") as unique_file:

reader = csv.reader(unique_file)

for row in reader:

if row:

unique_packages.add(row[0].strip()) # Remove leading/trailing whitespace

output_df = match_csv_java(....)

unique_packages = get_unique_packages(output_df)

shankari · 2023-10-05T04:49:52Z

bin/API_migration_scripts/API_changes_identifier.py

+    with open("unique_packages.csv", "w") as f:
+        csvwriter = csv.writer(f)
+        for package in packages:
+            csvwriter.writerow([package])


Suggested change

with open("unique_packages.csv", "w") as f:

csvwriter = csv.writer(f)

for package in packages:

csvwriter.writerow([package])

return packages

shankari · 2023-10-05T04:58:25Z

bin/API_migration_scripts/API_changes_identifier.py

+    with concurrent.futures.ThreadPoolExecutor() as executor:
+        results = executor.map(get_changed_content, links)
+


Good job on experimenting with python concurrency.

Although from a design perspective, this is a bit of overkill for what should be a super simple script.
Four minutes is not that long for a script that is run once every year, and concurrency can add complexity that makes it harder to maintain simple scripts.

However, this appears to be embarrassingly parallel, so the complexity seems manageable for now.
So I will not request a change for now, but will just note that simplicity is also a virtue, especially for simple scripts that are not part of the production system.

shankari requested changes Sep 16, 2023

View reviewed changes

shankari requested changes Oct 5, 2023

View reviewed changes

niccolopaganini closed this Nov 6, 2023

niccolopaganini force-pushed the master branch from 5566a8b to b8a5cce Compare November 6, 2023 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Migration Automation script - patch 2 #1031

API Migration Automation script - patch 2 #1031

niccolopaganini commented Sep 13, 2023

niccolopaganini commented Sep 13, 2023

niccolopaganini commented Sep 14, 2023

shankari left a comment

shankari Sep 16, 2023

shankari Sep 16, 2023

shankari Sep 16, 2023

shankari Sep 16, 2023

shankari Sep 16, 2023

shankari Sep 16, 2023

shankari Sep 16, 2023

shankari Sep 16, 2023

shankari Sep 16, 2023

shankari Sep 16, 2023

niccolopaganini commented Sep 22, 2023

niccolopaganini commented Sep 23, 2023

shankari commented Sep 24, 2023 •

edited

Loading

niccolopaganini commented Oct 3, 2023 •

edited

Loading

shankari Oct 5, 2023

shankari Oct 5, 2023

shankari Oct 5, 2023

shankari Oct 5, 2023

shankari Oct 5, 2023

shankari Oct 5, 2023

shankari Oct 5, 2023

shankari Oct 5, 2023

		with concurrent.futures.ThreadPoolExecutor() as executor:
		results = executor.map(get_changed_content, links)

API Migration Automation script - patch 2 #1031

API Migration Automation script - patch 2 #1031

Conversation

niccolopaganini commented Sep 13, 2023

niccolopaganini commented Sep 13, 2023

niccolopaganini commented Sep 14, 2023

shankari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niccolopaganini commented Sep 22, 2023

niccolopaganini commented Sep 23, 2023

shankari commented Sep 24, 2023 • edited Loading

niccolopaganini commented Oct 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari commented Sep 24, 2023 •

edited

Loading

niccolopaganini commented Oct 3, 2023 •

edited

Loading