Writing this assuming the use of HomeBrew, so some changes will need to be made if you are working on another system.
Download the python distribution that is compatible with Tkinker; we use python-tk@3.11
brew install python-tk@3.11
Clone the Git Repo:
git clone https://github.com/osullik/GESTALT.git
Navigate to the root of that project ~/GESTALT/
cd GESTALT
create a virtual environment using the requirements.txt
python -m venv gestalt_env
You now need to like Tkinter to your venv, This slack overflow post is helpful, but assuming you installed as above use (dependent on your version):
cd gestalt_env/lib
ln -s /opt/homebrew/Cellar/python-tk@3.11/3.11.4
[optional] if unsure of your versions, use the following command to list out your python versions in homebrew:
brew list | grep "python"
change back to the root directory
cd ../..
and activate your virtual environment
source gestalt_env/bin/activate
install the requirements for the venv:
pip install -r code/requirements.txt
With all this in place you're ready to start working with GESTALT!
To demonstrate the utility of GESTALT and allow for the replication of our results, we offer two bash scripts to build and execute GESTALT for each respective datatset.
To get the files from FLICKR you need API Keys. API access can be requested here. You then need to add them to your local environment variables with commands
export flickr_key=<your_api_key_here>
export flickr_secret=<your_secret_key_here>
To use the end-to-end scripts, navigate to the scripts folder from the root directory IMPORTANT: The scripts use relative directory addressing and all assume that they are being executed from within the GESTALT_SIGSPATIAL/scripts directory:
cd scripts
sh 99a_ingestSwanValley.sh
sh 99b_IngestDC.sh
Note that after you have run the shell scritps for the first time you may want to comment out the lines that invokes the querying of FLICKR - the results will be stored after a single run.
Note that depending on which experiment you are running you will need to modify the paths in the 'individual' scripts below.
Navigate to the scripts folder from the root directory IMPORTANT: The scripts use relative directory addressing and all assume that they are being executed from within the GESTALT/scripts directory:
cd scripts
To extract the KML files run the following command (note: you can add or subtract additional KML files following the instructions in the script)
sh 10_ingestKML.sh
The JSON outputs of the KML files will reside in GESTALT/data/output/dataCollection
To extract all objects within a bounding box from the overpass API use the following (noting that you can edit the bounding box and in the shell file):
sh 20_queryAllObjects.sh
The JSON output of the query will reside in GESTALT/data/output/dataCollection/osm_:bbox:_allobjects.json (whrre :bbox: is the bounding box the objects are found in.)
To get the files from FLICKR you need API Keys. API access can be requested here. You then need to add them to your local environment variables with commands
export flickr_key=<your_api_key_here>
export flickr_secret=<your_secret_key_here>
then you can access the flickr and download all relevant images using:
sh 30_queryFlickr.sh
and to extract the objects fro, them run:
sh 40_ingestFlickrObjects.sh
To extract the locations with specific search terms from the Openstreetmaps overpass API use the following (noting that you can edit the bounding box and search terms in the shell file):
sh 50_queryLocations.sh
The JSON output of the query will reside in GESTALT/data/output/dataCollection/:osmsearchTermList:.json (where :osmsearchTermList is the concatenated list of all search terms)
This completes the data collection phase
The ownership assignment analyses the collected data, and determines which objects belong to which location. To run these steps, do:
sh 60a_assign_kmeans.sh
sh 60b_assign_dbscan.sh
These will output their results to: GESTALT/data/output/owneshipAssignment/KMEANS_PredictedLocations.csv and GESTALT/data/output/owneshipAssignment/DBSCAN_PredictedLocations.csv respectively
The concept mapping takes the predicted loctations of each objects and creates a grid representation of the locaiton to be used in searching. to execute it, run:
sh 61_createConceptMaps.sh
To activate the search functions run:
sh 70_searchGestalt.sh
python ../code/UI.py
We provide a Jupyter Notebook configured to allow you to explore the results of clustering. Note it will only work after you have completed the clustering step. Open a new terminal and navigate back to your GESTALT_SIGSPATIAL directory.
First, we need to make sure Jupyter Can get to the gestalt_env virtual environment by making it into a kernel. Use the following command:
ipython kernel install --name "gestalt_env" --user
Then; lunch Jupyter Lab with
jupyter lab
Once it has launched, you will need to select the gestalt_env kernel and launch the Clustering.ipynb file.
To run the vizualizations simply change the data directory to match either your SV or DC path to the data/<DS/SV>/output/ownershipAssignment directory and run all the cells to the bottom in order.
We have built-in checks to prevent directory and file errors, but in the event that there are issues, you may need to manually create some of the data directories below.
├── README.md
├── code
│ ├── Clustering.ipynb
│ ├── ClusteringMetrics.py
│ ├── UI.py
│ ├── conceptMapping.py
│ ├── dataCollection.py
│ ├── experimentVariables.py
│ ├── experiments.py
│ ├── gestalt.py
│ ├── ownershipAssignment.py
│ ├── queryFlickr.py
│ ├── requirements.txt
│ ├── search.py
│ └── tests.py
├── data
│ ├── DC
│ │ ├── input
│ │ ├── output
│ │ │ ├── concept_mapping
│ │ │ ├── dataCollection
│ │ │ └── ownershipAssignment
│ │ └── photos
│ │ └── -77.120248_38.791086_-76.911012_38.995732
│ └── SV
│ ├── input
│ │ └── Swan_Valley.kml
│ ├── output
│ │ └── output
│ │ ├── concept_mapping
│ │ ├── dataCollection
│ │ └── ownershipAssignment
│ └── photos
│ └── 115.96168231510637_-31.90009882641578_116.05029961853784_-31.77307863942101
├── labeled_queries
│ ├── obj-loc.txt
│ └── obj-obj.txt
└── scripts
├── 10_ingestKML.sh
├── 20_queryAllObjects.sh
├── 30_queryFlickr.sh
├── 40_ingestFlickrObjects.sh
├── 50_queryLocations.sh
├── 51_queryAllLocations.sh
├── 60a_assign_kmeans.sh
├── 60b_assign_dbscan.sh
├── 61_createConceptMaps.sh
├── 70_searhGestalt.sh
├── 99_experiments.sh
├── 99a_ingestSwanValley.sh
└── 99b_IngestDC.sh
Contains all files relevant to papers, presentations etc. Primarily LaTeX. Included the bibliography file.
Contains all code relevant to the solving the problem at hand. Primarily C, little bit of python. Possibly some assembly, depending how unlucky we get.
Contains TRIVIAL code examples for testing code only. All large data files are stored externally.
check if you have it installed:
git --version
if you don't then install it (I recommend using homebrew, since we're all of the superior Mac User subspecies)
brew install git
If you don't have homebrew then install it with:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Once you have git up and running you want to use the git clone command to get the repo on your local system:
git clone https://github.com/osullik/GESTALT.git
Once you've got it replicated on your local you can get to work.
You can add single files, multiple files or directories all at once. ( I caution against using the * wildcard operator). for example if we were working on test.py and code.py we might use
git add code.py
git add test.py
or, we could also do
git add code.py test.py
or assuming that they are the only two files in the directory
git add .
Once you have added the file to the staging area you need to commit it (that is, on your local system, you create an entry in the version control system). I recommend using the -m tag to add a comment to the commit. The comments should be descriptive and meaningful (imagine you, in a panic at 3am before a demo sorting through 100 commit messages that say "updated code" trying to find the one you actually want)
git commit -m "Added the hello_world unit test to test.py and the hello_world function to code.py. The unit test passes"
To make those changes to the github page itself use:
git push
Sometimes you may get an error when running the push command saying that there are changes you don't yet have on your local machine and that you need to reconcile those first. something like this:
! [rejected] main -> main (fetch first)
error: failed to push some refs to 'https://github.com/osullik/GESTALT.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
This means that someone else (or you) has modified the github repo and it no longer matches yours. To reconcile those you want to use:
git pull
All going well, it should not have any merge conflict (i.e. two people editing the same file at the same time). If you manage to hit a merge conflict ping me on here and I'll come help you. It might ask you to add a merge message, if so, just use the same comment you used for your git commit.
If in doubt, check the status with:
git status
My rule of thumb is to commit whenever you have a new unit-test passing, so that you've got it as a working 'checkpoint'. But more often is better than not often enough.
For JABREF, you can install it with homebrew using:
brew install --cask jabref
We'll use git to our advantage, but will work through how we want to handle branching, merges etc and document here.
We should aim for Test-Driven development. In general:
- identify a problem
- write a test that fails that problem
- write code to solve the problem
- pass the test
- refactor the code to solve the general case.
If not fully test-driven, we should at least produce a unit test for each function that we write. As we mature we can add regression testing, integration testing etc.
Writing unit tests is important for ensuring that our code functions as intended. In a mature state, we should aim to write our tests first and then use the tests to guide how we write our code!
I've found a good little library for testing that I'll step you through how to use. This assumes that we're working in a directory CodeDir following our common directory structure, that is, with the files: myCode.py, test.py
CodeDir/
myCode.py
test.py
We will use VSCode liveshare to conduct pair programming.
To install VSCode I recommend (as always) using homebrew with:
brew install --cask visual-studio-code
note that specifying --cask indicates to homebrew that this will be a standalone app, usually with a GUI.
Once installed you can launch VSCODE from the command line with:
code
Once opened, you'll need to install two extensions to VSCODE by navigating to the extensions menu (or use COMD + SHIFT + X) and searching for:
- Live Share
To share a file, open your file of interest in the editor and look in the bottom left of the VSCode screen. There will be an option to "live share". Click it. You may need to sign in using github. You then have a link you can send to people to start collaborating!
We'll use Google Drive at the link above Initially, with a TO-DO of finding a better version control system for datasets.
ALEEZA
- Get Github Account
- Send kent / nicole Github Handle
- Read GETALT Paper
- Confirm whether of not you want to do this project
KENT
- Add Aleeza to Github
- Refactor build scripts & Test from scratch
- Send docco of Inputs & outputs
- Send Django Docco
- Send resources for getting started with Git.
NICOLE
- Create Lab Slack
- Arrange Meeting with Prof Samet
- Arrange Weekly Check-in time