10am-12pm on Monday 17 March 2025 at the Alan Turing Institute, https://ai-uk.turing.ac.uk/
- PDF parsing - see PDF parsing challenge
- Improve matching of items by semantic similarity - see matching challenge
- Add new data sources to Harmony Discovery
- Add new sources to Harmony
- Identify topics of items in Discovery
- Deduplicate items if they come from multiple sources
- Plus any ideas you might have!
We have some UX issues that could be fixed - see UX testing report.
Other ideas requested by research psychologists: can we add to the export, groups of similar items? E.g. everything to do with height across 5 studies? Perhaps this could also be another view in the visualisation in the tool.
Short demo: https://www.youtube.com/watch?v=cEZppTBj1NI Presentation at Melbourne Children's LifeCourse Initiative seminar: https://www.youtube.com/watch?v=ZPY-fPsVIE4
You can use Windows, Linux or Mac. We have made some videos to help you install Python and Harmony:
🎬Video for Windows · 🎬 Video for Linux/Mac · 🎬 Video on how to install the front end locally
Here are the steps to get started:
-
First clone the repository from Git. If you're not familiar with Git and Github, we recommend you watch a tutorial on Git first (example: https://www.youtube.com/watch?v=USjZcfj8yxE)
-
Install Python 3.11
-
Install Pycharm
-
Install Jupyter Notebooks
-
Run the example Colab notebook
-
We recommend Anaconda and Jupyter Notebook
-
Then you can do
pip install harmonydata
to install Harmony once Python has been installed.
We try to keep our code clean and consistent. If one person uses spaces and another uses tabs, it's hard to manage it and keep track of code changes. Please follow the general principles for consistency.
When developing and pushing changes,
- Please use PEP-8 Linter - this is a set of rules of how many whitespace characters are allowed in a line, and in general provides consistency for formatting of human readable code and comments. If everyone formats their code differently, things become hard to manage as it's hard to track if a change is a functional change, or a formatting change. Imagine if a newspaper article switched between British and American spelling every sentence and between formal tone and textspeak! Let's keep things consistent!
- Please run unit tests before pushing. We use test driven development. That means that every commit gets tested automatically by Github and will get a green tick or red cross if the tests pass or fail. All the repos have tests in a folder called
tests
and you can run them on your computer and Github actions will run them when you commit. They will tell you if you break any functionality. - Check your PR hasn’t got any extra files made by your IDE that shouldn’t be committed, such as .vscode. It's a common mistake for beginners to bulk commit the entire contents of a directory including files which are not part of the project. For example, Mac puts extra hidden files inside folders when you open them in the file browser. Try not to let them clutter our code base. They make code hard to manage and in some cases can break the tool.