Skip to content

Commit

Permalink
feat(#59): doc step
Browse files Browse the repository at this point in the history
  • Loading branch information
h1alexbel committed Sep 20, 2024
1 parent a0d3c7e commit 30ecc25
Showing 1 changed file with 47 additions and 0 deletions.
47 changes: 47 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,48 @@ just datasets

You should expect to have all seven files in `sr-data/experiment` directory.

### Cluster

We apply [clustering][Cluster analysis] on our previously
[created datasets](#create-datasets). We use the following algorithms:

* [KMeans]
* [Agglomerative clustering]
* [DBSCAN]
* [GMM]

Each algorithm generates set of clusters for each dataset.

To run this:

```bash
just cluster
```

You should expect to have the following directories inside `experiment`
directory:

* `kmeans`
* `agglomerative`
* `dbscan`
* `gmm`

Each directory have its subs named after dataset name: `e5`, `embedv3`,
`scores+sbert`, etc. In each subdirectory you should have `clusters` directory
with files containing clustered repositories. Each file, for instance `0.txt`,
where `0` is cluster identifier, hosts list of repositories in `OWNER/REPO`
format, separated by new line:

```text
Faceplugin-ltd/FaceRecognition-LivenessDetection-Android
LxxxSec/CTF-Java-Gadget
flutter-youni/flutter_youni_gromore
ax1sX/RouteCheck-Alpha
darksolopic/PasswordManagerGUI
borjavb/bq-lineage-tool
...
```

## How to contribute

Make sure that you have [Python 3.10+], [just], and [npm] installed on your
Expand All @@ -194,3 +236,8 @@ just full
[S-BERT]: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
[E5]: https://huggingface.co/intfloat/e5-large
[Embedv3]: https://cohere.com/blog/introducing-embed-v3
[KMeans]: https://en.wikipedia.org/wiki/K-means_clustering
[Cluster analysis]: https://en.wikipedia.org/wiki/Cluster_analysis
[Agglomerative clustering]: https://en.wikipedia.org/wiki/Hierarchical_clustering
[DBSCAN]: https://en.wikipedia.org/wiki/DBSCAN
[GMM]: https://en.wikipedia.org/wiki/Mixture_model

0 comments on commit 30ecc25

Please sign in to comment.