-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,5 @@ Cargo.lock | |
.vscode/ | ||
/**/results/ | ||
docker/query.sh | ||
|
||
src/tests/output |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
data/* | ||
!data/raw | ||
|
||
pyvenv | ||
*/__pycache__ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# Vinted data scrapper | ||
|
||
A series of data scrappers in Python that extract static information from the Vinted source code, being the perfect complement for a Vinted API wrapper. | ||
|
||
## Install & Run | ||
|
||
### Requirements | ||
|
||
- Pip | ||
- Python | ||
|
||
1. Create a virtual environment | ||
|
||
```bash | ||
python3 -m venv pyvenv/ | ||
source venv/bin/activate | ||
``` | ||
|
||
2. Run `requirements.txt` | ||
|
||
```bash | ||
pip3 install -r requirements.txt | ||
``` | ||
|
||
3. Run `main.py` | ||
|
||
## Extracted data categories | ||
|
||
| Element | Fields Returned | | ||
| ------------------- | --------------------------------------------------- | | ||
| Brands | Names, Ids | | ||
| Materials | Id, Name | | ||
| Colors | Id, Color, Hex Code | | ||
| Sizes | Id, Title, Size_Type, Category_id | | ||
| Categories | Id, Title, Code, Parent Id, URL, URL EN, Item Count | | ||
| Categories Children | Category Id, Child Id | | ||
| Countries | Id, French_name, local_name, ISO_code, flag_emoji | | ||
|
||
### Materials and sizes | ||
|
||
- Available languages: 🇪🇸 🇫🇷 🇺🇸 | ||
- [More languages can be added if html file included in `data/raw/materials` or `data/raw/sizes`] | ||
|
||
### Categories - Catalogs | ||
|
||
- **Debug mode:** Builds the full decision tree | ||
|
||
- **Exec mode:** Returns 2 CSVs: | ||
|
||
- `categories.csv`: Table of all the available categories and their attributes | ||
|
||
- `categories_children.csv`: Dictionary that models the Category->Children list relationship | ||
|
||
## Performance | ||
|
||
- Without brands search: | ||
|
||
```bash | ||
real 0m1,941s | ||
user 0m1,225s | ||
sys 0m0,037s | ||
``` | ||
|
||
- Brands validation process: | ||
|
||
```bash | ||
real 14m14,211s | ||
user 0m19,753s | ||
sys 0m1,229s | ||
``` | ||
|
||
## Authors | ||
|
||
[Álvaro Cabo](https://github.com/alvarocabo) | ||
|
||
[Pepe Márquez](https://github.com/pxp9) |
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.