Indexing - Ran out of memory? Resume Indexing? #146
Replies: 3 comments
-
Hey @jared252016, how's it going? Thanks for the discussion. I'm glad that
from pupyl.search import PupylImageSearch
SEARCH = PupylImageSearch(extreme_mode=False) , but a better solution just was proposed on #148. Databases are stored on the path defined on PupylImageSearch(data_dir: str) instantiation. For instance: from pupyl.search import PupylImageSearch
SEARCH = PupylImageSearch(data_dir=~/pupyl) will create the database on |
Beta Was this translation helpful? Give feedback.
-
Hi @jared252016, The first issue that you reported, about #147, has been addressed on #149. |
Beta Was this translation helpful? Give feedback.
-
@jared252016, the resume indexing feature was merged and it's part of new |
Beta Was this translation helpful? Give feedback.
-
Few questions about using Pupyl...
I'm looking at using this tool to start a free service that lets you reverse image search. I've been running the crawler for quite some time, looking into ways to compare the images other than just hashes, when I stumbled upon this. I have about 7TB of media, much of which is probably video and not strictly pictures, but when trying to index just one folder it ran out of memory. The last count was 263154 items. Now this took approximately 3 days to do, so I definitely don't want to start from scratch.
So as the title says, does this resume indexing where it left off? Where is the actual database? Surely it uses more than just files in hundreds of folders?
Also, is there any way to improve performance?
If this project takes off, I will happily donate to the pupyl project. I'm not able to use other reverse search APIs as the sites I am indexing are unique and often not indexed by the others.
Beta Was this translation helpful? Give feedback.
All reactions