-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow Import #2
Comments
@lunetics actually, when the import is in smaller chunks it actually takes longer. I originally had the import happening at every 100 entries however it took much longer to perform the overall import than the 'all at once' approach. By using a huge block we trade a large memory footprint to gain speed in the rebuilding of db indexes and performing the imports as a single transaction. |
I almost managed to get it down by almost the half using small batched insert with detached/ clear unused / old entities Without detach / cleargeonames:load:localities --no-debug -env=prod -v AF With detach / cleargeonames:load:localities --no-debug -env=prod -v AF Also there is not (unique) index on geonames_id column in mysql, adding that helps alot, as the import will slow down since there's an select on the id for every entry. I just added this piece of code right before each iteration in while here: https://github.com/Josiah/JJsGeonamesBundle/blob/master/Import/LocalityImporter.php#L619 'repository' => get_class($localityRepository),
]);
if ($lineNumber % 200 == 0) {
foreach ($managers as $manager) {
$manager->flush();
foreach ($entities as $entity) {
$manager->detach($entity);
$manager->clear($entity);
unset($entity);
}
}
unset($entities);
}
} |
Interesting, I guess that I was wrong! Can you submit a PR? I'll merge it straight away. |
Still working on it, looking to improve this already great bundle a little bit more. Still you are very advanced and i still need to understand how your structuring of repositories works :) The other way could be to load the file directly in mysql raw via INFILE and process / link the entities afterwards (load infile is awesome fast) Also i look to load the alternate names into the database and having some unified way to interact with geonames_id's |
Tried a little bit around. Shouldn't it be possible to import / save in smaller chunks, so that the entitymanager could be cleared all xy parsed entries? should speed the import up.
Any Idea?
The text was updated successfully, but these errors were encountered: