Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Import #2

Open
lunetics opened this issue Nov 13, 2013 · 4 comments
Open

Slow Import #2

lunetics opened this issue Nov 13, 2013 · 4 comments
Assignees

Comments

@lunetics
Copy link
Collaborator

Tried a little bit around. Shouldn't it be possible to import / save in smaller chunks, so that the entitymanager could be cleared all xy parsed entries? should speed the import up.

Any Idea?

@ghost ghost assigned Josiah Nov 13, 2013
@Josiah
Copy link
Owner

Josiah commented Nov 13, 2013

@lunetics actually, when the import is in smaller chunks it actually takes longer. I originally had the import happening at every 100 entries however it took much longer to perform the overall import than the 'all at once' approach.

By using a huge block we trade a large memory footprint to gain speed in the rebuilding of db indexes and performing the imports as a single transaction.

@lunetics
Copy link
Collaborator Author

I almost managed to get it down by almost the half using small batched insert with detached/ clear unused / old entities

Without detach / clear

geonames:load:localities --no-debug -env=prod -v AF
AF (Afghanistan) data saved
Imported in 59.005122 seconds.

With detach / clear

geonames:load:localities --no-debug -env=prod -v AF
AF (Afghanistan) data saved
Imported in 39.573471 seconds.

Also there is not (unique) index on geonames_id column in mysql, adding that helps alot, as the import will slow down since there's an select on the id for every entry.

I just added this piece of code right before each iteration in while here:

https://github.com/Josiah/JJsGeonamesBundle/blob/master/Import/LocalityImporter.php#L619

                'repository'   => get_class($localityRepository),
            ]);

            if ($lineNumber % 200 == 0) {
                foreach ($managers as $manager) {
                    $manager->flush();
                    foreach ($entities as $entity) {
                        $manager->detach($entity);
                        $manager->clear($entity);
                        unset($entity);
                    }
                }

                unset($entities);
            }
        }

@Josiah
Copy link
Owner

Josiah commented Nov 13, 2013

Interesting, I guess that I was wrong!

Can you submit a PR? I'll merge it straight away.

@lunetics
Copy link
Collaborator Author

Still working on it, looking to improve this already great bundle a little bit more. Still you are very advanced and i still need to understand how your structuring of repositories works :)

The other way could be to load the file directly in mysql raw via INFILE and process / link the entities afterwards (load infile is awesome fast)

Also i look to load the alternate names into the database and having some unified way to interact with geonames_id's

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants