Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching Resume Downloads #77

Open
RedbackThomson opened this issue Aug 31, 2018 · 12 comments
Open

Caching Resume Downloads #77

RedbackThomson opened this issue Aug 31, 2018 · 12 comments
Labels
enhancement New feature or request

Comments

@RedbackThomson
Copy link
Contributor

Most sponsors will want to download all of the resumes without applying any features. It would be nice to hash the list of IDs against a pre-prepared version of the .zip so that if two sponsors both try to download all the resumes it only has to download and zip it once. This could either be done in a new Mongo table, or we could add something like a redis server to the stack (although this would be cleared on server restart).

@RedbackThomson RedbackThomson added the enhancement New feature or request label Aug 31, 2018
@kanhegaonkarsaurabh kanhegaonkarsaurabh self-assigned this Sep 3, 2018
@kanhegaonkarsaurabh
Copy link
Contributor

@RedbackThomson Okay this seems interesting. But I don't think I quite understand the problem we're trying to solve. I got the solution options we have but what does this solve exactly

@RedbackThomson
Copy link
Contributor Author

So at the moment if the sponsors want to zip up all the resumes, it has to go through the process of downloading them all, zipping them, and uploading the zip back to S3. Not only does this take a lot of time, but it is expensive for the server to be doing constantly. Since most will just want "all the resumes", we should have that zip file ready at all times to serve to them.

@kanhegaonkarsaurabh
Copy link
Contributor

Ahhh got it. So I think we can use a redis-based cache layer with Node and MongoDB. But as far as I remember, redis cache has a upper limit for 512 mb for like optimized performance. Would the zip be smaller than that? Like as an approx. each pdf is about 2 mb or something. Then at a time, we could cache like 250 resumes

@kanhegaonkarsaurabh
Copy link
Contributor

Oh and yeah, do we have a memory limit while uploading a resume? I'll check it in the codebase but if you know it off the top of your head

@RedbackThomson
Copy link
Contributor Author

I agree that we could do a redis cache layer but it seems like a lot of work and overhead (we now have to host an manage a redis instance albeit probably just a Heroku add-on), so it's not something I would choose without a lot of consideration. Maybe if we could use the redis layer for other things then it is worth it, but for now this is the only use case.

@RedbackThomson
Copy link
Contributor Author

limits: {fileSize: 5 * 1024 * 1024}

Currently we have a 5MB upload limit.

@RedbackThomson
Copy link
Contributor Author

RedbackThomson commented Sep 4, 2018

@subhankar-panda I think if we were to start doing TS stuff, this would be a good first project. I can't see it needing to modify many of the other files.

@kanhegaonkarsaurabh
Copy link
Contributor

True. Redis is a big integration for a single task like this. I mean creating a new mongo table and persisting the resumes there also sounds like an efficient and a doable way.

@kanhegaonkarsaurabh
Copy link
Contributor

So I'll get started on this issue then. If you and @subhankar-panda want me to use ts for these files I can look up that and get started on it. I don't know the depth of the codebase so don't really think that I have anything to say on whether we should use ts for this or not.

@subhankar-panda
Copy link
Contributor

This seems like a fairly isolated feature and is a good opportunity to get our hands dirty with ts - go for it

@subhankar-panda
Copy link
Contributor

how / when would the zip file be updated?

@RedbackThomson
Copy link
Contributor Author

I was thinking that we could do a hash of all the user IDs (the user sends all the IDs in their request) and if we don't have that already cached then we do it as normal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants