Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating data_acquisition.py to post to job_server #220

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

frostyshadows
Copy link
Collaborator

🔨 Changes

Instead of adding jobs directly to the database, the script makes the POST request to job server for the job server to add them. This way we can run the script on AWS Lambda.

:squirrel: Testing instructions

Have job server running on localhost:5000. Run the script (pipenv run python3 data_acquisition.py) and make sure jobs are added to your local database.

📄 Relevant screenshots or documentation links

📋 Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Collaborator

@cowmanjoe cowmanjoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left a comment. Also, I think it would be good if there was a log message indicating how many jobs were inserted.

One concern I have that you can't really address here is that the ZipRecruiter jobs seem to actually come back with different URLS every time for the same job. It appears they are tagging some kind of unique ID in the URL, maybe because they want to count the number of clicks from that link? The reason this is a concern is it messes up our idea for not allowing duplicate jobs in with the unique link index. I'm not sure how we get around this, maybe some analysis on the other fields. Anyway, it's out of the scope of this PR.

"longitude": 0.0,
"company_name": job["hiring_company"]["name"],
"start_date": None,
"salary_min": job["salary_min"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use salary_min_annual here because salary_min can be hourly or monthly I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants