Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for proxies #976

Closed
3 tasks done
Siddhesh-Agarwal opened this issue May 15, 2024 · 8 comments · Fixed by #997 or #1002
Closed
3 tasks done

Add support for proxies #976

Siddhesh-Agarwal opened this issue May 15, 2024 · 8 comments · Fixed by #997 or #1002
Assignees
Labels
gssoc GSSoC 2024

Comments

@Siddhesh-Agarwal
Copy link
Contributor

Describe the feature

As a web scrapping library, supporting Proxies while sending requests would be useful. I propose a RequestConfig class where we can set the request timeout, proxies, redirect, etc.

Add ScreenShots

The class would look something like this:

class RequestConfig:
    timeout: int = 10
    proxy: dict[str, str] = {}
    redirect: bool = 10

It should also be passable to Different Scrapers.

Record

  • I agree to follow this project's Code of Conduct
  • I'm a GSSoC'24 contributor
  • I want to work on this issue
Copy link

Hi there! Thanks for opening this issue. We appreciate your contribution to this open-source project. We aim to respond or assign your issue as soon as possible.

@nikhil25803
Copy link
Member

Go ahead @Siddhesh-Agarwal

Note

  • Please create a separate module for this, as in the folder and project structure (if it is already created, just add your features as functions in the same module).
  • Do not use the `selenium web driver as it is incompatible with all devices and cloud platforms.
  • Before making any changes, please check whether the module you want to add exists. If yes, then you can add your functionality as a method only make a separate module and class for it.

All the best 👨‍💻

@Siddhesh-Agarwal
Copy link
Contributor Author

@nikhil25803 Here is how I am thinking of solving this problem:

  • create a config folder in src/scrape_up
  • Create the class in that folder

Here is the step I have some confusion about:

  • should I add a config parameter to all scraping classes? if not, then what other way can we use the config?

@nikhil25803
Copy link
Member

Yes go ahead. But for now add request timeout feature only.

@Siddhesh-Agarwal
Copy link
Contributor Author

Hey, Rewriting every request in the project, every time more features are added, doesn't seem very maintainable. I am adding a new get function that utilizes request.get() and RequestConfig.

@Siddhesh-Agarwal Siddhesh-Agarwal mentioned this issue May 17, 2024
5 tasks
@Siddhesh-Agarwal
Copy link
Contributor Author

@nikhil25803 I have added support for timeout, allow_redirect and headers parameters for now, I will add support for proxies if you think everything till this point is fine.

@Siddhesh-Agarwal
Copy link
Contributor Author

Hey @nikhil25803 Can I create a PR for Proxy support as well? It is a small 7-8 line addition in one file

@nikhil25803
Copy link
Member

Yeah go ahead @Siddhesh-Agarwal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gssoc GSSoC 2024
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants