You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: packages/actor-scraper/web-scraper/README.md
+33Lines changed: 33 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -724,3 +724,36 @@ This [v3 migration guide](https://sdk.apify.com/docs/upgrading/upgrading-to-v3)
724
724
725
725
Scraper-specific breaking changes:
726
726
- Proxy usage is now required.
727
+
728
+
## FAQ
729
+
730
+
### What is Web Scraper and what can it do?
731
+
Web Scraper is a versatile tool for extracting structured data from web pages using JavaScript code. It loads web pages in a browser, renders dynamic content, and allows you to extract data that can be stored in various formats such as JSON, XML, or CSV.
732
+
733
+
### How can I use Web Scraper?
734
+
You can use Web Scraper either manually through a user interface or programmatically [using the API](https://apify.com/apify/web-scraper/api). To get started, you need to specify the web pages to load and provide a JavaScript code called the Page function to extract data from the pages.
735
+
736
+
### What are the costs associated with using Web Scraper?
737
+
The average usage cost for Web Scraper can be found on the pricing page under the [Detailed pricing breakdown](https://apify.com/pricing) section. The cost estimates are based on averages and may vary depending on the complexity of the pages you scrape.
738
+
739
+
### Are there any limitations to using Web Scraper?
740
+
Web Scraper is designed to be user-friendly and generic, which may affect its performance and flexibility compared to more specialized solutions. It uses a resource-intensive Chromium browser and supports client-side JavaScript code only.
741
+
742
+
### Can I control the crawling behavior of Web Scraper?
743
+
Yes, you can control the crawling behavior of Web Scraper. You can specify start URLs, define link selectors, glob patterns, and pseudo-URLs to guide the scraper in following specific page links. This allows recursive crawling of websites or targeted extraction of data.
744
+
745
+
### How can I extract data from web pages using Web Scraper?
746
+
To extract data from web pages, you need to provide a JavaScript code called the Page function. This function is executed in the context of each loaded web page. You can use client-side libraries like jQuery to manipulate the DOM and extract the desired data.
747
+
748
+
### Is it possible to use proxies with Web Scraper?
749
+
Yes, you can configure proxies for Web Scraper. You have the option to use [Apify Proxy](https://apify.com/proxy), custom HTTP proxies, or SOCKS5 proxies. Proxies can help prevent detection by target websites and provide additional anonymity.
750
+
751
+
### How can I handle authentication and login for websites with Web Scraper?
752
+
Web Scraper supports logging into websites by transferring cookies. You can set initial cookies in the “Initial cookies” field, which allows the scraper to use your session credentials. Cookies have a limited lifetime, so you may need to update them periodically.
753
+
754
+
### How can I customize the behavior of Web Scraper?
755
+
Web Scraper provides advanced configuration options such as pre-navigation and post-navigation hooks and more. These options allow you to fine-tune the scraper’s behavior and perform additional actions during the scraping process.
756
+
757
+
### How can I access and export the data scraped by Web Scraper?
758
+
The data scraped by Web Scraper is stored in a dataset. You can access and export this data in various formats such as JSON, XML, CSV, or as an Excel spreadsheet. The results can be downloaded using the Apify API or through the Apify Console. Check out the Apify [API reference docs](https://docs.apify.com/api/v2) for full details.
0 commit comments