Skip to content

Commit

Permalink
Provide chromote based scraping (#362)
Browse files Browse the repository at this point in the history
New `read_html_live()` reads HTML into a real, live, HTML browser, meaning that you can scrape HTML generated by javascript. It returns a `LiveHTML` object which you can also use to simulate user interactions with the page, like clicking, typing, and scrolling 

Fixes #245.
  • Loading branch information
hadley authored Feb 1, 2024
1 parent ee093b1 commit 6b9b5da
Show file tree
Hide file tree
Showing 22 changed files with 1,050 additions and 9 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@
^LICENSE\.md$
^vignettes/articles$
^CRAN-SUBMISSION$
^data-raw$
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Imports:
tibble,
xml2 (>= 1.3)
Suggests:
chromote,
covr,
knitr,
R6,
Expand All @@ -41,4 +42,4 @@ Config/testthat/parallel: true
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.0
RoxygenNote: 7.3.1
5 changes: 5 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,17 @@
S3method(cookies,rvest_session)
S3method(format,rvest_field)
S3method(headers,rvest_session)
S3method(html_element,LiveHTML)
S3method(html_element,default)
S3method(html_element,rvest_session)
S3method(html_elements,LiveHTML)
S3method(html_elements,default)
S3method(html_elements,rvest_session)
S3method(html_form,rvest_session)
S3method(html_form,xml_document)
S3method(html_form,xml_node)
S3method(html_form,xml_nodeset)
S3method(html_table,LiveHTML)
S3method(html_table,rvest_session)
S3method(html_table,xml_document)
S3method(html_table,xml_node)
Expand All @@ -25,6 +28,7 @@ S3method(print,rvest_session)
S3method(read_html,rvest_session)
S3method(status_code,rvest_session)
export("%>%")
export(LiveHTML)
export(back)
export(follow_link)
export(forward)
Expand All @@ -50,6 +54,7 @@ export(is.session)
export(jump_to)
export(minimal_html)
export(read_html)
export(read_html_live)
export(repair_encoding)
export(session)
export(session_back)
Expand Down
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# rvest (development version)

* New `read_html_live()` reads HTML into a real, live, HTML browser, meaning
that you can scrape HTML generated by javascript. It returns a `LiveHTML`
object which you can also use to simulate user interactions with the page,
like clicking, typing, and scrolling (#245).

* `html_table()` discards rows without cells (@epiben, #360).

# rvest 1.0.3
Expand Down
Loading

0 comments on commit 6b9b5da

Please sign in to comment.