-
new_from($source)
Create a new HTMLScraper object from the passed source.
$source
can be of typeDOMNodeList
,DOMNode
orstring
.Returns:
Type Description array
When $source
is an instance ofDOMNodeList
then returns anarray
ofHTMLScraper
objects.HTMLScraper
When $source
is an instance ofDOMNode
or astring
-
CSS_to_Xpath(string $path) : string
Translates CSS selector to XPath expression.
-
__toString() : string
Magic function to convert
HTMLScraper
into astring
containing the HTML code of the loaded document. -
textContent() : string
Get the textContent of the loaded HTML document.
-
load_HTML_str(string $source, int $options = NULL) : bool
Load HTML from a string.
$options
It is for passing LIBXML constant flags.LIBXML_NOERROR | LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED
is always applied (even when$options
isNULL
).
Returns
TRUE
on success andFALSE
on failure. -
load_HTML_file(string $filename, int $options = NULL, array $context = NULL) : bool
Load HTML from a file.
-
$options
see$options
inHTMLScraper->load_HTML_str()
-
$context
see$context
instream_context_create()
Returns
TRUE
on success andFALSE
on failure. -
-
xpath(string $expr, int ...$items)
Get
DOMNode
that match the passed XPath path expression.$items
Index of theDOMNode
to be returned in theDOMNodeList
matching the XPath path expression.
It is 0-indexed. (i.e. to get first node use0
, for second node use1
and so on).
Negative values can be used for referencing the list item from the end. (i.e. use-1
for last node,-2
for second last node and so on).
If invalid index is usedNULL
is returned. (i.e. if only two nodes match the XPath path expression then using 3 will returnNULL
).
Returns:
Type Description NULL
When no nodes matches the XPath path expression DOMNodeList
When no ...$items
are passedDOMNode
When only one ...$items
is passedarray
When more than one ...$items
are passed. Array containsDOMNode
orNULL
Returns
DOMNodeList
(orDOMNode
when$item
index is specified) that matches the specified XPath path expression. -
querySelector(string $selector, int ...$items)
Same as
HTMLScraper->xpath()
except that it uses CSS selector instead of XPath path expression. -
xpath_extract($mapper, string $expr, int ...$items)
Find
DOMNode
(s) in the same way as inHTMLScraper->xpath()
then extract data from theDOMNode
(s) as specified by the$mapper
.$mapper
It can be any one of thestring
specified below or afunction
that takes aDOMNode
and returns any extracted value.Mapper Value Description 'innerHTML'
Maps DOMNode
to its innerHTML'outerHTML'
Maps DOMNode
to its outerHTML'textContent'
Maps DOMNode
to its textContent'textContentTrim'
Maps DOMNode
to its textContent without any whitespaces at the beginning or at the end of the textContent
-
querySelector_extract($mapper, string $selector, int ...$items)
Same as
HTMLScraper->xpath_extract()
except that it uses CSS selector instead of XPath path expression.
-
innerHTML(DOMNode &$node) : string
Returns innerHTML of the passed
DOMNode
. -
outerHTML(DOMNode &$node) : string
Returns outerHTML of the passed
DOMNode
. -
xpath(DOMNode &$node, string $expr, int ...$items)
Similar to
HTMLScraper->xpath()
except that it works on aDOMNode
instead of theHTMLScraper
'sDOMDocument
. -
querySelector(DOMNode &$node, string $selector, int ...$items)
Similar to
DOMNodeHelper::xpath()
except it uses CSS selector instead of a XPath path expression. -
getChildNode(DOMNode &$node, int ...$indexes)
Get one or more child nodes of the
DOMNode
.$indexes
See$items
inHTMLScraper->xpath()
.
Returns:
Type Description DOMNodeList
When no ...$indexes
is passedDOMNode
When only one ...$indexes
is passedarray
When more that one ...$indexes
is passed. Array containsDOMNode
orNULL
-
getChildElements(DOMNode &$node, int ...$indexes) : array
Same as
DOMNode::getChildNode()
except that it works on child elements instead of child nodes. -
remove_self(DOMNode &$node)
Removes the
DOMNode
from its parentDOMDocument
. -
filter_child_elements_xpath(DOMNode &$node, string ...$exprs)
Removes the child elements of the passed
DOMNode
that match the passed XPath path expression(s). -
filter_child_elements_querySelector(DOMNode &$node, string ...$selectors)
Removes the child elements of the passed
DOMNode
that match the passed CSS selector(s). -
filter_child_elements_index(DOMNode &$node, int ...$indexes)
Removes the child elements of the passed
DOMNode
specified by the...$indexes
.$indexes
See$items
inHTMLScraper->xpath()
.