WWW::Mechanize::PhantomJS - automate the PhantomJS browser
use WWW::Mechanize::PhantomJS;
my $mech = WWW::Mechanize::PhantomJS->new();
$mech->get('http://google.com');
$mech->eval_in_page('alert("Hello PhantomJS")');
my $png= $mech->content_as_png();
-
autodie
Control whether HTTP errors are fatal.
autodie => 0, # make HTTP errors non-fatal
The default is to have HTTP errors fatal, as that makes debugging much easier than expecting you to actually check the results of every action.
-
port
Specify the port where PhantomJS should listen
port => 8910
-
log
Specify the log level of PhantomJS
log => 'OFF' # Also INFO, WARN, DEBUG
-
launch_exe
Specify the path to the PhantomJS executable.
The default is
phantomjs
as found via$ENV{PATH}
. You can also provide this information from the outside by setting$ENV{PHANTOMJS_EXE}
. -
phantomjs_arg
Additional command line arguments to
phantomjs
. (phantomjs -h)phantomjs_arg => ['--frobnitz-the-foo=bar']
-
launch_ghostdriver
Filename of the
ghostdriver
Javascript code to launch. The default is the file distributed with this module.launch_ghostdriver => "devel/my/ghostdriver/main.js",
-
launch_arg
Specify additional parameters to the Ghostdriver script.
launch_arg => [ "--some-new-parameter=foo" ], "--webdriver=$port", '--webdriver-logfile=/tmp/webdriver', '--webdriver-loglevel=DEBUG', '--debug=true', note: these set config.xxx values in ghostrdriver/config.js
-
cookie_file
Cookies are not directly persisted. If you pass in a path here, that file will be used to store or retrieve cookies.
-
ignore_ssl_errors
If you want
phantomjs
to ignore SSL errors, pass a true value here. -
driver
A premade Selenium::Remote::Driver object.
-
report_js_errors
If set to 1, after each request tests for Javascript errors and warns. Useful for testing with
use warnings qw(fatal)
.
print $mech->phantomjs_version;
Returns the version of the PhantomJS executable that is used.
print $mech->ghostdriver_version;
Returns the version of the ghostdriver script that is used.
my $selenium= $mech->driver
Access the Selenium::Driver::Remote instance connecting to PhantomJS.
print for $mech->js_alerts();
An interface to the Javascript Alerts
Returns the list of alerts
$mech->clear_js_alerts();
Clears all saved alerts
print $_->{message}
for $mech->js_errors();
An interface to the Javascript Error Console
Returns the list of errors in the JEC
Maybe this should be called js_messages
or
js_console_messages
instead.
$mech->clear_js_errors();
Clears all Javascript messages from the console
Records a confirmation (which is "1" or "ok" by default), to be used whenever javascript fires a confirm dialog. If message is not found, the answer is "cancel".
my ($value, $type) = $mech->eval( '2+2' );
Evaluates the given Javascript fragment in the context of the web page. Returns a pair of value and Javascript type.
This allows access to variables and functions declared "globally" on the web page.
This method is special to WWW::Mechanize::PhantomJS.
$mech->eval_in_phantomjs(<<'JS', "Foobar/1.0");
this.settings.userAgent= arguments[0]
JS
Evaluates Javascript code in the context of PhantomJS.
This allows you to modify properties of PhantomJS.
my @links = $mech->selector('a');
$mech->highlight_node(@links);
print $mech->content_as_png();
Convenience method that marks all nodes in the arguments with
background: red;
border: solid black 1px;
display: block; /* if the element was display: none before */
This is convenient if you need visual verification that you've got the right nodes.
There currently is no way to restore the nodes to their original visual state except reloading the page.
$mech->get( $url, ':content_file' => $tempfile );
Retrieves the URL URL
.
It returns a faked HTTP::Response object for interface compatibility with WWW::Mechanize. It seems that Selenium and thus Selenium::Remote::Driver have no concept of HTTP status code and thus no way of returning the HTTP status code.
Recognized options:
:content_file
- filename to store the data inno_cache
- if true, bypass the browser cache
$mech->get_local('test.html');
Shorthand method to construct the appropriate
file://
URI and load it into PhantomJS. Relative
paths will be interpreted as relative to $0
.
This method accepts the same options as ->get()
.
This method is special to WWW::Mechanize::PhantomJS but could also exist in WWW::Mechanize through a plugin.
Warning: PhantomJs does not handle local files well. Especially subframes do not get loaded properly.
not implemented
Selenium currently does not allow a raw POST message and the code for constructing a form on the fly is not working so this method is not implemented.
$mech->post( 'http://example.com',
params => { param => "Hello World" },
headers => {
"Content-Type" => 'application/x-www-form-urlencoded',
},
charset => 'utf-8',
);
Sends a POST request to $url
.
A Content-Length
header will be automatically calculated if
it is not given.
The following options are recognized:
headers
- a hash of HTTP headers to send. If not given, the content type will be generated automatically.data
- the raw data to send, if you've encoded it already.
$mech->add_header(
'X-WWW-Mechanize-PhantomJS' => "I'm using it",
Encoding => 'text/klingon',
);
This method sets up custom headers that will be sent with every HTTP(S) request that PhantomJS makes.
Note that currently, we only support one value per header.
$mech->delete_header( 'User-Agent' );
Removes HTTP headers from the agent's list of special headers. Note that PhantomJS may still send a header with its default value.
$mech->reset_headers();
Removes all custom headers and makes PhantomJS send its defaults again.
my $response = $mech->response(headers => 0);
Returns the current response as a HTTP::Response object.
$mech->get('http://google.com');
print "Yay"
if $mech->success();
Returns a boolean telling whether the last request was successful. If there hasn't been an operation yet, returns false.
This is a convenience function that wraps $mech->res->is_success
.
$mech->get('http://google.com');
print $mech->status();
# 200
Returns the HTTP status code of the response. This is a 3-digit number like 200 for OK, 404 for not found, and so on.
$mech->back();
Goes one page back in the page history.
Returns the (new) response.
$mech->forward();
Goes one page forward in the page history.
Returns the (new) response.
print "We are at " . $mech->uri;
Returns the current document URI.
Returns the document object as a WebElement.
This is WWW::Mechanize::PhantomJS specific.
print $mech->content;
print $mech->content( format => 'html' ); # default
print $mech->content( format => 'text' ); # identical to ->text
This always returns the content as a Unicode string. It tries to decode the raw content according to its input encoding. This currently only works for HTML pages, not for images etc.
Recognized options:
-
format
- the stuff to returnThe allowed values are
html
andtext
. The default ishtml
.
print $mech->text();
Returns the text of the current HTML content. If the content isn't HTML, $mech will die.
print "The content is encoded as ", $mech->content_encoding;
Returns the encoding that the content is in. This can be used to convert the content from UTF-8 back to its native encoding.
$mech->update_html($html);
Writes $html
into the current document. This is mostly
implemented as a convenience method for HTML::Display::MozRepl.
print $mech->base;
Returns the URL base for the current page.
The base is either specified through a base
tag or is the current URL.
This method is specific to WWW::Mechanize::PhantomJS.
print $mech->content_type;
Returns the content type of the currently loaded document
print $mech->is_html();
Returns true/false on whether our content is HTML, according to the HTTP headers.
print "We are on page " . $mech->title;
Returns the current document title.
print $_->text . " -> " . $_->url . "\n"
for $mech->links;
Returns all links in the document as WWW::Mechanize::Link objects.
Currently accepts no parameters. See ->xpath
or ->selector
when you want more control.
my @text = $mech->selector('p.content');
Returns all nodes matching the given CSS selector. If
$css_selector
is an array reference, it returns
all nodes matched by any of the CSS selectors in the array.
This takes the same options that ->xpath
does.
This method is implemented via WWW::Mechanize::Plugin::Selector.
print $_->{innerHTML} . "\n"
for $mech->find_link_dom( text_contains => 'CPAN' );
A method to find links, like WWW::Mechanize's
->find_links
method. This method returns DOM objects from
PhantomJS instead of WWW::Mechanize::Link objects.
Note that PhantomJS
might have reordered the links or frame links in the document
so the absolute numbers passed via n
might not be the same between
WWW::Mechanize and WWW::Mechanize::PhantomJS.
The supported options are:
-
text
andtext_contains
andtext_regex
Match the text of the link as a complete string, substring or regular expression.
Matching as a complete string or substring is a bit faster, as it is done in the XPath engine of PhantomJS.
-
id
andid_contains
andid_regex
Matches the
id
attribute of the link completely or as part -
name
andname_contains
andname_regex
Matches the
name
attribute of the link -
url
andurl_regex
Matches the URL attribute of the link (
href
,src
orcontent
). -
class
- theclass
attribute of the link -
n
- the (1-based) index. Defaults to returning the first link. -
single
- If true, ensure that only one element is found. Otherwise croak or carp, depending on theautodie
parameter. -
one
- If true, ensure that at least one element is found. Otherwise croak or carp, depending on theautodie
parameter.The method
croak
s if no link is found. If thesingle
option is true, it alsocroak
s when more than one link is found.
print $_->text . "\n"
for $mech->find_link( text_contains => 'CPAN' );
A method quite similar to WWW::Mechanize's method.
The options are documented in ->find_link_dom
.
Returns a WWW::Mechanize::Link object.
This defaults to not look through child frames.
print $_->text . "\n"
for $mech->find_all_links( text_regex => qr/google/i );
Finds all links in the document.
The options are documented in ->find_link_dom
.
Returns them as list or an array reference, depending on context.
This defaults to not look through child frames.
print $_->{innerHTML} . "\n"
for $mech->find_all_links_dom( text_regex => qr/google/i );
Finds all matching linky DOM nodes in the document.
The options are documented in ->find_link_dom
.
Returns them as list or an array reference, depending on context.
This defaults to not look through child frames.
$mech->follow_link( xpath => '//a[text() = "Click here!"]' );
Follows the given link. Takes the same parameters that find_link_dom
uses.
Note that ->follow_link
will only try to follow link-like
things like A
tags.
my $link = $mech->xpath('//a[id="clickme"]', one => 1);
# croaks if there is no link or more than one link found
my @para = $mech->xpath('//p');
# Collects all paragraphs
my @para_text = $mech->xpath('//p/text()', type => $mech->xpathResult('STRING_TYPE'));
# Collects all paragraphs as text
Runs an XPath query in PhantomJS against the current document.
If you need more information about the returned results,
use the ->xpathEx()
function.
The options allow the following keys:
-
document
- document in which the query is to be executed. Use this to search a node within a specific subframe of$mech->document
. -
frames
- if true, search all documents in all frames and iframes. This may or may not conflict withnode
. This will default to theframes
setting of the WWW::Mechanize::PhantomJS object. -
node
- node relative to which the query is to be executed. Note that you will have to use a relative XPath expression as well. Use.//foo
instead of
//foo
-
single
- If true, ensure that only one element is found. Otherwise croak or carp, depending on theautodie
parameter. -
one
- If true, ensure that at least one element is found. Otherwise croak or carp, depending on theautodie
parameter. -
maybe
- If true, ensure that at most one element is found. Otherwise croak or carp, depending on theautodie
parameter. -
all
- If true, return all elements found. This is the default. You can use this option if you want to use->xpath
in scalar context to count the number of matched elements, as it will otherwise emit a warning for each usage in scalar context without any of the above restricting options. -
any
- no error is raised, no matter if an item is found or not. -
type
- force the return type of the query.type => $mech->xpathResult('ORDERED_NODE_SNAPSHOT_TYPE'),
WWW::Mechanize::PhantomJS tries a best effort in giving you the appropriate result of your query, be it a DOM node or a string or a number. In the case you need to restrict the return type, you can pass this in.
The allowed strings are documented in the MDN. Interesting types are
ANY_TYPE (default, uses whatever things the query returns) STRING_TYPE NUMBER_TYPE ORDERED_NODE_SNAPSHOT_TYPE
Returns the matched results.
You can pass in a list of queries as an array reference for the first parameter. The result will then be the list of all elements matching any of the queries.
This is a method that is not implemented in WWW::Mechanize.
In the long run, this should go into a general plugin for WWW::Mechanize.
my @text = $mech->by_id('_foo:bar');
Returns all nodes matching the given ids. If
$id
is an array reference, it returns
all nodes matched by any of the ids in the array.
This method is equivalent to calling ->xpath
:
$self->xpath(qq{//*[\@id="$_"], %options)
It is convenient when your element ids get mistaken for CSS selectors.
$mech->click( 'go' );
$mech->click({ xpath => '//button[@name="go"]' });
Has the effect of clicking a button (or other element) on the current form. The
first argument is the name
of the button to be clicked. The second and third
arguments (optional) allow you to specify the (x,y) coordinates of the click.
If there is only one button on the form, $mech->click()
with
no arguments simply clicks that one button.
If you pass in a hash reference instead of a name, the following keys are recognized:
-
selector
- Find the element to click by the CSS selector -
xpath
- Find the element to click by the XPath query -
dom
- Click on the passed DOM elementYou can use this to click on arbitrary page elements. There is no convenient way to pass x/y co-ordinates with this method.
-
id
- Click on the element with the given idThis is useful if your document ids contain characters that do look like CSS selectors. It is equivalent to
xpath => qq{//*[\@id="$id"]}
Returns a HTTP::Response object.
As a deviation from the WWW::Mechanize API, you can also pass a
hash reference as the first parameter. In it, you can specify
the parameters to search much like for the find_link
calls.
$mech->click_button( name => 'go' );
$mech->click_button( input => $mybutton );
Has the effect of clicking a button on the current form by specifying its name, value, or index. Its arguments are a list of key/value pairs. Only one of name, number, input or value must be specified in the keys.
name
- name of the buttonvalue
- value of the buttoninput
- DOM nodeid
- id of the buttonnumber
- number of the button
If you find yourself wanting to specify a button through its
selector
or xpath
, consider using ->click
instead.
print $mech->current_form->{name};
Returns the current form.
This method is incompatible with WWW::Mechanize.
It returns the DOM <form>
object and not
a HTML::Form instance.
The current form will be reset by WWW::Mechanize::PhantomJS
on calls to ->get()
and ->get_local()
,
and on calls to ->submit()
and ->submit_with_fields
.
Prints a dump of the forms on the current page to $fh. If $fh is not specified or is undef, it dumps to STDOUT.
$mech->form_name( 'search' );
Selects the current form by its name. The options are identical to those accepted by the "$mech->xpath" method.
$mech->form_id( 'login' );
Selects the current form by its id
attribute.
The options
are identical to those accepted by the "$mech->xpath" method.
This is equivalent to calling
$mech->by_id($id,single => 1,%options)
$mech->form_number( 2 );
Selects the _number_th form. The options are identical to those accepted by the "$mech->xpath" method.
$mech->form_with_fields(
'user', 'password'
);
Find the form which has the listed fields.
If the first argument is a hash reference, it's taken
as options to ->xpath
.
See also "$mech->submit_form".
my @forms = $mech->forms();
When called in a list context, returns a list of the forms found in the last fetched page. In a scalar context, returns a reference to an array with those forms.
The options are identical to those accepted by the "$mech->selector" method.
The returned elements are the DOM <form>
elements.
$mech->field( user => 'joe' );
$mech->field( not_empty => '', [], [] ); # bypass JS validation
Sets the field with the name given in $selector
to the given value.
Returns the value.
The method understands very basic CSS selectors in the value for $selector
,
like the HTML::Form find_input() method.
A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
By passing the array reference @pre_events
, you can indicate which
Javascript events you want to be triggered before setting the value.
@post_events
contains the events you want to be triggered
after setting the value.
By default, the events set in the
constructor for pre_events
and post_events
are triggered.
print $mech->value( 'user' );
Returns the value of the field given by $selector_or_name
or of the
DOM element passed in.
The legacy form of
$mech->value( name => value );
is also still supported but will likely be deprecated
in favour of the ->field
method.
For fields that can have multiple values, like a select
field,
the method is context sensitive and returns the first selected
value in scalar context and all values in list context.
Allows fine-grained access to getting/setting a value with a different API. Supported keys are:
pre
post
name
value
in addition to all keys that $mech->xpath
supports.
$mech->submit;
Submits the form. Note that this does not fire the onClick
event and thus also does not fire eventual Javascript handlers.
Maybe you want to use $mech->click
instead.
The default is to submit the current form as returned
by $mech->current_form
.
$mech->submit_form(
with_fields => {
user => 'me',
pass => 'secret',
}
);
This method lets you select a form from the previously fetched page, fill in its fields, and submit it. It combines the form_number/form_name, set_fields and click methods into one higher level call. Its arguments are a list of key/value pairs, all of which are optional.
-
form => $mech->current_form()
Specifies the form to be filled and submitted. Defaults to the current form.
-
fields => \%fields
Specifies the fields to be filled in the current form
-
with_fields => \%fields
Probably all you need for the common case. It combines a smart form selector and data setting in one operation. It selects the first form that contains all fields mentioned in \%fields. This is nice because you don't need to know the name or number of the form to do this.
(calls "$mech->form_with_fields()" and "$mech->set_fields()").
If you choose this, the form_number, form_name, form_id and fields options will be ignored.
$mech->set_fields(
user => 'me',
pass => 'secret',
);
This method sets multiple fields of the current form. It takes a list of field name and value pairs. If there is more than one field with the same name, the first one found is set. If you want to select which of the duplicate field to set, use a value which is an anonymous array which has the field value and its number as the 2 elements.
my @frames = $mech->expand_frames();
Expands the frame selectors (or 1
to match all frames)
into their respective PhantomJS nodes according to the current
document. All frames will be visited in breadth first order.
This is mostly an internal method.
my $last_frame= $mech->current_frame;
# Switch frame somewhere else
# Switch back
$mech->activate_container( $last_frame );
Returns the currently active frame as a WebElement.
This is mostly an internal method.
See also
http://code.google.com/p/selenium/issues/detail?id=4305
Frames are currently not really supported.
my $png_data = $mech->content_as_png();
# Create scaled-down 480px wide preview
my $png_data = $mech->content_as_png(undef, { width => 480 });
Returns the given tab or the current page rendered as PNG image.
All parameters are optional.
-
\%coordinates
If the coordinates are given, that rectangle will be cut out. The coordinates should be a hash with the four usual entries,
left
,top
,width
,height
.
This method is specific to WWW::Mechanize::PhantomJS.
Currently, the data transfer between PhantomJS and Perl is done Base64-encoded.
print Dumper $mech->viewport_size;
$mech->viewport_size({ width => 1388, height => 792 });
Returns (or sets) the new size of the viewport (the "window").
my $shiny = $mech->selector('#shiny', single => 1);
my $i_want_this = $mech->element_as_png($shiny);
Returns PNG image data for a single element
my $shiny = $mech->selector('#shiny', single => 1);
my $i_want_this= $mech->render_element(
element => $shiny,
format => 'pdf',
);
Returns the data for a single element
or writes it to a file. It accepts
all options of ->render_content
.
my $shiny = $mech->selector('#shiny', single => 1);
my ($pos) = $mech->element_coordinates($shiny);
print $pos->{left},',', $pos->{top};
Returns the page-coordinates of the $element
in pixels as a hash with four entries, left
, top
, width
and height
.
This function might get moved into another module more geared towards rendering HTML.
my $pdf_data = $mech->render( format => 'pdf' );
$mech->render_content(
format => 'jpg',
filename => '/path/to/my.jpg',
);
Returns the current page rendered in the specified format as a bytestring or stores the current page in the specified filename.
The filename must be absolute. We are dealing with external processes here!
This method is specific to WWW::Mechanize::PhantomJS.
Currently, the data transfer between PhantomJS and Perl
is done through a temporary file, so directly using
the filename
option may be faster.
my $pdf_data = $mech->content_as_pdf();
$mech->content_as_pdf(
filename => '/path/to/my.pdf',
);
Returns the current page rendered in PDF format as a bytestring.
This method is specific to WWW::Mechanize::PhantomJS.
Currently, the data transfer between PhantomJS and Perl
is done through a temporary file, so directly using
the filename
option may be faster.
These are methods that are available but exist mostly as internal helper methods. Use of these is discouraged.
my $query = $mech->element_query(['input', 'select', 'textarea'],
{ name => 'foo' });
Returns the XPath query that searches for all elements with tagName
s
in @elements
having the attributes %attributes
. The @elements
will form an or
condition, while the attributes will form an and
condition.
Returns the Javascript fragment to turn a Selenium::Remote::PhantomJS id back to a Javascript object.
As this module is in a very early stage of development, there are many incompatibilities. The main thing is that only the most needed WWW::Mechanize methods have been implemented by me so far.
At least the following methods are unsupported:
-
->find_all_inputs
This function is likely best implemented through
$mech->selector
. -
->find_all_submits
This function is likely best implemented through
$mech->selector
. -
->images
This function is likely best implemented through
$mech->selector
. -
->find_image
This function is likely best implemented through
$mech->selector
. -
->find_all_images
This function is likely best implemented through
$mech->selector
.
These functions are unlikely to be implemented because they make little sense in the context of PhantomJS.
-
->clone
-
->credentials( $username, $password )
-
->get_basic_credentials( $realm, $uri, $isproxy )
-
->clear_credentials()
-
->put
I have no use for it
-
->post
I have no use for it
- Add
limit
parameter to->xpath()
to allow an early exit-case when searching through frames. - Implement download progress
- Install the
PhantomJS
executable
- http://phantomjs.org - the PhantomJS homepage
- https://github.com/detro/ghostdriver - the ghostdriver homepage
- WWW::Mechanize - the module whose API grandfathered this module
- WWW::Scripter - another WWW::Mechanize-workalike with Javascript support
- WWW::Mechanize::Firefox - a similar module with a visible application
The public repository of this module is https://github.com/Corion/www-mechanize-phantomjs.
The public support forum of this module is https://perlmonks.org/.
I've given a talk about this module at Perl conferences:
German Perl Workshop 2014, German
Please report bugs in this module via the RT CPAN bug queue at https://rt.cpan.org/Public/Dist/Display.html?Name=WWW-Mechanize-PhantomJS or via mail to www-mechanize-phantomjs-Bugs@rt.cpan.org.
Max Maischein corion@cpan.org
Copyright 2014 by Max Maischein corion@cpan.org
.
This module is released under the same terms as Perl itself.
This distribution includes a modified copy of the ghostdriver code, which is released under the same terms as the ghostdriver code itself. The terms of the ghostdriver code are the BSD license, as found at https://github.com/detro/ghostdriver/blob/master/LICENSE.BSD:
Copyright (c) 2014, Ivan De Marino <http://ivandemarino.me>
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The ghostdriver code includes the Selenium WebDriver fragments.