Skip to content

Improvements for captcha detection (WIP) #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 47 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
8a9fb06
mousedown and mouseup action
jifeon Dec 9, 2015
ba6bd7c
snapshot actions
jifeon Dec 17, 2015
8854d7a
getters and setters
jifeon Dec 17, 2015
c3a8202
catching errors for rules
jifeon Dec 18, 2015
9893626
added new version
maZahaca Dec 20, 2015
a685dbe
killing phantom harder
jifeon Dec 21, 2015
51e504d
smarter error handling for actions
jifeon Dec 24, 2015
70c9e8c
force tearDown phantom on fail
jifeon Dec 25, 2015
ee0f85e
trim transformation
jifeon Dec 25, 2015
3378a0f
added version 0.2.0-alpha2
maZahaca Dec 26, 2015
880b4aa
VALUE rule type
jifeon Dec 26, 2015
89ff19f
Merge remote-tracking branch 'origin/feature-goose-babe' into feature…
jifeon Dec 26, 2015
13ec477
pluck, pick and get transformations
jifeon Jan 1, 2016
7a887be
snapshot verification added
jifeon Jan 4, 2016
2505216
store env options
jifeon Jan 6, 2016
5e18860
Fixed proxyRotator function to promised
maZahaca Jan 10, 2016
987e65d
fixed proxy-rotator for promised function
maZahaca Jan 11, 2016
c29269a
merge of proxy-rotator
maZahaca Jan 11, 2016
f0d4810
added new version
maZahaca Jan 11, 2016
56edc09
PhantomEnvironment refactored
jifeon Jan 14, 2016
5bf99e3
Merge remote-tracking branch 'origin/feature-goose-babe' into feature…
jifeon Jan 14, 2016
0619a94
fixed silent errors
maZahaca Jan 15, 2016
c5b822e
memory leaks fixed
jifeon Jan 16, 2016
79e49ef
Merge remote-tracking branch 'origin/feature-goose-babe' into feature…
jifeon Jan 16, 2016
6ac33f7
googse babe alpha 5
jifeon Jan 16, 2016
9f966d0
Added rule.child -> child index in the selected childNodes
maZahaca Jan 16, 2016
1f7beb1
graceful shutdown
jifeon Jan 17, 2016
ddfba68
Merge remote-tracking branch 'origin/feature-goose-babe' into feature…
jifeon Jan 17, 2016
5fb5d1c
alpha 7
jifeon Jan 17, 2016
731c33e
Cleaning DOM after scroll parsing
maZahaca Jan 22, 2016
58ff81e
Fixed phantom resource timeout
maZahaca Jan 24, 2016
0493f60
alpha 10
jifeon Jan 24, 2016
68092fe
alpha 11
jifeon Jan 24, 2016
664e5cf
kill phantom harder
jifeon Jan 26, 2016
1ffee08
alpha 13
jifeon Jan 28, 2016
c23593f
* Fixed 'provideCollection' -> 'provideRules' & action.collection -> …
maZahaca Jan 29, 2016
515b97e
Fixed couple of bugs
maZahaca Jan 30, 2016
d01af67
Merge pull request #67 from redco/feature-optimized-action-provide-rules
maZahaca Jan 30, 2016
475ab5e
new version
maZahaca Jan 30, 2016
fc3476c
alpha 15
jifeon Jan 31, 2016
236c3d5
alpha 16
jifeon Feb 9, 2016
260cf9e
Added rule.virtual and rule.set
maZahaca Feb 9, 2016
94aba50
alpha 18
jifeon Feb 13, 2016
9196ab2
alpha 19
jifeon Feb 13, 2016
684fc9c
alpha 20
jifeon Feb 16, 2016
bebdbcc
alpha 21
jifeon Feb 17, 2016
46a99f9
Fixed position error behavior
maZahaca Mar 4, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 186 additions & 10 deletions lib/Actions.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,17 @@ function Actions (options) {
Actions.prototype = {
TYPES: {
CLICK: 'click',
MOUSE_DOWN: 'mousedown',
MOUSE_UP: 'mouseup',
WAIT: 'wait',
WAIT_FOR_VISIBLE: 'waitForVisible',
WAIT_FOR_PATTERN: 'waitForPattern',
WAIT_FOR_PAGE: 'waitForPage',
TYPE: 'type',
CONDITION: 'conditionalActions',
EXIST: 'exist'
EXIST: 'exist',
BACK: 'back',
PROVIDE_COLLECTION: 'provideCollection'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we do not know what are rules for parsing before actions are performed. Also we have conditional actions and last action can return collection to parse, this collection will be attached to the rule which actions are performed for.

},

/**
Expand All @@ -32,6 +39,22 @@ Actions.prototype = {
return this.performActions(actions, parentSelector);
},

/**
* Perform parsing rule
* @param {Rule} rule
* @param {string} parentSelector
* @returns {Promise}
*/
performPostActionsForRule: function (rule, parentSelector) {
var actions = rule.postActions;

if (!actions) {
return vow.resolve();
}

return this.performActions(actions, parentSelector);
},

/**
* Perform array of actions
* @param {Array} actions
Expand Down Expand Up @@ -76,34 +99,68 @@ Actions.prototype = {
debug('Perform action %o for generated selector %s', action, selector);

var waitingForPage;
if (action.waitForPage) {
if (action.waitForPage || action.type === this.TYPES.BACK) {
waitingForPage = this.waitForPage(action.waitForPageTimeout);
} else {
waitingForPage = vow.resolve();
}

var casesPromise;
if (action.cases) {
casesPromise = this._performCases(action.cases, parentSelector);
}

var actionPromise;
switch (action.type) {
case this.TYPES.CLICK:
actionPromise = this.click(selector);
break;

case this.TYPES.MOUSE_DOWN:
actionPromise = this.mousedown(selector);
break;

case this.TYPES.MOUSE_UP:
actionPromise = this.mouseup(selector);
break;

case this.TYPES.WAIT:
actionPromise = this.waitElement(selector, action.timeout);
break;

case this.TYPES.WAIT_FOR_VISIBLE:
actionPromise = this.waitElementIsVisible(selector, action.timeout);
break;

case this.TYPES.WAIT_FOR_PATTERN:
actionPromise = this.waitForPattern(selector, action.pattern, action.timeout);
break;

case this.TYPES.WAIT_FOR_PAGE:
actionPromise = this.waitForPage(action.timeout);
break;

case this.TYPES.TYPE:
actionPromise = this.type(selector, action.text);
break;

case this.TYPES.CONDITION:
actionPromise = this.performConditionalActions(selector, action.conditions, action.actions);
actionPromise = this.performConditionalActions(selector, action.conditions, action.actions, action.elseActions);
break;

case this.TYPES.EXIST:
actionPromise = this.exist(selector);
break;

case this.TYPES.BACK:
actionPromise = this.back();
break;

case this.TYPES.PROVIDE_COLLECTION:
debug('Providing collection %o', action.collection);
actionPromise = vow.resolve(action.collection);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, better give an ability to return not only collection, but full {Rule}. That will be possible to use that kind of action for getting any rule type with scope, or use parent scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep agree, will change it

break;

default:
var customAction = this._customActions[action.type];
if (!customAction) {
Expand All @@ -115,7 +172,46 @@ Actions.prototype = {
}

return vow.all([actionPromise, waitingForPage]).spread(function (result) {
return result;
return casesPromise || result;
});
},

_performCases: function (cases, parentSelector) {
debug('handle several cases in parallel %o', cases);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very great improvement!!! 🐗


var wonCase = null;
var promises = cases.map(function (actions, caseNumber) {
var beginningPromise = this._performAction(actions[0], parentSelector);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actions[0] looks suspicious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider the case where we need cases feature. Say we click somewhere and two possible actions can happen: 1) new page is opened 2) popup is shown

Based on happened action, we want to perform different actions. To handle this situation we should start waiting for both cases before click, else we are in risk not to add handlers for page loading in time.

Compare two code pieces:

actions.reduce(
  (promise, action) => promise.then(() => performAction(action)), 
  vow.resolve()
);
click();

and

const firstAction = performAction(actions[0])
actions.slice(1).reduce(
  (promise, action) => promise.then(() => performAction(action)), 
  firstAction
);
click();

In first case click invoked before first action, in second - after. Компрендо?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the profit of it, just care about calling actions[0] without any default value. Could be empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, looks like we need to filter empty chains

return actions
.slice(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slice(1) looks suspicious. If empty actions?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have empty case actions you are f_cken mother f_cker. And even in this case everything will work like a charm. [].slice(1) returns [];

.reduce(function (promise, action, i, array) {
return promise.then(function () {
if (wonCase !== null && array !== cases[wonCase]) {
return vow.reject('Failed actions chain');
}

if (action.trueCase) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine we have 3 concurrent action chains:

  1. A B C D
  2. a b c d
  3. 1 2 3 4

On some step of each chain we can understand, that the chain is won the race (needed page is loaded, popup with needed context is shown). It's not necessary the first step, it's different for different chains.

  1. A B C D <- action "C" is detector of true case
  2. a b c d <- action "a" is detector of true case
  3. 1 2 3 4 <- end of chain is detector of true case

And if such action is happening we must reject all other chains because they can affect parsing process.

wonCase = caseNumber;
debug('Won case with actions %o', cases[wonCase]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's forever promise battle 👍

}

return this._performAction(action, parentSelector);
}, this);
}.bind(this), beginningPromise)
.then(function (results) {
if (wonCase === null) {
wonCase = caseNumber;
debug('Won case with actions %o', cases[wonCase]);
}
return results;
}, function (reason) {
debug('Chain %o was reject with reason %s', actions, reason);
throw reason;
});
}, this);

return vow.any(promises).then(function () {
return promises[wonCase];
});
},

Expand Down Expand Up @@ -148,10 +244,47 @@ Actions.prototype = {
}, [selector], timeout, interval);
},

/**
* Wait for an element is on the page and visible
* @param {string} selector
* @param {number} [timeout]
* @param {number} [interval]
* @returns {Promise}
*/
waitElementIsVisible: function (selector, timeout, interval) {
debug('._waitElementIsVisible() ' + selector);
return this.wait(/* @covignore */ function (selector) {
var nodes = Array.prototype.slice.call(Sizzle(selector), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no selector found on the page?
Sizzle(selector)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes would be an empty array, some would return false

return nodes.some(function (node) {
return node.offsetWidth !== 0 && node.offsetHeight !== 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand, how is that possible to detect element visibility on the page by this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, that was rough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To face the truth, I was surprised too ))

});
}, function (visible) {
return visible;
}, [selector], timeout, interval);
},

/**
* Wait for an element'c content matches pattern
* @param {string} selector
* @param {string} pattern
* @param {number} [timeout]
* @param {number} [interval]
* @returns {Promise}
*/
waitForPattern: function (selector, pattern, timeout, interval) {
debug('._waitForPattern() %s on selector %s', pattern, selector);
return this.wait(/* @covignore */ function (selector) {
var nodes = Sizzle(selector);
return nodes.length && nodes[0].textContent || '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind new feature about attr. That is not only one way to get info from node for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, need to make getting content from node more universal

}, function (text) {
return text.match(pattern) !== null;
}, [selector], timeout, interval);
},

/**
* Wait until function evalFunction expected in checkerFunction result
* @param {Function} evalFunction
* @param {Function} checkerFunction
* @param {Function} [checkerFunction]
* @param {Array} [args]
* @param {number} [timeout]
* @param {number} [interval]
Expand All @@ -161,11 +294,25 @@ Actions.prototype = {
var deferred = vow.defer();
args = args || [];
timeout = timeout || 5000;
interval = interval || 0;
interval = interval || 10;

checkerFunction = checkerFunction || function (result) {
return !!result
};

var errback = function (msg) {
clearTimeout(timeoutId);
clearInterval(intervalId);
deferred.reject(new Error('Error during _wait with args ' + args.toString() + ': ' + msg));
};

var timeoutId = setTimeout(function () {
this._env.removeErrback(errback);
clearInterval(intervalId);
deferred.reject(new Error('Timeout for _wait with arguments: ' + args.toString()));
}, timeout);
}.bind(this), timeout);

this._env.addErrback(errback);

var evalArgs = args.slice(0);
evalArgs.push(evalFunction);
Expand All @@ -176,9 +323,10 @@ Actions.prototype = {
if (checkerFunction.apply(null, arguments)) {
clearTimeout(timeoutId);
clearInterval(intervalId);
this._env.removeErrback(errback);
deferred.resolve();
}
});
}, this);
}.bind(this), interval);

return deferred.promise();
Expand Down Expand Up @@ -215,6 +363,26 @@ Actions.prototype = {
});
},

/**
* Perform mousedown on the element matched by selector
* @param {string} selector
* @returns {Promise}
*/
mousedown: function (selector) {
debug('mousedown on %s', selector);
return this._env.mousedown(selector);
},

/**
* Perform mouseup on the element matched by selector
* @param {string} selector
* @returns {Promise}
*/
mouseup: function (selector) {
debug('mouseup on %s', selector);
return this._env.mouseup(selector);
},

/**
* Type text to the element
* @param {string} selector
Expand Down Expand Up @@ -249,15 +417,16 @@ Actions.prototype = {
* @param {string} selector
* @param {Array} conditions
* @param {Array} actions
* @param {Array} [elseActions]
* @returns {Promise}
*/
performConditionalActions: function (selector, conditions, actions) {
performConditionalActions: function (selector, conditions, actions, elseActions) {
return this
.performActions(conditions, selector)
.then(function (result) {
if (!result) {
debug('Conditional actions failed with result %s, skip %o', result, actions);
return;
return elseActions ? this.performActions(elseActions, selector) : false;
}

debug('Conditional actions return %s, go with real some', result);
Expand All @@ -274,6 +443,13 @@ Actions.prototype = {
return this._env.evaluateJs(selector, /* @covignore */ function (selector) {
return Sizzle(selector).length > 0;
});
},

/**
* Navigates to previous page
*/
back: function () {
return this._env.back();
}
};

Expand Down
5 changes: 5 additions & 0 deletions lib/BrowserEnvironment.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ BrowserEnvironment.prototype = _.create(Environment.prototype, /**@lends Browser

var result = evalFunc.apply(null, args);
return vow.resolve(result);
},

// todo: write tests
back: function () {
window.history.back();
}
});

Expand Down
24 changes: 24 additions & 0 deletions lib/Environment.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ var vow = require('vow'),

function Environment(options) {
debug('Initializing...');

this._errbacks = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errback is callback for error )) In this particular case it's errbacks for evaluateJs fucntions. Sometimes we need to know that something going wrong. For example: if we use wait and on first iteration code fails, we should stop waiting. We add errback before wait and remove it when wait is completed.

}

Environment.prototype = {
Expand Down Expand Up @@ -50,6 +52,28 @@ Environment.prototype = {
*/
waitForPage: function (timeout) {
throw new Error('You must redefine waitForPage method in child environment');
},

back: function () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add those functions to Env?
Before it was enough to add it in Actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For browser env it's just window.history.back(), for phantom it's page.goBack(). Phantom functions are more strong because they emulate real user actions.

throw new Error('You must redefine back method in child environment');
},

mousedown: function () {
throw new Error('You must redefine back method in child environment');
},

mouseup: function () {
throw new Error('You must redefine back method in child environment');
},

addErrback: function (errback) {
this._errbacks.push(errback);
},

removeErrback: function (errback) {
this._errbacks = this._errbacks.filter(function (e) {
return e !== errback;
});
}
};

Expand Down
Loading