Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scrolling timeout issue #678

Open
wants to merge 3 commits into
base: 1.13
Choose a base branch
from

Conversation

otsch
Copy link
Contributor

@otsch otsch commented Jan 12, 2025

This change addresses an issue where the maximum scrolling distance may change between calculating it, sending the scroll message, and verifying the new position. If verifying the position times out, the code now checks if the scrolling distance has changed. If it has, scrolling is retried once.

I encountered this issue on pages where an overlay pops up. The problem occurs randomly, approximately once every 10 executions of the same code. I suspect it happens when the overlay is rendered precisely between calculating the maximum possible scroll distance and verifying it after sending the scroll message.

This change appears to have completely resolved the issue in my specific case. However, I haven’t added a test case, as I don't know how to reliably reproduce this behavior.

I also reduced the wait time for verifying the new position, as 30 seconds seemed unnecessarily long.

I created this pull request directly without opening an issue first. If you’d prefer that I create an issue as well, please let me know.

otsch added 3 commits January 12, 2025 16:46
This change addresses a potential issue where the maximum scrolling
distance may change between calculating it, sending the scroll message,
and verifying the new position. If verifying the position times out, the
code now checks if the scrolling distance has changed. If it has,
scrolling is retried once.

Additionally, the maximum wait time for verifying the new position has
been reduced from 30 seconds to 3 seconds.
@enricodias
Copy link
Member

By looking at your code, you are just trying to scroll again one more time if you encounter a timeout if the mouse has moved on the screen? So there is nothing preventing the issue from happening a second time, it would just be less likely to happen.

@otsch
Copy link
Contributor Author

otsch commented Jan 16, 2025

@enricodias
If I understand correctly, Utils::tryWithTimeout(3_000_000, $this->waitForScroll($targets['x'], $targets['y'])); verifies whether the scrolling process was actually successful, i.e., whether the page is now at the expected scroll position. However, if the maximum scrollable distance retrieved via getScrollDistancesAndTargets() or getMaximumDistance() changes (because something new was rendered) before the scroll action could be performed, the verification cannot succeed. In such a case, I would re-check the distances.

In my case, due to the overlay that opens, the maximum distance always ends up being 0. In this situation, scrolling is not repeated. If something is rendered that only reduces the maximum distance but doesn’t set it to 0, an attempt to scroll again will be made.

Here is a simplified example of the code I use to load a page and scroll to the bottom:

<?php

use HeadlessChromium\BrowserFactory;
use HeadlessChromium\Exception\CommunicationException;
use HeadlessChromium\Exception\CommunicationException\ResponseHasError;
use HeadlessChromium\Exception\NoResponseAvailable;
use HeadlessChromium\Exception\OperationTimedOut;
use HeadlessChromium\Page;

include __DIR__ . '/vendor/autoload.php';

class ScrollDownPage
{
    /**
     * @throws Exception
     */
    public function invoke(Page $page): ?string
    {
        $distance = $this->waitAndGetMaxScrollingDistance($page);

        if ($distance === 0) { // Retry once
            $distance = $this->waitAndGetMaxScrollingDistance($page);
        }

        if ($distance > 0) {
            $scrollingEvents = 0;

            while ($distance > 0 && $scrollingEvents < 1_000) {
                $page->mouse()->scrollDown($distance);

                $distance = $this->waitAndGetMaxScrollingDistance($page);

                $scrollingEvents++;
            }
        } else {
            throw new Exception('Scrolling down failed. Couldn’t scroll down even once.');
        }

        return $page->getHtml();
    }

    /**
     * @throws OperationTimedOut
     * @throws CommunicationException
     * @throws NoResponseAvailable
     * @throws ResponseHasError
     */
    private function waitAndGetMaxScrollingDistance(Page $page): int
    {
        $distance = $this->getMaxYScrollingDistance($page);

        if ($distance > 0) {
            return $distance;
        }

        for ($i = 1; ($i * 50_000) <= 1_000_000; $i++) {
            usleep(50_000);

            $distance = $this->getMaxYScrollingDistance($page);

            if ($distance > 0) {
                return $distance;
            }
        }

        return 0;
    }

    /**
     * @throws CommunicationException
     * @throws CommunicationException\ResponseHasError
     * @throws NoResponseAvailable
     * @throws OperationTimedOut
     */
    private function getMaxYScrollingDistance(Page $page): int
    {
        $scrollableArea = $page->getLayoutMetrics()->getCssContentSize();

        $visibleArea = $page->getLayoutMetrics()->getCssVisualViewport();

        $maximumY = $scrollableArea['height'] - $visibleArea['clientHeight'];

        return (int) $maximumY - (int) $visibleArea['pageY'];
    }
}

$url = 'https://www.example.com/something';

$browserFactory = new BrowserFactory('chromium');

$browser = $browserFactory->createBrowser(['windowSize'   => [1920, 1000], 'headless' => false]);

$page = $browser->createPage();

$page->navigate($url)->waitForNavigation();

// Scroll down
$html = (new ScrollDownPage())->invoke($page);

If you’d like to try it yourself, you might want to modify the scroll() method in the Mouse class to add log output:

    private function scroll(int $distanceY, int $distanceX = 0): self
    {
        $this->page->assertNotClosed();

        // make sure the mouse is on the screen
        $this->move($this->x, $this->y);

        [$distances, $targets] = $this->getScrollDistancesAndTargets($distanceY, $distanceX);

        error_log('try to scroll distances: ' . var_export($distances, true));

        // scroll
        $this->sendScrollMessage($distances);

        try {
            // wait until the scroll is done
            Utils::tryWithTimeout(3_000_000, $this->waitForScroll($targets['x'], $targets['y']));
        } catch (\HeadlessChromium\Exception\OperationTimedOut $exception) {
            error_log('failed to verify scroll distances');

            // Maybe the possible max scroll distances changed in the meantime.
            $prevDistances = $distances;

            [$distances, $targets] = $this->getScrollDistancesAndTargets($distanceY, $distanceX);

            error_log('new distances: ' . var_export($distances, true));

            if ($prevDistances === $distances) {
                throw $exception;
            }

            if (0 !== $distanceY || 0 !== $distanceX) { // Try with the new values.
                $this->sendScrollMessage($distances);

                // wait until the scroll is done
                Utils::tryWithTimeout(3_000_000, $this->waitForScroll($targets['x'], $targets['y']));
            }
        }

        // set new position after move
        $this->x += $distances['x'];
        $this->y += $distances['y'];

        return $this;
    }

I thought that since the maximum possible scroll distance is already determined and replaces the user-requested distance before the message is sent to the browser, it wouldn’t be desirable for an exception to be thrown if it can't scroll the full distance (or no distance at all). Additionally, I found it a bit excessive to try verifying the position for 30 seconds, especially since it might often fail due to overlays - which, unfortunately, are everywhere on the internet nowadays.

@GrahamCampbell GrahamCampbell changed the base branch from 1.12 to 1.13 February 7, 2025 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants