Skip to content

Navigation Menu

Explore
By size
By industry
By use case
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

nadar / crawler Public

Notifications You must be signed in to change notification settings
Fork 1
Star 10

Code
Issues
Pull requests
Actions
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Wiki
Security
Insights

Releases: nadar/crawler

Releases · nadar/crawler

1.7.1

05 Apr 09:53

nadar

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.7.1 Latest

Latest

1.7.1 (5. April 2022)

Added catch for throwable when parsing pdfs, also updated to latest version of smalot/pdfparser.

Assets 2

Loading

All reactions

1.7.0

10 Aug 09:27

nadar

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.7.0

1.7.0 (10. August 2021)

#20 Improve the strip tags for html parser in order to generate a more clean and readable output when $stripTags is enabled. Things like <p>foo</p><p>bar</p> are now handled as foo bar instead of foobar.

Assets 2

Loading

All reactions

1.6.2

15 Apr 12:01

nadar

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.6.2

1.6.2 (16. April 2021)

#18 Fix issue with pages where utf8 chars are in title tag.

Assets 2

Loading

All reactions

1.6.1

15 Apr 10:18

nadar

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.6.1

1.6.1 (16. April 2021)

#17 Fixed issue where crawler group is not generated correctly.

Assets 2

Loading

All reactions

1.6.0

16 Mar 08:37

nadar

Compare

Choose a tag to compare

Loading

1.6.0

1.6.0 (16. March 2021)

#15 Do not follow links which have rel="nofollow" by default. This can be configured in the HtmlParser::$ignoreRels property.

Assets 2

Loading

All reactions

1.5.0

13 Jan 10:24

nadar

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.5.0

1.5.0 (13. January 2020)

#14 Pass the StatusCode of the response into the parsers and process only HTML and PDFs with code 200 (OK).

Assets 2

Loading

All reactions

1.4.0

13 Jan 08:58

nadar

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.4.0

1.4.0 (13. January 2020)

#13 New Crawler method getCycles() returns the number of times the run() method was called.

Assets 2

Loading

All reactions

1.3.0

20 Dec 07:56

nadar

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.3.0

1.3.0 (20. December 2020)

#10 Add relative url check to Url class.
#8 Merge the path of an url when only a query param is provided.

Assets 2

Loading

All reactions

1.2.1

17 Dec 12:54

nadar

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.2.1

1.2.1 (17. December 2020)

#9 Fix issue where CRAWL_IGNORE tag had no effect. Trim the array value for found linkes, which is equals to the link title

Assets 2

Loading

All reactions

1.2.0

14 Nov 11:16

nadar

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.2.0

1.2.0 (14. November 2020)

#7 By default, response content which is bigger then 5MB won't be passed to Parsers. In order to turn off this behavior use 'maxSize' => false or increase the limit 'maxSize' => 15000000 (which is 15MB for example). The value must be provided in Bytes. The main goal is to ensure that the PDF Parser won't run into very large memory consumption. This restriction won't stop the Crawler from downloading the URL (whether its large the the maxSize definition or not), but preventing memory leaks when the Parsers start to interact with the response content.

Assets 2

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.