All notable changes to spatie/crawler
will be documented in this file.
- Fix setParsableMimeTypes() by @superpenguin612 in #470
Full Changelog: https://github.com/spatie/crawler/compare/8.2.2...8.2.3
- Check original URL against depth tree when visited link is a redirect by @superpenguin612 in #467
- @superpenguin612 made their first contribution in #467
Full Changelog: https://github.com/spatie/crawler/compare/8.2.0...8.2.1
- Fix wording in documentation by @adamtomat in #460
- Add Laravel/Illuminate 11 Support by @Jubeki in #461
Full Changelog: https://github.com/spatie/crawler/compare/8.1.0...8.2.0
- feat: custom link parser by @Velka-DEV in #458
- @Velka-DEV made their first contribution in #458
Full Changelog: https://github.com/spatie/crawler/compare/8.0.4...8.1.0
- allow Browsershot v4
- Fix return type by @riesjart in #452
- @riesjart made their first contribution in #452
Full Changelog: https://github.com/spatie/crawler/compare/8.0.2...8.0.3
- Define only needed methods in observer implementation by @buismaarten in #449
- @buismaarten made their first contribution in #449
Full Changelog: https://github.com/spatie/crawler/compare/8.0.1...8.0.2
- Check if rel attribute contains nofollow by @robbinbenard in #445
- @robbinbenard made their first contribution in #445
Full Changelog: https://github.com/spatie/crawler/compare/8.0.0...8.0.1
- add linkText to crawl observer methods
- upgrade dependencies
- support Laravel 10
- Feat/convert phpunit tests to pest by @mansoorkhan96 in #401
- Add the ability to change the default baseUrl scheme by @arnissolle in #402
- @arnissolle made their first contribution in #402
Full Changelog: https://github.com/spatie/crawler/compare/7.1.1...7.1.2
- Fix issue #395 by @BrokenSourceCode in #396
- @BrokenSourceCode made their first contribution in #396
Full Changelog: https://github.com/spatie/crawler/compare/7.1.0...7.1.1
- allow Laravel 9 collections
- Keep only guzzlehttp/psr7 v2.0 by @flangofas in #392
- @flangofas made their first contribution in #392
Full Changelog: https://github.com/spatie/crawler/compare/7.0.4...7.0.5
- allow psr7 v2
- change response type hint (#371)
- require PHP 8+
- drop support for PHP 7.x
- convert syntax to PHP 8
- no API changes have been made
- bugfix: infinite loops when a CrawlProfile prevents crawling (#358)
- add
setCurrentCrawlLimit
andsetTotalCrawlLimit
- internal refactors
- add support for PHP 8.0
- tweak variable naming in
ArrayCrawlQueue
(#326)
- improve chucked reading of response
- move observer / profiles / queues to separate namespaces
- typehint all the things
- use laravel/collections instead of tightenco package
- remove support for anything below PHP 7.4
- remove all deprecated functions and classes
- treat connection exceptions as request exceptions
- fix: method and property name error (#311)
- add crawler option to allow crawl links with rel="nofollow" (#310)
- only crawl links that are completely parsed
- fix curl streaming responses (#295)
- add
setParseableMimeTypes()
(#293)
- fix LinkAdder not receiving the updated DOM (#292)
- allow tightenco/collect 7 (#282)
- respect maximum response size when checking Robots Meta tags (#281)
- allow Guzzle 7
- allow symfony 5 components
- allow tightenco/collect 6.0 and up (#261)
- fix crash when
CrawlRequestFailed
receives an exception other thanRequestException
- case-insensitive user agent bugfix (#249)
- fix bugs in
hasAlreadyBeenProcessed
THIS VERSION CONTAINS A CRITICAL BUG, DO NOT USE
- added
ArrayCrawlQueue
; this is now the default queue - deprecated
CollectionCrawlQueue
- Make user agent configurable (#246)
delayBetweenRequests
now usesint
instead offloat
everywhere
- remove incorrect docblock
- handle relative paths after redirects correctly
- add
getUrls
andgetPendingUrls
- Respect maximumDepth in combination with robots (#181)
- Properly handle
noindex,follow
urls.
- added capability of crawling links with rel= next or prev
- add
setDelayBetweenRequests
- fix an issue where the node in the depthtree could be null
- improve performance by only building the depth three when needed
- handlers will get html after JavaScript has been processed
- refactor to improve extendability
- always add links to pool if robots shouldn't be respected
- refactor of internals
- make it possible to override
$defaultClientOptions
- Bump minimum required version of
spatie/robots-txt
to1.0.1
.
- Respect robots.txt
- improved extensibility by removing php native type hinting of url, queue and crawler pool Closures
- do not follow links that have attribute
rel
set tonofollow
- Support both
Illuminate
's andTighten
'sCollection
.
- fix bugs when installing into a Laravel app
- the
CrawlObserver
andCrawlProfile
are upgraded from interfaces to abstract classes - don't crawl
tel:
links
- fix endless loop
- add
setCrawlObservers
,addCrawlObserver
- fix
setMaximumResponseSize
(someday we'll get this right)
CONTAINS BUGS, DO NOT USE THIS VERSION
- fix
setMaximumResponseSize
CONTAINS BUGS, DO NOT USE THIS VERSION
- fix
setMaximumResponseSize
CONTAINS BUGS, DO NOT USE THIS VERSION
- add
setMaximumResponseSize
- fix for exception being thrown when encountering a malformatted url
- use
\Psr\Http\Message\UriInterface
for all urls - use Puppeteer
- drop support from PHP 7.0
- allow symfony 4 crawler
- added the ability to change the crawl queue
- more performance improvements
- performance improvements
- add
CrawlSubdomains
profile
- add crawl count limit
- add depth limit
- add JavaScript execution
- fix deps for PHP 7.2
- add
EmptyCrawlObserver
- refactor to make use of Symfony Crawler's
link
function
- fix bugs around relative urls
- add
CrawlInternalUrls
- make sure the passed client options are being used
- second attempt to fix detection of redirects
- fix detection of redirects
- fix the default timeout of 5 seconds
- set a default timeout of 5 seconds
- fix for non responding hosts
- fix for the accidental crawling of mailto-links
- improve performance by concurrent crawling
- make it possible to determine on which url a url was found
- Ignore
tel:
links when crawling
- Added
path
,segment
andsegments
functions toUrl
- Updated the required version of Guzzle to a secure version
- Fixed a bug where the crawler would not take query strings into account
- Fixed a bug where the crawler tries to follow JavaScript links
- Add support for DomCrawler 3.x
- Fix for normalizing relative links when using non-80 ports
- Add support for custom ports
- Lower required php version to 5.5
- Make url's case sensitive
- First release