Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix setParsableMimeTypes() #470

Merged
merged 2 commits into from
Jul 31, 2024

Conversation

superpenguin612
Copy link
Contributor

Addresses issues #352 and #369, and PR #432.

The setParsableMimeTypes option does not actually prevent a crawl of a URL, it only prevents URLs that originate from that page from being crawled.

If executeJavascript is enabled in tandem, the setParsableMimeTypes option becomes completely broken; as it turns out, the method with which mime types are ignored conflicts with Browsershot's JavaScript execution, since it overwrites the $body variable:

if ($this->crawler->mayExecuteJavaScript()) {
    $body = $this->getBodyAfterExecutingJavaScript($crawlUrl->url);
    $response = $response->withBody(Utils::streamFor($body));
}

This PR prevents pages with non-parsable mime types from being crawled completely, edits the test for mime types to be more representative of what this feature should accomplish, and adds an additional test to validate that the issue is fixed when executing JS, as well.

@freekmurze freekmurze merged commit c659f2f into spatie:main Jul 31, 2024
10 checks passed
@freekmurze
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants