-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities. #29334
Conversation
Test build #126972 has finished for PR 29334 at commit
|
Hi @dongjoon-hyun @srowen @Fokko and @HyukjinKwon, you have been involved with jackson upgrade previously, and would like to have your opinion around this upgrade too. Basically, 2.6.x is no longer maintained, and since we are planning to continue to release 2.4.x line, it might be good to upgrade jackson to a maintained release too. Thanks ! |
Once the discussion around this is settled, will start working on releasing 2.4.7. |
Test build #126975 has finished for PR 29334 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ScrapCodes You can read the thread here on updating the Spark 3.0 branch to the latest version of Jackson: #21596 There were already some issues with None
handling in Scala.
With Jackson, it is important to check where the vulnerability lies. Sometimes, it is in code-paths that aren't used by Spark, for example, plugins that are not used. I think it is up to the Spark community to decide on this. But, looking at the behavioral changes mentioned in #21596 I think we should be cautious.
@@ -224,6 +224,10 @@ | |||
<groupId>org.scala-lang</groupId> | |||
<artifactId>scala-library</artifactId> | |||
</dependency> | |||
<dependency> | |||
<groupId>org.scala-lang</groupId> | |||
<artifactId>scala-reflect</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this one needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required because, jackson-scala-module underwent this change:
FasterXML/jackson-module-scala@d6071ac#diff-fdc3abdfd754eeb24090dbd90aeec2ce
And Spark-core uses scala-reflect library at one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
org.apache.spark.deploy.security.HBaseDelegationTokenProvider
Yeah, the problem is that there are behavior changes. You can see one in the UT failures. It outputs an explicit null for fields that are null. I think it can be disabled, but not sure what else we'd have to fix where to maintain behavior. I recall it wasn't trivial when I looked at it a while ago. |
I also agree with @srowen 's opinion. However, if we have a clean patch passing all UTs, we may reconsider. |
Test build #127034 has finished for PR 29334 at commit
|
Jenkins, retest this please. |
…ity vulnerabilities.
…ackson scala module it is not transitive.
9cd3d9a
to
7cff657
Compare
Test build #127037 has finished for PR 29334 at commit
|
Test build #127039 has finished for PR 29334 at commit
|
Jenkins, retest this please.
Failing due to corrupted .m2 directory. |
Jenkins, retest this please. |
Test build #127049 has finished for PR 29334 at commit
|
Jenkins, retest this please. |
Test build #127050 has finished for PR 29334 at commit
|
@shaneknapp and @HyukjinKwon I am sorry for calling for your attention, but jenkins is failing with corrupted m2 directory again. I was also wondering, why github could not trigger the UTs, does it think that this is just a build patch? |
@Fokko, @srowen and @dongjoon-hyun Thank for giving me the feedback. I agree, with you guys. But, I wanted to give this patch a try - can it be done in a clean way? This patch is almost the same as one mentioned at, #21596 Also, the behaviour change was due to a bug in jackson 2.7.x. I have tested, it worked fine. Somehow the Jenkins, does not pass due to corrupted m2 cache. Do you think, that it is worth a try? |
Alright, one more FasterXML/jackson-databind#2798, Shall we consider 2.10.x ? change is the same and the later is free from whole store house of CVEs. |
@ScrapCodes, yes, the m2 is corrupted in Jenkins machine. In the master, this dependency check is being skipped in Jenkins and GitHub Actions build runs instead. In other branches, they are not done yet (see SPARK-32249). I was hesitant about porting GitHub actions back yet but maybe we should go and port it back. For the meanwhile, maybe we can try to remove |
@HyukjinKwon Thanks for looking in to it, and it is my mistake, I did not know that github actions are not ported to other branches yet. I am not 100% sure that they should be ported or not, does it effect our total resource capacity on github actions? |
Yes, I think it does. That was one of reasons why I was hesitant. FYI, there was a bit of discussions and updates about resources at SPARK-32264. Given that the PRs (to |
For the PR itself, I agree with @srowen's and @dongjoon-hyun comments at #29334 (comment) and #29334 (comment). Sorry for late responses. |
Alrighty, then I will skip this for 2.4.7 release, even though I still feel that this might be safe and good for people in general, provided jackson 2.6.7 had last release in 2017 (ignoring the micro CVE backport release in 2019). If we do need to revisit this at a later point, we can always take care of it in the next release. Thanks !
We can ignore this, as it does not include the code paths(org.jsecurity and com.pastdev.httpcomponents) spark uses (hopefully ) |
Jenkins, retest this please. |
Test build #127337 has finished for PR 29334 at commit
|
Jenkins, retest this please. |
Test build #127378 has finished for PR 29334 at commit
|
Jenkins, retest this please. |
Test build #127397 has finished for PR 29334 at commit
|
Jenkins, retest this please. |
Test build #127398 has finished for PR 29334 at commit
|
Jenkins, retest this please. |
Test build #127476 has finished for PR 29334 at commit
|
@HyukjinKwon and @dongjoon-hyun , it seems the test passed, with just the basic changes which are very similar to what was done earlier to migrate to Jackson 2.9.x #21596. Do you want to reconsider your opinion now? |
I'm still uneasy about it because client applications will inherit this dependency change too. Willing to be convinced otherwise but I know this was the root of all the reluctance to update this before 3.0 |
Is it difficult to suggest, to upgrade to a maintained version of a library which has otherwise got, so many security vulnerabilities? Even if those security vulnerabilities have no impact, at least one could save himself form swarm of alert generate by so called "security analysing" software. Currently used version(2.6), has not been released since 2017, except a micro release(in 2019) that fixed only known CVEs at that point. Lastly, change required to upgrade to latest jackson is just (s/NON_NULL/NON_ABSENT), can we make it simpler for our client application to upgrade. |
We can try shading it. That would remove the classpath issue at the cost of some complexity. |
Sorry for being late, kinda busy the last few weeks. If we decide to shade Jackson, we must be very careful that it isn't part of any public APIs. As mentioned before, not all the alerts generated by Jackson, are vulnerabilities to Spark, since many of them are code paths that aren't used by Spark. I'm mostly with Sean and would like to encourage users to move to 3.0. I've checked and there aren't many follow up PR's that patch issues caused by the Jackson update, so maybe we're making this bigger than it is. |
A quick note that might be relevant on question of CVEs affecting Spark. I wrote this a while ago: to give full description of these issues. I will also note that:
I hope this helps. |
Thank you @cowtowncoder, @srowen and @Fokko. Indeed, the Security vulnerabilities seems to serve the purpose of generating the false alarm at least in this case, since we are not using those code paths that affected. And, if some client application depends on Spark and uses jackson-databind, they need to deal with security issues on their own. Best thing to do is upgrade to 3.0, but it is sort of difficult to upgrade for folks who have recently upgraded to Spark 2.4.x . This is also the reason we are still maintaining the release version 2.4.x. Lot of great suggestions have chimed in, shading the jar comes with it's own set of complexity. I am not absolutely sure, but If we cannot upgrade as is, I had suggest we can re-consider this later. Thanks again everyone for chiming in and providing valuable suggestions. |
What changes were proposed in this pull request?
Update pom, with the latest release of jackson to 2.9.10 and 2.9.10.5. Which have fixes for all the reported CVEs so far - as per github advisories page.
Why are the changes needed?
Currently Apache Spark branch-2.4 depends on jackson release, which is not maintained (as per comment in FasterXML/jackson-databind#2186 (comment)).
There are some CVEs reported on the issue SPARK-32495 , but my understanding is since the release is not a maintained version, the advisory info is not up to date.
And it is also possible, that there are vulnerabilities which apply to higher version and are not yet reported for the un-maintained release.
Does this PR introduce any user-facing change?
Yes, the downstream users of spark who continue to depend on the older jackson release, will have to upgrade too.
If the package is affected with high severity security vulnerability, it might still be a good step.
How was this patch tested?
Existing tests.