-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decode method doesn't work proper for some strings #814
Comments
It's not exactly wrong as some parsers will decode HTML entities without the trailing semicolon. So this test is technically ambiguous. Remember we are trying to encode/decode/canonicalize in a way that makes attacks difficult or impossible. We are not necessarily trying to strictly implement a spec. The real world implementations are more complex than specs. Still, this issue keeps coming up over the years. Especially in the case where the semicolon is present it seems like we are producing the result least likely to be right. Simply switching to using the longest (instead of shortest) possible match would fix a lot. |
@planetlevel , Agreed with statement "Simply switching to using the longest (instead of shortest) possible match would fix a lot." |
@mukesh4804 wrote:
Hard to say, but I can almost guarantee that it will be at least 6 months or longer should we decide to prioritize a fix for this. We tend to prioritize fixes that have security implications and also focus on ones that have the broadest impact to the security community. Not that many ESAPI clients use the decoders in the first place. In fact, the OWASP Java Encoder project does not even offer decoders! They just generally are not needed. The implementation of the ESAPI output encoders were intentionally written to provide a "safe harbor" to users who used a popular set of badly broken and vulnerable browsers from a certain company whose first goal was definitely not security and who played fast and loose with the W3C specs. It might be argued that the "specs are the only things that (should) matter", but attackers don't care about adherence to specs, but only what the actual behavior is. Thus this "safe harbor", tended in the direction of aggressively encoding things. As a result, I suspect that the decoders were designed to match so that the respective encoder / decoder acted as inverse operations. So this is only speculation but that may be why the decoders seem to be operating on shortest match rather than the longest match. That said, I think the ESAPI team would address this as a low priority bug. Its hard to see (outside of contrived cases of course) how such decoder bugs not following the W3C specs to the letter is going to cause a security problem. For output encoding, certainly. But most often when I see decoders being used, that raises a red flag in my mind. From a security perspective at least, decoders really shouldn't be necessary. Decoding should be left to the purview of the end user's browsers. Of all the hundreds of security code reviews that I've done, I can never really recall where an ESAPI decoder was actually needed. At best, it was unnecessary and mostly harmless. Hence, to me, this is a Low priority. And given that it's a low priority, it honestly may never get fixed unless someone creates a PR to address it. So, it this is important to you, I'd say submit a PR for it (see the CONTRIBUTING-TO-ESAPI.txt file). But even then, it's likely that it will still be 6 months or so until the next ESAPI release unless we need to patch some security issue. |
Describe the bug
decode method doesn't work proper for some strings
Specify what ESAPI version(s) you are experiencing this bug in
2.5.2
To Reproduce
https://github.com/ESAPI/esapi-java-legacy
I am using the test folder of above git repository .
public class HTMLEntityCodecTest {
Codec codec = new HTMLEntityCodec();
}
Expected behavior
true
decode method should return # instead of νm .
Here '&num' encoded should be returning '#', but instead its returning 'vm' that is 'v' for '&nu' and 'm' for 'm'
https://dev.w3.org/html5/html-author/charref
Screenshots
If applicable, add screenshots to help explain your problem.
[NOTE: Please do NOT just ask general questions here as they will not be answered. Instead, please use the GitHub Discussions and create a new topic under 'Questions and Answers".
Please delete any irrelevant portion of this template before submitting your GitHub issue. Thanks.]
Platform environment (please complete the following information):
Additional context
Add any other context about the problem here.
If known, please select the label corresponding to the affected ESAPI component.
The text was updated successfully, but these errors were encountered: