decode method doesn't work proper for some strings #814

mukesh4804 · 2023-11-24T06:56:08Z

Describe the bug
decode method doesn't work proper for some strings

Specify what ESAPI version(s) you are experiencing this bug in
2.5.2

To Reproduce
https://github.com/ESAPI/esapi-java-legacy
I am using the test folder of above git repository .

public class HTMLEntityCodecTest {
Codec codec = new HTMLEntityCodec();

@Test
public void testEntityDecoding(){
    assertEquals("#", codec.decode("&num;"));
    
}

}

Expected behavior
true

decode method should return # instead of νm .

Here '&num' encoded should be returning '#', but instead its returning 'vm' that is 'v' for '&nu' and 'm' for 'm'

https://dev.w3.org/html5/html-author/charref

Screenshots
If applicable, add screenshots to help explain your problem.
[NOTE: Please do NOT just ask general questions here as they will not be answered. Instead, please use the GitHub Discussions and create a new topic under 'Questions and Answers".
Please delete any irrelevant portion of this template before submitting your GitHub issue. Thanks.]

Platform environment (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
JDK version used with ESAPI

Additional context
Add any other context about the problem here.
If known, please select the label corresponding to the affected ESAPI component.

The text was updated successfully, but these errors were encountered:

planetlevel · 2023-11-24T23:51:24Z

It's not exactly wrong as some parsers will decode HTML entities without the trailing semicolon.

So this test is technically ambiguous. Remember we are trying to encode/decode/canonicalize in a way that makes attacks difficult or impossible. We are not necessarily trying to strictly implement a spec. The real world implementations are more complex than specs.

Still, this issue keeps coming up over the years. Especially in the case where the semicolon is present it seems like we are producing the result least likely to be right.

Simply switching to using the longest (instead of shortest) possible match would fix a lot.

mukesh4804 · 2023-11-25T08:39:17Z

@planetlevel , Agreed with statement "Simply switching to using the longest (instead of shortest) possible match would fix a lot."
Do we expect any tentative timeline for the fix?

kwwall · 2023-11-25T16:47:51Z

@mukesh4804 wrote:

Do we expect any tentative timeline for the fix?

Hard to say, but I can almost guarantee that it will be at least 6 months or longer should we decide to prioritize a fix for this. We tend to prioritize fixes that have security implications and also focus on ones that have the broadest impact to the security community. Not that many ESAPI clients use the decoders in the first place. In fact, the OWASP Java Encoder project does not even offer decoders! They just generally are not needed.

The implementation of the ESAPI output encoders were intentionally written to provide a "safe harbor" to users who used a popular set of badly broken and vulnerable browsers from a certain company whose first goal was definitely not security and who played fast and loose with the W3C specs. It might be argued that the "specs are the only things that (should) matter", but attackers don't care about adherence to specs, but only what the actual behavior is. Thus this "safe harbor", tended in the direction of aggressively encoding things. As a result, I suspect that the decoders were designed to match so that the respective encoder / decoder acted as inverse operations. So this is only speculation but that may be why the decoders seem to be operating on shortest match rather than the longest match.

That said, I think the ESAPI team would address this as a low priority bug. Its hard to see (outside of contrived cases of course) how such decoder bugs not following the W3C specs to the letter is going to cause a security problem. For output encoding, certainly. But most often when I see decoders being used, that raises a red flag in my mind. From a security perspective at least, decoders really shouldn't be necessary. Decoding should be left to the purview of the end user's browsers. Of all the hundreds of security code reviews that I've done, I can never really recall where an ESAPI decoder was actually needed. At best, it was unnecessary and mostly harmless. Hence, to me, this is a Low priority.

And given that it's a low priority, it honestly may never get fixed unless someone creates a PR to address it. So, it this is important to you, I'd say submit a PR for it (see the CONTRIBUTING-TO-ESAPI.txt file). But even then, it's likely that it will still be 6 months or so until the next ESAPI release unless we need to patch some security issue.

mukesh4804 added the bug label Nov 24, 2023

kwwall added the Priority-Low label Nov 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decode method doesn't work proper for some strings #814

decode method doesn't work proper for some strings #814

mukesh4804 commented Nov 24, 2023 •

edited by kwwall

Loading

planetlevel commented Nov 24, 2023

mukesh4804 commented Nov 25, 2023

kwwall commented Nov 25, 2023

decode method doesn't work proper for some strings #814

decode method doesn't work proper for some strings #814

Comments

mukesh4804 commented Nov 24, 2023 • edited by kwwall Loading

planetlevel commented Nov 24, 2023

mukesh4804 commented Nov 25, 2023

kwwall commented Nov 25, 2023

mukesh4804 commented Nov 24, 2023 •

edited by kwwall

Loading