Use of escape.Markdown for #text elements #7

chamilad · 2019-10-16T21:46:18Z

Hello,

I'm using your library for a markdown generation tool for static site generators. The Rule interface is just perfect!

The use of escape for #text elements mostly seem like a problem for me as I read through the code. Would you be able to explain why this was used in the first place? I couldn't understand why certain characters needed to be escaped in the first place.

Thanks!

JohannesKaufmann · 2020-03-23T19:34:10Z

@chamilad great that you like the library!

If the following snippet gets run through the library **Not Strong** it might produce **Not Strong** which would not be what we are expecting. These side-effects happen with quite a few characters ("*" for bold, "_" for italic, "-" for list items, four space characters accidentally creates a code block, ...).

When a header (eg. <h3>) contains any new lines in its body, it will split the header contents
over multiple lines, breaking the header in Markdown (because in Markdown, a header just
starts with #'s and anything on the next line is not part of the header). Since in HTML
and Markdown all white space is treated the same, I chose to replace line endings with spaces.
-> lunny/html2md#6

With escaping, this input will generate this output which is not perfect but close to the original.

@chamilad if you send me some snippets that behave unexpectedly, I'm happy to add some test cases and fix that.

As a Background Information: This library was designed to pipe whole websites through it, meaning it is supposed to handle some weird edge cases.

estyrke · 2021-02-02T06:33:15Z

Hi there! First, thanks for a great library! Second, I have an example that behaves unexpectedly:

The document I'm converting contains maths equations such as $L’ = (1+n \cdot C) \cdot L$. Amazingly, this almost works out of the box since the $$ syntax is apparently used in some Markdown flavors as well. However, I get $L’ = (1+n \\cdot C) \\cdot L$ , i.e. the backslashes before cdot are escaped. I would need them "raw": $L’ = (1+n \cdot C) \cdot L$ .

If this is a corner case that breaks something else, then I'm happy to just write my own rule to override the default one, just thought I'd mention this.

JohannesKaufmann · 2021-02-02T09:38:34Z

@estyrke Yeah, you are right that is a bug. Unfortunately, it's not that easy to fix.

I have thought about a new approach that might make escaping more reliable (also resolving #19), but that requires a substantial refactor. And I don't have time for that at the moment 🤷‍♂️

For now, you can create a custom rule for "span" and register it using AddRules.

Then check whether the element has the classname “tex2jax_process” using selec.HasClass.

If it has return selec.Text() instead of content. That gets you the original text that is not escaped.

If it does not have the classname, return nil which is then going to run the default rule.

Let me know if you have any problems...

JohannesKaufmann mentioned this issue Mar 28, 2020

html not suport. #11

Closed

JohannesKaufmann added the docs improve documentation label May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of escape.Markdown for #text elements #7

Use of escape.Markdown for #text elements #7

chamilad commented Oct 16, 2019

JohannesKaufmann commented Mar 23, 2020

estyrke commented Feb 2, 2021

JohannesKaufmann commented Feb 2, 2021

Use of escape.Markdown for #text elements #7

Use of escape.Markdown for #text elements #7

Comments

chamilad commented Oct 16, 2019

JohannesKaufmann commented Mar 23, 2020

estyrke commented Feb 2, 2021

JohannesKaufmann commented Feb 2, 2021