Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly invalid test: definition-list.docx #10394

Open
StephanMeijer opened this issue Nov 19, 2024 · 5 comments
Open

Possibly invalid test: definition-list.docx #10394

StephanMeijer opened this issue Nov 19, 2024 · 5 comments
Labels

Comments

@StephanMeijer
Copy link
Contributor

StephanMeijer commented Nov 19, 2024

Explain the problem.

I have observed that in the word/document.xml file of the definition_list.docx test document — available at test/docx/definition_list.docx and test/docx/golden/definition_list.docx — some paragraphs have two styles applied simultaneously. This is illustrated in the following snippet:

word/document.xml of definition-list.docx

According to the OOXML standard (ECMA-376), a paragraph should have only one style defined via the <w:pStyle>element within its properties. The standard does not indicate, suggest, or provide examples where multiple styles are applied directly to a single paragraph.

This raises a concern that the definition_list.docx file may be invalid per the OOXML specification due to multiple styles being assigned to a single paragraph.

Microsoft Word interprets this paragraph to be of style "SourceCode", not "Definition" as illustrated in the following screenshot:

Screenshot illustrating Microsoft Word

Pandoc version?
Latest.

@StephanMeijer
Copy link
Contributor Author

StephanMeijer commented Nov 20, 2024

This could be a valid replacement: definition_list.docx

@jgm
Copy link
Owner

jgm commented Nov 20, 2024

Interesting:

 % pandoc -f native -t docx | pandoc -f docx -t native
[ DefinitionList
    [ ( [ Str "term1" ]
      , [ [ Para [ Str "definition" , Space , Str "1" ]
          , CodeBlock ( "" , [ "hs" ] , [] ) "-- haskell code"
          ]
        ]
      )
    , ( [ Str "term2" ]
      , [ [ Para [ Str "definition" , Space , Str "2" ] ] ]
      )
    ]
]
^D
[ DefinitionList
    [ ( [ Str "term1" ]
      , [ [ Para [ Str "definition" , Space , Str "1" ] ] ]
      )
    ]
, CodeBlock ( "" , [] , [] ) "-- haskell code"
, DefinitionList
    [ ( [ Str "term2" ]
      , [ [ Para [ Str "definition" , Space , Str "2" ] ] ]
      )
    ]
]

A definition list with a code block in the definition won't survive a round-trip through docx. I think this is because we can't add multiple styles on paragraphs. We keep track of the extent of the definition list by looking for the Definition style, but the code block needs a SourceCode style.

Not sure what to do about this. We can replace the invalid test file but then we'll get a test failure because it's of this kind!

@StephanMeijer
Copy link
Contributor Author

On our side, in our converter, we changed both the input and output of the test. I think that is actually the intended thought behind the test. I do recommmend that.

@jgm
Copy link
Owner

jgm commented Nov 20, 2024

I think this test is meant to test a case where you have a code block inside a definition list. Pandoc simply can't handle that, it seems.

@StephanMeijer
Copy link
Contributor Author

But having a "code block inside a definition list" simple cannot occur. A Paragraph is either styled to be a block of code, or a definition (term). It cannot have two styles. Therefore, Pandoc's current behaviour is valid but the input for the test is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants