Malformed mapping can make index snapshots not restorable or mountable (searchable snapshots) #84146

lucabelluccini · 2022-02-17T19:18:17Z

Elasticsearch Version

7.14.2, 7.17.0 (probably earlier too)

Installed Plugins

No response

Java Version

bundled

OS Version

not relevant

Problem Description

If a user accidentally ingests JSON documents which have weird/malformed bodies, the generated mappings due to dynamic mapping will make the snapshot of the index fail on restore.

This can happen also during ILM (when the index is moved to mounted phases when using searchable snapshots) or during a normal snapshot restore operation.

Steps to Reproduce

DELETE myverybadindex

PUT myverybadindex
{
  "mappings": {
    "properties": {
      "query": {
        "properties": {
          "1": {
            "type": "text"
          },
          "\u0000": {
            "type": "text"
          },
          "\u0000\u0000\u0000\u0000": {
            "type": "text"
          }
        }
      }
    }
  }
}

POST _snapshot/found-snapshots/myverybadsnapshot
{
  "indices": "myverybadindex",
  "include_global_state": false
}

GET _snapshot/found-snapshots/myverybadsnapshot

POST /_snapshot/found-snapshots/myverybadsnapshot/_mount?wait_for_completion=true
{
  "index": "myverybadindex", 
  "renamed_index": "myverybadindex-mounted",
  "ignore_index_settings": [ "index.refresh_interval" ] 
}

POST /_snapshot/found-snapshots/myverybadsnapshot/_restore
{
  "indices": "myverybadindex",
  "rename_pattern": "(.+)"
  , "rename_replacement": "restored_index_$1"
}

Both mounting and restore operations end up with:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "i_o_exception",
        "reason" : "Duplicate field '\u0000'\n at [Source: (org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat$DeserializeMetaBlobInputStream); line: -1, column: 431]"
      }
    ],
    "type" : "i_o_exception",
    "reason" : "Duplicate field '\u0000'\n at [Source: (org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat$DeserializeMetaBlobInputStream); line: -1, column: 431]"
  },
  "status" : 500
}

This can happen also during ILM (when the index is moved to mounted phases when using searchable snapshots).

Logs (if relevant)

No response

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-02-17T19:19:05Z

Pinging @elastic/es-search (Team:Search)

elasticmachine · 2022-02-17T19:19:05Z

Pinging @elastic/es-distributed (Team:Distributed)

kunisen · 2022-02-18T02:20:25Z

[1]

I wonder if we have an easy way to fix this issue?

IIUC, we need to

find the index having bad mapping
delete those bad indices
take snapshot again
mount or restore

Is this enough and is there a good way to find bad mapping indices?
(I wonder if it's not only limited to Unicode, but may expand to more patterns, which might be not that easy to check until we got rejected by the failure.)

[2]

I also feel, it might be great if we block it at the index creation stage, because it doesn’t really make logical sense to make it "OK to snapshot" but "NG to restore/mount".

Had a chat with @Leaf-Lin, the reason behind this seems to be it's not great to prevent users from creating fields that are based on Unicode, because users in a different language would have fields that are completely normal to them, but ES is unable to process it correctly.

However, given it's causing discrepancy behavior in “index creation” and “snapshot/restore”, which probably ideally best to get things aligned.

Is there a way to across this? e.g. make an "encoding logic" internally to avoid using Unicode directly?
(like URL encoding is widely used in lots of applications)

DaveCTurner · 2022-02-18T09:40:51Z

This seems to be a SMILE bug, or at least something that's not supported properly in SMILE. The following test fails for SMILE (and CBOR) but passes for JSON and YAML.

diff --git a/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java b/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java
index 96b93568c66..9ab3cce8aa8 100644
--- a/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java
+++ b/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java
@@ -140,6 +140,7 @@ public abstract class BaseXContentTestCase extends ESTestCase {
         expectUnclosedException(() -> BytesReference.bytes(builder().startObject().field("foo")));

         assertResult("{'foo':'bar'}", () -> builder().startObject().field("foo").value("bar").endObject());
+        assertResult("{'\\u0000':'','\\u0000\\u0000':''}", () -> builder().startObject().field("\0", "").field("\0\0", "").endObject());
     }

     public void testNullField() throws IOException {

The trouble is that the SMILE parser treats these field names as short ASCII strings which get cached to avoid unnecessary instantiation, but the cache is keyed by an integer representation of the string and both of these strings map to 0.

I don't think this is a general Unicode problem, it's only going to affect field names that are made up of some short sequence of NUL bytes. I have reported this at FasterXML/jackson-dataformats-binary#312.

Can we perhaps forbid field names containing NUL bytes entirely? Are they ever anything but a mistake?

lucabelluccini · 2022-02-18T10:24:37Z

Thank you David for the prompt analysis.

Can we perhaps forbid field names containing NUL bytes entirely? Are they ever anything but a mistake?

It could be a nice feature.
Most times I've seen it was a client (Fluentd or other products) trying to index garbage data.
I would always allow an escape hatch (an index setting or a cluster setting ?).

SharpEdgeMarshall · 2022-06-23T13:22:39Z

Any news on this? we have a not restorable index cause of this bug

DaveCTurner · 2022-07-06T10:17:52Z

The Jackson bug is fixed upstream, but a fixed version (≥2.14.0) is yet to be released.

javanna · 2024-06-13T12:47:54Z

Starting from Elasticsearch 8.6 we upgraded jackson to 2.14. This should be fixed now.

lucabelluccini added >bug needs:triage Requires assignment of a team area label :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs :Search Foundations/Mapping Index mappings, including merging and defining field types labels Feb 17, 2022

elasticmachine added Team:Distributed Meta label for distributed team (obsolete) Team:Search Meta label for search team labels Feb 17, 2022

DaveCTurner mentioned this issue Feb 18, 2022

Short NUL-only keys incorrectly detected as duplicates in CBOR and SMILE FasterXML/jackson-dataformats-binary#312

Closed

javanna removed the needs:triage Requires assignment of a team area label label Feb 21, 2022

javanna closed this as completed Jun 13, 2024

javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Malformed mapping can make index snapshots not restorable or mountable (searchable snapshots) #84146

Malformed mapping can make index snapshots not restorable or mountable (searchable snapshots) #84146

lucabelluccini commented Feb 17, 2022 •

edited

Loading

elasticmachine commented Feb 17, 2022

elasticmachine commented Feb 17, 2022

kunisen commented Feb 18, 2022

DaveCTurner commented Feb 18, 2022

lucabelluccini commented Feb 18, 2022

SharpEdgeMarshall commented Jun 23, 2022

DaveCTurner commented Jul 6, 2022 •

edited

Loading

javanna commented Jun 13, 2024

Malformed mapping can make index snapshots not restorable or mountable (searchable snapshots) #84146

Malformed mapping can make index snapshots not restorable or mountable (searchable snapshots) #84146

Comments

lucabelluccini commented Feb 17, 2022 • edited Loading

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

elasticmachine commented Feb 17, 2022

elasticmachine commented Feb 17, 2022

kunisen commented Feb 18, 2022

[1]

[2]

DaveCTurner commented Feb 18, 2022

lucabelluccini commented Feb 18, 2022

SharpEdgeMarshall commented Jun 23, 2022

DaveCTurner commented Jul 6, 2022 • edited Loading

javanna commented Jun 13, 2024

lucabelluccini commented Feb 17, 2022 •

edited

Loading

DaveCTurner commented Jul 6, 2022 •

edited

Loading