-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malformed mapping can make index snapshots not restorable or mountable (searchable snapshots) #84146
Comments
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-distributed (Team:Distributed) |
[1]I wonder if we have an easy way to fix this issue? IIUC, we need to
Is this enough and is there a good way to find bad mapping indices? [2]I also feel, it might be great if we block it at the index creation stage, because it doesn’t really make logical sense to make it "OK to snapshot" but "NG to restore/mount". Had a chat with @Leaf-Lin, the reason behind this seems to be it's not great to prevent users from creating fields that are based on Unicode, because users in a different language would have fields that are completely normal to them, but ES is unable to process it correctly. However, given it's causing discrepancy behavior in “index creation” and “snapshot/restore”, which probably ideally best to get things aligned. Is there a way to across this? e.g. make an "encoding logic" internally to avoid using Unicode directly? |
This seems to be a SMILE bug, or at least something that's not supported properly in SMILE. The following test fails for SMILE (and CBOR) but passes for JSON and YAML. diff --git a/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java b/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java
index 96b93568c66..9ab3cce8aa8 100644
--- a/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java
+++ b/server/src/test/java/org/elasticsearch/common/xcontent/BaseXContentTestCase.java
@@ -140,6 +140,7 @@ public abstract class BaseXContentTestCase extends ESTestCase {
expectUnclosedException(() -> BytesReference.bytes(builder().startObject().field("foo")));
assertResult("{'foo':'bar'}", () -> builder().startObject().field("foo").value("bar").endObject());
+ assertResult("{'\\u0000':'','\\u0000\\u0000':''}", () -> builder().startObject().field("\0", "").field("\0\0", "").endObject());
}
public void testNullField() throws IOException { The trouble is that the SMILE parser treats these field names as short ASCII strings which get cached to avoid unnecessary instantiation, but the cache is keyed by an integer representation of the string and both of these strings map to I don't think this is a general Unicode problem, it's only going to affect field names that are made up of some short sequence of NUL bytes. I have reported this at FasterXML/jackson-dataformats-binary#312. Can we perhaps forbid field names containing NUL bytes entirely? Are they ever anything but a mistake? |
Thank you David for the prompt analysis.
It could be a nice feature. |
Any news on this? we have a not restorable index cause of this bug |
The Jackson bug is fixed upstream, but a fixed version (≥2.14.0) is yet to be released. |
Starting from Elasticsearch 8.6 we upgraded jackson to 2.14. This should be fixed now. |
Elasticsearch Version
7.14.2, 7.17.0 (probably earlier too)
Installed Plugins
No response
Java Version
bundled
OS Version
not relevant
Problem Description
If a user accidentally ingests JSON documents which have weird/malformed bodies, the generated mappings due to dynamic mapping will make the snapshot of the index fail on restore.
This can happen also during ILM (when the index is moved to mounted phases when using searchable snapshots) or during a normal snapshot restore operation.
Steps to Reproduce
Both mounting and restore operations end up with:
This can happen also during ILM (when the index is moved to mounted phases when using searchable snapshots).
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: