WARNING: THIS IS WORK IN PROGRESS
The Elastic Common Schema (ECS) defines a common set of fields for ingesting data into Elasticsearch. A common schema helps you correlate data from sources like logs and metrics or IT operations analytics and security analytics.
ECS is still under development and backward compatibility is not guaranteed. Any feedback on the general structure, missing fields, or existing fields is appreciated. For contributions please read the Contributing Guide.
The current version of ECS is 0.1.0
.
ECS defines these fields.
- Base fields
- Agent fields
- Cloud fields
- Container fields
- Destination fields
- Device fields
- Error fields
- Event fields
- File fields
- Geo fields
- Host fields
- HTTP fields
- Log fields
- Network fields
- Organization fields
- Operating System fields
- Process fields
- Service fields
- Source fields
- URL fields
- User fields
- User agent fields
The base set contains all fields which are on the top level. These fields are common across all types of events.
The agent fields contain the data about the agent/client/shipper that created the event.
Examples: In the case of Beats for logs, the agent.name is filebeat. For APM, it is the agent running in the app/service. The agent information does not change if data is sent through queuing systems like Kafka, Redis, or processing systems such as Logstash or APM Server.
Fields related to the cloud or infrastructure the events are coming from.
Examples: If Metricbeat is running on an EC2 host and fetches data from its host, the cloud info contains the data about this machine. If Metricbeat runs on a remote machine outside the cloud and fetches data from a service running in the cloud, the field contains cloud data from the machine the service is running on.
Container fields are used for meta information about the specific container that is the source of information. These fields help correlate data based containers from any runtime.
Destination fields describe details about the destination of a packet/event.
Device fields are used to provide additional information about the device that is the source of the information. This could be a firewall, network device, etc.
These fields can represent errors of any kind. Use them for errors that happen while fetching events or in cases where the event itself contains an error.
Field | Description | Type | Multi Field | Example |
---|---|---|---|---|
error.id | Unique identifier for the error. | keyword | ||
error.message | Error message. | text | ||
error.code | Error code describing the error. | keyword |
The event fields are used for context information about the data itself.
File fields provide details about each file.
Geo fields can carry data about a specific location related to an event or geo information for an IP field.
Host fields provide information related to a host. A host can be a physical machine, a virtual machine, or a Docker container.
Normally the host information is related to the machine on which the event was generated/collected, but they can be used differently if needed.
Fields related to HTTP requests and responses.
Fields which are specific to log events.
Fields related to network data.
The organization fields enrich data with information about the company or entity the data is associated with. These fields help you arrange or filter data stored in an index by one or multiple organizations.
Field | Description | Type | Multi Field | Example |
---|---|---|---|---|
organization.name | Organization name. | text | ||
organization.id | Unique identifier for the organization. | keyword |
The OS fields contain information about the operating system. These fields are often used inside other prefixes, such as host.os.*
or user_agent.os.*
.
These fields contain information about a process. These fields can help you correlate metrics information with a process id/name from a log message. The process.pid
often stays in the metric itself and is copied to the global field for correlation.
The service fields describe the service for or from which the data was collected. These fields help you find and correlate logs for a specific service and version.
Source fields describe details about the source of the event.
URL fields provide a complete URL, with scheme, host, and path. The URL object can be reused in other prefixes, such as host.url.*
for example. Keep the structure consistent whenever you use URL fields.
Field | Description | Type | Multi Field | Example |
---|---|---|---|---|
url.href | Full url. The field is stored as keyword.url.href is a [multi field](https://www.elastic.co/guide/en/ elasticsearch/reference/6.2/ multi-fields.html#_multi_fields_with_multiple_analyzers). The data is stored as keyword url.href and test url.href.analyzed . These fields enable you to run a query against part of the url still works splitting up the URL at ingest time.href is an analyzed field so the parsed information can be accessed through href.analyzed in queries. |
text | https://elastic.co:443/search?q=elasticsearch#top |
|
url.href.keyword | The full URL. This is a non-analyzed field that is useful for aggregations. | keyword | 1 | |
url.scheme | Scheme of the request, such as "https". Note: The : is not part of the scheme. |
keyword | https |
|
url.host.name | Hostname of the request, such as "example.com". For correlation the this field can be copied into the host.name field. |
keyword | elastic.co |
|
url.port | Port of the request, such as 443. | integer | 443 |
|
url.path | Path of the request, such as "/search". | text | ||
url.path.keyword | URL path. A non-analyzed field that is useful for aggregations. | keyword | 1 | |
url.query | The query field describes the query string of the request, such as "q=elasticsearch". The ? is excluded from the query string. If a URL contains no ? , there is no query field. If there is a ? but no query, the query field exists with an empty string. The exists query can be used to differentiate between the two cases. |
text | ||
url.query.keyword | URL query part. A non-analyzed field that is useful for aggregations. | keyword | 1 | |
url.fragment | Portion of the url after the # , such as "top".The # is not part of the fragment. |
keyword | ||
url.username | Username of the request. | keyword | ||
url.password | Password of the request. | keyword |
The user fields describe information about the user that is relevant to the event. Fields can have one entry or multiple entries. If a user has more than one id, provide an array that includes all of them.
The user_agent fields normally come from a browser request. They often show up in web service logs coming from the parsed user agent string.
These are example on how ECS fields can be used in different use cases. Most use cases not only contain ECS fields but additional fields which are not in ECS to describe the full use case. The fields which are not in ECS are in italic.
Contributions of additional uses cases on top of ECS are welcome.
- The document MUST have the
@timestamp
field. - The data type defined for an ECS field MUST be used.
- It SHOULD have the field
event.version
to define which version of ECS it uses. - As many fields as possible should be mapped to ECS.
Writing fields
- All fields must be lower case
- Combine words using underscore
- No special characters except
_
Naming fields
- Present tense. Use present tense unless field describes historical information.
- Singular or plural. Use singular and plural names properly to reflect the field content. For example, use
requests_per_sec
rather thanrequest_per_sec
. - General to specific. Organise the prefixes from general to specific to allow grouping fields into objects with a prefix like
host.*
. - Avoid repetition. Avoid stuttering of words. If part of the field name is already in the prefix, do not repeat it. Example:
host.host_ip
should behost.ip
. - Use prefixes. Fields must be prefixed except for the base fields. For example all
host
fields are prefixed withhost.
. Seedot
notation in FAQ for more details. - Do not use abbreviations. (A few exceptions like
ip
exist.)
ElasticSearch can index text multiple ways:
- text indexing allows for full text search, or searching arbitrary words that are part of the field.
- keyword indexing allows for much faster exact match and prefix search, and allows for aggregations (what Kibana visualizations are built on).
In some cases, only one type of indexing makes sense for a field.
However there are cases where both types of indexing can be useful, and we want to index both ways. As an example, log messages can sometimes be short enough that it makes sense to sort them by frequency (that's an aggregation). They can also be long and varied enough that full text search can be useful on them.
Whenever both types of indexing are helpful, we use multi-fields indexing. The convention used is the following:
foo
:text
indexing. The top level of the field (its plain name) is used for full text search.foo.raw
:keyword
indexing. The nested field has suffix.raw
and is what you will use for aggregations.- Performance tip: when filtering your stream in Kibana (or elsewhere), if you
are filtering for an exact match or doing a prefix search,
both
text
andkeyword
field can be used, but doing so on thekeyword
field (named.raw
) will be much faster and less memory intensive.
- Performance tip: when filtering your stream in Kibana (or elsewhere), if you
are filtering for an exact match or doing a prefix search,
both
Keyword only fields
The fields that only make sense as type keyword
are not named foo.raw
, the
plain field (foo
) will be of type keyword
, with no nested field.
Despite the fact that IDs are often integers in various systems, this is not
always the case. Since we want to make it possible to map as many data sources
to ECS as possible, we default to using the keyword
type for IDs.
The benefits to a user adopting these fields and names in their clusters are:
- Data correlation. Ability to easily correlate data from the same or different sources, including:
- data from metrics, logs, and apm
- data from the same machines/hosts
- data from the same service
- Ease of recall. Improved ability to remember commonly used field names (because there is a single set, not a set per data source)
- Ease of deduction. Improved ability to deduce field names (because the field naming follows a small number of rules with few exceptions)
- Reuse. Ability to re-use analysis content (searches, visualizations, dashboards, alerts, reports, and ML jobs) across multiple data sources
- Future proofing. Ability to use any future Elastic-provided analysis content in your environment without modifications
The rename processor can help you resolve field conflicts. For example, imagine that you already have a field called "user," but ECS employs user
as an object. You can use the rename processor on ingest time to rename your field to the matching ECS field. If your field does not match ECS, you can rename your field to user.value
instead.
Events may contain fields in addition to ECS fields. These fields can follow the ECS naming and writing rules, but this is not a requirement.
There are two common key formats for ingesting data into Elasticsearch:
- Dot notation:
user.firstname: Nicolas
,user.lastname: Ruflin
- Underline notation:
user_firstname: Nicolas
,user_lastname: Ruflin
For ECS we decided to use the dot notation. Here's some background on this decision.
Ingesting user.firstname: Nicolas
and user.lastname: Ruflin
is identical to ingesting the following JSON:
"user": {
"firstname": "Nicolas",
"lastname": "Ruflin"
}
In Elasticsearch, user
is represented as an object datatype. In the case of the underline notation, both are just string datatypes.
NOTE: ECS does not use nested datatypes, which are arrays of objects.
With dot notation, each prefix in Elasticsearch is an object. Each object can have parameters that control how fields inside the object are treated. In the context of ECS, for example, these parameters would allow you to disable dynamic property creation for certain prefixes.
Individual objects give you more flexibility on both the ingest and the event sides. In Elasticsearch, for example, you can use the remove processor to drop complete objects instead of selecting each key inside. You don't have to know ahead of time which keys will be in an object.
In Beats, you can simplify the creation of events. For example, you can treat each object as an object (or struct in Golang), which makes constructing and modifying each part of the final event easier.
In Elasticsearch, each key can only have one type. For example, if user
is an object
, you can't use it as a keyword
type in the same index, like {"user": "nicolas ruflin"}
. This restriction can be an issue in certain datasets. For the ECS data itself, this is not an issue because all fields are predefined.
Mixing the underline notation with the ECS dot notation is not a problem. As long as there are no conflicts, they can coexist in the same document.