Related links:
-
RFC 7231, Section 6, which covers HTTP response codes
-
List of HTTP Status Codes on Wikipedia
Here we cover only status codes with particular meanings within Manta or that are commonly used within Manta.
Code | HTTP | Meaning in Manta |
---|---|---|
100-continue |
- |
The client requested extra initial validation, and the server has not yet rejected the request. |
200 |
|
Most commonly used for successful GETs |
201 |
|
Most commonly used for creating jobs and multipart uploads (not object PUT operations) |
204 |
|
Used for successful direction creations, directory removals, object uploads, object deletes, snaplink creation, and a handful of other operations |
400 |
|
The client send an invalid HTTP request (e.g., an incorrect MD5 checksum) |
401 |
|
The client sent an invalid or unsupported signature, or it did not send any signature. |
403 |
|
The client failed to authenticate, or it authenticated and was not allowed to access the resource. |
408 |
|
The server did not receive a complete request from the client within a reasonable timeout. |
409 |
|
The client sent an invalid combination of parameters for an API request. |
412 |
|
The client issued a conditional request and the conditions were not true. (For example, this could have been a PUT-if-the-object-does-not-already-exist, and the object already existed.) |
413 |
|
The client attempted a streaming upload and sent more bytes than were allowed based on the |
429 |
|
The client is being rate-limited by the server because it issued too many requests in too short a period. |
499 |
(not in HTTP) |
The 499 status is used to indicate that the client appeared to abandon the request. (In this case, it’s not possible to send a response. The 499 code is used for internal logging and statistics.) This was originally used in nginx. |
500 |
|
Catch-all code for a failure to process this request. |
502 |
|
Historically, this code was emitted by Manta when requests took more than two minutes to complete. This was an artifact of the load balancer. Modern versions of Manta report this as a 503. |
503 |
|
This code generally indicates that the system is overloaded and cannot process more work. In practice, this currently means that a particular metadata shard’s queue is full, that Muskie took too long to respond to the request, or that there aren’t enough working storage nodes with enough disk space to satisfy this upload. |
507 |
|
The Manta deployment is out of physical disk space for new objects. See [_not_enough_free_space_for_mb] for details. |
Generally:
-
Responses with status codes above 400 (400 through 599) are often called errors. In many cases, though, 400-level errors do not indicate that anything is wrong. For example, a 404 may be normal behavior for a client that checks for the existence of a particular object before doing some other operation.
-
For errors (except for 499), the response body should contain a JSON object containing more information: a Manta-specific error code and message.
-
Generally, 400-level codes (i.e., codes from 400 to 499) indicate that the request failed due to something within the client’s control.
-
Generally, 500-level codes (i.e., codes from 500 to 599) indicate a server-side failure.
See also: [_investigating_an_increase_in_error_rate].
Related links:
-
List of HTTP Header Fields on Wikipedia
Here we cover only headers with particular meanings within Manta or that are commonly used within Manta.
Header | Request/Response | Origin | Meaning |
---|---|---|---|
|
Both |
HTTP |
|
|
Both |
HTTP |
MD5 checksum of the body of a request or response. It’s essential that clients and servers validate this on receipt. |
|
Both |
HTTP, Manta |
Describes the type (i.e., MIME type) of the body of the request or response. Manta understands a special content-type for directories called |
|
Both |
HTTP |
The time when the request or response was generated. This is often useful when debugging for putting together a timeline. |
|
Both |
HTTP |
|
any header starting with |
Both |
Manta |
Arbitrary user-provided headers. |
|
Response |
Manta |
For GET or HEAD requests on directories, this header indicates how many items are in the directory. |
|
Both |
Manta |
A unique identifier for this request. This can be used to locate details about a request in Matna logs. Clients may specify this header on requests, in which case Manta will use the requested id. Othewrise, Manta will generate one and provide it with the response. |
|
Response |
Manta |
A unique identifier for the frontend instance that handled this request. Specifically, this identifies the "webapi" zone that handled the request. |
HTTP allows clients to specify a header called Expect: 100-continue
to request that the server validate the request headers before the client sends the rest of it. For example, suppose a client wants to upload a 10 GiB object to /foo/stor/bar/obj1
, but /foo/stor/bar
does not exist. With Expect: 100-continue
, the server can immediately send a "404 Not Found" response (because the parent directory doesn’t exist). Without this header, HTTP would require that the client send the entire 10 GiB request.
When Expect: 100-continue
is specified with the request headers, then the client waits for a 100-continue
response before proceeding to send the body of the request.
We mention this behavior because error handling for requests that do not use 100-continue
can be surprising. For example, when the client doesn’t specify this header, the server might still choose to send a 400 or 500-level response immediately, but it must still wait for the client to send the whole request. There have been bugs in the past where the server did not read the request of the request, resulting in a memory leak and a timeout from the client’s perspective (because the client has no reason to read a response before it has even finished sending the request, if it didn’t use 100-continue
).
In order to frame HTTP requests and responses, one of two modes must be used:
-
A request or response can specify a
content-length
header that indicates exactly how many bytes of data will be contained in the body; or -
A request or response can specify
transfer-encoding: chunked
, which indicates that the body will be sent in chunks, each of which is preceded by a size
Manta treats these two modes a little differently. If an upload request has a content-length
, then Manta ensures that the storage nodes chosen to store the data have enough physical space available. Requests with transfer-encoding: chunked
are called streaming uploads. For these uploads, a maximum content length is assumed by the server that’s used to validate that storage nodes contain enough physical space. The maximum content length for a streaming upload can be overridden using the max-content-length
header.
See also the next section on [_validating_the_contents_of_requests_and_responses].
It’s critical that clients and servers validate the body of responses and requests. Some types of corruption are impossible to report any other way.
Corrupted requests and responses can manifest in a number of ways:
-
the sender may stop sending after too few bytes
-
the sender may send EOF after sending too few bytes
-
the sender may send too many bytes
-
the body may have the right number of bytes, but have incorrect bytes
Importantly, because of the two modes of transfer described above (under [_streaming_vs_fixed_size_requests]), the reader of a request or response always knows how many bytes to expect. In the cases above:
-
If the sender stops sending bytes after too few bytes (but the socket is still open for writes in both directions), then the reader will fail the operation due to a timeout. For example, if the client does this, then the server will report a 408 error. The client must implement a timeout for this case to cover the case where the server fails in this way.
-
If the sender sends EOF after too few bytes, this would be a bad request or response. If a client did this, then the server would report a 400 error. The client must implement a check for this case to cover the case where the server fails in this way. At this point in the HTTP operation, the client may have already read a successful response (i.e., a 200), and it needs to be sophisticated enough to treat it as an error anyway.
-
If the sender sends too many bytes, then the request or response would be complete, but the next request or response would likely be invalid.
-
When possible, clients and servers should generally send a
Content-MD5
header. This allows the remote side to compute an MD5 checksum on the body and verify that the correct bytes were sent. For object downloads, Manta always stores the MD5 computed from the original upload and it always provides theContent-MD5
header on responses. If clients provide aContent-MD5
header on uploads, then Manta always validates that it receives it. When both of these mechanisms are used by both client and server, a client can be sure of end-to-end integrity.
Note: It’s been noted that MD5 checksums are deprecated for security purposes due to the risk of collisions. While they are likely not appropriate for security, MD5 collisions remain rare enough for MD5 checksums to be used for basic integrity checks.
Below is a summary of the most relevant fields for an audit log entry. (Note
that Muskie sometimes writes out log entries unrelated to the completion of an
HTTP request. Only log entries with "audit": true
represent completion of an
HTTP request. Other log entries have other fields.)
JSON property | Example value | Meaning |
---|---|---|
|
|
If |
|
26 |
Time in milliseconds between when Muskie started processing this request and when the response headers were sent. This is commonly called time to first byte. See also building a request timeline. This should generally match the |
|
|
Manta-defined token that describes the type of operation. In this case, |
|
See specific properties below. |
Object describing the incoming request |
|
|
HTTP method for this request (specified by the client) |
|
|
URL (path) provided for this request (specified by the client) |
|
{
"accept": "*/*",
"x-request-id": "a080d88b-8e42-4a98-a6ec-12e1b0dbf612",
"date": "Tue, 01 Aug 2017 03:03:13 GMT",
"authorization": "Signature keyId=\"/poseidon/keys/ef:0e:27:45:c5:95:4e:92:ba:ab:03:17:e5:3a:60:14\",algorithm=\"rsa-sha256\",headers=\"date\",signature=\"...\"",
"user-agent": "restify/1.4.1 (ia32-sunos; v8/3.14.5.9; OpenSSL/1.0.1i) node/0.10.32",
"accept-version": "~1.0",
"host": "manta.staging.joyent.us",
"connection": "keep-alive",
"x-forwarded-for": "::ffff:172.27.4.22"
} |
Headers provided with this request (specified by the client). The |
|
{
"login": "poseidon",
"uuid": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
"groups": [ "operators" ],
"user": null
} |
Object describing the account making this request. This is not the same as the owner! Note that this can differ from the owner of the resource ( |
|
|
For authenticated requests, the name of the account that made the request. |
|
|
For authenticated requests, the unique identifier for the account that made the request. |
|
|
For authenticated requests, a list of groups that the caller is part of. Generally, the only interesting group is |
|
|
For authenticated requests from a subuser of the account, the name of the subuser account. |
|
|
Unique identifier for the account that owns the requested resource. This is generally the uuid of the account at the start of the URL (i.e., for a request of |
|
See specific properties below. |
Describes the HTTP response sent by Muskie to the client. |
|
200 |
|
|
{
"last-modified": "Sat, 22 Mar 2014 01:17:01 GMT",
"content-type": "application/x-json-stream; type=directory",
"result-set-size": 1,
"date": "Tue, 01 Aug 2017 03:03:13 GMT",
"server": "Manta",
"x-request-id": "a080d88b-8e42-4a98-a6ec-12e1b0dbf612",
"x-response-time": 26,
"x-server-name": "204ac483-7e7e-4083-9ea2-c9ea22f459fd"
} |
Headers sent in the response from Muskie to the client. Among the most useful is the |
|
|
Identifies the name of the restify route that handled this request. |
JSON property | Example value | Meaning |
---|---|---|
|
|
When present, this indicates the shard that was queried for the metadata for |
|
|
Error associated with this request, if any. See [_details_about_specific_error_messages]. |
|
|
When present, this is the unique identifier for the Manta object identified by |
|
|
When present, this indicates the shard that was queried for the metadata for the parent directory of |
|
|
The (remote) IP address of the client connected to Manta. Note that clients aren’t connected directly to Muskie. When using TLS ("https" URLs), clients connect to |
|
|
The IP address and port of the TCP connection over which this request was received. Generally, Muskie only connects directly to an |
|
{
"earlySetup": 32,
"parseDate": 8,
"parseQueryString": 28,
"handler-3": 127,
"checkIfPresigned": 3,
"enforceSSL": 3,
"ensureDependencies": 5,
"_authSetup": 5,
"preSignedUrl": 3,
"checkAuthzScheme": 4,
"parseAuthTokenHandler": 36,
"signatureHandler": 73,
"parseKeyId": 59,
"loadCaller": 133,
"verifySignature": 483,
"parseHttpAuthToken": 5,
"loadOwner": 268,
"getActiveRoles": 43,
"gatherContext": 27,
"setup": 225,
"getMetadata": 5790,
"storageContext": 8,
"authorize": 157,
"ensureEntryExists": 3,
"assertMetadata": 3,
"getDirectoryCount": 7903,
"getDirectory": 10245
} |
An object describing the time in microseconds for each phase of the request processing pipeline. This is useful for identifying latency. The names in this object are the names of functions inside Muskie responsible for the corresponding phase of request processing. |
|
[ {
"shark": "1.stor.staging.joyent.us",
"result": "ok",
"timeToFirstByte": 2,
"timeTotal": 902,
"_startTime": 1509505866032
}, {
"shark": "2.stor.staging.joyent.us",
"result": "ok",
"timeToFirstByte": 1,
"timeTotal": 870,
"_startTime": 1509505866033
} ] |
This field should be present for Manta requests that make requests to individual storage nodes. The value is an array of storage nodes contacted as part of the request, including the result of this subrequest, when it started, and how long it took. For GET requests, these subrequests are GET requests from individual storage nodes hosting a copy of the object requested. These subrequests happen serially, and we stop as soon as one completes. For PUT requests, the storage node subrequests are PUT requests to individual storage nodes on which a copy of the new object will be stored. If all goes well, you’ll see N sharks contacted (typically 2, but whatever the client’s requested durability level is), all successfully, and the requests will be concurrent with each other. If any of these fail, Manta will try another N sharks, and up to one more set of N. For durability level 2, you may see up to 6 sharks contacted: three sets of two. The sets would be sequential, while each pair in a set run concurrently. |
Bunyan-provided properties
JSON property | Example value | Meaning |
---|---|---|
|
|
ISO 8601 timestamp closest to when the log entry was generated. |
|
|
The hostname of the system that generated the log entry. For us, this is generally a uuid corresponding to the zonename of the Muskie container. |
|
|
The pid of the process that generated the log entry. |
|
|
Bunyan-defined log level. This is a numeric value corresponding to conventional values like |
|
|
For Muskie audit log entries, the message is always |
XXX talk about common stack traces? XXX that should include 503 from 'No storage nodes available for this request'
See also the Manta Tools Overview.
Many of these tools have manual pages or sections in this guide about how to use
them. You can generally view the manual page with man TOOLNAME
in whatever
context you can run the tool.
Tool | Where you run it | Has manual page? | Purpose |
---|---|---|---|
|
headnode GZ or "manta" zones |
Yes |
Run arbitrary commands in various types of Manta zones |
|
headnode GZ or "manta" zones |
Yes |
Open a shell in a particular Manta zone |
|
"webapi" zone |
No |
Fetch metadata for an object (including what shard it’s on) |
|
"moray", "electric-moray" zones |
Yes |
Fetch rows directly from Moray |
|
"moray" zones |
No |
Shows running stats about Moray RPC activity |
"postgres" zones (need to be copied in as needed) |
No |
Report on PostgreSQL activity |
|
|
Anywhere |
Yes |
Format bunyan-format log files. With |
|
Anywhere |
Yes |
|
|
Anywhere |
Yes |
Inspect various properties of a process, including its open files, thread stacks, working directory, signal mask, etc. |
|
Anywhere |
Yes |
Shows information about the networking stack, including open TCP connections and various counters (including error counters). |
|
Anywhere |
Yes |
Shows running stats related to applications' use of the filesystem (e.g., reads and writes) |
|
Anywhere |
Yes |
Shows running stats related to applications' use of CPU and memory |
|
Anywhere |
Yes |
Shows running stats related to system-wide CPU usage |
|
Anywhere |
Yes |
Shows running stats related to zone-wide memory usage |
Anywhere |
No |
Inspect JavaScript-level state in core files from Node.js processes. |
bounce (as in: "bounce a box", "bounce a service") |
Bouncing a box or a service means restarting it. Bouncing a box usually means rebooting a server. Bouncing a service usually means restarting an SMF service (killing any running processes and allowing the system to restart them). |
bound (as in: "CPU-bound", "disk-bound", "I/O-bound") |
A program or a workload is said to be "X-bound" for some resource X when its performance is limited by that resource. For example, the performance of a CPU-bound process is limited by the amount of CPU available to it. "Disk-bound" (or "I/O-bound") usually means that a process or workload is limited by the I/O performance of the storage subsystem, which may be a collection of disks organized into a ZFS pool. |
box |
A box is a physical server (as opposed to a virtual machine or container). |
container/zone/VM |
A container is a lightweight virtualized environment, usually having its own process namespace, networking stack, filesystems, and so on. For most purposes, a container looks like a complete instance of the operating system, but there may be many containers running within one instance of the OS. They generally cannot interact with each other except through narrow channels like the network. The illumos implementation of containers are called zones. SmartOS also runs hardware-based virtual machines inside zones (i.e., a heavyweight hardware-virtualized environment within the lightweight OS-virtualized environment), and while those are technically running in a container, the term container is usually only applied to zones not running a hardware-based virtualization environment. For historical reasons, within Triton and SmartOS, zones are sometimes called VMs, though that term sometimes refers only to the hardware virtualized variety. The three terms are often used interchangeably (and also interchangeably with instance, since most components are deployed within their own container). |
headroom |
Idle capacity for a resource. For example, we say there’s CPU headroom on a box when some CPUs are idle some of the time. This usually means the system is capable of doing more work (at least with respect to this resource). |
instance (general, SAPI) |
Like service, instance can refer to a number of different things, including a member of a SAPI service or SMF service. Most commonly, "instance" to refer to a SAPI service. |
latency |
Latency refers to how much time an operation takes. It can apply to any discrete operation: a disk I/O request, a database transaction, a remote procedure call, a system call, establishment of a TCP connection, an HTTP request, and so on. |
out of (as in: "out of CPU") |
We sometimes say a box is out of a resource when that resource is fully utilized (i.e., "out of CPU" when all CPUs are busy). |
pegged, slammed, swamped |
These are all synonyms for being out of some resource. "The CPUs are pegged" means a box has very little CPU headroom (i.e., the CPUs are mostly fully utilized). You can also say "one CPU is pegged" (i.e., that CPU is fully utilized). You might also say "the disks are swamped" (i.e., they’re nearly always busy doing I/O). See also saturated. |
saturated |
A resource is saturated when processes are failing to use the resource because it’s already fully utilized. For example, when CPUs are saturated, threads that are ready to run have to wait in queues. When a network port is saturated, packets are dropped. Similar to pegged, but more precise. |
service (general) |
Service can refer to a SAPI service (see below), an SMF service (see below), or it may be used more generally to describe almost any useful function provided by a software component. As a verb (e.g., "this process is servicing requests"), it usually means "to process [requests]". |
service (SAPI) |
Within SAPI (the Triton facility for managing configuration and deployment of cloud applications like Manta), a service refers to a collection of instances providing similar functionality. It usually describes a type of component (e.g., "storage" or "webapi") that may have many instances. These instances usually share images and configuration, and within SAPI, the service is the place where such configuration is stored. |
service (SMF) |
Within the operating system, an SMF service is a piece of configuration that usually describes long-running programs that should be automatically restarted under various failure conditions. For example, we define an SMF service for "mahi-v2" (our authenticationc ache) so that the operating system automatically starts the service upon boot and restarts it if the process exits or dumps core. (Within SMF, it’s actually instances of a service that get started, stopped, restarted, and so on. For many services, there’s only one "default" instance, and the terms are often used interchangeably. Usually someone will say "I restarted the mahi-v2 service" rather than "I restarted the sole instance of the mahi-v2 service". However, for some services (notably "muskie", "moray", "electric-moray", and "binder") we do deploy multiple instances, and it may be important to be more precise (e.g., "three of the muskie instances in this zone are in maintenance"). See |
shard |
A shard generally refers to a database that makes up a fraction of a larger logical database. For example, the Manta metadata tier is one logical data store, but it’s divided into a number of equally-sized shards. In sharded systems like this, incoming requests are directed to individual shards in a deterministic way based on some sharding key. (Many systems use a customer id for this purpose. Manta traditionally uses the name of the parent directory of the resource requested. In Manta, each shard typically uses 2-3 databases for high availability, but these aren’t separate shards because they’re exact copies. Sharding typically refers to a collection of disjoint databases that together make up a much larger dataset. |
tail latency |
When discussing a collection of operations, tail latency refers to the latency of the slowest operations (i.e., the tail of the distribution). This is often quantified using a high-numbered percentile. For example, if the 99th percentile of requests is 300ms, then 99% of requests have latency at most 300ms. As compared with an average or median latency, the 99th percentile better summarizes the latency of the slowest requests. |