Skip to content

Latest commit

 

History

History
517 lines (393 loc) · 30.2 KB

0003-quick-reference.adoc

File metadata and controls

517 lines (393 loc) · 30.2 KB

Quick references

Manta HTTP Quick Reference

HTTP Status Codes in Manta

Related links:

Here we cover only status codes with particular meanings within Manta or that are commonly used within Manta.

Code HTTP Meaning in Manta

100-continue

-

The client requested extra initial validation, and the server has not yet rejected the request.

200

OK

Most commonly used for successful GETs

201

Created

Most commonly used for creating jobs and multipart uploads (not object PUT operations)

204

No Content

Used for successful direction creations, directory removals, object uploads, object deletes, snaplink creation, and a handful of other operations

400

Bad Request

The client send an invalid HTTP request (e.g., an incorrect MD5 checksum)

401

Unauthorized

The client sent an invalid or unsupported signature, or it did not send any signature.

403

Forbidden

The client failed to authenticate, or it authenticated and was not allowed to access the resource.

408

Request Timeout

The server did not receive a complete request from the client within a reasonable timeout.

409

Conflict

The client sent an invalid combination of parameters for an API request.

412

Precondition Failed

The client issued a conditional request and the conditions were not true. (For example, this could have been a PUT-if-the-object-does-not-already-exist, and the object already existed.)

413

Request Entity Too Large

The client attempted a streaming upload and sent more bytes than were allowed based on the max-content-length header. See [_request_has_exceeded_bytes] for details.

429

Too Many Requests

The client is being rate-limited by the server because it issued too many requests in too short a period.

499

(not in HTTP)

The 499 status is used to indicate that the client appeared to abandon the request. (In this case, it’s not possible to send a response. The 499 code is used for internal logging and statistics.) This was originally used in nginx.

500

Internal Server Error

Catch-all code for a failure to process this request.

502

Bad Gateway

Historically, this code was emitted by Manta when requests took more than two minutes to complete. This was an artifact of the load balancer. Modern versions of Manta report this as a 503.

503

Service Unavailable

This code generally indicates that the system is overloaded and cannot process more work. In practice, this currently means that a particular metadata shard’s queue is full, that Muskie took too long to respond to the request, or that there aren’t enough working storage nodes with enough disk space to satisfy this upload.

507

Not Enough Space

The Manta deployment is out of physical disk space for new objects. See [_not_enough_free_space_for_mb] for details.

Generally:

  • Responses with status codes above 400 (400 through 599) are often called errors. In many cases, though, 400-level errors do not indicate that anything is wrong. For example, a 404 may be normal behavior for a client that checks for the existence of a particular object before doing some other operation.

  • For errors (except for 499), the response body should contain a JSON object containing more information: a Manta-specific error code and message.

  • Generally, 400-level codes (i.e., codes from 400 to 499) indicate that the request failed due to something within the client’s control.

  • Generally, 500-level codes (i.e., codes from 500 to 599) indicate a server-side failure.

HTTP Headers in Manta

Related links:

Here we cover only headers with particular meanings within Manta or that are commonly used within Manta.

Header Request/Response Origin Meaning

Content-Length

Both

HTTP

See [_streaming_vs_fixed_size_requests].

Content-MD5

Both

HTTP

MD5 checksum of the body of a request or response. It’s essential that clients and servers validate this on receipt.

Content-Type

Both

HTTP, Manta

Describes the type (i.e., MIME type) of the body of the request or response. Manta understands a special content-type for directories called application/json; type=directory, which represents a Manta directory.

Date

Both

HTTP

The time when the request or response was generated. This is often useful when debugging for putting together a timeline.

Transfer-encoding: chunked

Both

HTTP

See [_streaming_vs_fixed_size_requests].

any header starting with m-

Both

Manta

Arbitrary user-provided headers.

Result-Set-Size

Response

Manta

For GET or HEAD requests on directories, this header indicates how many items are in the directory.

x-request-id

Both

Manta

A unique identifier for this request. This can be used to locate details about a request in Matna logs. Clients may specify this header on requests, in which case Manta will use the requested id. Othewrise, Manta will generate one and provide it with the response.

x-server-name

Response

Manta

A unique identifier for the frontend instance that handled this request. Specifically, this identifies the "webapi" zone that handled the request.

Requests using "100-continue"

HTTP allows clients to specify a header called Expect: 100-continue to request that the server validate the request headers before the client sends the rest of it. For example, suppose a client wants to upload a 10 GiB object to /foo/stor/bar/obj1, but /foo/stor/bar does not exist. With Expect: 100-continue, the server can immediately send a "404 Not Found" response (because the parent directory doesn’t exist). Without this header, HTTP would require that the client send the entire 10 GiB request.

When Expect: 100-continue is specified with the request headers, then the client waits for a 100-continue response before proceeding to send the body of the request.

We mention this behavior because error handling for requests that do not use 100-continue can be surprising. For example, when the client doesn’t specify this header, the server might still choose to send a 400 or 500-level response immediately, but it must still wait for the client to send the whole request. There have been bugs in the past where the server did not read the request of the request, resulting in a memory leak and a timeout from the client’s perspective (because the client has no reason to read a response before it has even finished sending the request, if it didn’t use 100-continue).

Streaming vs. fixed-size requests

In order to frame HTTP requests and responses, one of two modes must be used:

  • A request or response can specify a content-length header that indicates exactly how many bytes of data will be contained in the body; or

  • A request or response can specify transfer-encoding: chunked, which indicates that the body will be sent in chunks, each of which is preceded by a size

Manta treats these two modes a little differently. If an upload request has a content-length, then Manta ensures that the storage nodes chosen to store the data have enough physical space available. Requests with transfer-encoding: chunked are called streaming uploads. For these uploads, a maximum content length is assumed by the server that’s used to validate that storage nodes contain enough physical space. The maximum content length for a streaming upload can be overridden using the max-content-length header.

Validating the contents of requests and responses

It’s critical that clients and servers validate the body of responses and requests. Some types of corruption are impossible to report any other way.

Corrupted requests and responses can manifest in a number of ways:

  • the sender may stop sending after too few bytes

  • the sender may send EOF after sending too few bytes

  • the sender may send too many bytes

  • the body may have the right number of bytes, but have incorrect bytes

Importantly, because of the two modes of transfer described above (under [_streaming_vs_fixed_size_requests]), the reader of a request or response always knows how many bytes to expect. In the cases above:

  • If the sender stops sending bytes after too few bytes (but the socket is still open for writes in both directions), then the reader will fail the operation due to a timeout. For example, if the client does this, then the server will report a 408 error. The client must implement a timeout for this case to cover the case where the server fails in this way.

  • If the sender sends EOF after too few bytes, this would be a bad request or response. If a client did this, then the server would report a 400 error. The client must implement a check for this case to cover the case where the server fails in this way. At this point in the HTTP operation, the client may have already read a successful response (i.e., a 200), and it needs to be sophisticated enough to treat it as an error anyway.

  • If the sender sends too many bytes, then the request or response would be complete, but the next request or response would likely be invalid.

  • When possible, clients and servers should generally send a Content-MD5 header. This allows the remote side to compute an MD5 checksum on the body and verify that the correct bytes were sent. For object downloads, Manta always stores the MD5 computed from the original upload and it always provides the Content-MD5 header on responses. If clients provide a Content-MD5 header on uploads, then Manta always validates that it receives it. When both of these mechanisms are used by both client and server, a client can be sure of end-to-end integrity.

Note: It’s been noted that MD5 checksums are deprecated for security purposes due to the risk of collisions. While they are likely not appropriate for security, MD5 collisions remain rare enough for MD5 checksums to be used for basic integrity checks.

Muskie log entry properties

Below is a summary of the most relevant fields for an audit log entry. (Note that Muskie sometimes writes out log entries unrelated to the completion of an HTTP request. Only log entries with "audit": true represent completion of an HTTP request. Other log entries have other fields.)

General Muskie-provided properties

JSON property Example value Meaning

audit

true

If true, this entry describes completion of an HTTP request. Otherwise, this is some other type of log entry, and many of the fields below may not apply.

latency

26

Time in milliseconds between when Muskie started processing this request and when the response headers were sent. This is commonly called time to first byte. See also building a request timeline. This should generally match the x-response-time response header.

operation

getstorage

Manta-defined token that describes the type of operation. In this case, getstorage refers to an HTTP GET from a user’s stor directory.

req

See specific properties below.

Object describing the incoming request

req.method

GET

HTTP method for this request (specified by the client)

req.url

"/poseidon/stor/manta_gc/mako/1.stor.staging.joyent.us?limit=1024"

URL (path) provided for this request (specified by the client)

req.headers

{
    "accept": "*/*",
    "x-request-id": "a080d88b-8e42-4a98-a6ec-12e1b0dbf612",
    "date": "Tue, 01 Aug 2017 03:03:13 GMT",
    "authorization": "Signature keyId=\"/poseidon/keys/ef:0e:27:45:c5:95:4e:92:ba:ab:03:17:e5:3a:60:14\",algorithm=\"rsa-sha256\",headers=\"date\",signature=\"...\"",
    "user-agent": "restify/1.4.1 (ia32-sunos; v8/3.14.5.9; OpenSSL/1.0.1i) node/0.10.32",
    "accept-version": "~1.0",
    "host": "manta.staging.joyent.us",
    "connection": "keep-alive",
    "x-forwarded-for": "::ffff:172.27.4.22"
}

Headers provided with this request (specified by the client). The Date header is particularly useful to note, as this usually reflects the timestamp (on the client) when the client generated the request. This is useful when constructing a request timeline. In particular, problems with the network (timeouts and retransmissions) or queueing any time before Muskie starts processing the request can be identified using this header, provided that the client clock is not too far off from the server clock.

req.caller

{
    "login": "poseidon",
    "uuid": "4d649f41-cf87-ca1d-c2c0-bb6a9004311d",
    "groups": [ "operators" ],
    "user": null
}

Object describing the account making this request. This is not the same as the owner! Note that this can differ from the owner of the resource (req.owner). That commonly happens when the caller uses operator privileges to access objects in someone else’s account or when any user makes an authenticated request to access public data in some other user’s account.

req.caller.login

"poseidon"

For authenticated requests, the name of the account that made the request.

req.caller.uuid

"4d649f41-cf87-ca1d-c2c0-bb6a9004311d"

For authenticated requests, the unique identifier for the account that made the request.

req.caller.groups

[ "operators" ]

For authenticated requests, a list of groups that the caller is part of. Generally, the only interesting group is "operators", which grants the caller privileges to read from and write to any account.

req.caller.user

null

For authenticated requests from a subuser of the account, the name of the subuser account.

req.owner

"4d649f41-cf87-ca1d-c2c0-bb6a9004311d"

Unique identifier for the account that owns the requested resource. This is generally the uuid of the account at the start of the URL (i.e., for a request of "/poseidon/stor", this would be the uuid of the account poseidon).

res

See specific properties below.

Describes the HTTP response sent by Muskie to the client.

res.statusCode

200

HTTP-level status code.

res.headers

{
    "last-modified": "Sat, 22 Mar 2014 01:17:01 GMT",
    "content-type": "application/x-json-stream; type=directory",
    "result-set-size": 1,
    "date": "Tue, 01 Aug 2017 03:03:13 GMT",
    "server": "Manta",
    "x-request-id": "a080d88b-8e42-4a98-a6ec-12e1b0dbf612",
    "x-response-time": 26,
    "x-server-name": "204ac483-7e7e-4083-9ea2-c9ea22f459fd"
}

Headers sent in the response from Muskie to the client. Among the most useful is the x-request-id header, which should uniquely identify this request. You can use this to correlate observations from the client or other parts of the system.

route

"getstorage"

Identifies the name of the restify route that handled this request.

Muskie-provided properties for debugging only

JSON property Example value Meaning

entryShard

"tcp://3.moray.staging.joyent.us:2020"

When present, this indicates the shard that was queried for the metadata for req.url. Unfortunately, this field is not currently present when Muskie fails to fetch metadata, either because of a Moray failure or just because the metadata is missing (i.e., the path doesn’t exist).

err

false

Error associated with this request, if any. See [_details_about_specific_error_messages].

objectId

"bf54fb8a-6cb5-4683-8655-f9ad90b984d4"

When present, this is the unique identifier for the Manta object identified by req.url when the request was made. This is helpful when trying to verify that a request fetched the exact object that you expect (and not another object that had the same name at the time).

parentShard

"tcp://2.moray.staging.joyent.us:2020"

When present, this indicates the shard that was queried for the metadata for the parent directory of req.url. This is only present when the parent metadata was fetched (which is common for PUT requests, but not GET or DELETE requests). Unfortunately, this field is not currently present when Muskie fails to fetch metadata, either because of a Moray failure or just because the metadata is missing (i.e., the path doesn’t exist).

logicalRemoteAddress

"172.27.4.22"

The (remote) IP address of the client connected to Manta. Note that clients aren’t connected directly to Muskie. When using TLS ("https" URLs), clients connect to stud in the loadbalancer component. Stud connects to haproxy in the same container. haproxy in the load balancer container connects to another haproxy instance in the Muskie container. That haproxy instance connects to a Muskie process. The client’s IP is passed through this chain and recorded in logicalRemoteAddress.

remoteAddress, remotePort

"127.0.0.1", 64628

The IP address and port of the TCP connection over which this request was received. Generally, Muskie only connects directly to an haproxy inside the same zone, so the remote address will usually be 127.0.0.1. Neither of these fields is generally interesting except when debugging interactions with the local haproxy.

req.timers

{
    "earlySetup": 32,
    "parseDate": 8,
    "parseQueryString": 28,
    "handler-3": 127,
    "checkIfPresigned": 3,
    "enforceSSL": 3,
    "ensureDependencies": 5,
    "_authSetup": 5,
    "preSignedUrl": 3,
    "checkAuthzScheme": 4,
    "parseAuthTokenHandler": 36,
    "signatureHandler": 73,
    "parseKeyId": 59,
    "loadCaller": 133,
    "verifySignature": 483,
    "parseHttpAuthToken": 5,
    "loadOwner": 268,
    "getActiveRoles": 43,
    "gatherContext": 27,
    "setup": 225,
    "getMetadata": 5790,
    "storageContext": 8,
    "authorize": 157,
    "ensureEntryExists": 3,
    "assertMetadata": 3,
    "getDirectoryCount": 7903,
    "getDirectory": 10245
}

An object describing the time in microseconds for each phase of the request processing pipeline. This is useful for identifying latency. The names in this object are the names of functions inside Muskie responsible for the corresponding phase of request processing.

sharksContacted

[ {
  "shark": "1.stor.staging.joyent.us",
  "result": "ok",
  "timeToFirstByte": 2,
  "timeTotal": 902,
  "_startTime": 1509505866032
}, {
  "shark": "2.stor.staging.joyent.us",
  "result": "ok",
  "timeToFirstByte": 1,
  "timeTotal": 870,
  "_startTime": 1509505866033
} ]

This field should be present for Manta requests that make requests to individual storage nodes. The value is an array of storage nodes contacted as part of the request, including the result of this subrequest, when it started, and how long it took.

For GET requests, these subrequests are GET requests from individual storage nodes hosting a copy of the object requested. These subrequests happen serially, and we stop as soon as one completes.

For PUT requests, the storage node subrequests are PUT requests to individual storage nodes on which a copy of the new object will be stored. If all goes well, you’ll see N sharks contacted (typically 2, but whatever the client’s requested durability level is), all successfully, and the requests will be concurrent with each other. If any of these fail, Manta will try another N sharks, and up to one more set of N. For durability level 2, you may see up to 6 sharks contacted: three sets of two. The sets would be sequential, while each pair in a set run concurrently.

Bunyan-provided properties

JSON property Example value Meaning

time

"2017-08-01T03:03:13.985Z"

ISO 8601 timestamp closest to when the log entry was generated.

hostname

"204ac483-7e7e-4083-9ea2-c9ea22f459fd"

The hostname of the system that generated the log entry. For us, this is generally a uuid corresponding to the zonename of the Muskie container.

pid

79465

The pid of the process that generated the log entry.

level

30

Bunyan-defined log level. This is a numeric value corresponding to conventional values like 'debug', 'info', 'warn', etc. You can filter based on level using the bunyan command.

msg

"handled: 200"

For Muskie audit log entries, the message is always "handled: " followed by the HTTP level status code.

XXX talk about common stack traces? XXX that should include 503 from 'No storage nodes available for this request'

Debugging tools quick reference

See also the Manta Tools Overview.

Many of these tools have manual pages or sections in this guide about how to use them. You can generally view the manual page with man TOOLNAME in whatever context you can run the tool.

Tool Where you run it Has manual page? Purpose

manta-oneach(1)

headnode GZ or "manta" zones

Yes

Run arbitrary commands in various types of Manta zones

manta-login(1)

headnode GZ or "manta" zones

Yes

Open a shell in a particular Manta zone

mlocate

"webapi" zone

No

Fetch metadata for an object (including what shard it’s on)

moray(1) tools

"moray", "electric-moray" zones

Yes

Fetch rows directly from Moray

moraystat.d

"moray" zones

No

Shows running stats about Moray RPC activity

pgsqlstat tools

"postgres" zones (need to be copied in as needed)

No

Report on PostgreSQL activity

bunyan

Anywhere

Yes

Format bunyan-format log files. With -p PID, shows live verbose log entries from a process.

curl

Anywhere

Yes

curl is (among other things) a general-purpose HTTP client. It can be used to make test requests to Manta itself as well as various components within Manta, including authcache and storage.

proc(1) tools (also called the ptools, which includes pfiles, pstack, and others)

Anywhere

Yes

Inspect various properties of a process, including its open files, thread stacks, working directory, signal mask, etc.

netstat(1M)

Anywhere

Yes

Shows information about the networking stack, including open TCP connections and various counters (including error counters).

vfsstat(1M)

Anywhere

Yes

Shows running stats related to applications' use of the filesystem (e.g., reads and writes)

prstat(1M)

Anywhere

Yes

Shows running stats related to applications' use of CPU and memory

mpstat(1M)

Anywhere

Yes

Shows running stats related to system-wide CPU usage

zonememstat(1M)

Anywhere

Yes

Shows running stats related to zone-wide memory usage

mdb_v8

Anywhere

No

Inspect JavaScript-level state in core files from Node.js processes.

Glossary of jargon

bounce (as in: "bounce a box", "bounce a service")

Bouncing a box or a service means restarting it. Bouncing a box usually means rebooting a server. Bouncing a service usually means restarting an SMF service (killing any running processes and allowing the system to restart them).

bound (as in: "CPU-bound", "disk-bound", "I/O-bound")

A program or a workload is said to be "X-bound" for some resource X when its performance is limited by that resource. For example, the performance of a CPU-bound process is limited by the amount of CPU available to it. "Disk-bound" (or "I/O-bound") usually means that a process or workload is limited by the I/O performance of the storage subsystem, which may be a collection of disks organized into a ZFS pool.

box

A box is a physical server (as opposed to a virtual machine or container).

container/zone/VM

A container is a lightweight virtualized environment, usually having its own process namespace, networking stack, filesystems, and so on. For most purposes, a container looks like a complete instance of the operating system, but there may be many containers running within one instance of the OS. They generally cannot interact with each other except through narrow channels like the network. The illumos implementation of containers are called zones. SmartOS also runs hardware-based virtual machines inside zones (i.e., a heavyweight hardware-virtualized environment within the lightweight OS-virtualized environment), and while those are technically running in a container, the term container is usually only applied to zones not running a hardware-based virtualization environment. For historical reasons, within Triton and SmartOS, zones are sometimes called VMs, though that term sometimes refers only to the hardware virtualized variety. The three terms are often used interchangeably (and also interchangeably with instance, since most components are deployed within their own container).

headroom

Idle capacity for a resource. For example, we say there’s CPU headroom on a box when some CPUs are idle some of the time. This usually means the system is capable of doing more work (at least with respect to this resource).

instance (general, SAPI)

Like service, instance can refer to a number of different things, including a member of a SAPI service or SMF service. Most commonly, "instance" to refer to a SAPI service.

latency

Latency refers to how much time an operation takes. It can apply to any discrete operation: a disk I/O request, a database transaction, a remote procedure call, a system call, establishment of a TCP connection, an HTTP request, and so on.

out of (as in: "out of CPU")

We sometimes say a box is out of a resource when that resource is fully utilized (i.e., "out of CPU" when all CPUs are busy).

pegged, slammed, swamped

These are all synonyms for being out of some resource. "The CPUs are pegged" means a box has very little CPU headroom (i.e., the CPUs are mostly fully utilized). You can also say "one CPU is pegged" (i.e., that CPU is fully utilized). You might also say "the disks are swamped" (i.e., they’re nearly always busy doing I/O). See also saturated.

saturated

A resource is saturated when processes are failing to use the resource because it’s already fully utilized. For example, when CPUs are saturated, threads that are ready to run have to wait in queues. When a network port is saturated, packets are dropped. Similar to pegged, but more precise.

service (general)

Service can refer to a SAPI service (see below), an SMF service (see below), or it may be used more generally to describe almost any useful function provided by a software component. As a verb (e.g., "this process is servicing requests"), it usually means "to process [requests]".

service (SAPI)

Within SAPI (the Triton facility for managing configuration and deployment of cloud applications like Manta), a service refers to a collection of instances providing similar functionality. It usually describes a type of component (e.g., "storage" or "webapi") that may have many instances. These instances usually share images and configuration, and within SAPI, the service is the place where such configuration is stored.

service (SMF)

Within the operating system, an SMF service is a piece of configuration that usually describes long-running programs that should be automatically restarted under various failure conditions. For example, we define an SMF service for "mahi-v2" (our authenticationc ache) so that the operating system automatically starts the service upon boot and restarts it if the process exits or dumps core. (Within SMF, it’s actually instances of a service that get started, stopped, restarted, and so on. For many services, there’s only one "default" instance, and the terms are often used interchangeably. Usually someone will say "I restarted the mahi-v2 service" rather than "I restarted the sole instance of the mahi-v2 service". However, for some services (notably "muskie", "moray", "electric-moray", and "binder") we do deploy multiple instances, and it may be important to be more precise (e.g., "three of the muskie instances in this zone are in maintenance"). See smf(5).

shard

A shard generally refers to a database that makes up a fraction of a larger logical database. For example, the Manta metadata tier is one logical data store, but it’s divided into a number of equally-sized shards. In sharded systems like this, incoming requests are directed to individual shards in a deterministic way based on some sharding key. (Many systems use a customer id for this purpose. Manta traditionally uses the name of the parent directory of the resource requested. In Manta, each shard typically uses 2-3 databases for high availability, but these aren’t separate shards because they’re exact copies. Sharding typically refers to a collection of disjoint databases that together make up a much larger dataset.

tail latency

When discussing a collection of operations, tail latency refers to the latency of the slowest operations (i.e., the tail of the distribution). This is often quantified using a high-numbered percentile. For example, if the 99th percentile of requests is 300ms, then 99% of requests have latency at most 300ms. As compared with an average or median latency, the 99th percentile better summarizes the latency of the slowest requests.