Skip to content

Commit

Permalink
Polish troubleshooting docs (#546)
Browse files Browse the repository at this point in the history
* Polish troubleshooting docs

Signed-off-by: Gao Hongtao <hanahmily@gmail.com>
  • Loading branch information
hanahmily authored Sep 26, 2024
1 parent 75b0638 commit 40ea208
Show file tree
Hide file tree
Showing 14 changed files with 534 additions and 139 deletions.
2 changes: 1 addition & 1 deletion bydbctl/internal/cmd/rest.go
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ func parseFromYAML(tryParseGroup bool, reader io.Reader) (requests []reqBody, er
data["groups"] = []string{group}
}
} else {
return nil, errors.WithMessage(errMalformedInput, "absent node: metadata or name&group")
return nil, errors.WithMessage(errMalformedInput, "absent node: name or groups")
}
j, err = json.Marshal(data)
if err != nil {
Expand Down
6 changes: 3 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ There’s room to improve the performance and resource usage based on the nature

Here you can learn all you need to know about BanyanDB. Let's get started with it.

- **Guides**. Learn how to install, configure, and use BanyanDB by real-world examples.
- **Installation**. Instruments about how to download and onboard BanyanDB server, Banyand.
- **Clients**. Some native clients to access Banyand.
- **Observability**. Learn how to effectively monitor, diagnose and optimize Banyand.
- **Interacting**. Learn how to interact with Banyand, including schema management, data ingestion, data retrieving and so on.
- **Operation**. Learn how to operate Banyand, including observability, troubleshooting, and so on.
- **Concept**. Learn the concepts of Banyand. Includes the architecture, data model, and so on.
- **CRUD Operations**. To create, read, update, and delete data points or entities on resources in the schema.

### Useful Links

Expand Down
144 changes: 144 additions & 0 deletions docs/concept/rotation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Data Rotation

Data rotation is the process of managing the size of data stored in BanyanDB by removing old data and keeping only the most recent data. Data rotation is essential to prevent the database from running out of disk space and to maintain query performance.

## Overview

BanyanDB partitions its data into multiple [**segments**](tsdb.md#segment). These segments are time-based, allowing efficient management of data retention and querying. The `segment_interval` and retention policy (`ttl`) for each [group](../interacting/data-lifecycle.md#measures-and-streams) determine how data is segmented and retained in the database.

## Formulation

To express the relationship between the **number of segments**, the **segment interval**, and the **time-to-live (TTL)** in BanyanDB, we can derive a simple formula.

### General Formula for Number of Segments

The relationship between the number of segments, segment interval, and TTL can be expressed as:

```
S = (T / I) rounded up + 1
```

Where:

- `S` is the **number of segments**.
- `I` is the **segment interval** (in the same unit as the TTL).
- `T` is the **TTL** (time-to-live, in the same unit as the segment interval).

### Explanation

1. **T / I**: This represents the number of full segments needed to cover the TTL. For example, if the TTL is 7 days and the segment interval is 3 days, you would need at least 2.33 segments to cover the 7-day period.

2. **Rounded up**: We round up the result of `T / I` because partial segments still require a full segment to store the data.

3. **+ 1 segment**: We add 1 additional segment to account for the next segment being created to store incoming data as the current period closes.

### General Insights

- **Smaller segment intervals** (e.g., 1 day) lead to a larger number of segments because more segments are needed to cover the TTL.
- **Larger segment intervals** (e.g., 3 days) reduce the number of segments, but you still need 1 additional segment to handle data as it transitions between periods.

Thus, the formula effectively balances the need for both data retention and the number of segments based on the chosen segment interval.

### Example 1: Segment Interval = 3 Days, TTL = 7 Days

```
S = (7 / 3) rounded up + 1
S = 2.33 rounded up + 1
S = 3 + 1
S = 4
```

| Time (Day) | Action | Number of Segments |
|---------------|---------------------------------------|--------------------|
| Day 1 (00:00) | Segment for Days 1–3 is created | 1 |
| Day 3 (23:00) | New segment for Days 4–6 is created | 2 |
| Day 6 (23:00) | New segment for Days 7–9 is created | 3 |
| Day 9 (23:00) | New segment for Days 10–12 is created | 4 |
| Day 10 (00:00) | Oldest segment for Days 1–3 is removed | 3 |
| Day 12 (23:00) | New segment for Days 13–15 is created | 4 |
| Day 13 (00:00) | Oldest segment for Days 4–6 is removed | 3 |

So, **4 segments** are required to retain data for 7 days with a 3-day segment interval.

### Example 2: Segment Interval = 1 Day, TTL = 7 Days

```
S = (7 / 1) rounded up + 1
S = 7 + 1
S = 8
```

| Time (Day) | Action | Number of Segments |
|---------------|-----------------------------------|--------------------|
| Day 1 (23:00) | New segment for Day 2 is created | 1 |
| Day 2 (00:00) | Oldest segment (if any) removed | 1 |
| Day 2 (23:00) | New segment for Day 3 is created | 2 |
| Day 3 (00:00) | Oldest segment (if any) removed | 2 |
| ... | ... | ... |
| Day 7 (23:00) | New segment for Day 8 is created | 7 |
| Day 8 (00:00) | Oldest segment for Day 1 removed | 7 |
| Day 8 (23:00) | New segment for Day 9 is created | 8 |
| Day 9 (00:00) | Oldest segment for Day 2 removed | 7 |

At any given time, there will be a maximum of **8 segments**: 1 for the new day and 7 for the last 7 days of data.

### Example 3: Segment Interval = 2 Days, TTL = 7 Days

```
S = (7 / 2) rounded up + 1
S = 3.5 rounded up + 1
S = 4 + 1
S = 5
```

So, **5 segments** are required to retain data for 7 days with a 2-day segment interval.

### Generalization for Any Time Unit

To use this formula with time units like hours and days , make sure **both the segment interval (I)** and **TTL (T)** use the same unit of time. If they don’t, convert one of them so that they match.

#### Steps

1. **Convert both the segment interval and TTL to the same time unit**, if necessary.
- For example, if the TTL is in days but the segment interval is in hours, convert the TTL to hours (e.g., 3 days = 72 hours).

2. **Apply the formula** to get the number of segments.

### Example 4: Mixed Units (Segment Interval in Hours, TTL in Days)

- **Segment Interval** = 12 hours
- **TTL** = 3 days

First, convert the TTL to hours:

```
3 days = 3 * 24 = 72 hours
```

Now, apply the formula:

```
S = (72 / 12) rounded up + 1
S = 6 + 1
S = 7
```

So, **7 segments** are required to retain data for 3 days with a 12-hour segment interval.


### Example 5: Minimum Number of Segments

```
S = (7 / 8) rounded up + 1
S = 0.875 rounded up + 1
S = 1 + 1
S = 2
```

So, **2 segments** are required to retain data for 7 days with an 8-day segment interval. 2 segments are the minimum number whatever the TTL and segment interval are. When the TTL is less than the segment interval, you can have the minimum number of segments.

## Conclusion

Data rotation is a critical aspect of managing data in BanyanDB. By understanding the relationship between the number of segments, segment interval, and TTL, you can effectively manage data retention and query performance in the database. The formula provided here offers a simple way to calculate the number of segments required based on the chosen segment interval and TTL.

For more information on data management and lifecycle in BanyanDB, refer to the [Data Lifecycle](../interacting/data-lifecycle.md) documentation.
2 changes: 2 additions & 0 deletions docs/interacting/data-lifecycle.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ More ttl units can be found in the [IntervalRule.Unit](../api-reference.md#inter

You can also manage the Group by other clients such as [Web-UI](./web-ui/schema/group.md) or [Java-Client](java-client.md).

For more details about how they works, please refer to the [data rotation](../concept/rotation.md).

## [Property](../concept/data-model.md#properties)

`Property` data provides both [CRUD](./bydbctl/property.md) operations and TTL mechanism.
Expand Down
14 changes: 13 additions & 1 deletion docs/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,19 @@ catalog:
- name: "Cluster Management"
path: "/operation/cluster"
- name: "Troubleshooting"
path: "/operation/troubleshooting"
catalog:
- name: "Error Checklist"
path: "/operation/troubleshooting/error-checklist"
- name: "Troubleshooting Installation"
path: "/operation/troubleshooting/install"
- name: "Troubleshooting Crash"
path: "/operation/troubleshooting/crash"
- name: "Troubleshooting No Data"
path: "/operation/troubleshooting/no-data"
- name: "Troubleshooting Overhead"
path: "/operation/troubleshooting/overhead"
- name: "Troubleshooting Query"
path: "/operation/troubleshooting/query"
- name: "Security"
path: "/operation/security"
- name: "File Format"
Expand Down
18 changes: 18 additions & 0 deletions docs/operation/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,24 @@ There are three bootstrap commands: `data`, `liaison`, and `standalone`. You cou

Below are the available flags for configuring BanyanDB:

### Service Discovery

BanyanDB Liaison reads the endpoints of the data servers from the etcd server. The following flags are used to configure:

`node-host-provider`: the node host provider, can be "hostname", "ip" or "flag", default is hostname.

If the `node-host-provider` is "flag", you can use `node-host` to configure the node host:

```sh
./banyand liaison --node-host=foo.bar.com --node-host-provider=flag
```

If the `node-host-provider` is "hostname", BanyanDB will use the hostname of the server as the node host. The hostname is parsed from the go library `os.Hostname()`.

If the `node-host-provider` is "ip", BanyanDB will use the IP address of the server as the node host. The IP address is parsed from the go library `net.Interfaces()`. BanyanDB will use the first non-loopback IPv4 address as the node host.

The official Helm chart uses the `node-host-provider` as "ip" as the default value.

### Liaison & Network

BanyanDB uses gRPC for communication between the servers. The following flags are used to configure the network settings.
Expand Down
2 changes: 2 additions & 0 deletions docs/operation/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,8 @@ The read flow is the same as reading data from `measure`, with each metric being

Banyand, the server of BanyanDB, supports profiling automatically. The profiling data is collected by the `pprof` package and can be accessed through the `/debug/pprof` endpoint. The port of the profiling server is `2122` by default.

Refer to the [pprof documentation](https://golang.org/pkg/net/http/pprof/) for more information on how to use the profiling data.

## Query Tracing

BanyanDB supports query tracing, which allows you to trace the execution of a query. The tracing data includes the query plan, execution time, and other useful information. You can enable query tracing by setting the `QueryRequest.trace` field to `true` when sending a query request.
Expand Down
134 changes: 0 additions & 134 deletions docs/operation/troubleshooting.md

This file was deleted.

Loading

0 comments on commit 40ea208

Please sign in to comment.