diff --git a/docs/developer-guides/caching-task-outputs.md b/docs/developer-guides/caching-task-outputs.md index 580c5265..3718db4d 100644 --- a/docs/developer-guides/caching-task-outputs.md +++ b/docs/developer-guides/caching-task-outputs.md @@ -1,5 +1,6 @@ --- slug: "../faqs/task-cache-output" +description: "Learn how to cache task outputs for quick access." --- # Caching Task Outputs diff --git a/docs/developer-guides/creating-and-managing-gen-ai-prompt-templates.md b/docs/developer-guides/creating-and-managing-gen-ai-prompt-templates.md index ba68e586..5e683659 100644 --- a/docs/developer-guides/creating-and-managing-gen-ai-prompt-templates.md +++ b/docs/developer-guides/creating-and-managing-gen-ai-prompt-templates.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/creating-and-managing-gen-ai-prompt-templates" +description: "Learn how to create prompt templates and use them in Orkes' system LLM tasks." +--- + # Using GenAI Prompt Templates In this guide, we’ll provide a quick overview of Generative AI Prompt Templates and how Orkes makes it easy to leverage the power of Large Language Models (LLMs) natively in their applications. diff --git a/docs/developer-guides/debugging-workflows.md b/docs/developer-guides/debugging-workflows.md index 02c155b6..262187a9 100644 --- a/docs/developer-guides/debugging-workflows.md +++ b/docs/developer-guides/debugging-workflows.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/debugging-workflows" +description: "Learn how to use Orkes Platform to debug workflows and recover from failed executions." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Install from '@site/src/components/install.mdx'; diff --git a/docs/developer-guides/enabling-cdc-on-conductor-workflows.md b/docs/developer-guides/enabling-cdc-on-conductor-workflows.md index e10a4fa7..60f2cab5 100644 --- a/docs/developer-guides/enabling-cdc-on-conductor-workflows.md +++ b/docs/developer-guides/enabling-cdc-on-conductor-workflows.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/enabling-cdc-on-conductor-workflows" +description: "Learn how to use the CDC (Change Data Capture) pattern to send workflow and task status updates to eventing systems." +--- + # Enabling CDC (Change Data Capture) Change Data Capture (CDC) is a design pattern for tracking changes in the source data and replicating the changes to the target systems. diff --git a/docs/developer-guides/error-handling.md b/docs/developer-guides/error-handling.md new file mode 100644 index 00000000..d308cdc8 --- /dev/null +++ b/docs/developer-guides/error-handling.md @@ -0,0 +1,224 @@ +--- +slug: "/error-handling" +description: "Conductor is built to handle failure and guarantee execution in the long run. Configure various resilience parameters for your workflows and tasks." +--- + +# Handling Failures + +Orkes Conductor automatically handles transient workflow and task failures without the need to write custom code. Various failure-handling configurations can be set ahead of time, which will take effect during execution. + +For tasks, you can configure the following resilience parameters in its task definition: + +* Retries +* Timeouts +* Rate limits + +For workflows, you can configure the following resilience parameters in its workflow definition: + +* Compensation flows (known as failure workflow in Conductor) + +:::note +To deal with workflow failures post-execution, refer to [Debugging Workflows](/developer-guides/debugging-workflows). +::: + + +## Message delivery guarantees + +Conductor guarantees at least once message delivery, meaning all messages are persistent and will be delivered to task workers one or more times. In the event of failure, the message will be delivered more than once. This semantic ensures that: +1. If a workflow has started, it will run to completion as long as all its tasks are completed. +2. If a task worker fails due to restarts, crashes, or other issues, the message will be redelivered to another worker node that is alive and responding. + + +## Task retries + +Automatic retries are a key strategy for handling transient task failures. If a task fails to complete, the Conductor server will make the task available for polling again after a given duration. + + +### Retry configuration + +You can configure retry behavior for tasks in its **task definition**. The parameters for defining a task’s retry behavior are: +* Retry count +* Retry logic +* Retry delay seconds +* Backoff scale factor + +| Parameter | Description | Required/ Optional | +| --------- | ----------- | ------------------ | +| retryCount | The maximum number of times that the task will be retried. Default value is 3. | Required. | +| retryLogic | The policy that determines when to schedule each retry. Supported values: | Required. | +| retryDelaySeconds | The base value duration to wait before the task is made available for polling again. This provides time for the task service to recover from any transient failure before it is retried. Default value is 60.

**Note:** The actual duration depends on the retry policy set in retryLogic. | Required. | +| backoffScaleFactor | The value multiplied with `retryDelay` in order to determine the interval for Linear or Exponential Backoff retry. Default value is 1. | Required. | + +**Example** + +``` json +// task definition +{ + "name": "someTaskDefName", + ... + "retryCount": 3, + "retryLogic": "FIXED|EXPONENTIAL_BACKOFF|LINEAR_BACKOFF", + "retryDelaySeconds": 1, + "backoffScaleFactor": 1 +} +``` + +### Example retry behavior + +

Diagram showing how the Conductor server and worker interact in the event of a retry.

+ + +Based on the retry configuration in the above figure, the following sequence of events will occur in the event of a retry: +1. Worker (W1) polls the Conductor server for task T1 and receives the task. +2. After processing the task, the worker determines that the task execution is a failure and reports to the server with a `FAILED` status after 10 seconds. +3. The server will persist this failed execution of T1. +4. A new task T1 execution is created and scheduled for polling. Based on the retry configuration, the task will be available for polling after 5 seconds + + +## Task timeouts + +A task timeout can occur if: +* There are no workers available for a given task type. This could be due to longer-than-expected system downtime or a system misconfiguration. +* The worker receives the message but dies before completely processing the task, so the task never reaches completion. +* The worker has completed the task but could not communicate with the Conductor server due to network failures, the server being down, or other issues. + + +### Timeout configuration + +You can configure timeout behavior for tasks in its **task definition** to handle the various abovementioned cases. The parameters for a task’s timeout behavior are: +* Poll timeout seconds +* Response timeout seconds +* Timeout seconds +* Timeout policy + + +| Parameter | Description | Required/ Optional | +| --------- | ----------- | ------------------ | +| pollTimeoutSeconds | The maximum duration in seconds that a worker has to poll a task before it gets marked as `TIMED_OUT`. When configured with a value > 0, Conductor will wait for the task to be picked up by a worker.

Useful for detecting a backlogged task queue with insufficient workers.

Default value is 3600. | Required. | +| responseTimeoutSeconds | The maximum duration in seconds that a worker has to respond to the server with a status update before it gets marked as `TIMED_OUT`. When configured with a value > 0, Conductor will wait for the worker to return a status update, starting from when the task was picked up.

If a task requires more time to complete, the worker can respond with the `IN_PROGRESS` status.

Default value is 600. | Required. | +| timeoutSeconds | The maximum duration in seconds for the task to reach a terminal state before it gets marked as `TIMED_OUT`. When configured with a value > 0, Conductor will wait for the task to complete, starting from when the task was picked up.

Useful for governing the overall SLA for completion.

Default value is 3600. | Required. | +| timeoutPolicy | The policy for how the Conductor server should handle the timeout. Supported values:

**Note:** The ALERT_ONLY option should be used only when you have your own metrics monitoring system to send alerts. | Required. | + + +:::note + To configure tasks that never timeout, set `timeOutSeconds` and `pollTimeoutSeconds` to 0. +::: + +**Example** + +```json +// task definition +{ + "name": "someTaskDefName", + ... + "retryCount": 3, + "retryLogic": "FIXED|EXPONENTIAL_BACKOFF|LINEAR_BACKOFF", + "retryDelaySeconds": 1, + "backoffScaleFactor": 1 +} +``` + +### Example timeout behavior + +
Poll timeout +In the figure below, task T1 isn’t polled by the worker within 60 seconds, so Conductor marks it as `TIMED_OUT`. + + +

Diagram showing how the Conductor server and worker interact in the event of a poll timeout.

+ +
+ +
Response timeout + +

Diagram showing how the Conductor server and worker interact in the event of a response timeout.

+ + +Based on the timeout configuration in the above figure, the following sequence of events will occur in the event of a delayed worker response: +1. At 0 seconds, the worker polls the Conductor server for task T1 and receives it. T1 is marked as IN_PROGRESS by the server. +2. The worker starts processing the task, but the worker instance dies during the execution. +3. At 20 seconds (T1’s `responseTimeoutSeconds`), the server marks T1 as TIMED_OUT since the worker has not updated the task within the configured duration. +4. A new instance of task T1 is scheduled based on the retry configuration. +5. At 25 seconds, the retried instance of T1 is available for polling after the `retryDelaySeconds` (5) has elapsed. + +
+ +
Poll timeout + +

Diagram showing how the Conductor server and worker interact in the event of a timeout.

+ +Based on the timeout configuration in the above figure, the following sequence of events will occur when a task cannot be completed within the given duration: +1. At 0 seconds, a worker polls the Conductor server for task T1 and receives the task. T1 is marked as `IN_PROGRESS` by the server. +2. The worker starts processing the task but is unable to complete it within the response timeout. The worker updates the server with T1 set to an IN_PROGRESS status and a callback of 9 seconds. +3. The server puts T1 back in the queue but makes it invisible and the worker continues to poll for the task but does not receive T1 for 9 seconds. +4. After 9 seconds, the worker receives T1 from the server but is still unable to finish processing the task. As such, the worker updates the server again with a callback of 9 seconds. +5. The same cycle repeats for the next few seconds. +6. At 30 seconds (T1 timeout), the server marks T1 as TIMED_OUT because it is not in a terminal state after first being moved to IN_PROGRESS status. The server schedules a new task based on the retry count. +7. At 32 seconds, the worker finishes processing T1 and updates the server with COMPLETED status. The server ignores this update since T1 has already been moved to a terminal status (TIMED_OUT). + +
+ + +## Task rate limits + +Rate limits on tasks are a key strategy for managing task load and worker capacity. When the number of tasks scheduled within a given duration exceeds the defined rate limit, the Conductor server will place the additional tasks in a PENDING status. Once an IN_PROGRESS task is completed, the rate limit is freed up, and the server will make the next PENDING task available for polling. + + +### Rate limit configuration + +You can configure rate limit behavior for tasks in its **task definition**. The parameters for defining a task’s rate limit behavior are: +* Rate limit +* Rate limit frequency +* Concurrent executions + +| Parameter | Description | Required/ Optional | +| --------- | ----------- | ------------------ | +| rateLimitPerFrequency | The maximum number of task executions that can be scheduled in a given duration. Default value is 0. | Required. | +| rateLimitFrequencyInSeconds | The duration, in seconds, specified for the rate limit. Default value is 1. | Required. | +| concurrentExecLimit | The number of task executions that can be scheduled concurrently. Default value is 0. | Required. | + + +:::note +To configure tasks with no rate limits, set `rateLimitPerFrequency` and `concurrentExecLimit` to 0. +::: + +**Example** + +```json +// task definition +{ + "name": "someTaskDefName", + "pollTimeoutSeconds": 3600, + "responseTimeoutSeconds": 600, + "timeoutSeconds": 3600, + "timeoutPolicy": "TIME_OUT_WF", +} +``` + +## Workflow compensation flows + +A compensation flow can be configured to take place when a workflow execution fails (FAILED status). Known as a **failure workflow** in Conductor, this failure workflow must be created in Conductor and added to the main workflow definition. + +When triggered, the failure workflow receives the failed workflow details, such as its workflow ID and tasks, as input. This enables you to implement compensating logic to handle the failure. + + +### Setting a failure workflow + +You can set a failure workflow for a workflow in its **workflow definition**. Before setting the failure workflow, ensure that you have created it first. + +**To set a failure workflow:** +1. Go to **Definitions** > **Workflow**. +2. Select the workflow that you want to add a failure workflow to. +3. In the **Workflow** tab on the right, scroll down to **Failure workflow name** and select the failure workflow from the dropdown box. +4. Select **Save** > **Confirm save**. + +

Configuring failure workflow in UI.

+ +**Example** + +```json +// workflow definition +{ + ... + "failureWorkflow": "" +} +``` diff --git a/docs/developer-guides/event-handler.md b/docs/developer-guides/event-handler.md index 1c70812f..d46b3374 100644 --- a/docs/developer-guides/event-handler.md +++ b/docs/developer-guides/event-handler.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/event-handler" +description: "Learn how to configure an event handler in a Conductor cluster to send and receive events." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; diff --git a/docs/developer-guides/getting-started-with-orkes-template-explorer.md b/docs/developer-guides/getting-started-with-orkes-template-explorer.md index 3a24b7e8..13236e8b 100644 --- a/docs/developer-guides/getting-started-with-orkes-template-explorer.md +++ b/docs/developer-guides/getting-started-with-orkes-template-explorer.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/getting-started-with-orkes-template-explorer" +description: "Get started by using the available templates on Orkes Platform to build your Conductor workflows." +--- + # Getting Started with Orkes Template Explorer Orkes Template Explorer offers a versatile collection of pre-designed templates. These templates are not only ready to use right out of the box but also highly customizable to align with your specific enterprise needs. diff --git a/docs/developer-guides/integration-with-cicd.md b/docs/developer-guides/integration-with-cicd.md index 911e3969..ae2d190a 100644 --- a/docs/developer-guides/integration-with-cicd.md +++ b/docs/developer-guides/integration-with-cicd.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/integration-with-cicd" +description: "Find out the best practices for integrating Conductor workflows into your CI/CD processes." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Install from '@site/src/components/install.mdx'; diff --git a/docs/developer-guides/metrics-and-observability.md b/docs/developer-guides/metrics-and-observability.md index 634a9325..f68594ab 100644 --- a/docs/developer-guides/metrics-and-observability.md +++ b/docs/developer-guides/metrics-and-observability.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/metrics-and-observability" +description: "Discover how to use the Metrics dashboard to get insights into your workflow performance." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; diff --git a/docs/developer-guides/monitoring-task-queues.md b/docs/developer-guides/monitoring-task-queues.md index f86e9bc2..fc1fbecb 100644 --- a/docs/developer-guides/monitoring-task-queues.md +++ b/docs/developer-guides/monitoring-task-queues.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/monitoring-task-queues" +description: "Learn how to monitor task queues in Orkes Conductor." +--- + # Monitoring Task Queues When an application or workflow needs to execute a task in the background, it adds tasks to task queues. These queues hold tasks that are pending execution and are processed later by worker services. diff --git a/docs/developer-guides/orchestrating-human-tasks.md b/docs/developer-guides/orchestrating-human-tasks.md index e0077581..a7ee1886 100644 --- a/docs/developer-guides/orchestrating-human-tasks.md +++ b/docs/developer-guides/orchestrating-human-tasks.md @@ -1,3 +1,8 @@ +--- +slug: "/content/developer-guides/orchestrating-human-tasks" +description: "Get started on orchestrating human tasks using Orkes Conductor." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; diff --git a/docs/developer-guides/passing-inputs-to-task-in-conductor.md b/docs/developer-guides/passing-inputs-to-task-in-conductor.md index 53999a52..ca66b98e 100644 --- a/docs/developer-guides/passing-inputs-to-task-in-conductor.md +++ b/docs/developer-guides/passing-inputs-to-task-in-conductor.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/passing-inputs-to-task-in-conductor" +description: "Learn how to configure variable task inputs and create the right expressions to dynamically reference and pass data between tasks." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Install from '@site/src/components/install.mdx'; diff --git a/docs/developer-guides/running-workflows.md b/docs/developer-guides/running-workflows.md index 28902518..2043066d 100644 --- a/docs/developer-guides/running-workflows.md +++ b/docs/developer-guides/running-workflows.md @@ -1,3 +1,7 @@ +--- +slug: "/developer-guides/running-workflows" +description: "This guide focuses on the basics of running workflows in Orkes Conductor, including its task statuses, creating schedules, and sending signals." +--- # Running Conductor Workflows diff --git a/docs/developer-guides/scaling-workers.md b/docs/developer-guides/scaling-workers.md index dde004de..fc575569 100644 --- a/docs/developer-guides/scaling-workers.md +++ b/docs/developer-guides/scaling-workers.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/scaling-workers" +description: "Decide whether to scale the number of workers based on performance metrics like throughput and number of pending tasks." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Install from '@site/src/components/install.mdx'; diff --git a/docs/developer-guides/scheduling-workflows.md b/docs/developer-guides/scheduling-workflows.md index 08945c4e..591fff89 100644 --- a/docs/developer-guides/scheduling-workflows.md +++ b/docs/developer-guides/scheduling-workflows.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/scheduling-workflows" +description: "Learn how to schedule workflows in Orkes Conductor using cron expressions." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Install from '@site/src/components/install.mdx'; diff --git a/docs/developer-guides/schema-validation.md b/docs/developer-guides/schema-validation.md index b1a141ae..658af518 100644 --- a/docs/developer-guides/schema-validation.md +++ b/docs/developer-guides/schema-validation.md @@ -1,5 +1,7 @@ --- sidebar_label: Schema Validation +slug: "/developer-guides/schema-validation" +description: "Learn how to create schemas in Orkes Conductor to enforce the your input/output payload for workflows and tasks." --- import Tabs from '@theme/Tabs'; diff --git a/docs/developer-guides/secrets-in-conductor.md b/docs/developer-guides/secrets-in-conductor.md index a53dbc3f..c07f888a 100644 --- a/docs/developer-guides/secrets-in-conductor.md +++ b/docs/developer-guides/secrets-in-conductor.md @@ -1,6 +1,9 @@ --- sidebar_position: 6 +slug: "/developer-guides/secrets-in-conductor" +description: "Learn how to securely pass sensitive variables using secrets or masked inputs." --- + # Using Secrets Sensitive information such as usernames, passwords, API keys, and authorization tokens is often required in workflows. To protect this sensitive data, secrets can be used to hide these values on the user interface. Secrets allow you to securely manage and use sensitive information within workflows without exposing it directly. diff --git a/docs/developer-guides/sending-signals-to-workflows.md b/docs/developer-guides/sending-signals-to-workflows.md index c3a4e446..a35231b2 100644 --- a/docs/developer-guides/sending-signals-to-workflows.md +++ b/docs/developer-guides/sending-signals-to-workflows.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/sending-signals-to-workflows" +description: "Learn how to send signals to control a workflow progress in Orkes Conductor." +--- + # Sending Signals to Workflows import Tabs from '@theme/Tabs'; diff --git a/docs/developer-guides/task-to-domain.md b/docs/developer-guides/task-to-domain.md index 584b76db..b34370aa 100644 --- a/docs/developer-guides/task-to-domain.md +++ b/docs/developer-guides/task-to-domain.md @@ -1,3 +1,9 @@ +--- +sidebar_label: Routing Tasks +slug: "/developer-guides/task-to-domain" +description: "Learn how to route tasks to different sets of workers using the concept task to domain." +--- + # Routing Tasks (Task-to-domain) import Tabs from '@theme/Tabs'; diff --git a/docs/developer-guides/unit-and-regression-tests.md b/docs/developer-guides/unit-and-regression-tests.md index 205f543c..d18c65c3 100644 --- a/docs/developer-guides/unit-and-regression-tests.md +++ b/docs/developer-guides/unit-and-regression-tests.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/unit-and-regression-tests" +description: "Learn how to do unit tests and regression tests for Conductor workflows." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Install from '@site/src/components/install.mdx'; diff --git a/docs/developer-guides/using-environment-variables.md b/docs/developer-guides/using-environment-variables.md index 3c4f7359..db934f10 100644 --- a/docs/developer-guides/using-environment-variables.md +++ b/docs/developer-guides/using-environment-variables.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/using-environment-variables" +description: "Learn how to configure environment variables for global use across multiple workflows." +--- + # Using Environment Variables Environment variables are essential for managing variables that need to be frequently accessed across multiple workflows. By storing these variables globally, they can be reused, making workflows more efficient and easier to manage. @@ -8,7 +13,7 @@ Environment variables can be created and stored in Orkes Conductor, that can be To create an environment variable: -1. From Orkes Conductor cluster, go to the **Definitions > Environment Variable**s from the left menu. +1. From Orkes Conductor Console, go to the **Definitions > Environment Variable**s from the left menu. 2. Click **+New Environment Variable** from the top-right corner of the page. 3. Provide the following details: @@ -52,7 +57,7 @@ Access to the environment variables can be granted via Groups/Applications in Or To provide explicit permission to Groups: -1. Navigate to **Access Control > Groups** from the left menu on your Orkes Conductor cluster. +1. Navigate to **Access Control > Groups** from the left menu on your Orkes Conductor console. 2. Create a new group or choose an existing group. 3. Under the **Permissions** section, click **+Add Permission**. 4. Under the **Env variables** tab, select the required variables with the required permissions. diff --git a/docs/developer-guides/using-llms-in-your-orkes-conductor-workflows.md b/docs/developer-guides/using-llms-in-your-orkes-conductor-workflows.md index 83fc2b1d..3a8b74e8 100644 --- a/docs/developer-guides/using-llms-in-your-orkes-conductor-workflows.md +++ b/docs/developer-guides/using-llms-in-your-orkes-conductor-workflows.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/using-llms-in-your-orkes-conductor-workflows" +description: "Learn how to use Orkes' system LLM tasks, including the steps for integration, access control, and prompt creation." +--- + # Using LLMs In this guide, we’ll provide an overview of Generative AI, Large Language Models (LLMs), and how Orkes makes it easy to leverage the power of LLMs natively in your applications. Whether you’re a developer, product manager, or anyone interested in GenAI powering your business logic, this guide will help you understand the concepts and get started with AI-powered applications in Orkes Conductor. diff --git a/docs/developer-guides/using-vector-databases-in-your-orkes-conductor-workflows.md b/docs/developer-guides/using-vector-databases-in-your-orkes-conductor-workflows.md index 3de2e5dd..f4cd7be8 100644 --- a/docs/developer-guides/using-vector-databases-in-your-orkes-conductor-workflows.md +++ b/docs/developer-guides/using-vector-databases-in-your-orkes-conductor-workflows.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/using-vector-databases-in-your-orkes-conductor-workflows" +description: "Learn how to use vector databases in Orkes' system LLM tasks, including the steps for integration and access control." +--- + # Using Vector Databases In this guide, we’ll provide an overview of Vector Databases and how Orkes makes it easy for developers to use Vector DBs for AI tasks in your applications. We will go through the concepts, where and how vector DBs add capabilities for your AI-powered applications, and how to get started. diff --git a/docs/developer-guides/webhook-integration.md b/docs/developer-guides/webhook-integration.md index fb1eee46..7631244f 100644 --- a/docs/developer-guides/webhook-integration.md +++ b/docs/developer-guides/webhook-integration.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/webhook-integration" +description: "Get started on using webhook integrations in Orkes Conductor." +--- + import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Install from '@site/src/components/install.mdx'; diff --git a/docs/developer-guides/workflow-version-behavior-on-execution.md b/docs/developer-guides/workflow-version-behavior-on-execution.md index 20be0c15..23b65da4 100644 --- a/docs/developer-guides/workflow-version-behavior-on-execution.md +++ b/docs/developer-guides/workflow-version-behavior-on-execution.md @@ -1,3 +1,8 @@ +--- +slug: "/developer-guides/workflow-version-behavior-on-execution" +description: "Find out how workflow versions intersect with current and new executions." +--- + # Workflow Versions at Runtime The Conductor [workflow can be versioned](https://orkes.io/content/faqs/workflow-versioning), which means you can have multiple versions of the same workflow. diff --git a/docs/error-handling.md b/docs/error-handling.md deleted file mode 100644 index d68231a8..00000000 --- a/docs/error-handling.md +++ /dev/null @@ -1,89 +0,0 @@ -# Error Handling - -Handling errors is one of the critical aspects of software development, especially while dealing with distributed systems. - -Orkes Conductor allows you to create applications that are resilient against failures - -without having to worry about handling error conditions. - -## Execution Guarantees with Conductor -Conductor is built to offer at-least one delivery guarantee. This means all the messages are persistent, durable and will be delivered to the task workers at least once. -This model ensures two things: -1. If a workflow has started, it will complete as long as all the tasks are completed. -2. If a task worker fails due to restarts, the system going down, etc., the message is redelivered to another node that is alive and responding. - -## Handling Timeouts -A timeout can occur if: -1. There are no workers available for a given task type - this could be due to longer downtime in the system or misconfiguration of the system. -2. The worker receives the message and dies before completing the processing of the task, so the task never goes to the completion state. -3. The worker has completed the processing but could not communicate with the Conductor server due to network failures, the Conductor server being down, etc. - -Conductor allows configuring tasks with various timeouts to handle such cases: - -### Poll Timeout Seconds -```json -{ - "taskType": "send_email", - "pollTimeoutSeconds": 10 -} -``` -When configured with a value > 0, the system will wait for this number of seconds for the task to be picked up by a task worker. Useful when you want to detect a backlogged task queue with insufficient workers. - -### Response Timeout Seconds -```json -{ - "taskType": "send_email", - "responseTimeoutSeconds": 10 -} -``` -When configured with a value > 0, the system will wait for this number of seconds from when the task is polled before the worker updates back with a status. The worker can keep the task in the **IN_PROGRESS** state if it requires more time to complete. - -### Timeout Seconds -```json -{ - "taskType": "send_email", - "timeoutSeconds": 30 -} -``` -When configured with a value > 0, the system will wait for this task to complete successfully until this number of seconds from when the task is first polled. We can use this to fail a workflow when a task breaches the overall SLA for completion. - -### Timeout Policies -The policy for timeout dictates how the server should handle the case. - -* **RETRY**: Retries the task again. -* **TIME_OUT_WF**: The task status is marked as TIMED_OUT, and the task is terminated. -* **ALERT_ONLY**: Registers a counter and sends an alert. No further action is taken. Use when you have your own metrics monitoring to send alerts. - -```json -{ - "timeoutPolicy": "RETRY" -} -``` - -## Handling Failures -One of the key powers of Conductor is that it allows you to create applications without worrying about failures. However, failures are part of any system. Conductor lets you configure automatic retries when a task fails, allowing you to handle transient errors seamlessly. - -### Task Failures -Failure policies are defined with `retry*` parameters: -#### Retry Logic -```json -{ - "retryLogic": "FIXED|EXPONENTIAL_BACKOFF|LINEAR_BACKOFF", - "retryDelaySeconds": 1, - "backoffRate": 1 -} -``` -* **FIXED**: Reschedule the task after *retryDelaySeconds* -* **EXPONENTIAL_BACKOFF**: Reschedule the task after _retryDelaySeconds * (2 ^ attemptNumber)_ -* **LINEAR_BACKOFF**: Reschedule after *retryDelaySeconds * backoffRate * attemptNumber* - -### Workflow Failures - -Workflow can fail if one of the tasks fails even after retry attempts if terminated by a signal or using a terminate API call. Failed workflow transactions can be compensated using compensating workflows. - -With Conductor, you can specify a name for a failure workflow within your workflow definition, which will be triggered if the main workflow fails. This failure workflow receives the failed workflow's ID and tasks as input, enabling you to implement compensating logic to handle the failure. - -In your workflow definition, you can add the workflow name to be run on the failure of your current workflow: - -```json -"failureWorkflow": "", -``` \ No newline at end of file diff --git a/sidebars.js b/sidebars.js index ae397021..00c3de5f 100644 --- a/sidebars.js +++ b/sidebars.js @@ -177,7 +177,7 @@ const sidebars = { }, 'developer-guides/webhook-integration', 'developer-guides/write-workflows-using-code', - 'error-handling' + 'developer-guides/error-handling' ] }, { diff --git a/static/img/dev-guides/handling_failures-add_failure_workflow.png b/static/img/dev-guides/handling_failures-add_failure_workflow.png new file mode 100644 index 00000000..cf49bdc9 Binary files /dev/null and b/static/img/dev-guides/handling_failures-add_failure_workflow.png differ diff --git a/static/img/dev-guides/handling_failures-poll_timeout_example.jpg b/static/img/dev-guides/handling_failures-poll_timeout_example.jpg new file mode 100644 index 00000000..9774ad58 Binary files /dev/null and b/static/img/dev-guides/handling_failures-poll_timeout_example.jpg differ diff --git a/static/img/dev-guides/handling_failures-response_timeout_example.jpg b/static/img/dev-guides/handling_failures-response_timeout_example.jpg new file mode 100644 index 00000000..92207b61 Binary files /dev/null and b/static/img/dev-guides/handling_failures-response_timeout_example.jpg differ diff --git a/static/img/dev-guides/handling_failures-retry_example.jpg b/static/img/dev-guides/handling_failures-retry_example.jpg new file mode 100644 index 00000000..06042db6 Binary files /dev/null and b/static/img/dev-guides/handling_failures-retry_example.jpg differ diff --git a/static/img/dev-guides/handling_failures-timeout_example.jpg b/static/img/dev-guides/handling_failures-timeout_example.jpg new file mode 100644 index 00000000..1c3c5249 Binary files /dev/null and b/static/img/dev-guides/handling_failures-timeout_example.jpg differ