Skip to content

Commit

Permalink
Docs updates
Browse files Browse the repository at this point in the history
dvra example
  • Loading branch information
monoxgas committed May 10, 2024
1 parent 78b0c6b commit 1c08d90
Show file tree
Hide file tree
Showing 27 changed files with 2,358 additions and 868 deletions.
321 changes: 321 additions & 0 deletions docs/home/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,321 @@
# Getting Started

Rigging is a flexible library built on top of other very flexible libraries. As such it might take a bit to warm
up to it's interfaces provided the many ways you can accomplish your goals. However, the code is well documented
and topic pages and source are a great places to step in/out of as you explore.

??? tip "IDE Setup"

Rigging has been built with full type support which provides clear guidance on what
methods return what types, and when they return those types. It's recommended that you
operate in a development environment which can take advantage of this information.
You're use of Rigging will almost "fall" into place and you won't be guessing about
objects as you work.

## Basic Chats

Let's start with a very basic generation example that doesn't include any parsing features, continuations, etc.
You want to chat with a model and collect it's response.

We first need to get a [generator][rigging.generator.Generator] object. We'll use
[`get_generator`][rigging.generator.get_generator] which will resolve an identifier string
to the underlying generator class object.

??? note "API Keys"

The default Rigging generator is [LiteLLM][rigging.generator.LiteLLMGenerator], which
wraps a large number of providers and models. We assume for these examples that you
have API tokens set as environment variables for these models. You can refer to the
[LiteLLM docs](https://docs.litellm.ai/docs/) for supported providers and their key format.
If you'd like, you can change any of the model IDs we use and/or add `,api_key=[sk-1234]` to the
end of any of the generator IDs to specify them inline.

```py hl_lines="3"
import rigging as rg # (1)!

generator = rg.get_generator("claude-3-sonnet-20240229") # (2)!
pending = generator.chat(
[
{"role": "system", "content": "You are a wizard harry."},
{"role": "user", "content": "Say hello!"},
]
)
chat = pending.run()
print(chat.all)
# [
# Message(role='system', parts=[], content='You are a wizard harry.'),
# Message(role='user', parts=[], content='Say hello!'),
# ]
```

1. You'll see us use this shorthand import syntax throughout our code, it's
totally optional but makes things look nice.
2. This is actually shorthand for `litellm!anthropic/claude-3-sonnet-20240229`, where `litellm`
is the provider. We just default to that generator and you don't have to be explicit. You
can find more information about this in the [generators](../topics/generators.md) docs.


Generators have an easy [`chat()`][rigging.generator.Generator.chat] method which you'll
use to initiate the conversations. You can supply messages in many different forms from
dictionary objects, full [`Message`][rigging.message.Message] classes, or a simple `str`
which will be converted to a user message.

```py hl_lines="4-9"
import rigging as rg

generator = rg.get_generator("claude-3-sonnet-20240229")
pending = generator.chat( # (1)!
[
{"role": "system", "content": "You are a wizard harry."},
{"role": "user", "content": "Say hello!"},
]
)
chat = pending.run()
print(chat.all)
# [
# Message(role='system', parts=[], content='You are a wizard harry.'),
# Message(role='user', parts=[], content='Say hello!'),
# Message(role='assistant', parts=[], content='Hello! How can I help you today?'),
# ]
```

1. [`generator.chat`][rigging.generator.Generator.chat] is actually just a helper for
[`chat(generator, ...)`][rigging.generator.chat], they do the same thing.

??? note "PendingChat vs Chat"

You'll notice we name the result of `chat()` as `pending`. The naming might be confusing,
but chats go through 2 phases. We first stage them into a pending state, where we operate
and prepare them in a "pipeline" of sorts before we actually trigger generation with `run()`.

Calling `.chat()` doesn't trigger any generation, but calling any of these run methods will:

- [rigging.chat.PendingChat.run][]
- [rigging.chat.PendingChat.run_many][]
- [rigging.chat.PendingChat.run_batch][]

In this case, we have nothing additional we want to add to our pending chat, and we are only interested
in generating exactly one response message. We simply call [`.run()`][rigging.chat.PendingChat.chat] to
execute the generation process and collect our final [`Chat`][rigging.chat.Chat] object.

```py hl_lines="10-11"
import rigging as rg

generator = rg.get_generator("claude-3-sonnet-20240229")
pending = generator.chat(
[
{"role": "system", "content": "You are a wizard harry."},
{"role": "user", "content": "Say hello!"},
]
)
chat = pending.run()
print(chat.all)
# [
# Message(role='system', parts=[], content='You are a wizard harry.'),
# Message(role='user', parts=[], content='Say hello!'),
# Message(role='assistant', parts=[], content='Hello! How can I help you today?'),
# ]
```

View more about Chat objects and their properties [over here.][rigging.chat.Chat]. In general, chats
give you access to exactly what messages were passed into a model, and what came out the other side.

## Conversation

Both `PendingChat` and `Chat` objects provide freedom for forking off the current state of messages, or
continuing a stream of messages after generation has occured. In general:

- [`PendingChat.fork`][rigging.chat.PendingChat.fork] will clone the current pending chat and let you maintain
both the new and original object for continued processing.
- [`Chat.fork`][rigging.chat.Chat.fork] will produce a fresh `PendingChat` from all the messages prior to the
previous generation (useful for "going back" in time).
- [`Chat.continue_`][rigging.chat.Chat.continue_] is similar to `fork` (actually a wrapper) which tells `fork` to
include the generated messages as you move on (useful for "going forward" in time).

```py
import rigging as rg

generator = rg.get_generator("gpt-3.5-turbo")
chat = generator.chat([
{"role": "user", "content": "Hello, how are you?"},
])

# We can fork before generation has occured
specific = chat.fork("Be specific please.").run()
poetic = chat.fork("Be as poetic as possible").overload(temperature=1.5).run() # (1)!

# We can also continue after generation
next_chat = poetic.continue_(
{"role": "user", "content": "That's good, tell me a joke"}
)

update = next_chat.run()
```

1. In this case the temperature change will only be applied to the poetic path because `fork` has
created a clone of our pending chat.

## Basic Parsing

Now let's assume we want to ask the model for a piece of information, and we want to make sure
this item conforms to a pre-defined structure. Underneath rigging uses [Pydantic XML](https://pydantic-xml.readthedocs.io/)
which itself is built on [Pydantic](https://docs.pydantic.dev/). We'll cover more about
constructing models in a [later section](../topics/models.md), but don't stress the details for now.

??? note "XML vs JSON"

Rigging is opinionated with regard to using XML to weave unstructured data with structured contents
as the underlying LLM generates text responses. A frequent solution to getting "predictable"
outputs from LLMs has been forcing JSON conformant outputs, but we think this is
poor form in the long run. You can read more about this from [Anthropic](https://docs.anthropic.com/claude/docs/use-xml-tags)
who have done extensive research with their models.

We'll skip the long rant, but trust us that XML is a very useful syntax which beats
JSON any day of the week for typical use cases.

To begin, let's define a `FunFact` model which we'll have the LLM fill in. Rigging exposes a
[`Model`][rigging.model.Model] base class which you should inherit from when defining structured
inputs. This is a lightweight wrapper around pydantic-xml's [`BaseXMLModel`](`https://pydantic-xml.readthedocs.io/en/latest/pages/api.html#pydantic_xml.BaseXmlModel`)
with some added features and functionality to make it easy for Rigging to manage. However, everything
these models support (for the most part) is also supported in Rigging.

```py hl_lines="3-4"
import rigging as rg

class FunFact(rg.Model):
fact: str # (1)!

chat = rg.get_generator('gpt-3.5-turbo').chat(
f"Provide a fun fact between {FunFact.xml_example()} tags."
).run()

fun_fact = chat.last.parse(FunFact)

print(fun_fact.fact)
# The Eiffel Tower can be 15 cm taller during the summer due to the expansion of the iron in the heat.
```

1. This is what pydantic XML refers to as a "primitive" class as it is simply and single
typed value placed between the tags. See more about primitive types, elements, and attributes in the
[Pydantic XML Docs](https://pydantic-xml.readthedocs.io/en/latest/pages/quickstart.html#primitives)

We need to show the target LLM how to format it's response, so we'll use the
[`.xml_example()`][rigging.model.Model.xml_example] class method which all models
support. By default this will simple emit empty XML tags of our model:

```xml
Provide a fun fact between <fun-fact></fun-fact> tags.
```

??? note "Customizing Model Tags"

Tags for a model are auto-generated based on the name of the class. You are free
to override these by passing `tag=[value]` into your class definition like this:

```py
class LongNameForThing(rg.Model, tag="short"):
...
```

We wrap up the generation and extract our parsed object by calling [`.parse()`][rigging.message.Message.parse]
on the [last message][rigging.chat.Chat.last] of our generated chat. This will process the contents
of the message, extract the first matching model which parses successfully, and return it to us as a python
object.

```py hl_lines="10"
import rigging as rg

class FunFact(rg.Model):
fact: str

chat = rg.get_generator('gpt-3.5-turbo').chat(
f"Provide a fun fact between {FunFact.xml_example()} tags."
).run()

fun_fact = chat.last.parse(FunFact)

print(fun_fact.fact) # (1)!
# The Eiffel Tower can be 15 cm taller during the summer due to the expansion of the iron in the heat.
```

1. Because we've defined `FunFact` as a class, the result if `.parse()` is typed to that object. In our
code, all the properties of fact will be available just like we created the object directly.

Notice that we don't have to worry about the model being verbose in it's response, as we've communicated
that the text between the `#!xml <fun-fact></fun-fact>` tags is the relevent place to put it's answer.

## Strict Parsing

In the example above, we don't handle the case where the model fails to properly conform to our
desired output structure. If the last message content is invalid in some way, our call to `parse`
will result in an exception from rigging. Rigging is designed at it's core to manage this process,
and we have a few options:

1. We can make the parsing optional by switching to [`.try_parse()`][rigging.message.Message.try_parse]. The type
of the return value with automatically switch to `#!python FunFact | None` and you can handle cases
where parsing failed.
2. We can extend our pending chat with [`.until_parsed_as()`][rigging.chat.PendingChat] which will cause the
`run()` function to internally check if parsing is succeeding before returning the chat back to you.

=== "Option 1 - Trying"

```py hl_lines="5"
chat = rg.get_generator('gpt-3.5-turbo').chat(
f"Provide a fun fact between {FunFact.xml_example()} tags."
).run()

fun_fact = chat.last.try_parse(FunFact) # fun_fact might now be None

print(fun_fact or "Failed to get fact")
```

=== "Option 2 - Until"

```py hl_lines="3"
chat = rg.get_generator('gpt-3.5-turbo').chat(
f"Provide a fun fact between {FunFact.xml_example()} tags."
).until_parsed_as(FunFact).run()

fun_fact = chat.last.parse(FunFact) # This call should never fail

print(fun_fact or "Failed to get fact")
```

A couple of comments regarding this structure:

1. We still have to call `parse` on the message despite use using `until_parsed_as`. This is
a limitation of type hinting as we'd have to turn every `PendingChat` and `Chat` into a generic
which could carry types forward. It's a small price for big code complexity savings.
2. Internally, the generation code inside `PendingChat` will attempt to re-generate until
the LLM correctly produces a parsable input, up until a maximum number of "rounds" is reached.
This process is configurable with the arguments to all [`until`][rigging.chat.PendingChat.until_parsed_as]
or [`using`][rigging.chat.PendingChat.using] functions.

## Parsing Many Models

Assuming we wanted to extend our example to produce a set of interesting facts, we have a couple of options:

1. Simply use [`run_many()`][rigging.chat.PendingChat.run_many] and generate N examples individually
2. Rework our code slightly and let the model provide us multiple facts at once.

=== "Option 1 - Multiple Generations"

```py
chats = rg.get_generator('gpt-3.5-turbo').chat(
f"Provide a fun fact between {FunFact.xml_example()} tags."
).run_many(3)

for chat in chats:
print(chat.last.parse(FunFact).fact)
```

=== "Option 2 - Inline Set"

```py
chat = rg.get_generator('gpt-3.5-turbo').chat(
f"Provide a 3 fun facts each between {FunFact.xml_example()} tags."
).run()

for fun_fact in chat.last.parse_set(FunFact):
print(fun_fact.fact)
```
39 changes: 39 additions & 0 deletions docs/home/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Rigging

Rigging is a lightweight LLM interaction framework built on Pydantic XML. The goal is to make leveraging LLMs in production pipelines as simple and effictive as possible. Here are the highlights:

- **Structured Pydantic models** can be used interchangably with unstructured text output.
- LiteLLM as the default generator giving you **instant access to a huge array of models**.
- Add easy **tool calling** abilities to models which don't natively support it.
- Store different models and configs as **simple connection strings** just like databases.
- Chat templating, forking, continuations, generation parameter overloads, stripping segments, etc.
- Modern python with type hints, async support, pydantic validation, serialization, etc.

```py
import rigging as rg
from rigging.model import CommaDelimitedAnswer as Answer

answer = rg.get_generator('gpt-4') \
.chat(f"Give me 3 famous authors between {Answer.xml_tags()} tags.") \
.until_parsed_as(Answer) \
.run()

answer = chat.last.parse(Answer)
print(answer.items)

# ['J. R. R. Tolkien', 'Stephen King', 'George Orwell']
```

Rigging is built and maintained by [dreadnode](https://dreadnode.io) where we use it daily for our work.

## Installation
We publish every version to Pypi:
```bash
pip install rigging
```

If you want to build from source:
```bash
cd rigging/
poetry install
```
23 changes: 23 additions & 0 deletions docs/home/principles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Principles

LLMs are extremely capable machine learning systems, but they operate purely in textual spaces as a byproduct of
their training data. We have access to the compression of a huge repository of human knowledge, but are limited to quering
that information via natural language. Our first inclination is to let these language interfaces drive
our design decisions. We build chat bots and text search, and when it comes time to align them with closely
with the rest of our fixed software stack, we quickly get frustrated by their inconsistencies and limited
control over their products.

In software we operate on the principle of known interfaces as the basis for composability. In the functional paradigm, we want our
software functions to operate like mathmatical ones, where the same input always produces the same output with no side effects.
Funny enough LLMs (like all models) also operate in that way (minus things like floating point errors), but we intentionally
inject randomness to our sampling process to give them the freedom to explore and produce novel outputs. Therefore we shouldn't
aim for "purity" in the strict sense, but we should aim for consistency in their interface.

Once you start to think of a "prompt", "completion", or "chat interaction" as being the temporary textual interface by which we pass in
structured inputs and produce structured outputs, we can begin to link them with traditional software. Many libraries get close to this
idea, but they rarely hold the opinion that programing types and structures, and not text, are the best way to make LLM-based
systems composible.

Reframing these language models as tools which use tokens of text in context windows to navigate latest space and produce
probabilities of output tokens, but do not need to have the data they consume or produce be holistically constrained to
textual spaces in our use of them is a core opinion of Rigging.
Loading

0 comments on commit 1c08d90

Please sign in to comment.