Skip to content

Commit

Permalink
Added Instructor concept img to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ddebowczyk committed Mar 16, 2024
1 parent 7ceaffe commit 4371645
Show file tree
Hide file tree
Showing 32 changed files with 629 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ experimental/
.env
vendor
php_errors.log
NOTES.md
composer.lock
Binary file added docs/img/concept.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Instructor is a library that allows you to extract structured, validated data fr
Instructor for PHP is inspired by the [Instructor](https://jxnl.github.io/instructor/) library for Python created by [Jason Liu](https://twitter.com/jxnlco).


![image](./img/concept.png)


## Instructor in Other Languages
Expand Down
26 changes: 26 additions & 0 deletions notes/NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# NOTES


## Public vs private/protected fields

Document and write tests around the behavior of public vs private/protected fields.


## Research

- Queue-based load leveling
- Throttling
- Circuit breaker
- Producer-consumer / queue-worker
- Rate limiting
- Retry on service failure
- Backpressure
- Batch stage chain
- Request aggregator
- Rolling poller window
- Sparse task scheduler
- Marker and sweeper
- Actor model



3 changes: 3 additions & 0 deletions notes/api/00_API.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# API design

> This is an exploration of the ideas for Instructor API. Those examples are not final, most are not working and the concepts are subject to change.
42 changes: 42 additions & 0 deletions notes/api/async.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Async

## No streaming

```php
$instructor = new Instructor();
$async = $instructor->request(
messages: "Jason is 35 years old",
responseModel: Task::class,
onDone: function (Task $task) {
// Completed model
$this->saveTask($task);
},
onError: function (Exception $e) {
// Handle error
},
)->async();
// continue execution
```

## With streaming / partials

```php
$instructor = new Instructor();
$async = $instructor->->request(
messages: "Jason is 35 years old",
responseModel: Task::class,
onEachUpdate: function (Task $task) {
// Partially updated model
$this->updateTask($task);
},
onDone: function (Task $task) {
// Completed model
$this->saveTask($task);
},
onError: function (Exception $e) {
// Handle error
},
)->async();
// continue execution
```

86 changes: 86 additions & 0 deletions notes/api/inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Inference

Get the task.

```php
$instructor = new Instructor();
$task = $instructor->respond(
messages: "Jason is 35 years old",
responseModel: Task::class,
);

$this->updateView($task);
```
or

```php
$instructor = new Instructor();
$task = $instructor->request(
messages: "Jason is 35 years old",
responseModel: Task::class,
)->get();

$this->updateView($task);
```
or

```php
$instructor = new Instructor();
$task = $instructor->withRequest(new Request(
messages: "Jason is 35 years old",
responseModel: Task::class,
partials: true
))->get();
```

Get partial updates of task.

```php
$instructor = new Instructor();
$stream = $instructor->request(
messages: "Jason is 35 years old",
responseModel: Task::class,
)->stream();

foreach($stream->partial as $taskUpdate) {
// Partially updated model
$this->updateView($taskUpdate);
// Complete model is null until done
// $stream->complete == null
}
// Only now $stream->complete is set & validated
if($stream->complete) {
$task = $stream->complete;
}
```

Get the list of tasks, one by one.

```php
$instructor = new Instructor();
$stream = $instructor->request(
messages: "Jason is 35 years old",
responseModel: Sequence::of(Task::class),
)->get();

foreach($stream as $taskUpdate) {
// Partially updated model
$this->updateView($taskUpdate);
}
```

Get the list of tasks, one by one, with partial updates.

```php
$instructor = new Instructor();
$stream = $instructor->request(
messages: "Jason is 35 years old",
responseModel: Sequence::of(Task::class),
partials: true
)->stream();

foreach($stream as $taskUpdate) {
// Partially updated model
$this->updateView($taskUpdate);
}
```
44 changes: 44 additions & 0 deletions notes/api/iterables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Iterable results


## Separate endpoint which returns Iterable

Client iterates over it and receives partial updates until iterator is exhausted.
If the model implements iterable, it can be used to return partial updates.

```php
$instructor = new Instructor();
$taskUpdates = $instructor->respond(
messages: "Notify Jason about the upcoming meeting on Thursday at 10:00 AM",
responseModel: Task::class,
stream: true
);
foreach($taskUpdates as $partial) {
// Partially updated model
$this->updateView($partial);
}
// do something with task
TaskStore::save($partial);
```



## Separate, optional callback parameter

Client receives partially updated model via callback, while `response()` will still return complete answer when done.

```php
$instructor = new Instructor();
$task = $instructor->respond(
messages: "Jason is 35 years old",
responseModel: Task::class,
onEachUpdate: function (Task $partial) {
// Partially updated model
$this->updateView($partial);
},
stream: true
);
// do something with task
TaskStore::save($task);
```

3 changes: 3 additions & 0 deletions notes/done/00_DONE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# DONE

> Those are ideas that have been implemented or problems that have been solved.
31 changes: 31 additions & 0 deletions notes/done/custom-schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## Custom schema generation - not based on class reflection & PHPDoc

### Problem and ideas

Model classes could implement HasSchemaProvider interface, which would allow for custom schema generation - rendering logic would skip reflection and use the provided schema instead.

SchemaProvider could be a trait, which would allow for easy implementation.

Example SchemaProvider:
class SchemaProvider {
public function schema(): Schema {
return new Schema([
'type' => 'object',
'properties' => [
'id' => ['type' => 'integer', 'description' => 'Description'],
'name' => ['type' => 'string', 'description' => 'Description'],
],
'required' => ['id', 'name'],
]);
}
}

### Solution

If model implements CanProvideSchema interface it can fully customize schema generation.

It usually requires to also implement custom deserialization logic via CanDeserializeJson interface, so you can control how LLM response JSON is turned into data (and fed into model fields).

You may also need to implement CanTransformResponse to control what you ultimately send back to the caller (e.g. you can return completely different data than the input model).

This is used for the implementation of Scalar class, which is a universal adapter for scalar values.
13 changes: 13 additions & 0 deletions notes/done/custom-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Validation


### Problem and ideas

What about validation in such case? we can already have ```validate()``` method in the schema,
Is it enough?


## Solution

Validation can be also customized by implementing CanSelfValidate interface. It allows you to fully control how the data is validated. At the moment it skips built in Symfony Validator logic, so you have to deal with Symfony validation constraints manually.

26 changes: 26 additions & 0 deletions notes/done/observability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Observability


## Problem and ideas

> Priority: must have
Requirements and solution - to be analyzed

- How to track regular vs streamed responses? Streamed responses are unreadable / meaningless individually. Higher abstraction layer is needed to handle them - eg. "folder" with individual chunks of data. Completion ID allows to track incoming chunks under a single context.
- Completion, if streamed, needs extra info on whether it has been completed or disrupted for any reason.


## Solution

You can:
- wiretap() to get stream of all internal events
- connect to specific events via onEvent()

This allows you plug in your preferred logging / monitoring system.

- Performance - timestamps are available on events, which allows you to record performance of either full flow or individual steps.
- Errors - can be done via onError()
- Validation errors - can be done via onEvent()
- Generated data models - can be done via onEvent()

83 changes: 83 additions & 0 deletions notes/done/partial-updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
## Partial updates

> Priority: should have
If callback is on, we should be able to provide partial updates to the object + send
notifications about the changes.

To achieve this I need a way to generate a skeleton JSON, send it back to the client and then send changes or new versions of the whole object back to the client.

Question: How to make partial updates and streaming / iterables compatible?

### Using events

Library currently dispatches events on every chunk received from LLM in streaming mode and on every partial update of the response model.

Questions:
1. How does the client receive partially updated data model? What's the API? Do we want separate endpoint for regular `response()` method vs partial / streamed one?
2. How do we distinguish between partial updates and collection streaming (getting a stream of instances of the same model)?
3. Can the streamed collections models be partially updated?
4. Is there a need for a separate event on property completed, not just updated?


### IDEA: Denormalization of model structure

It may make sense to denormalize the model - instead of nested structure, split it into a series of individual objects with references. Then generate them in a sequence individually (while providing object context). To be tested if this would result in better or worse inference quality, which is ultimately the most important thing.

Splitting into objects would also allow for partial updates.

Further - splitting objects to properties and generating them individually would make streaming partial updates easier.

To be tested: maybe it could work for less capable models with no function calling.

##### Model now

Conceptually, the model is a tree of objects, which is generated in a single pass.

```
Issues[] {
Issue {
title: string
description: string
type: IssueType {
value: [technical, commercial, collaboration, other]
}
related_quotes: Quote[] {
Quote {
text: string
source: string
date: ?date
}
}
}
}
```

##### Flattened model

The alternative is treating the model as a series of items - each item is a property of an object, following prescribed structure.

```
issues.issue[0].title
issues.issue[0].description
issues.issue[0].type
issues.issue[0].related_quotes
issues.issue[0].related_quotes.quote[0].text
issues.issue[0].related_quotes.quote[0].source
issues.issue[0].related_quotes.quote[0].date
issues.issue[0].related_quotes.quote[1].text
issues.issue[0].related_quotes.quote[1].source
issues.issue[0].related_quotes.quote[1].date
...
issues.issue[1].title
issues.issue[1].description
issues.issue[1].type
issues.issue[1].related_quotes
issues.issue[1].related_quotes.quote[2].text
issues.issue[1].related_quotes.quote[2].source
issues.issue[1].related_quotes.quote[2].date
issues.issue[1].related_quotes.quote[3].text
issues.issue[1].related_quotes.quote[3].source
issues.issue[1].related_quotes.quote[3].date
...
```
Loading

0 comments on commit 4371645

Please sign in to comment.