Implementation of the initial ArgRank and DebateGPT prototypes, used in the experiments conducted during AI Safety Camp 8.
This repo has been made public almost a year from the initial commit, time during which its attentional hazardousness has decreased with the launch of ChatGPT. For up-to-date information on follow-up work, please refer to the homepage of the broader research agenda.
from debategpt.inference.core import Debate, distance
d = Debate()
# Advance the debate two full rounds (i.e. each party has one contribution).
d.play(2)
# Advance the debate two steps (i.e. two contributions in total). With two parties, this is equivalent to a full round.
d.step(2)
# Render human-readable debate transcript.
print(d.transcript())
# Introduces propositions in the debate which are not "owned" by any party. These can be seen as observations about the world.
d.establish("The Earth is round.")
# The following splits the debate into parallel branches. Forking can be repeated and interweaved with establishing facts, advancing the debate, etc.
d.fork(4)
# Make specific selections of parts of the debate(s).
sel1 = d.branch(1).party(0).round(0, 2)
sel2 = d.party([0, 1]).round(1)
sel3 = d.round(2).branch([0, 1]).party(0)
# First selector narrows in on two utterances.
assert len(sel1.flattened_props()) == 1 * 1 * 2
# When selector is not specified (e.g. branch here), all elements are considered.
assert len(sel2.flattened_props()) == 4 * 1 * 2
# Selector order doesn't really matter.
assert len(sel3.flattened_props()) == 2 * 1 * 1
# Selectors can then be plugged into the distance function, which averages distances between (ordered) pairs of propositions.
dist1 = distance(sel1, sel2)
dist2 = distance(sel2, sel3)
dist3 = distance(sel1, sel3)
# Extract the argument graph associated with a selector. This can then be used with tools from the `networkx` package.
G = sel1.graph
debategpt.training.orchestrator
manages high-level transcript generation, populates the experience store, and handles weight updates.debategpt.training.reward
implements ArgRank and helps evaluate generated transcripts.debategpt.training.trainer
gets the two above elements to work in concert.scripts/train.py
minimally wraps the trainer with default settings.