CogAGENT: A Multimodal, Knowledgeable and Controllable Toolkit for Building Conversational Agents
Demo system and more information is available at https://synehe.github.io/CogAGENT/
A short illustration video is at https://youtu.be/SE0SEeiAmXI
CogAGENT is a toolkit for building multimodal, knowledgeable and controllable conversational agents. We provide 17 models and integrate a variety of datasets covered above features. We decouple and modularize them flexibly to make users more convenient for development and research.
This package has the following advantages:
- A multimodal, knowledgeable and controllable conversational framework. We propose a unified framework named CogAGENT, incorporating Multimodal Module, Knowledgeable Module and Controllable Module to conduct multimodal interaction, generate knowledgeable response and make replies under control in real scenarios.
- Comprehensive conversational models, datasets and metrics. CogAGENT implements 17 conversational models covering task-oriented dialogue, open-domain dialogue and question-answering tasks. We also integrate some widely used conversational datasets and metrics to verify the performance of models.
- Open-source and modularized conversational toolkit. We release CogAGENT as an open-source toolkit and modularize conversational agents to provide easy-to-use interfaces. Hence, users can modify codes for their own customized models or datasets.
- Online dialogue system. We release an online system, which supports conversational agents to interact with users. We also provide a video to illustrate how to use it.
# clone CogAGENT
git git@github.com:CogNLP/CogAGENT.git
# install CogAGENT
cd cogagent
pip install -e .
pip install -r requirements.txt
from cogagent import *
import torch
import torch.nn as nn
import torch.optim as optim
# init the logger,device and experiment result saving dir
device, output_path = init_cogagent(
device_id=8,
output_path=datapath,
folder_tag="run_diffks_on_wow",
)
# choose utterance reader
reader = WoWReader(raw_data_path=raw_data_path)
train_data, dev_data, test_data = reader.read_all()
vocab = reader.read_vocab()
# choose data processor
# In the training phase, no retriever is selected as the knowledge is provided by dataset
processor = WoWForDiffksProcessor(max_token_len=512, vocab=vocab, debug=False)
train_dataset = processor.process_train(train_data)
dev_dataset = processor.process_dev(dev_data)
test_dataset = processor.process_test(test_data)
# choose response generator
model = DiffKSModel()
metric = BaseKGCMetric(default_metric_name="bleu-4",vocab=vocab)
loss = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)
# Use the provided Trainer class to start the model training process
trainer = Trainer(model,train_dataset,dev_data=test_dataset,n_epochs=40,batch_size=2,
loss=loss,optimizer=optimizer,scheduler=None,metrics=metric,
drop_last=False,gradient_accumulation_steps=1,num_workers=5,
validate_steps=2000,save_by_metric="bleu-4",save_steps=None,
output_path=output_path,grad_norm=1,
use_tqdm=True,device=device,
fp16_opt_level='O1',
)
trainer.train()
Modal | Category | Reference |
---|---|---|
SUMBT | Fundamental | SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking |
SC-LSTM | Fundamental | Semantically conditioned lstm-based natural language generation for spoken dialogue systems |
BERTNLU | Fundamental | ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems |
MDRG | Fundamental | Towards end-to-end multi-domain dialogue modelling |
UBAR | Fundamental | owards fully end-to-end task-oriented dialog system with gpt- |
GPT2 for Chinese chitchat | Fundamental | Chinese chitchat |
TransResNet-Ret | Multimodal | Image-Chat: Engaging Grounded Conversations |
MMBERT | Multimodal | Selecting Stickers in Open-Domain Dialogue through Multitask Learning |
MAE | Multimodal | MMCoQA: Conversational Question Answering over Text, Tables, and Images |
PICa | Multimodal | An empirical study of gpt-3 for few-shot knowledge-based vqa |
LingUNet | Multimodal | Where Are You? Localization from Embodied Dialog |
DifffKS | Knowledgeable | Difference-aware Knowledge Selection for Knowledge-grounded Conversation Generation |
KE-Blender | Knowledgeable | Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation |
NPH | Knowledgeable | Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding |
BERTQA | Knowledgeable | Dense Passage Retrieval for Open-Domain Question Answering |
KEMP | Controllable | OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs |
RobertaClassifier | Controllable | On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark |
Dataset | Category | Reference |
---|---|---|
MultiWOZ 2.0 | Fundamental | MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling |
MultiWOZ 2.1 | Fundamental | MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines |
Chinese chitchat Dataset | Fundamental | Chinese chitchat |
MOD | Multimodal | DSTC10-Track1 |
MMConvQA | Multimodal | MMCoQA: Conversational Question Answering over Text, Tables, and Images |
OK-VQA | Multimodal | Ok-vqa: A visual question answering benchmark requiring external knowledge |
VQAv2 | Multimodal | Making the v in vqa matter: Elevating the role of image understanding in visual question answering |
WAY | Multimodal | Where Are You? Localization from Embodied Dialog |
Wizard of Wikipedia | Knowledgeable | Wizard of Wikipedia: Knowledge-Powered Conversational Agents |
Holl-E | Knowledgeable | Towards Exploiting Background Knowledge for Building Conversation Systems |
OpenDialKG | Knowledgeable | OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs |
DIASAFETY | Controllable | On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark |
EmpatheticDialogues | Controllable | Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset |