Implementation of Neural Network (NN) model in the field of Knowledge Based Question Answering (KBQA) with Pre-trained Language Model: supports BERT/RoBERTa/ALBERT。
-
Build Knowledge System (KB)
- Organize knowledge system based on business characteristics.
- Extract triples from unstructured input text
(Subject, Predicate, Object)
and store them in a specific way (usually a graph database).- For example: "Chow SingChi's movie Kung Fu Hustle was released in 2004", including two pairs of triples
(Chow SingChi, filmed, Kung Fu Hustle)
,(Kung Fu Hustle, release time, 2004)
.
- For example: "Chow SingChi's movie Kung Fu Hustle was released in 2004", including two pairs of triples
-
Standard Question Answering (QA)
- Relational entity extraction
- Extracting
(Subject, Predicate)
from query statements. - For example: "In which year is Kung Fu Hustle released" includes
(Kung Fu Hustle, release time)
.
- Extracting
- Entity disambiguation
- Solving the problem of ambiguity caused by entities with the same name.
- For example: Chow SingChi and Xing Ye should correspond to the same entity.
- Relationship linking
- Linking the extracted entities and relationships to ensure that the linked entity relationship is valid in the knowledge system.
- For example, in the Douban Movie Review scene, asking "What is the name of Chow SingChi's mother", the obtained
(Chow SingChi, mother)
is illegal because the relationship is not established in the knowledge system.
- Response generating
- Retrieve legal relational entity pairs in the knowledge system and generate output results.
- Relational entity extraction
This project mainly focuses on the relational entity extraction part of the standard question and answer (QA) task. Regular KBQA Query contains the following categories:
- One-hop derivation
- For example: "In which year is Kung Fu Hustle released," including
(Kung Fu Hustle, release time)
.
- For example: "In which year is Kung Fu Hustle released," including
- Comparison of derivation results
- For example: "Is Kung Fu Hustle release earlier than All for the Winner", which includes
((Kung Fu Hustle, release time) ~ (All for the Winner, release time))
, you need to retrieve all results for comparison.
- For example: "Is Kung Fu Hustle release earlier than All for the Winner", which includes
- Nested derivation
- For example: "What is the age of Chow SingChi's mother", includes
((Chow SingChi, mother), age)
, which requires nested query.
- For example: "What is the age of Chow SingChi's mother", includes
- Comparison of nested derivation results
- For example: "Is Chow SingChi's mother older than Ng Mang Tat", including
(((Chow SingChi, mother), age) ~ (Ng Mang Tat, age))
.
- For example: "Is Chow SingChi's mother older than Ng Mang Tat", including
The project only deals with the first case, which is most commonly used. Relationships are obtained through classification with global semantics, at the same time, the entities are obtained through sequence labeling.
For the latter three cases, first obtain the entities through sequence labeling, and then use global semantics and local semantics of entities (prior information) to obtain multiple relationships. At present, the difficulties lie in how to accurately link multi-relationships with multi-entities and how to handle the amplification of loss of multi-tasking. It's hard to deal with using neural network models alone.
Methods of One-hop derivation
- Pipeline method: The Relationship classification and entity extraction tasks are divided into two tasks to calculate the loss separately without affecting each other.
- Easy to train as both of them are regular NLP tasks: classification and sequence labeling.
- Slow in inference, the input needs to be fed into both models at inference phase.
- Illegal results are prone to occur due to the model prediction error, which need to perform relationship linking task.
- For example:
(Chow SingChi, release time)
- For example:
- Joint method: The relationship classification and entity extraction tasks interact with each other, using the same embedding layer to obtain semantic encoding, and loss is calculated in multi-tasking way.
- Hard to train, both loss and gradient descent speed of the two tasks are not on the same magnitude;
- Fast in inference, the input is fed into a single model at inference phase.
- Illegal results rarely occur due to the same semantic encoding for both tasks, which avoid the relationship linking task.
Joint method is used in this project.
- Train difficult sequence labeling task first and freeze the downstream weights of classification task until the validation set accuracy reaches a default threshold.
- MultiLoss mentioned in Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics is modified to calculate the multi-loss of classification task and sequence labeling task. The optimizer is inherit from the previous stage to train both tasks at the same time.
keras_bert_kbqa
├── helper.py # help file for training paramters
├── __init__.py
├── predict.py # load the trained model
├── train.py # train and save model
└── utils
├── bert.py # keras implementation of bert model
├── callbacks.py # EarlyStopping,ReduceLROnPlateau,TaskSwitch
├── decoder.py # Viterbi decoder of sequence labeling task
├── graph_builder.py # neo4j graph database processing function, not used
├── __init__.py
├── metrics.py # Crf_accuracy and Crf_loss of sequence labeling task, support mask
├── models.py # support textcnn and dense for classification task, idcnn-crf and bilstm-crf for sequence labeling task
├── processor.py # Standardized training/validation dataset
└── tokenizer.py # tokenizer of bert model
- flask == 1.1.1
- keras == 2.3.1
- numpy == 1.18.1
- loguru == 0.4.1
- requests == 2.22.0
- termcolor == 1.1.0
- tensorflow == 1.15.2
- keras_contrib == 2.0.8
- This project mainly focuses on the relational entity extraction part of the standard question and answer (QA) task. For the rest, it is implemented in a relatively crude way (not using a graph database).
- This case mimics the online case in engineering practice, which exist problems of low amount of query data and difference between the data generated by
template + filling
method and actural data. - Test results show that in the case of low data volume: * Generalization error is large, the use of neural network models alone does not work well. It should be used in combination with regular expressions and rules. * Models are difficult to train.
examples
├── data
│ ├── build_upload.py # generate training/validation data from raw data
│ ├── data
│ │ ├── database.txt # database generated from raw data for query result retrieval (not using graph database)
│ │ ├── dev_data.txt # validation data
│ │ ├── prior_check.txt # double check, correcting the errors of entities obtained by nn model
│ │ └── train_data.txt # training data
│ ├── origin_data
│ │ └── douban_movies.txt # raw data
│ └── templates
│ ├── neo4j_config.txt # configs of graph database, not used
│ ├── text_templates.txt # templates for generating training/validation data
│ └── utter_search.txt # query result retrieval instructions(crude impletementation, not using graph database)
├── deploy # deploy a trained model for use
│ ├── run_deploy.py
│ └── run_deploy.sh
├── models # model save path
│ ├── ALBERT-IDCNN-CRF.h5
│ ├── id_to_label.pkl
│ ├── id_to_tag.pkl
│ ├── label_to_id.pkl
│ ├── model_configs.json
│ └── tag_to_id.pkl
└── train # train a new model
├── run_train.py
├── run_train.sh
└── train_config.json # train configs
The form of training/validation data is [text information, category information, sequence labeling information]
, shown as follows:
[
[
"骗中骗的评分高吗",
"豆瓣评分",
"B-title I-title I-title O O O O O"
],
[
"安东尼娅家族啥时候上映的呀",
"电影上映时间是什么",
"B-title I-title I-title I-title I-title I-title O O O O O O O"
],
...
]
This part is located in examples/train/train_config.json
:
- The sentence length parameter
max_len
should be adapted to the length of the training/validation text. Excessively long sentence length will occupy a large amount of video memory and have a large impact on the convergence of the sequence labeling task. - ALBERT model is easier to train than BERT model in low data volume scene, and the performance has no significant difference compared with BERT model.
all_train_threshold
indicates that when the validation accuracy of the sequence labeling task reaches this value, both the classification task and the sequence labeling task are trained:- If it is too small, the sequence labeling task cannot converge, and the classification task is prone to over-fitting.
- If it is too large, the classification task is prone to under-fitting.
- The recommended value is between 0.9 and 0.98.
clf_type
can betextcnn
anddense
:- When it is
textcnn
, the rest parameters aredense_units
,dropout_rate
,filters
andkernel_size
. - When it is
dense
, the rest parameter isdense_units
.
- When it is
ner_type
can beidcnn
andbilstm
:- When it is
idcnn
, the rest parameters arefilters
,kernel_size
andblocks
. - When it is
bilstm
, the rest parameters areunits
,num_hidden_layers
anddropout_rate
.
- When it is
python examples/data/build_upload.py # generate all files in examples/data/data
bash examples/train/run_train.sh # train a new model
bash examples/deploy/run_deploy.sh # deploy a trained model for use
Send a request to API:
import requests
r = requests.post(
"http://your_ip:your_port/query",
json={
"text": "大话西游之大圣娶亲是最近刚上的电影吗"})
print(r.text)
Returns:
{
"text": "大话西游之大圣娶亲是最近刚上的",
"predicate": "电影上映时间是什么",
"subject": [
{
"title": "大话西游之大圣娶亲"
}
],
"response": "2014"
}
- Optimize model structure to make it easier to train.
- Try to handle more complex KBQA scenarios.
- Improve some details.
- Migrate to tensorflow 2.0.
- Add other BERTs models, like Distill_Bert, Tiny_Bert.
BERT
ALBERT
- Google_albert_base
- Google_albert_large
- Google_albert_xlarge
- Google_albert_xxlarge
- Xuliang_albert_xlarge
- Xuliang_albert_large
- Xuliang_albert_base
- Xuliang_albert_base_ext
- Xuliang_albert_small
- Xuliang_albert_tiny
Roberta
Thanks for all these wonderful works!