Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models (ACM CIKM 2024 Research Track Paper)
The /data/
folder in GenFEND_release_ch
and GenFEND_release_en
is where the data used for training and testing is stored.
For GenFEND_release_ch
, we experiment on Weibo21.
The folder /data/Weibo21/
contains the data with real comments, and the folder /data/Weibo21/Dmeta-embedding-comments-feature/
contains the extracted feature of real comments.
the folder /data/role_virtual_comments/
contains the Weibo21 data with generated comments and the corresponding extracted comment feature.
For GenFEND_release_en
, we experiment on LLM-mis and GossipCop.
The folder /data/LLM-mis/
contains the LLM-mis data with generated comments, and the folder /data/LLM-mis/bge-large-en-v1.5/
contains the extracted feature of generated comments.
The folder /data/GossipCop/
contains the GossipCop data with real comments, the folder data/GossipCop/bge-large-en-v1.5/
contains the extracted feature of real comments, and the folder /data/role_virtual_comments/
contains the GossipCop data with generated comments and the corresponding extracted comment feature.
Note that we only list some example instances in train.json
, val.json
, and test.json
.
You should prepare the whole dataset in the same format as example instances, and follow STEP I in the How To Run section to generate the complete dataset.
Download model files from Dmeta-embedding and put them in the folder GenFEND_release_ch/pretrained_model/Dmeta-embedding/
.
Download model files from bge-large-en-v1.5 and put them in the folder GenFEND_release_en/pretrained_model/bge-large-en-v1.5/
.
STEP I: Comment Encoding
Go to the folder GenFEND_release_ch/data/
or GenFEND_release_en/data/
and run the following command:
python cmts_fea_ext.py
python add_file_index.py
STEP II: Training and Testing
To experiment on the Weibo21 dataset, go to the folder GenFEND_release_ch
and run the following command:
python main.py --model_name bert_genfend
or
python main.py --model_name defend_genfend
To experiment on the GossipCop dataset, go to the folder GenFEND_release_en
and run the following command:
python main.py --root_path './data/GossipCop/' --model_name bert_genfend
or
python main.py --root_path './data/GossipCop/' --model_name defend_genfend
To experiment on the LLM-mis dataset, go to the folder GenFEND_release_en
and run the following command:
python main.py --root_path './data/LLM-mis/' --model_name bert_genfend
MultiSubpp.py
serves as a plug-in module for integrating with both content-only and comment-based models.
Refer to the MultiSubppModel used in BERTMtiSppModel.py
and dEFENDMtiSppModel.py
to see how to integrate with the other models.
@inproceedings{nan2024let,
title={{Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models}},
author={Nan, Qiong and Sheng, Qiang and Cao, Juan and Hu, Beizhe and Wang, Danding and Li, Jintao},
booktitle={Proceedings of the 33rd ACM International Conference on Information and Knowledge Management},
pages = {1732–1742},
doi={10.1145/3627673.3679519},
year={2024}
}
- Paper List
LLM-for-misinformation-research
: https://github.com/ICTMCG/LLM-for-misinformation-research/ - Tutorial @SIGIR 2024
Preventing and Detecting Misinformation Generated by Large Language Models
: https://sigir24-llm-misinformation.github.io/