Skip to content

Application Examples

Nicolay Rusnachenko edited this page Nov 1, 2023 · 9 revisions

Sampling for BERT fine-tunining in setiment text classification

Considernig RuAttitudes collection for training models in English

python3 -m arekit_ss.sample --writer jsonl --source ruattitudes --sampler bert --dest_lang en

And the same for RuSentRel, the overall instruction is pretty much similar:

python3 -m arekit_ss.sample --writer jsonl --source rusentrel --sampler bert --dest_lang en

IMPORTANT: You might be interested to manually BALANCE the result sampled data.

Fact-checking Large Language Models

Treat as framework for polishing datasets.

Application is as follows:

  1. Sampling with prompting.
  2. Application of LLM
  3. Gathering results and manual analysing.

We use the following instruction:

python3 -m arekit_ss.sample --writer csv --source nerel --sampler prompt --src_lang ru --dest_lang en \
--prompt "For text: '{text}', is the relation of type {label_val} from '{s_val}' towards '{t_val}'? Answer yes or no, and explain why if no." \ 
--splits train:test
python3 -m arekit_ss.sample --writer csv --source nerel-bio --sampler prompt --dest_lang en \
--prompt "For the text part of the PubMed abstract: '{text}', is the relation of type {label_val} from '{s_val}' towards '{t_val}'? Answer yes or no, and explain why if no." \
--splits train:test