"Built with Meta Llama 3".

LLaMAntino-3-ANITA-8B-Inst-DPO-ITA is a model of the LLaMAntino - Large Language Models family. The model is an instruction-tuned version of Meta-Llama-3-8b-instruct (a fine-tuned LLaMA 3 model). This model version aims to be the a Multilingual Model 🏁 -- EN 🇺🇸 + ITA🇮🇹 -- to further fine-tune for the Specific Italian Task

The 🌟ANITA project🌟 (Advanced Natural-based interaction for the ITAlian language) wants to provide Italian NLP researchers with an improved model for the Italian Language 🇮🇹 use cases.

Model Details

Last Update: 10/05/2024

https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA

Model	HF	GGUF	EXL2
swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA	Link	Link	Link

Repo Structure

inference/inference_anita.ipynb Python Notebook for testing the model inference.
model_adaptation/finetune_llama3.py Python script fine-tuning the model on a specific task using Unsloth library.
model_adaptation/dpo_llama3.py Python script to optimize the model over user preferences.
model_adaptation/job_example.slurm SLURM script to run the model_adaption process over an HPC architecture (multiple-GPUs).
evaluation/job_evaluation.slurm SLURM script to evaluate the model using EleutherAI/lm-evaluation-harness framework.
use_examples/LLama_3_for_Sentiment_Analysis.ipynb Python Notebook for Sentiment Analysis task.
use_examples/Llamaindex_LangChain.ipynb Python Notebook for RAG task.
use_examples/Topic_Modeling_with_Llama3.ipynb Python Notebook for Topic Modleing.
use_examples/SeqRecSys_LLM_Zero_Shot.ipynb Python Notebook for Sequential Recommender Systems.
use_examples/User Interface.ipynb Python Notebook for inference through Visual Interface.
requirements.txt List of project dependencies.

Specifications

Model developers:
Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy
SWAP Research Group
Variations: The model release has been supervised fine-tuning (SFT) using QLoRA 4bit, on instruction-based datasets. DPO approach over the mlabonne/orpo-dpo-mix-40k dataset is used to align with human preferences for helpfulness and safety.
Input: Models input text only.
Language: Multilingual🏁 + Italian 🇮🇹
Output: Models generate text and code only.
Model Architecture: Llama 3 architecture.
Context length: 8K, 8192.
Library Used: Unsloth

Playground

To use the model directly, there are many ways to get started, choose one of the following ways to experience it.

Prompt Template

<|start_header_id|>system<|end_header_id|>

{ SYS Prompt }<|eot_id|><|start_header_id|>user<|end_header_id|>

{ USER Prompt }<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{ ASSIST Prompt }<|eot_id|>

Transformers

For direct use with transformers, you can easily get started with the following steps.

Firstly, you need to install transformers via the command below with pip.
```
pip install -U transformers
```

Right now, you can start using the model directly.

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)

base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(base_model)

sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \
    "(Advanced Natural-based interaction for the ITAlian language)." \
    " Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo."

messages = [
    {"role": "system", "content": sys},
    {"role": "user", "content": "Chi è Carlo Magno?"}
]

#Method 1
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
for k,v in inputs.items():
    inputs[k] = v.cuda()
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6)
results = tokenizer.batch_decode(outputs)[0]
print(results)

#Method 2
import transformers
pipe = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=False, # langchain expects the full text
    task='text-generation',
    max_new_tokens=512, # max number of tokens to generate in the output
    temperature=0.6,  #temperature for more or less creative answers
    do_sample=True,
    top_p=0.9,
)

sequences = pipe(messages)
for seq in sequences:
    print(f"{seq['generated_text']}")

Additionally, you can also use a model with 4bit quantization to reduce the required resources at least. You can start with the code below.

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False,
)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(base_model)

sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \
      "(Advanced Natural-based interaction for the ITAlian language)." \
      " Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo."
  
messages = [
    {"role": "system", "content": sys},
    {"role": "user", "content": "Chi è Carlo Magno?"}
]

#Method 1
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
for k,v in inputs.items():
    inputs[k] = v.cuda()
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6)
results = tokenizer.batch_decode(outputs)[0]
print(results)

#Method 2
import transformers
pipe = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=False, # langchain expects the full text
    task='text-generation',
    max_new_tokens=512, # max number of tokens to generate in the output
    temperature=0.6,  #temperature for more or less creative answers
    do_sample=True,
    top_p=0.9,
)

sequences = pipe(messages)
for seq in sequences:
    print(f"{seq['generated_text']}")

Evaluation

Open LLM Leaderboard:

Evaluated with lm-evaluation-benchmark-harness for the Open Italian LLMs Leaderboard

   lm_eval --model hf --model_args pretrained=HUGGINGFACE_MODEL_ID  --tasks hellaswag_it,arc_it  --device cuda:0 --batch_size auto:2
   lm_eval --model hf --model_args pretrained=HUGGINGFACE_MODEL_ID  --tasks m_mmlu_it --num_fewshot 5  --device cuda:0 --batch_size auto:2

Metric	Value
Avg.	0.6160
Arc_IT	0.5714
Hellaswag_IT	0.7093
MMLU_IT	0.5672

Unsloth

Unsloth, a great tool that helps us easily develop products, at a lower cost than expected.

Citation instructions

@misc{polignano2024advanced,
      title={Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA}, 
      author={Marco Polignano and Pierpaolo Basile and Giovanni Semeraro},
      year={2024},
      eprint={2405.07101},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{basile2023llamantino,
      title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language}, 
      author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
      year={2023},
      eprint={2312.09993},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@article{llama3modelcard,
  title={Llama 3 Model Card},
  author={AI@Meta},
  year={2024},
  url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Model Details

Repo Structure

Specifications

Playground

Prompt Template

Transformers

Evaluation

Unsloth

Citation instructions

Files

README.md

Latest commit

History

README.md

File metadata and controls

Model Details

Repo Structure

Specifications

Playground

Prompt Template

Transformers

Evaluation

Unsloth

Citation instructions