
I am using Axolotl
https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-3-vision/lora-11b.yaml
in examples we have a sample .yaml file for this
“`
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
<h1>optionally might have model_type or tokenizer_type or processor_type</h1>
processor_type: AutoProcessor
<h1>Automatically upload checkpoint and final model to HF</h1>
<h1>hub_model_id: username/custom_model_name</h1>
<h1>these 3 lines are needed for now to handle vision chat templates w images</h1>
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false
chat_template: llama3_2_vision
datasets:
– path: HuggingFaceH4/llava-instruct-mix-vsft
type: chat_template
split: train[:1%]
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./outputs/out
adapter: lora
lora_model_dir:
sequence_len: 8192
pad_to_sequence_len: false
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
bf16: true
fp16:
tf32: true
gradient_checkpointing: true
logging_steps: 1
<h1>flash_attention: true # use for text-only mode</h1>
sdp_attention: true
warmup_ratio: 0.1
evals_per_epoch: 1
saves_per_epoch: 1
weight_decay: 0.0
<h1>save_first_step: true # uncomment this to validate checkpoint saving works with your config</h1>
“`
based on which I have made a similar .yaml file
“`
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
processor_type: AutoProcessor
tokenizer_config: <path_to_custom_tokenizer>
tokenizer_type: AutoTokenizer
<h1>Vision-chat template handling</h1>
<h1>skip_prepare_dataset: true</h1>
<h1>remove_unused_columns: false</h1>
<h1>sample_packing: false</h1>
chat_template: llama3_2_vision
datasets:
– path: <path_to_dataset>
type: chat_template
field_messages: messages
message_property_mappings:
role: role
content: content
roles:
system:
– system
user:
– user
assistant:
– assistant
train_on_inputs: false
output_dir: <path_to_output_directory>
<h1>Training parameters</h1>
sequence_len: 8192
pad_to_sequence_len: false
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
weight_decay: 0.0
warmup_ratio: 0.1
<h1>Precision & performance</h1>
bf16: true
fp16:
tf32: true
gradient_checkpointing: true
logging_steps: 1
flash_attention: true # text-only mode
<h1>sdp_attention: true</h1>
<h1>Checkpointing</h1>
evals_per_epoch: 1
saves_per_epoch: 1
save_first_step: true
save_total_limit: 3
weight_decay: 0.0
special_tokens:
pad_token: <|end_of_text|>
“`
but when i run
axolotl train config.yaml
and I have processor_type:
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
processor_type: AutoProcessor
tokenizer_config: <path_to_custom_tokenizer>
tokenizer_type: AutoTokenizer
I get the error
KeyError: 'Indexing with integers is not available when using Python based feature extractors'
but when i remove the field
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
tokenizer_config: <path_to_custom_tokenizer>
tokenizer_type: AutoTokenizer
or even
“`
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
processor_type: AutoProcessor
tokenizer_config: <path_to_custom_tokenizer>
<h1>Vision-chat template handling</h1>
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false
“`
I get the error
AttributeError: 'MllamaTextSelfAttention' object has no attribute 'is_causal'
What happened here?
How does one do this?
Will this fine-tuning lead to loss of Vision Capabilities of the model?
Is there a guide to writing config.yaml files for different models?
Python Version: 3.12
Axolotl Version: Latest
ataset: a .jsonl with
{
"messages":
[
{"role": "system", "content": "<system_prompt>"},
{"role": "user", "content": "<question>"},
{"role": "assistant", "content": "<answer>"}
]
}
which was previously used to fine tune Llama3.1 8B using the following config.yaml
“`
base_model: NousResearch/Meta-Llama-3.1-8B-Instruct
tokenizer_config: <path_to_custom_tokenizer>
tokenizer_type: AutoTokenizer
chat_template: llama3
datasets:
– path: <path_to_dataset>
type: chat_template
field_messages: messages
message_property_mappings:
role: role
content: content
roles:
system:
– system
user:
– user
assistant:
– assistant
train_on_inputs: false
output_dir: <path_to_output_directory>
sequence_len: 2048
sample_packing: true
gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 4
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5
bf16: auto
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
resume_from_checkpoint:
auto_resume_from_checkpoints: true
save_only_model: false
logging_steps: 1
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 2
saves_per_epoch: 1
save_total_limit: 3
weight_decay: 0.0
special_tokens:
pad_token: <|end_of_text|>
“`
<p>Thank you.I'm planning to fine-tune LLaMA 3.2 11B Instruct on a JSONL dataset of domain-specific question-answer pairs — purely text, no images. The goal is to improve its instruction-following behavior for specialized text tasks, while still retaining its ability to handle multimodal inputs like OCR and image-based queries.</p>
<p>I am using Axolotl
<a href="https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-3-vision/lora-11b.yaml">https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-3-vision/lora-11b.yaml</a>
in examples we have a sample .yaml file for this
“`
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
<h1>optionally might have model_type or tokenizer_type or processor_type</h1>
processor_type: AutoProcessor
<h1>Automatically upload checkpoint and final model to HF</h1>
<h1>hub_model_id: username/custom_model_name</h1>
<h1>these 3 lines are needed for now to handle vision chat templates w images</h1>
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false
chat_template: llama3_2_vision
datasets:
– path: HuggingFaceH4/llava-instruct-mix-vsft
type: chat_template
split: train[:1%]
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./outputs/out
adapter: lora
lora_model_dir:
sequence_len: 8192
pad_to_sequence_len: false
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
bf16: true
fp16:
tf32: true
gradient_checkpointing: true
logging_steps: 1
<h1>flash_attention: true # use for text-only mode</h1>
sdp_attention: true
warmup_ratio: 0.1
evals_per_epoch: 1
saves_per_epoch: 1
weight_decay: 0.0
<h1>save_first_step: true # uncomment this to validate checkpoint saving works with your config</h1>
“`
based on which I have made a similar .yaml file
“`
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
processor_type: AutoProcessor
tokenizer_config: <path_to_custom_tokenizer>
tokenizer_type: AutoTokenizer
<h1>Vision-chat template handling</h1>
<h1>skip_prepare_dataset: true</h1>
<h1>remove_unused_columns: false</h1>
<h1>sample_packing: false</h1>
chat_template: llama3_2_vision
datasets:
– path: <path_to_dataset>
type: chat_template
field_messages: messages
message_property_mappings:
role: role
content: content
roles:
system:
– system
user:
– user
assistant:
– assistant
train_on_inputs: false
output_dir: <path_to_output_directory>
<h1>Training parameters</h1>
sequence_len: 8192
pad_to_sequence_len: false
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
weight_decay: 0.0
warmup_ratio: 0.1
<h1>Precision & performance</h1>
bf16: true
fp16:
tf32: true
gradient_checkpointing: true
logging_steps: 1
flash_attention: true # text-only mode
<h1>sdp_attention: true</h1>
<h1>Checkpointing</h1>
evals_per_epoch: 1
saves_per_epoch: 1
save_first_step: true
save_total_limit: 3
weight_decay: 0.0
special_tokens:
pad_token: <|end_of_text|>
“`
but when i run
axolotl train config.yaml
and I have processor_type:
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
processor_type: AutoProcessor
tokenizer_config: <path_to_custom_tokenizer>
tokenizer_type: AutoTokenizer
I get the error
KeyError: 'Indexing with integers is not available when using Python based feature extractors'
but when i remove the field
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
tokenizer_config: <path_to_custom_tokenizer>
tokenizer_type: AutoTokenizer
or even
“`
base_model: alpindale/Llama-3.2-11B-Vision-Instruct
processor_type: AutoProcessor
tokenizer_config: <path_to_custom_tokenizer></p>
<h1>Vision-chat template handling</h1>
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false
“`
I get the error
AttributeError: 'MllamaTextSelfAttention' object has no attribute 'is_causal'
What happened here?
How does one do this?
Will this fine-tuning lead to loss of Vision Capabilities of the model?
Is there a guide to writing config.yaml files for different models?
Python Version: 3.12
Axolotl Version: Latest
ataset: a .jsonl with
{
"messages":
[
{"role": "system", "content": "<system_prompt>"},
{"role": "user", "content": "<question>"},
{"role": "assistant", "content": "<answer>"}
]
}
which was previously used to fine tune Llama3.1 8B using the following config.yaml
“`
base_model: NousResearch/Meta-Llama-3.1-8B-Instruct
tokenizer_config: <path_to_custom_tokenizer>
tokenizer_type: AutoTokenizer
chat_template: llama3
datasets:
– path: <path_to_dataset>
type: chat_template
field_messages: messages
message_property_mappings:
role: role
content: content
roles:
system:
– system
user:
– user
assistant:
– assistant
train_on_inputs: false
output_dir: <path_to_output_directory>
sequence_len: 2048
sample_packing: true
gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 4
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5
bf16: auto
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
resume_from_checkpoint:
auto_resume_from_checkpoints: true
save_only_model: false
logging_steps: 1
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 2
saves_per_epoch: 1
save_total_limit: 3
weight_decay: 0.0
special_tokens:
pad_token: <|end_of_text|>
“`
Thank you.
