Skip to content

seyeint/Fine-Tuning_QLoRA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Using QLoRA on the new Phi-3 from microsoft and testing it.

edit1: I'll merge without quantization later. Just testing stuff and memory caps.

edit2: merged adapter both with and without quantization (careful as I'm not reseting gpu memory throughout the model declarations, so some of it might go to RAM and slow things down)



Quick context on LoRA, regardless of the framework one is working in:

Rank (R): Measure of "dimensionality" of the space a transformation - weight matrix - can span.

The max rank is always the min(#rows, #columns)... why? because that's the max rank possible when assuming all rows and columns are linearly independent.

Higher rank means more parameters to learn, more memory, more energy spent on concept/semantic space.


Low (Lo): One wants to approximate a subset of these weight matrices (x layers) with lower rank matrices (check SVD or other factorization methods). [A]

After that, typically a simple linear projection gets us back to the correct dimension that must match the next input dimension. [B]

The rest of the network has its weights frozen during the forward and backward pass.


Adaptation (A): One adapts the pre-trained model to a more specific task/objective, capturing important relationships from this new environment, comparatively to the innitial general model training.


LoRA: One fine-tunes a large network not simply by freezing some weights and re-training on specific data, but actually reducing the computational cost by performing and working with low rank approximation matrices. $inDim \times outDim$ -> $rank \times (inDim + outDim)$

image

For an intuitive overview of rank and alpha parameters, check this.

About

QLoRA on Phi-3 and qtzd Phi-3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published