marp | theme | size | paginate | math |
---|---|---|---|---|
true |
defaults |
58140 |
true |
katex |
Tutor: Joan Serrat Guim Casadellà Cors 04 - 2024
Pixel-wise Classification
EasyPortrait
Issue: Hard and costly to annotate
Ground Truth is "free" to obtain
Unsupervised domain adaptation : no target labels available, do self-training
- train model with synthetic source
- make predictions on real target images
- select "best" predictions as groundtruth = pseudolabels
- retrain the model, go to 2.
Active domain adaptation: ask for a few target labels
- train model with synthetic source
- select a few promising
$^*$ target samples to be annotated by a human - retrain the model
- if annotation budget exhausted stop, else go to 2.
- mIoU: Average IoU for each class
-
$y$ and$x$ are the ground truth and input vectors, respectively. - Dataset of
$N$ training examples, sampled from a distribution$D$ . -
$\theta$ are the parameters describing a neural network. -
$P(y|x, \theta)$ Probability of predicting y from x given NN is in state$\theta$ . -
$R(x, y)$ is the evaluation of the reward function. -
$\nabla_{\theta}$ denotes the usual gradient computation of a NN.
- Align model predictions and intended usage via reward optimisation.
-
REINFORCE's well-known log-derivative trick.
- Learn $P(y|x, \theta) \rightarrow \max_{\theta} \mathbb{E}{x \sim D} \left[ \mathbb{E}{y \sim P(\cdot | x, \theta)} R(x, y) \right]$
Approach:
- Model pretraining with maximum-likelihood estimation.
- Model tuning for the task by maximazing related reward with the REINFORCE algorithm.
Tuning computer vision models with task rewards | Full paper link
- Provides a way to estimate the gradient of the expected reward for a given input x:
- $\nabla_{\theta} \mathbb{E}{y \sim P} \left[ R(x, y) \right] = \mathbb{E}{y \sim P} \left[ R(x, y) \nabla_{\theta} \log P(y|x; \theta) \right]$
- Unbiased estimate of RHS as an average of per-example gradients.
- Implemented in the model's loss function
function batch_loss(θ, x, y):
# n is the size of a mini-batch.
return (1/n) * Σ(log P(yᵢ'|xᵢ; θ))
end function
function step_mle(θ, x, y, α):
G_mle := ∇θ batch_loss(θ, x, y)
return θ + αG_mle
end function
function batch_loss(θ, x, y, r):
return (1/n) * Σ(r log P(yᵢ'|xᵢ; θ))
end function
function step_reward(θ, x, α):
y_sample := batch_sample(θ, x)
y_baseline := batch_sample(θ, x)
r := R(x, y_sample) - R(x, y_baseline)
G_r := ∇θ batch_loss(θ, x, y_sample, r)
return θ + αG_r
end function
- Method can suffer from high variance affecting the overall performance.
- Some variance reduction techniques include:
- Increase the number of samples (batch size, nº GPUs)
$V_n \propto O(\frac{1}{N})$ - Substracting baselines
$B$ independent to x.$\rightarrow \mathbb{E}(B)=0$ - Rolling mean?
- Increase the number of samples (batch size, nº GPUs)
Monte Carlo Gradient Estimation in Machine Learning | Full paper link
- Encoder: Extracts features (e.g., ResNet);
- Decoder: Reconstructs output using these features, often with upsampling.
- Widespread use in numerous research papers
- Medium size: Fits inavailable GPU's memory
$\le 12 GB$ - Atrous Spatial Pyramid Pooling (ASPP)
-
Implemented a custom decode head MMsegmentation module which inherits from the base decode head.
-
Includes reward & baseline computation capabilities when performing a loss forward step in train mode.
-
Interacts with the new reward optimisation loss function.
- Implemented in MMSegmentation as a new Loss Function
-
$R(x_i, y_i) = 1 - \frac{\sum{IoU(x_i, y_i)}}{nº \ classes}$ to weight better ($loss \to 0$ ) when$mIoU \to 1$
function batch_loss(θ, x, y, r):
return (1/n) * Σ(r log P(yᵢ'|xᵢ; θ))
end function
function step_reward(θ, x, α):
y_sample := batch_sample(θ, x)
y_baseline := batch_sample(θ, x)
r := R(x, y_sample) - R(x, y_baseline)
G_r := ∇θ batch_loss(θ, x, y_sample, r)
return θ + αG_r
end function
- Scheduler: Initial Learning Rate, evolution, ...
- Model Structure: Auxiliary head?.
-
Baseline: Definition of
$r_b$ . - Steps: What checkpoint to use and how many steps to take.
- Weights: Use of weights in loss computation.
- Best config:
$1e-4 \rightarrow 1e-6$ - Lower starting learing rate LR to "overcome" the change in the model's structure and loss function
- Perform some variance reduction by also lowering the final LR
- Overall better performance than original trend line
- Auxiliary head roughest start beacause facing more changes
- Both face high-variance
-
$C_c$ : Class counts -
$T_c$ : Total counts -
$C_w$ : Class weights
- Have tried the same configuration on the EasyPortrait dataset, achieving similar results.
- In the future, plan to implement this on the Mapillary Vistas dataset, which starts from a lower mIoU since it's a more challenging task.
- Repeat some of the experiments to establish the mean improvement for each result and its variance.
- The work shown is trying to fine-tune a model that already has extremely high accuracy. It's trying to outperform models that have been extensively tuned. This gives little room for improvement, as changes in the model's behavior make it extremely difficult to optimize its hyperparameters for significant improvements.
- The key paper in the project also doesn't help, as it provides no details on the implementation of the new algorithm. This makes it difficult to apply those ideas to a new problem.
- As seen, variance is a significant problem in this project. However, the proposed approach seems to address it, to some extent.
- Overall, the general goal has been accomplished, as some configurations perform somewhat better than the original model. However, the lack of time and computing power makes it difficult to further tune the model and try to widen the gap through reward optimization.