You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder whether quantize the input to centroids is an optional processing for both training and generation, and the advantages of using centroids. Thanks again.
The text was updated successfully, but these errors were encountered:
An IR of 32^2 × 3 is still quite computationally intensive.
While working at even lower resolutions is tempting, prior
work has demonstrated human performance on image classi-
fication begins to drop rapidly below this size (Torralba et al.,
2008). Instead, motivated by early color display palettes,
we create our own 9-bit color palette by clustering (R, G,
B) pixel values using k-means with k = 512. Using this
palette yields an input sequence length 3 times shorter than
the standard (R, G, B) palette, while still encoding color
faithfully. A similar approach was applied to spatial patches
by Ranzato et al. (2014). We call the resulting context length
(32^2 or 48^2 or 64^2) the model resolution (MR). Note that
this reduction breaks permutation invariance of the color
channels, but keeps the model spatially invariant.
The idea is that by discretizing the 3 RGB values into single bins you can reduce the sequence length by a factor of 3, which greatly reduces compute as the attention mechanism has a quadratic complexity with respect to sequence length.
Hi,
Thanks for your implementation of image-gpt.
I wonder whether quantize the input to centroids is an optional processing for both training and generation, and the advantages of using centroids. Thanks again.
The text was updated successfully, but these errors were encountered: