Can you train detail "Style Embedding Extraction"? 1. Especially, a trainable "Q-Former", how to train this? 2. Can you publish train code? Thank you.