Part 1: Learning Optical Expansion from Scale Matching ---- Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Informationn
Part 2: CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching ---- Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark
Part 3: Vision Transformers are Good Mask Auto-Labelers ---- ECON: Explicit Clothed humans Optimized via Normal integration
Part 4: Zero-shot Generative Model Adaptation via Image-specific Prompt Learning ---- ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization
Part 5: Token Boosting for Robust Self-Supervised Visual Transformer Pre-training ---- Mask-guided Matting in the Wild
- NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images
- NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination
- NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
- RelightableHands: Efficient Neural Relighting of Articulated Hand Models
- Multi-View Azimuth Stereo via Tangent Space Consistency
- VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
- InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds
- Compressing Volumetric Radiance Fields to 1 MB