Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
portfolio
publications
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Published in 1st Workshop on Green Fondation Models, ECCV, 2024
Experimental evaluation and improvements of data-efficient multi-modal adaptation of single-modality LLM and perceptual backbones.
Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks
Published in ICLR 2025, 2025
Improvements to QINCo for vector quantization with better encoding, fast approximate decoding, and optimized training, achieving state-of-the-art results in vector compression and billion-scale search.
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
Published in arXiv preprint, 2025
A new pixel diffusion decoder architecture for image tokenization that achieves higher reconstruction quality and faster sampling than KL-VAE through distillation-based single-step decoding.
VUGEN: Visual Understanding priors for GENeration
Published in arXiv preprint, 2025
A framework that leverages VLM pretrained visual understanding priors for efficient and high-quality image generation, achieving superior performance while preserving understanding capabilities.
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Published in arXiv preprint, 2026
Empirical study of native multimodal pretraining using Transfusion framework, revealing key insights on visual representation, data synergy, world modeling, and MoE scaling.
