Cs-Cv

WAFFLE: Finetuning Multi-Modal Models for Automated Front-End Development

Software Engineering 2 JAN, 2026

WAFFLE: Finetuning Multi-Modal Models for Automated Front-End Development

By Shanchao Liang

Instance-Free Domain Adaptive Object Detection

Computer Vision 6 JAN, 2026

Instance-Free Domain Adaptive Object Detection

By Hengfu Yu

A Unified Formula for Affine Transformations between Calibrated Cameras

Computer Vision 6 JAN, 2026

A Unified Formula for Affine Transformations between Calibrated Cameras

By Levente Hajder

DreamHome-Pano: Design-Aware and Conflict-Free Panoramic Interior Generation

Computer Vision 6 JAN, 2026

DreamHome-Pano: Design-Aware and Conflict-Free Panoramic Interior Generation

By Lulu Chen

Diffeomorphism-Equivariant Neural Networks

Machine Learning 6 JAN, 2026

Diffeomorphism-Equivariant Neural Networks

By Josephine Elisabeth Oettinger

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

Computer Vision 24 JAN, 2024

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

By Zijia Zhao

Rethinking Attention: Polynomial Alternatives to Softmax in Transformers

Machine Learning (Stats) 13 JAN, 2026

Rethinking Attention: Polynomial Alternatives to Softmax in Transformers

By Hemanth Saratch

Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering

Artificial Intelligence 6 JAN, 2026

Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering

By Changin Choi

AR as an Evaluation Playground: Bridging Metrics and Visual Perception of Computer Vision Models

Computer Vision 6 JAN, 2026

AR as an Evaluation Playground: Bridging Metrics and Visual Perception of Computer Vision Models

By Ashkan Ganj

Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

Computer Vision 6 JAN, 2026

Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

By Min-Seop Kwak

Concepts in Motion: Temporal Bottlenecks for Interpretable Video Classification

Computer Vision 6 JAN, 2026

Concepts in Motion: Temporal Bottlenecks for Interpretable Video Classification

By Patrick Knab

Spectral Compressive Imaging via Chromaticity-Intensity Decomposition

Computer Vision 6 JAN, 2026

Spectral Compressive Imaging via Chromaticity-Intensity Decomposition

By Xiaodong Wang

Visual Autoregressive Modeling for Instruction-Guided Image Editing

Multimedia 6 JAN, 2026

Visual Autoregressive Modeling for Instruction-Guided Image Editing

By Qingyang Mao

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

Machine Learning 6 JAN, 2026

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

By Yu Zhang

Revisiting Emotions Representation for Recognition in the Wild

Machine Learning 6 JAN, 2026

Revisiting Emotions Representation for Recognition in the Wild

By Joao Baptista Cardia Neto

Machine Learning for Detection and Severity Estimation of Sweetpotato Weevil Damage in Field and Lab Conditions

Computer Vision 6 JAN, 2026

Machine Learning for Detection and Severity Estimation of Sweetpotato Weevil Damage in Field and Lab Conditions

By Doreen M. Chelangat

Orientation-Robust Latent Motion Trajectory Learning for Annotation-free Cardiac Phase Detection in Fetal Echocardiography

Eess Iv 6 JAN, 2026

Orientation-Robust Latent Motion Trajectory Learning for Annotation-free Cardiac Phase Detection in Fetal Echocardiography

By Yingyu Yang

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

Artificial Intelligence 6 JAN, 2026

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

By Shenyuan Gao

Clinical-Prior Guided Multi-Modal Learning with Latent Attention Pooling for Gait-Based Scoliosis Screening

Computer Vision 6 JAN, 2026

Clinical-Prior Guided Multi-Modal Learning with Latent Attention Pooling for Gait-Based Scoliosis Screening

By Dong Chen

Gold Exploration using Representations from a Multispectral Autoencoder

Artificial Intelligence 6 JAN, 2026

Gold Exploration using Representations from a Multispectral Autoencoder

By Argyro Ts

A Survey of AI-Generated Video Evaluation

Computer Vision 19 JAN, 2026

A Survey of AI-Generated Video Evaluation

By Xiao Liu

Bridging the Indoor-Outdoor Gap: Vision-Centric Instruction-Guided Embodied Navigation for the Last Meters

Robotics 6 JAN, 2026

Bridging the Indoor-Outdoor Gap: Vision-Centric Instruction-Guided Embodied Navigation for the Last Meters

By Yuxiang Zhao

Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

Computer Vision 6 JAN, 2026

Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

By Yunze Tong

SPDA-SAM: A Self-prompted Depth-Aware Segment Anything Model for Instance Segmentation

Computer Vision 6 JAN, 2026

SPDA-SAM: A Self-prompted Depth-Aware Segment Anything Model for Instance Segmentation

By Yihan Shang