Type | Title | Homepage | Code | Code Stars |
---|---|---|---|---|
Best Paper | Planning-oriented Autonomous Driving | Link | Github | |
Best Paper | Visual Programming: Compositional visual reasoning without training | Link | Github | |
Best Paper Honorable Mention | DynIBaR: Neural Dynamic Image-Based Rendering | Link | Github | |
Best Student Paper | 3D Registration with Maximal Cliques | Link | Github | |
Best Student Paper Honorable Mention | DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation | Link | Github |
The following CVPR2023 paper information is extracted from the following web page and saved in the papers_info.json
file.
https://openaccess.thecvf.com/CVPR2023?day=all
https://cvpr2023.thecvf.com/Conferences/2023/AcceptedPapers
If you find any errors in the paper information or missing Githubs, you are welcome to modify the corresponding content of the papers_info_refined.json
file and submit a Pull Request.
Title | Paper | Code | Github Stars |
---|---|---|---|
YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors | Link | Github | |
From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models | Link | Github | |
Co-Training 2L Submodels for Visual Recognition | Link | Github | |
Token Turing Machines | Link | Github | |
How Can Objects Help Action Recognition? | Link | Github | |
GINA-3D: Learning To Generate Implicit Neural Assets in the Wild | Link | Github | |
Images Speak in Images: A Generalist Painter for In-Context Visual Learning | Link | Github | |
Planning-Oriented Autonomous Driving | Link | Github | |
Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks | Link | Github | |
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions | Link | Github | |
DepGraph: Towards Any Structural Pruning | Link | Github | |
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale | Link | Github | |
Universal Instance Perception As Object Discovery and Retrieval | Link | Github | |
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360Β° | Link | Github | |
EfficientViT: Memory Efficient Vision Transformer With Cascaded Group Attention | Link | Github | |
Unifying Vision, Text, and Layout for Universal Document Processing | Link | Github | |
ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders | Link | Github | |
FlexiViT: One Model for All Patch Sizes | Link | Github | |
CLIPPO: Image-and-Language Understanding From Pixels Only | Link | Github | |
Neighborhood Attention Transformer | Link | Github | |
SeqTrack: Sequence to Sequence Learning for Visual Object Tracking | Link | Github | |
Deep Learning of Partial Graph Matching via Differentiable Top-K | Link | Github | |
Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation | Link | Github | |
Paint by Example: Exemplar-Based Image Editing With Diffusion Models | Link | Github | |
Cut and Learn for Unsupervised Object Detection and Instance Segmentation | Link | Github | |
Masked Image Modeling With Local Multi-Scale Reconstruction | Link | Github | |
PAniC-3D: Stylized Single-View 3D Reconstruction From Portraits of Anime Characters | Link | Github | |
Learning To Generate Image Embeddings With User-Level Differential Privacy | Link | Github | |
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures | Link | Github | |
InstMove: Instance Motion for Object-Centric Video Segmentation | Link | Github | |
Activating More Pixels in Image Super-Resolution Transformer | Link | Github | |
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking | Link | Github | |
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking | Link | Github | |
OpenGait: Revisiting Gait Recognition Towards Better Practicality | Link | Github | |
Run, Donβt Walk: Chasing Higher FLOPS for Faster Neural Networks | Link | Github | |
All Are Worth Words: A ViT Backbone for Diffusion Models | Link | Github | |
Shape, Pose, and Appearance From a Single Image via Bootstrapped Radiance Field Inversion | Link | Github | |
MAGE: MAsked Generative Encoder To Unify Representation Learning and Image Synthesis | Link | Github | |
Mask-Free Video Instance Segmentation | Link | Github | |
Compressing Volumetric Radiance Fields to 1 MB | Link | Github | |
PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers | Link | Github | |
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network | Link | Github | |
FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction | Link | Github | |
Detecting Everything in the Open World: Towards Universal Object Detection | Link | Github | |
Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning | Link | Github | |
Cross-Domain Image Captioning With Discriminative Finetuning | Link | Github | |
NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360Β° Views | Link | Github | |
Scaling Language-Image Pre-Training via Masking | Link | Github | |
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation | Link | Github | |
RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation | Link | Github | |
MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors | Link | Github | |
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing | Link | Github | |
BiFormer: Vision Transformer With Bi-Level Routing Attention | Link | Github | |
All in One: Exploring Unified Video-Language Pre-Training | Link | Github | |
Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation | Link | Github | |
Wavelet Diffusion Models Are Fast and Scalable Image Generators | Link | Github | |
Efficient and Explicit Modelling of Image Hierarchies for Image Restoration | Link | Github | |
3D Registration With Maximal Cliques | Link | Github | |
Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual Question Answering | Link | Github | |
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks | Link | Github | |
DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets | Link | Github | |
BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points | Link | Github | |
EDICT: Exact Diffusion Inversion via Coupled Transformations | Link | Github | |
Disentangling Writer and Character Styles for Handwriting Generation | Link | Github | |
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation | Link | Github | |
Conditional Image-to-Video Generation With Latent Flow Diffusion Models | Link | Github | |
Inversion-Based Style Transfer With Diffusion Models | Link | Github | |
Recurrent Vision Transformers for Object Detection With Event Cameras | Link | Github | |
Dense Distinct Query for End-to-End Object Detection | Link | Github | |
Neural Video Compression With Diverse Contexts | Link | Github | |
Spherical Transformer for LiDAR-Based 3D Recognition | Link | Github | |
You Only Segment Once: Towards Real-Time Panoptic Segmentation | Link | Github | |
Referring Image Matting | Link | Github | |
VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking | Link | Github | |
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation | Link | Github | |
NIKI: Neural Inverse Kinematics With Invertible Neural Networks for 3D Human Pose and Shape Estimation | Link | Github | |
High-Fidelity 3D GAN Inversion by Pseudo-Multi-View Optimization | Link | Github | |
GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction | Link | Github | |
OTAvatar: One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering | Link | Github | |
PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces | Link | Github | |
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation | Link | Github | |
Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation | Link | Github | |
LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs | Link | Github | |
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation | Link | Github | |
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis | Link | Github | |
Learning a Sparse Transformer Network for Effective Image Deraining | Link | Github | |
Visual Prompt Multi-Modal Tracking | Link | Github | |
DeepSolo: Let Transformer Decoder With Explicit Points Solo for Text Spotting | Link | Github | |
HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining | Link | Github | |
Learning Visual Representations via Language-Guided Sampling | Link | Github | |
GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning | Link | Github | |
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection | Link | Github | |
NeRF-RPN: A General Framework for Object Detection in NeRFs | Link | Github | |
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation | Link | Github | |
Position-Guided Text Prompt for Vision-Language Pre-Training | Link | Github | |
Query-Centric Trajectory Prediction | Link | Github | |
Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need | Link | Github | |
LoGoNet: Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion | Link | Github | |
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training | Link | Github | |
BEVHeight: A Robust Framework for Vision-Based Roadside 3D Object Detection | Link | Github | |
SimpleNet: A Simple Network for Image Anomaly Detection and Localization | Link | Github | |
Think Twice Before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving | Link | Github | |
Slide-Transformer: Hierarchical Vision Transformer With Local Self-Attention | Link | Github | |
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion | Link | Github | |
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking | Link | Github | |
Identity-Preserving Talking Face Generation With Landmark and Appearance Priors | Link | Github | |
LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation | Link | Github | |
Delving Into Shape-Aware Zero-Shot Semantic Segmentation | Link | Github | |
Aligning Bag of Regions for Open-Vocabulary Object Detection | Link | Github | |
ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation | Link | Github | |
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers | Link | Github | |
Data-Driven Feature Tracking for Event Cameras | Link | Github | |
FeatureBooster: Boosting Feature Descriptors With a Lightweight Neural Network | Link | Github | |
Omni Aggregation Networks for Lightweight Image Super-Resolution | Link | Github | |
Shifted Diffusion for Text-to-Image Generation | Link | Github | |
A Generalized Framework for Video Instance Segmentation | Link | Github | |
Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild | Link | Github | |
LANA: A Language-Capable Navigator for Instruction Following and Generation | Link | Github | |
Learning Generative Structure Prior for Blind Text Image Super-Resolution | Link | Github | |
Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement | Link | Github | |
TriDet: Temporal Action Detection With Relative Boundary Modeling | Link | Github | |
GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds | Link | Github | |
Fix the Noise: Disentangling Source Feature for Controllable Domain Translation | Link | Github | |
Multimodal Prompting With Missing Modalities for Visual Recognition | Link | Github | |
Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving | Link | Github | |
Enhanced Training of Query-Based Object Detection via Selective Query Recollection | Link | Github | |
Data-Efficient Large Scale Place Recognition With Graded Similarity Supervision | Link | Github | |
Super-Resolution Neural Operator | Link | Github | |
Revisiting Rotation Averaging: Uncertainties and Robust Losses | Link | Github | |
PlaneDepth: Self-Supervised Depth Estimation via Orthogonal Planes | Link | Github | |
Human Guided Ground-Truth Generation for Realistic Image Super-Resolution | Link | Github | |
DynamicDet: A Unified Dynamic Architecture for Object Detection | Link | Github | |
FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation | Link | Github | |
HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes With Iterative Intertwined Regularization | Link | Github | |
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information | Link | Github | |
UniHCP: A Unified Model for Human-Centric Perceptions | Link | Github | |
NeuFace: Realistic 3D Neural Face Rendering From Multi-View Images | Link | Github | |
Adaptive Assignment for Geometry Aware Local Feature Matching | Link | Github | |
Learning To Generate Text-Grounded Mask for Open-World Semantic Segmentation From Only Image-Text Pairs | Link | Github | |
CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation | Link | Github | |
Anchor3DLane: Learning To Regress 3D Anchors for Monocular 3D Lane Detection | Link | Github | |
Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision | Link | Github | |
CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent Search | Link | Github | |
DNF: Decouple and Feedback Network for Seeing in the Dark | Link | Github | |
Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing | Link | Github | |
Scalable, Detailed and Mask-Free Universal Photometric Stereo | Link | Github | |
Learning To Dub Movies via Hierarchical Prosody Models | Link | Github | |
BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation | Link | Github | |
Generic-to-Specific Distillation of Masked Autoencoders | Link | Github | |
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding | Link | Github | |
Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning | Link | Github | |
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval? | Link | Github | |
Unifying Short and Long-Term Tracking With Graph Hierarchies | Link | Github | |
Hierarchical Fine-Grained Image Forgery Detection and Localization | Link | Github | |
CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution | Link | Github | |
Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting | Link | Github | |
Masked Image Training for Generalizable Deep Image Denoising | Link | Github | |
CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP | Link | Github | |
Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring | Link | Github | |
Multimodal Industrial Anomaly Detection via Hybrid Fusion | Link | Github | |
LinK: Linear Kernel for LiDAR-Based 3D Perception | Link | Github | |
V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting | Link | Github | |
Meta Architecture for Point Cloud Analysis | Link | Github | |
CF-Font: Content Fusion for Few-Shot Font Generation | Link | Github | |
ViTs for SITS: Vision Transformers for Satellite Image Time Series | Link | Github | |
ISBNet: A 3D Point Cloud Instance Segmentation Network With Instance-Aware Sampling and Box-Aware Dynamic Convolution | Link | Github | |
A Light Weight Model for Active Speaker Detection | Link | Github | |
Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark | Link | Github | |
DeltaEdit: Exploring Text-Free Training for Text-Driven Image Manipulation | Link | Github | |
Understanding Imbalanced Semantic Segmentation Through Neural Collapse | Link | Github | |
MP-Former: Mask-Piloted Transformer for Image Segmentation | Link | Github | |
Hierarchical Dense Correlation Distillation for Few-Shot Segmentation | Link | Github | |
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection | Link | Github | |
IFSeg: Image-Free Semantic Segmentation via Vision-Language Model | Link | Github | |
AutoFocusFormer: Image Segmentation off the Grid | Link | Github | |
EqMotion: Equivariant Multi-Agent Motion Prediction With Invariant Interaction Reasoning | Link | Github | |
GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds | Link | Github | |
Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models | Link | Github | |
Finetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Models | Link | Github | |
Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation | Link | Github | |
Two-View Geometry Scoring Without Correspondences | Link | Github | |
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability | Link | Github | |
Learning Semantic Relationship Among Instances for Image-Text Matching | Link | Github | |
LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation | Link | Github | |
Robust Mean Teacher for Continual and Gradual Test-Time Adaptation | Link | Github | |
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders | Link | Github | |
Directional Connectivity-Based Segmentation of Medical Images | Link | Github | |
Zero-Shot Referring Image Segmentation With Global-Local Context Features | Link | Github | |
Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank | Link | Github | |
Dynamic Focus-Aware Positional Queries for Semantic Segmentation | Link | Github | |
Vision Transformer With Super Token Sampling | Link | Github | |
Sampling Is Matter: Point-Guided 3D Human Mesh Reconstruction | Link | Github | |
3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds | Link | Github | |
PROB: Probabilistic Objectness for Open World Object Detection | Link | Github | |
Benchmarking Robustness of 3D Object Detection to Common Corruptions | Link | Github | |
Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images | Link | Github | |
MARLIN: Masked Autoencoder for Facial Video Representation LearnINg | Link | Github | |
ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing | Link | Github | |
Interactive and Explainable Region-Guided Radiology Report Generation | Link | Github | |
SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection | Link | Github | |
Real-Time 6K Image Rescaling With Rate-Distortion Optimization | Link | Github | |
Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring | Link | Github | |
Frequency-Modulated Point Cloud Rendering With Easy Editing | Link | Github | |
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning | Link | Github | |
BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models | Link | Github | |
LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling | Link | Github | |
DynaFed: Tackling Client Data Heterogeneity With Global Dynamics | Link | Github | |
Frame Flexible Network | Link | Github | |
GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training | Link | Github | |
Collaboration Helps Camera Overtake LiDAR in 3D Detection | Link | Github | |
CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning | Link | Github | |
RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving | Link | Github | |
Generalized Relation Modeling for Transformer Tracking | Link | Github | |
WildLight: In-the-Wild Inverse Rendering With a Flashlight | Link | Github | |
Equiangular Basis Vectors | Link | Github | |
DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium | Link | Github | |
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-Identification | Link | Github | |
Diversity-Aware Meta Visual Prompting | Link | Github | |
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training | Link | Github | |
Texts as Images in Prompt Tuning for Multi-Label Image Recognition | Link | Github | |
PointConvFormer: Revenge of the Point-Based Convolution | Link | Github | |
Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection | Link | Github | |
RILS: Masked Visual Reconstruction in Language Semantic Space | Link | Github | |
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization | Link | Github | |
StyleRes: Transforming the Residuals for Real Image Editing With StyleGAN | Link | Github | |
SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation | Link | Github | |
Learning With Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning | Link | Github | |
Handwritten Text Generation From Visual Archetypes | Link | Github | |
Post-Training Quantization on Diffusion Models | Link | Github | |
DPF: Learning Dense Prediction Fields With Weak Supervision | Link | Github | |
OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer | Link | Github | |
SCPNet: Semantic Scene Completion on Point Cloud | Link | Github | |
Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation | Link | Github | |
Novel Class Discovery for 3D Point Cloud Semantic Segmentation | Link | Github | |
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation With Cross-Scale Distortion Awareness | Link | Github | |
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | Link | Github | |
Masked and Adaptive Transformer for Exemplar Based Image Translation | Link | Github | |
DCFace: Synthetic Face Generation With Dual Condition Diffusion Model | Link | Github | |
T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection | Link | Github | |
SMPConv: Self-Moving Point Representations for Continuous Convolution | Link | Github | |
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution | Link | Github | |
A Large-Scale Homography Benchmark | Link | Github | |
GeoMVSNet: Learning Multi-View Stereo With Geometry Perception | Link | Github | |
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression | Link | Github | |
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks | Link | Github | |
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge | Link | Github | |
Rethinking Federated Learning With Domain Shift: A Prototype View | Link | Github | |
Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization | Link | Github | |
Dynamic Coarse-To-Fine Learning for Oriented Tiny Object Detection | Link | Github | |
Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning | Link | Github | |
Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Time | Link | Github | |
Guiding Pseudo-Labels With Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation | Link | Github | |
Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization | Link | Github | |
Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process | Link | Github | |
A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation From a Single RGB Image | Link | Github | |
DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects | Link | Github | |
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models | Link | Github | |
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation | Link | Github | |
Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation | Link | Github | |
Visibility Constrained Wide-Band Illumination Spectrum Design for Seeing-in-the-Dark | Link | Github | |
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud | Link | Github | |
Sharpness-Aware Gradient Matching for Domain Generalization | Link | Github | |
Deep Graph-Based Spatial Consistency for Robust Non-Rigid Point Cloud Registration | Link | Github | |
Decoupled Multimodal Distilling for Emotion Recognition | Link | Github | |
Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation | Link | Github | |
An Image Quality Assessment Dataset for Portraits | Link | Github | |
Leveraging Hidden Positives for Unsupervised Semantic Segmentation | Link | Github | |
Semantic-Conditional Diffusion Networks for Image Captioning | Link | Github | |
STMixer: A One-Stage Sparse Action Detector | Link | Github | |
Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset | Link | Github | |
Joint Visual Grounding and Tracking With Natural Language Specification | Link | Github | |
Where Is My Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization | Link | Github | |
Power Bundle Adjustment for Large-Scale 3D Reconstruction | Link | Github | |
Rethinking Domain Generalization for Face Anti-Spoofing: Separability and Alignment | Link | Github | |
A Unified Pyramid Recurrent Network for Video Frame Interpolation | Link | Github | |
Revisiting Reverse Distillation for Anomaly Detection | Link | Github | |
SOOD: Towards Semi-Supervised Oriented Object Detection | Link | Github | |
POEM: Reconstructing Hand in a Point Embedded Multi-View Stereo | Link | Github | |
Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors | Link | Github | |
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation | Link | Github | |
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID | Link | Github | |
Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment | Link | Github | |
Task Residual for Tuning Vision-Language Models | Link | Github | |
Structured Sparsity Learning for Efficient Video Super-Resolution | Link | Github | |
Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prior | Link | Github | |
Imitation Learning As State Matching via Differentiable Physics | Link | Github | |
PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration | Link | Github | |
Twin Contrastive Learning With Noisy Labels | Link | Github | |
TarViS: A Unified Approach for Target-Based Video Segmentation | Link | Github | |
Clover: Towards a Unified Video-Language Alignment and Fusion Model | Link | Github | |
Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need | Link | Github | |
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers | Link | Github | |
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images | Link | Github | |
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos | Link | Github | |
Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision | Link | Github | |
Interactive Segmentation As Gaussion Process Classification | Link | Github | |
PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation | Link | Github | |
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization | Link | Github | |
Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo | Link | Github | |
TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets | Link | Github | |
Exploring Discontinuity for Video Frame Interpolation | Link | Github | |
Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections | Link | Github | |
Affordance Grounding From Demonstration Video To Target Image | Link | Github | |
Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection | Link | Github | |
How to Backdoor Diffusion Models? | Link | Github | |
LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising | Link | Github | |
Neuron Structure Modeling for Generalizable Remote Physiological Measurement | Link | Github | |
Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation | Link | Github | |
STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection | Link | Github | |
RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor | Link | Github | |
Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation | Link | Github | |
Learning Federated Visual Prompt in Null Space for MRI Reconstruction | Link | Github | |
Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution | Link | Github | |
Learning Distortion Invariant Representation for Image Restoration From a Causality Perspective | Link | Github | |
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery | Link | Github | |
MSF: Motion-Guided Sequential Fusion for Efficient 3D Object Detection From Point Cloud Sequences | Link | Github | |
CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection | Link | Github | |
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective | Link | Github | |
Polynomial Implicit Neural Representations for Large Diverse Datasets | Link | Github | |
3D-Aware Multi-Class Image-to-Image Translation With NeRFs | Link | Github | |
Masked Motion Encoding for Self-Supervised Video Representation Learning | Link | Github | |
Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning | Link | Github | |
Towards Scalable Neural Representation for Diverse Videos | Link | Github | |
CLOTH4D: A Dataset for Clothed Human Reconstruction | Link | Github | |
Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration | Link | Github | |
Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations | Link | Github | |
Robust Test-Time Adaptation in Dynamic Scenarios | Link | Github | |
Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification | Link | Github | |
FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training | Link | Github | |
MOSO: Decomposing MOtion, Scene and Object for Video Prediction | Link | Github | |
ALOFT: A Lightweight MLP-Like Architecture With Dynamic Low-Frequency Transform for Domain Generalization | Link | Github | |
A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others | Link | Github | |
SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency | Link | Github | |
Best of Both Worlds: Multimodal Contrastive Learning With Tabular and Imaging Data | Link | Github | |
Viewpoint Equivariance for Multi-View 3D Object Detection | Link | Github | |
DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection | Link | Github | |
Regularizing Second-Order Influences for Continual Learning | Link | Github | |
Backdoor Defense via Adaptively Splitting Poisoned Dataset | Link | Github | |
Towards Artistic Image Aesthetics Assessment: A Large-Scale Dataset and a New Method | Link | Github | |
JacobiNeRF: NeRF Shaping With Mutual Information Gradients | Link | Github | |
Accelerating Vision-Language Pretraining With Free Language Modeling | Link | Github | |
Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection | Link | Github | |
PA&DA: Jointly Sampling Path and Data for Consistent NAS | Link | Github | |
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling | Link | Github | |
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity | Link | Github | |
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection | Link | Github | |
ZBS: Zero-Shot Background Subtraction via Instance-Level Background Modeling and Foreground Selection | Link | Github | |
Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation | Link | Github | |
AdaptiveMix: Improving GAN Training via Feature Space Shrinkage | Link | Github | |
Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation | Link | Github | |
Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction | Link | Github | |
A Strong Baseline for Generalized Few-Shot Semantic Segmentation | Link | Github | |
FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection | Link | Github | |
Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation | Link | Github | |
Siamese DETR | Link | Github | |
Distribution Shift Inversion for Out-of-Distribution Prediction | Link | Github | |
Towards Unified Scene Text Spotting Based on Sequence Generation | Link | Github | |
CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer | Link | Github | |
Supervised Masked Knowledge Distillation for Few-Shot Transformers | Link | Github | |
MELTR: Meta Loss Transformer for Learning To Fine-Tune Video Foundation Models | Link | Github | |
Unsupervised Inference of Signed Distance Functions From Single Sparse Point Clouds Without Learning Priors | Link | Github | |
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation | Link | Github | |
Adaptive Human Matting for Dynamic Videos | Link | Github | |
Making Vision Transformers Efficient From a Token Sparsification View | Link | Github | |
ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection | Link | Github | |
Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning | Link | Github | |
ACL-SPC: Adaptive Closed-Loop System for Self-Supervised Point Cloud Completion | Link | Github | |
Weakly Supervised Posture Mining for Fine-Grained Classification | Link | Github | |
H2ONet: Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction | Link | Github | |
E2PN: Efficient SE(3)-Equivariant Point Network | Link | Github | |
Audio-Visual Grouping Network for Sound Localization From Mixtures | Link | Github | |
StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping | Link | Github | |
MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset | Link | Github | |
Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation | Link | Github | |
Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation | Link | Github | |
Glocal Energy-Based Learning for Few-Shot Open-Set Recognition | Link | Github | |
Indiscernible Object Counting in Underwater Scenes | Link | Github | |
Curricular Object Manipulation in LiDAR-Based Object Detection | Link | Github | |
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning With Structure-Trajectory Prompted Reconstruction for Person Re-Identification | Link | Github | |
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification | Link | Github | |
HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models | Link | Github | |
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers | Link | Github | |
Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection | Link | Github | |
DAA: A Delta Age AdaIN Operation for Age Estimation via Binary Code Transformer | Link | Github | |
Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution | Link | Github | |
The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks | Link | Github | |
Dynamic Conceptional Contrastive Learning for Generalized Category Discovery | Link | Github | |
Class Adaptive Network Calibration | Link | Github | |
Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation | Link | Github | |
FAC: 3D Representation Learning via Foreground Aware Feature Contrast | Link | Github | |
NICO++: Towards Better Benchmarking for Domain Generalization | Link | Github | |
Bridging Search Region Interaction With Template for RGB-T Tracking | Link | Github | |
Rotation-Invariant Transformer for Point Cloud Matching | Link | Github | |
Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm | Link | Github | |
CXTrack: Improving 3D Point Cloud Tracking With Contextual Information | Link | Github | |
CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment | Link | Github | |
Revisiting Residual Networks for Adversarial Robustness | Link | Github | |
Upcycling Models Under Domain and Category Shift | Link | Github | |
Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video | Link | Github | |
PDPP:Projected Diffusion for Procedure Planning in Instructional Videos | Link | Github | |
NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation | Link | Github | |
Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label Classification | Link | Github | |
Detecting Backdoors in Pre-Trained Encoders | Link | Github | |
Equivalent Transformation and Dual Stream Network Construction for Mobile Image Super-Resolution | Link | Github | |
TAPS3D: Text-Guided 3D Textured Shape Generation From Pseudo Supervision | Link | Github | |
Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container | Link | Github | |
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution | Link | Github | |
Re-Thinking Federated Active Learning Based on Inter-Class Diversity | Link | Github | |
Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction | Link | Github | |
Federated Incremental Semantic Segmentation | Link | Github | |
Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces | Link | Github | |
Learning Common Rationale To Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems | Link | Github | |
Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition | Link | Github | |
Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data | Link | Github | |
Optimization-Inspired Cross-Attention Transformer for Compressive Sensing | Link | Github | |
Context-Based Trit-Plane Coding for Progressive Image Compression | Link | Github | |
Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation | Link | Github | |
Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection | Link | Github | |
GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency | Link | Github | |
BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation | Link | Github | |
On the Effects of Self-Supervision and Contrastive Alignment in Deep Multi-View Clustering | Link | Github | |
Diverse 3D Hand Gesture Prediction From Body Dynamics by Bilateral Hand Disentanglement | Link | Github | |
sRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model | Link | Github | |
Reliability in Semantic Segmentation: Are We on the Right Track? | Link | Github | |
Diversity-Measurable Anomaly Detection | Link | Github | |
ABCD: Arbitrary Bitwise Coefficient for De-Quantization | Link | Github | |
Block Selection Method for Using Feature Norm in Out-of-Distribution Detection | Link | Github | |
Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution | Link | Github | |
Two-Shot Video Object Segmentation | Link | Github | |
MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition | Link | Github | |
Extracting Class Activation Maps From Non-Discriminative Features As Well | Link | Github | |
Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception | Link | Github | |
MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos | Link | Github | |
Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction | Link | Github | |
Visual Prompt Tuning for Generative Transfer Learning | Link | Github | |
Improved Test-Time Adaptation for Domain Generalization | Link | Github | |
Watch or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling and Reliability Scoring | Link | Github | |
Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition | Link | Github | |
Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis | Link | Github | |
DiGA: Distil To Generalize and Then Adapt for Domain Adaptive Semantic Segmentation | Link | Github | |
Learning a Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation Models | Link | Github | |
SliceMatch: Geometry-Guided Aggregation for Cross-View Pose Estimation | Link | Github | |
DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection | Link | Github | |
On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks | Link | Github | |
ScarceNet: Animal Pose Estimation With Scarce Annotations | Link | Github | |
Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric | Link | Github | |
Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction | Link | Github | |
Preserving Linear Separability in Continual Learning by Backward Feature Projection | Link | Github | |
Generalizable Implicit Neural Representations via Instance Pattern Composers | Link | Github | |
Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching | Link | Github | |
Progressive Neighbor Consistency Mining for Correspondence Pruning | Link | Github | |
Trainable Projected Gradient Method for Robust Fine-Tuning | Link | Github | |
Independent Component Alignment for Multi-Task Learning | Link | Github | |
Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit | Link | Github | |
DualVector: Unsupervised Vector Font Synthesis With Dual-Part Representation | Link | Github | |
Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images | Link | Github | |
Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection | Link | Github | |
Partial Network Cloning | Link | Github | |
Ultra-High Resolution Segmentation With Ultra-Rich Context: A Novel Benchmark | Link | Github | |
Object Detection With Self-Supervised Scene Adaptation | Link | Github | |
Generative Bias for Robust Visual Question Answering | Link | Github | |
MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation | Link | Github | |
Coreset Sampling From Open-Set for Fine-Grained Self-Supervised Learning | Link | Github | |
Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures | Link | Github | |
SE-ORNet: Self-Ensembling Orientation-Aware Network for Unsupervised Point Cloud Shape Correspondence | Link | Github | |
B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution | Link | Github | |
High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors | Link | Github | |
DivClust: Controlling Diversity in Deep Clustering | Link | Github | |
Large-Scale Training Data Search for Object Re-Identification | Link | Github | |
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning | Link | Github | |
CREPE: Can Vision-Language Foundation Models Reason Compositionally? | Link | Github | |
Semi-Supervised Domain Adaptation With Source Label Adaptation | Link | Github | |
StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning | Link | Github | |
Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples | Link | Github | |
ScanDMM: A Deep Markov Model of Scanpath Prediction for 360Β° Images | Link | Github | |
PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification | Link | Github | |
DIP: Dual Incongruity Perceiving Network for Sarcasm Detection | Link | Github | |
Weakly Supervised Video Representation Learning With Unaligned Text for Sequential Videos | Link | Github | |
PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer | Link | Github | |
Continuous Intermediate Token Learning With Implicit Motion Manifold for Keyframe Based Motion Interpolation | Link | Github | |
VQACL: A Novel Visual Question Answering Continual Learning Setting | Link | Github | |
RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval | Link | Github | |
PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations | Link | Github | |
MixTeacher: Mining Promising Labels With Mixed Scale Teacher for Semi-Supervised Object Detection | Link | Github | |
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training | Link | Github | |
Computationally Budgeted Continual Learning: What Does Matter? | Link | Github | |
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers | Link | Github | |
Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network | Link | Github | |
R2Former: Unified Retrieval and Reranking Transformer for Place Recognition | Link | Github | |
Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization | Link | Github | |
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement | Link | Github | |
DistilPose: Tokenized Pose Regression With Heatmap Distillation | Link | Github | |
Bitstream-Corrupted JPEG Images Are Restorable: Two-Stage Compensation and Alignment Framework for Image Restoration | Link | Github | |
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks | Link | Github | |
BiCro: Noisy Correspondence Rectification for Multi-Modality Data via Bi-Directional Cross-Modal Similarity Consistency | Link | Github | |
Representation Learning for Visual Object Tracking by Masked Appearance Transfer | Link | Github | |
AnchorFormer: Point Cloud Completion From Discriminative Nodes | Link | Github | |
TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation | Link | Github | |
Proximal Splitting Adversarial Attack for Semantic Segmentation | Link | Github | |
NVTC: Nonlinear Vector Transform Coding | Link | Github | |
CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose | Link | Github | |
Enhancing the Self-Universality for Transferable Targeted Attacks | Link | Github | |
Randomized Adversarial Training via Taylor Expansion | Link | Github | |
Long Range Pooling for 3D Large-Scale Scene Understanding | Link | Github | |
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training | Link | Github | |
Federated Domain Generalization With Generalization Adjustment | Link | Github | |
CoMFormer: Continual Learning in Semantic and Panoptic Segmentation | Link | Github | |
Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning | Link | Github | |
MIST: Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering | Link | Github | |
STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition | Link | Github | |
An In-Depth Exploration of Person Re-Identification and Gait Recognition in Cloth-Changing Conditions | Link | Github | |
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions | Link | Github | |
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning | Link | Github | |
Long-Tailed Visual Recognition via Self-Heterogeneous Integration With Knowledge Excavation | Link | Github | |
Bias Mimicking: A Simple Sampling Approach for Bias Mitigation | Link | Github | |
OReX: Object Reconstruction From Planar Cross-Sections Using Neural Fields | Link | Github | |
Multi-Level Logit Distillation | Link | Github | |
Real-Time Evaluation in Online Continual Learning: A New Hope | Link | Github | |
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction | Link | Github | |
CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network With Large Input | Link | Github | |
Boosting Video Object Segmentation via Space-Time Correspondence Learning | Link | Github | |
Hunting Sparsity: Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation | Link | Github | |
TINC: Tree-Structured Implicit Neural Compression | Link | Github | |
Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels | Link | Github | |
DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting | Link | Github | |
Large-Capacity and Flexible Video Steganography via Invertible Neural Network | Link | Github | |
VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization | Link | Github | |
LINe: Out-of-Distribution Detection by Leveraging Important Neurons | Link | Github | |
Neural Transformation Fields for Arbitrary-Styled Font Generation | Link | Github | |
Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning | Link | Github | |
Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation | Link | Github | |
Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation | Link | Github | |
FCC: Feature Clusters Compression for Long-Tailed Visual Recognition | Link | Github | |
Neural Vector Fields: Implicit Representation by Explicit Learning | Link | Github | |
Learning Action Changes by Measuring Verb-Adverb Textual Relationships | Link | Github | |
Make Landscape Flatter in Differentially Private Federated Learning | Link | Github | |
Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization | Link | Github | |
Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning | Link | Github | |
Knowledge Combination To Learn Rotated Detection Without Rotated Annotation | Link | Github | |
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias | Link | Github | |
Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion | Link | Github | |
PointCert: Point Cloud Classification With Deterministic Certified Robustness Guarantees | Link | Github | |
Advancing Visual Grounding With Scene Knowledge: Benchmark and Method | Link | Github | |
Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt | Link | Github | |
3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention | Link | Github | |
Self-Supervised 3D Scene Flow Estimation Guided by Superpoints | Link | Github | |
End-to-End Video Matting With Trimap Propagation | Link | Github | |
Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement | Link | Github | |
Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection | Link | Github | |
RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts | Link | Github | |
Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising | Link | Github | |
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network | Link | Github | |
MAGVLT: Masked Generative Vision-and-Language Transformer | Link | Github | |
Focused and Collaborative Feedback Integration for Interactive Image Segmentation | Link | Github | |
OpenMix: Exploring Outlier Samples for Misclassification Detection | Link | Github | |
Adaptive Data-Free Quantization | Link | Github | |
VideoTrack: Learning To Track Objects via Video Transformer | Link | Github | |
Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module | Link | Github | |
Towards Better Stability and Adaptability: Improve Online Self-Training for Model Adaptation in Semantic Segmentation | Link | Github | |
Contrastive Grouping With Transformer for Referring Image Segmentation | Link | Github | |
Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation | Link | Github | |
3D-POP β An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds With Marker-Based Motion Capture | Link | Github | |
PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering | Link | Github | |
Towards Open-World Segmentation of Parts | Link | Github | |
PCR: Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning | Link | Github | |
Quantum Multi-Model Fitting | Link | Github | |
Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment | Link | Github | |
Practical Network Acceleration With Tiny Sets | Link | Github | |
Feature Alignment and Uniformity for Test Time Adaptation | Link | Github | |
Finding Geometric Models by Clustering in the Consensus Space | Link | Github | |
VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation | Link | Github | |
Meta-Learning With a Geometry-Adaptive Preconditioner | Link | Github | |
Divide and Conquer: Answering Questions With Object Factorization and Compositional Reasoning | Link | Github | |
Physical-World Optical Adversarial Attacks on 3D Face Recognition | Link | Github | |
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning | Link | Github | |
On Calibrating Semantic Segmentation Models: Analyses and an Algorithm | Link | Github | |
Binary Latent Diffusion | Link | Github | |
Q: How To Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images! | Link | Github | |
MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding From Object Detection | Link | Github | |
Behavioral Analysis of Vision-and-Language Navigation Agents | Link | Github | |
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding | Link | Github | |
Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation | Link | Github | |
Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections | Link | Github | |
Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection | Link | Github | |
Non-Contrastive Unsupervised Learning of Physiological Signals From Video | Link | Github | |
Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning | Link | Github | |
Markerless Camera-to-Robot Pose Estimation via Self-Supervised Sim-to-Real Transfer | Link | Github | |
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning | Link | Github | |
PeakConv: Learning Peak Receptive Field for Radar Semantic Segmentation | Link | Github | |
Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation | Link | Github | |
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning | Link | Github | |
Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification | Link | Github | |
Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning | Link | Github | |
Abstract Visual Reasoning: An Algebraic Approach for Solving Ravenβs Progressive Matrices | Link | Github | |
Introducing Competition To Boost the Transferability of Targeted Adversarial Examples Through Clean Feature Mixup | Link | Github | |
Boosting Verified Training for Robust Image Classifications via Abstraction | Link | Github | |
DaFKD: Domain-Aware Federated Knowledge Distillation | Link | Github | |
Resource-Efficient RGBD Aerial Tracking | Link | Github | |
BiasBed β Rigorous Texture Bias Evaluation | Link | Github | |
Progressive Open Space Expansion for Open-Set Model Attribution | Link | Github | |
Harmonious Feature Learning for Interactive Hand-Object Pose Estimation | Link | Github | |
Masked Images Are Counterfactual Samples for Robust Fine-Tuning | Link | Github | |
MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning | Link | Github | |
CFA: Class-Wise Calibrated Fair Adversarial Training | Link | Github | |
Regularization of Polynomial Networks for Image Recognition | Link | Github | |
SlowLiDAR: Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples | Link | Github | |
Depth Estimation From Indoor Panoramas With Neural Scene Representation | Link | Github | |
Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch Corruptions | Link | Github | |
EfficientSCI: Densely Connected Network With Space-Time Factorization for Large-Scale Video Snapshot Compressive Imaging | Link | Github | |
GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task | Link | Github | |
Boundary-Aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval | Link | Github | |
Towards Practical Plug-and-Play Diffusion Models | Link | Github | |
Where We Are and What Weβre Looking At: Query Based Worldwide Image Geo-Localization Using Hierarchies and Scenes | Link | Github | |
PEFAT: Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation and Feature Adversarial Training | Link | Github | |
From Node Interaction To Hop Interaction: New Effective and Scalable Graph Learning Paradigm | Link | Github | |
Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-Shot Learning With Hyperspherical Embeddings | Link | Github | |
Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning | Link | Github | |
Layout-Based Causal Inference for Object Navigation | Link | Github | |
Ensemble-Based Blackbox Attacks on Dense Prediction | Link | Github | |
Adversarial Robustness via Random Projection Filters | Link | Github | |
NLOST: Non-Line-of-Sight Imaging With Transformer | Link | Github | |
Fast Contextual Scene Graph Generation With Unbiased Context Augmentation | Link | Github | |
Event-Based Blurry Frame Interpolation Under Blind Exposure | Link | Github | |
Defending Against Patch-Based Backdoor Attacks on Self-Supervised Learning | Link | Github | |
GradMA: A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting | Link | Github | |
Balanced Product of Calibrated Experts for Long-Tailed Recognition | Link | Github | |
Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions | Link | Github | |
Annealing-Based Label-Transfer Learning for Open World Object Detection | Link | Github | |
Make-a-Story: Visual Memory Conditioned Consistent Story Generation | Link | Github | |
Revisiting Prototypical Network for Cross Domain Few-Shot Learning | Link | Github | |
Perception and Semantic Aware Regularization for Sequential Confidence Calibration | Link | Github | |
Semi-Weakly Supervised Object Kinematic Motion Prediction | Link | Github | |
Image Quality-Aware Diagnosis via Meta-Knowledge Co-Embedding | Link | Github | |
MaLP: Manipulation Localization Using a Proactive Scheme | Link | Github | |
Adjustment and Alignment for Unbiased Open Set Domain Adaptation | Link | Github | |
Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions | Link | Github | |
Sliced Optimal Partial Transport | Link | Github | |
HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions | Link | Github | |
Trap Attention: Monocular Depth Estimation With Manual Traps | Link | Github | |
GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection | Link | Github | |
Learning From Noisy Labels With Decoupled Meta Label Purifier | Link | Github | |
Local Connectivity-Based Density Estimation for Face Clustering | Link | Github | |
Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography | Link | Github | |
Probing Neural Representations of Scene Perception in a Hippocampally Dependent Task Using Artificial Neural Networks | Link | Github | |
A Probabilistic Framework for Lifelong Test-Time Adaptation | Link | Github | |
PointCMP: Contrastive Mask Prediction for Self-Supervised Learning on Point Cloud Videos | Link | Github | |
Deep Polarization Reconstruction With PDAVIS Events | Link | Github | |
Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting | Link | Github | |
Probabilistic Debiasing of Scene Graphs | Link | Github | |
PMR: Prototypical Modal Rebalance for Multimodal Learning | Link | Github | |
Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning | Link | Github | |
HyperCUT: Video Sequence From a Single Blurry Image Using Unsupervised Ordering | Link | Github | |
Document Image Shadow Removal Guided by Color-Aware Background | Link | Github | |
DLBD: A Self-Supervised Direct-Learned Binary Descriptor | Link | Github | |
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning | Link | Github | |
Learning Debiased Representations via Conditional Attribute Interpolation | Link | Github | |
Bayesian Posterior Approximation With Stochastic Ensembles | Link | Github | |
Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning | Link | Github | |
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning | Link | Github | |
Noisy Correspondence Learning With Meta Similarity Correction | Link | Github | |
RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases | Link | Github | |
Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval | Link | Github | |
BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud Registration | Link | Github | |
Are Data-Driven Explanations Robust Against Out-of-Distribution Data? | Link | Github | |
Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection | Link | Github | |
Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning | Link | Github | |
High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition | Link | Github | |
A Bag-of-Prototypes Representation for Dataset-Level Applications | Link | Github | |
Neural Dependencies Emerging From Learning Massive Categories | Link | Github | |
Learning With Noisy Labels via Self-Supervised Adversarial Noisy Masking | Link | Github | |
CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset | Link | Github | |
Balanced Energy Regularization Loss for Out-of-Distribution Detection | Link | Github | |
Being Comes From Not-Being: Open-Vocabulary Text-to-Motion Generation With Wordless Training | Link | Github | |
Masked Representation Learning for Domain Generalized Stereo Matching | Link | Github | |
Where Is My Spot? Few-Shot Image Generation via Latent Subspace Optimization | Link | Github | |
Genie: Show Me the Data for Quantization | Link | Github | |
G-MSM: Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors | Link | Github | |
TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers | Link | Github | |
Hierarchical Prompt Learning for Multi-Task Learning | Link | Github | |
Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising | Link | Github | |
Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration | Link | Github | |
Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization | Link | Github | |
Towards Effective Visual Representations for Partial-Label Learning | Link | Github | |
Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation | Link | Github | |
Black-Box Sparse Adversarial Attack via Multi-Objective Optimisation | Link | Github | |
Spatio-Temporal Pixel-Level Contrastive Learning-Based Source-Free Domain Adaptation for Video Semantic Segmentation | Link | Github | |
Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint | Link | Github | |
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval | Link | Github | |
Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEncoder | Link | Github | |
Towards Bridging the Performance Gaps of Joint Energy-Based Models | Link | Github | |
Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection | Link | Github | |
AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection | Link | Github | |
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning | Link | Github | |
X-Pruner: eXplainable Pruning for Vision Transformers | Link | Github | |
Efficient Mask Correction for Click-Based Interactive Image Segmentation | Link | Github | |
Dynamic Aggregated Network for Gait Recognition | Link | Github | |
Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery | Link | Github | |
Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier and Reconstructor | Link | Github | |
Adaptive Plasticity Improvement for Continual Learning | Link | Github | |
Jedi: Entropy-Based Localization and Removal of Adversarial Patches | Link | Github | |
BAAM: Monocular 3D Pose and Shape Reconstruction With Bi-Contextual Attention Module and Attention-Guided Modeling | Link | Github | |
Leverage Interactive Affinity for Affordance Learning | Link | Github | |
Evolved Part Masking for Self-Supervised Learning | Link | Github | |
CHMATCH: Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervised Learning | Link | Github | |
High-Fidelity Event-Radiance Recovery via Transient Event Frequency | Link | Github | |
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures | Link | Github | |
Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns | Link | Github | |
Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types for Semi-Weakly Segmentation in Expert-Driven Domains | Link | Github | |
A Soma Segmentation Benchmark in Full Adult Fly Brain | Link | Github | |
KD-DLGAN: Data Limited Image Generation via Knowledge Distillation | Link | Github | |
PIVOT: Prompting for Video Continual Learning | Link | Github | |
Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks | Link | Github | |
L-CoIns: Language-Based Colorization With Instance Awareness | Link | Github | |
Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph | Link | Github | |
Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration | Link | Github | |
Dense Network Expansion for Class Incremental Learning | Link | Github | |
Unsupervised Intrinsic Image Decomposition With LiDAR Intensity | Link | Github | |
Neuralizer: General Neuroimage Analysis Without Re-Training | Link | Github | |
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers | Link | Github | |
Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling | Link | Github | |
Modular Memorability: Tiered Representations for Video Memorability Prediction | Link | Github | |
Federated Learning With Data-Agnostic Distribution Fusion | Link | Github | |
Four-View Geometry With Unknown Radial Distortion | Link | Github | |
Manipulating Transfer Learning for Property Inference | Link | Github | |
BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image | Link | Github | |
3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud | Link | Github | |
Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks | Link | Github | |
Towards Professional Level Crowd Annotation of Expert Domain Data | Link | Github | |
Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation | Link | Github | |
Similarity Metric Learning for RGB-Infrared Group Re-Identification | Link | Github | |
On the Difficulty of Unpaired Infrared-to-Visible Video Translation: Fine-Grained Content-Rich Patches Transfer | Link | Github | |
Camouflaged Instance Segmentation via Explicit De-Camouflaging | Link | Github | |
Global Vision Transformer Pruning With Hessian-Aware Saliency | Link | Github | |
DoNet: Deep De-Overlapping Network for Cytology Instance Segmentation | Link | Github | |
ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer | Link | Github | |
AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning | Link | Github | |
Simulated Annealing in Early Layers Leads to Better Generalization | Link | Github | |
Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding | Link | Github | |
Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation | Link | Github | |
Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation | Link | Github | |
MEDIC: Remove Model Backdoors via Importance Driven Cloning | Link | Github | |
Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing With Non-Learnable Primitives | Link | Github | |
Adaptive Graph Convolutional Subspace Clustering | Link | Github | |
Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language | Link | Github | |
Correlational Image Modeling for Self-Supervised Visual Pre-Training | Link | Github | |
Text With Knowledge Graph Augmented Transformer for Video Captioning | Link | Github | |
Panoptic Video Scene Graph Generation | Link | Github | |
DartBlur: Privacy Preservation With Detection Artifact Suppression | Link | Github | |
IDGI: A Framework To Eliminate Explanation Noise From Integrated Gradients | Link | Github | |
Ultrahigh Resolution Image/Video Matting With Spatio-Temporal Sparsity | Link | Github | |
Vector Quantization With Self-Attention for Quality-Independent Representation Learning | Link | Github | |
Privacy-Preserving Representations Are Not Enough: Recovering Scene Content From Camera Poses | Link | Github | |
DETRs With Hybrid Matching | Link | Github | |
GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods | Link | Github | |
AltFreezing for More General Video Face Forgery Detection | Link | Github | |
Heterogeneous Continual Learning | Link | Github | |
EMT-NAS:Transferring Architectural Knowledge Between Tasks From Different Datasets | Link | Github | |
Efficient Movie Scene Detection Using State-Space Transformers | Link | Github | |
Private Image Generation With Dual-Purpose Auxiliary Classifier | Link | Github | |
BASiS: Batch Aligned Spectral Embedding Space | Link | Github | |
A Large-Scale Robustness Analysis of Video Action Recognition Models | Link | Github | |
Neumann Network With Recursive Kernels for Single Image Defocus Deblurring | Link | Github | |
Rebalancing Batch Normalization for Exemplar-Based Class-Incremental Learning | Link | Github | |
ToThePoint: Efficient Contrastive Learning of 3D Point Clouds via Recycling | Link | Github | |
Self-Supervised Blind Motion Deblurring With Deep Expectation Maximization | Link | Github | |
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning | Link | Github | |
DINN360: Deformable Invertible Neural Network for Latitude-Aware 360Β° Image Rescaling | Link | Github | |
Patch-Craft Self-Supervised Training for Correlated Image Denoising | Link | Github | |
Learning Decorrelated Representations Efficiently Using Fast Fourier Transform | Link | Github | |
AstroNet: When Astrocyte Meets Artificial Neural Network | Link | Github | |
PanoSwin: A Pano-Style Swin Transformer for Panorama Understanding | Link | Github | |
Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge | Link | Github | |
Polarized Color Image Denoising | Link | Github |