Computer Vision and Pattern Recognition
Authors and titles for recent submissions
[ total of 739 entries: 1-420 | 421-739 ][ showing 420 entries per page: fewer | more | all ]
Mon, 3 Jun 2024
- [1] arXiv:2405.21075 [pdf, other]
-
Title: Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video AnalysisAuthors: Chaoyou Fu, Yuhan Dai, Yondong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing SunComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [2] arXiv:2405.21074 [pdf, other]
-
Title: Latent Intrinsics Emerge from Training to RelightSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [3] arXiv:2405.21070 [pdf, other]
-
Title: Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable InsightsSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [4] arXiv:2405.21066 [pdf, other]
-
Title: Mixed Diffusion for 3D Indoor Scene SynthesisAuthors: Siyi Hu, Diego Martin Arroyo, Stephanie Debats, Fabian Manhardt, Luca Carlone, Federico TombariComments: 19 pages, 14 figures. Under review. Code to be released at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [5] arXiv:2405.21059 [pdf, other]
-
Title: Unified Directly Denoising for Both Variance Preserving and Variance Exploding Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [6] arXiv:2405.21050 [pdf, other]
-
Title: Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion ModelsAuthors: Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Dimitris N. MetaxasSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [7] arXiv:2405.21048 [pdf, other]
-
Title: Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent ModelingComments: 22 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [8] arXiv:2405.21016 [pdf, other]
-
Title: MpoxSLDNet: A Novel CNN Model for Detecting Monkeypox Lesions and Performance Comparison with Pre-trained ModelsAuthors: Fatema Jannat Dihan, Saydul Akbar Murad, Abu Jafar Md Muzahid, K. M. Aslam Uddin, Mohammed J.F. Alenazi, Anupam Kumar Bairagi, Sujit BiswasSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [9] arXiv:2405.21013 [pdf, other]
-
Title: StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and BeyondAuthors: Pengyuan Lyu, Yulin Li, Hao Zhou, Weihong Ma, Xingyu Wan, Qunyi Xie, Liang Wu, Chengquan Zhang, Kun Yao, Errui Ding, Jingdong WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [10] arXiv:2405.20991 [pdf, other]
-
Title: Hard Cases Detection in Motion Prediction by Vision-Language Foundation ModelsComments: IEEE Intelligent Vehicles Symposium (IV) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [11] arXiv:2405.20987 [pdf, other]
-
Title: Early Stopping Criteria for Training Generative Adversarial Networks in Biomedical ImagingComments: This paper is accepted at the 35th IEEE Irish Signals and Systems Conference (ISSC 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [12] arXiv:2405.20985 [pdf, other]
-
Title: DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [13] arXiv:2405.20980 [pdf, other]
-
Title: Neural Gaussian Scale-Space FieldsAuthors: Felix Mujkanovic, Ntumba Elie Nsampi, Christian Theobalt, Hans-Peter Seidel, Thomas LeimkühlerComments: 15 pages; SIGGRAPH 2024; project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
- [14] arXiv:2405.20906 [pdf, ps, other]
-
Title: Enhancing Vision Models for Text-Heavy Content Understanding and InteractionComments: 5 pages, 4 figures (including 1 graph)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [15] arXiv:2405.20892 [pdf, other]
-
Title: MALT: Multi-scale Action Learning Transformer for Online Action DetectionComments: 8 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [16] arXiv:2405.20881 [pdf, other]
-
Title: S4Fusion: Saliency-aware Selective State Space Model for Infrared Visible Image FusionComments: NurIPS, Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [17] arXiv:2405.20876 [pdf, other]
-
Title: Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark StudyComments: 11 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [18] arXiv:2405.20868 [pdf, other]
-
Title: Responsible AI for Earth ObservationAuthors: Pedram Ghamisi, Weikang Yu, Andrea Marinoni, Caroline M. Gevaert, Claudio Persello, Sivasakthy Selvakumaran, Manuela Girotto, Benjamin P. Horton, Philippe Rufin, Patrick Hostert, Fabio Pacifici, Peter M. AtkinsonSubjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
- [19] arXiv:2405.20867 [pdf, other]
-
Title: Automatic Channel Pruning for Multi-Head AttentionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)
- [20] arXiv:2405.20853 [pdf, other]
-
Title: MeshXL: Neural Coordinate Field for Generative 3D Foundation ModelsAuthors: Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao ChenSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [21] arXiv:2405.20851 [pdf, other]
-
Title: MegActor: Harness the Power of Raw Video for Vivid Portrait AnimationAuthors: Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang FanSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [22] arXiv:2405.20834 [pdf, other]
-
Title: Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal ReasoningAuthors: Cheng Tan, Jingxuan Wei, Linzhuang Sun, Zhangyang Gao, Siyuan Li, Bihui Yu, Ruifeng Guo, Stan Z. LiComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [23] arXiv:2405.20829 [pdf, other]
-
Title: Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive InferenceComments: CVPR Workshop on Computer Vision in the Wild (CVinW), 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [24] arXiv:2405.20810 [pdf, other]
-
Title: Context-aware Difference Distilling for Multi-change CaptioningComments: Accepted by ACL 2024 main conference (long paper)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [25] arXiv:2405.20797 [pdf, other]
-
Title: Ovis: Structural Embedding Alignment for Multimodal Large Language ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [26] arXiv:2405.20795 [pdf, other]
-
Title: InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [27] arXiv:2405.20791 [pdf, other]
-
Title: GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [28] arXiv:2405.20786 [pdf, other]
-
Title: Stratified Avatar Generation from Sparse ObservationsSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [29] arXiv:2405.20764 [pdf, other]
-
Title: CoMoFusion: Fast and High-quality Fusion of Infrared and Visible Image with Consistency ModelSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [30] arXiv:2405.20750 [pdf, other]
-
Title: Diffusion Models Are Innate One-Step GeneratorsComments: 9 pages, 4 figures and 4 tables on the main contentsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [31] arXiv:2405.20743 [pdf, other]
-
Title: Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent CodesComments: 15 pages, 3 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [32] arXiv:2405.20735 [pdf, other]
-
Title: Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical ImagesComments: $\copyright$ 2024 IEEE. Accepted in 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [33] arXiv:2405.20729 [pdf, other]
-
Title: Extreme Point Supervised Instance SegmentationComments: CVPR 2024 AcceptedSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [34] arXiv:2405.20721 [pdf, other]
-
Title: ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [35] arXiv:2405.20720 [pdf, other]
-
Title: Power of Cooperative Supervision: Multiple Teachers Framework for Enhanced 3D Semi-Supervised Object DetectionComments: under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [36] arXiv:2405.20717 [pdf, other]
-
Title: Cyclic image generation using chaotic dynamicsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
- [37] arXiv:2405.20711 [pdf, other]
-
Title: Revisiting Mutual Information Maximization for Generalized Category DiscoveryComments: Preprint versionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [38] arXiv:2405.20687 [pdf, other]
-
Title: Conditioning GAN Without Training DatasetAuthors: Kidist Amde MekonnenComments: 5 pages, 2 figures, Part of my MSc project course, School Project Course 2022Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
- [39] arXiv:2405.20675 [pdf, other]
-
Title: Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion SamplingComments: 7 pages, 11 figures, ELLIS Doctoral Symposium 2023 in Helsinki, FinlandSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
- [40] arXiv:2405.20674 [pdf, other]
-
Title: 4Diffusion: Multi-view Video Diffusion Model for 4D GenerationComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [41] arXiv:2405.20672 [pdf, other]
-
Title: Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbationsComments: 22 pages, 15 figures (including appendix)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [42] arXiv:2405.20669 [pdf, other]
-
Title: Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score DistillationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [43] arXiv:2405.20666 [pdf, other]
-
Title: MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language RecognitionComments: Accepted by TCSVT 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [44] arXiv:2405.20650 [pdf, other]
-
Title: GenMix: Combining Generative and Mixture Data Augmentation for Medical Image ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [45] arXiv:2405.20648 [pdf, other]
-
Title: Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and SummarizationSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [46] arXiv:2405.20643 [pdf, other]
-
Title: Learning Gaze-aware Compositional GANAuthors: Nerea Aranjuelo, Siyu Huang, Ignacio Arganda-Carreras, Luis Unzueta, Oihana Otaegui, Hanspeter Pfister, Donglai WeiComments: Accepted by ETRA 2024 as Full paper, and as journal paper in Proceedings of the ACM on Computer Graphics and Interactive TechniquesJournal-ref: Proceedings of the ACM on Computer Graphics and Interactive Techniques, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [47] arXiv:2405.20633 [pdf, other]
-
Title: Action-OOD: An End-to-End Skeleton-Based Model for Robust Out-of-Distribution Human Action DetectionComments: Under consideration at Computer Vision and Image UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [48] arXiv:2405.20614 [pdf, other]
-
Title: EPIDetect: Video-based convulsive seizure detection in chronic epilepsy mouse model for anti-epilepsy drug screeningAuthors: Junming Ren, Zhoujian Xiao, Yujia Zhang, Yujie Yang, Ling He, Ezra Yoon, Stephen Temitayo Bello, Xi Chen, Dapeng Wu, Micky Tortorella, Jufang HeSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [49] arXiv:2405.20610 [pdf, other]
-
Title: Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic SegmentationComments: 14 pages, 5 figures, submitted to IEEE TPAMI. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [50] arXiv:2405.20607 [pdf, other]
-
Title: Textual Inversion and Self-supervised Refinement for Radiology Report GenerationAuthors: Yuanjiang Luo, Hongxiang Li, Xuan Wu, Meng Cao, Xiaoshuang Huang, Zhihong Zhu, Peixi Liao, Hu Chen, Yi ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [51] arXiv:2405.20606 [pdf, other]
-
Title: Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
- [52] arXiv:2405.20596 [pdf, other]
-
Title: Generalized Semi-Supervised Learning via Self-Supervised Feature AdaptationComments: 10 pages; Accepted by NeurIPS 2023Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [53] arXiv:2405.20584 [pdf, other]
-
Title: Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based CustomizationComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [54] arXiv:2405.20510 [pdf, other]
-
Title: Physically Compatible 3D Object Modeling from a Single ImageAuthors: Minghao Guo, Bohan Wang, Pingchuan Ma, Tianyuan Zhang, Crystal Elaine Owens, Chuang Gan, Joshua B. Tenenbaum, Kaiming He, Wojciech MatusikSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [55] arXiv:2405.20494 [pdf, other]
-
Title: Slight Corruption in Pre-training Data Makes Better Diffusion ModelsAuthors: Hao Chen, Yujin Han, Diganta Misra, Xiang Li, Kai Hu, Difan Zou, Masashi Sugiyama, Jindong Wang, Bhiksha RajComments: 50 pages, 33 figures, 4 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [56] arXiv:2405.20469 [pdf, other]
-
Title: Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic ImagesComments: Accepted at CVPR 2024 Workshop: SyntaGen-Harnessing Generative Models for Synthetic Visual Datasets. Project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [57] arXiv:2405.20465 [pdf, other]
-
Title: ENTIRe-ID: An Extensive and Diverse Dataset for Person Re-IdentificationComments: 5 pages, 2024 18th International Conference on Automatic Face and Gesture Recognition (FG)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [58] arXiv:2405.20462 [pdf, other]
-
Title: Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation PretrainingComments: 16 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [59] arXiv:2405.20459 [pdf, other]
-
Title: On Calibration of Object Detectors: Pitfalls, Evaluation and BaselinesComments: 31 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [60] arXiv:2405.20443 [pdf, ps, other]
-
Title: P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [61] arXiv:2405.20364 [pdf, other]
-
Title: Learning 3D Robotics Perception using Inductive PriorsAuthors: Muhammad Zubair IrshadComments: Georgia Tech Ph.D. Thesis, December 2023. For more details: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [62] arXiv:2405.20363 [pdf, other]
-
Title: LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wildComments: 7 pages, 3 figures, 5 tables, CVPR 2024 Workshop on Computer Vision in the WildSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [63] arXiv:2405.21056 (cross-list from cs.RO) [pdf, other]
-
Title: An Organic Weed Control Prototype using Directed Energy and Deep LearningSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [64] arXiv:2405.21022 (cross-list from cs.CL) [pdf, other]
-
Title: You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNetComments: Technical report. Yiran Zhong is the corresponding author. The code is available at this https URLSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [65] arXiv:2405.20986 (cross-list from cs.LG) [pdf, other]
-
Title: Uncertainty Quantification for Bird's Eye View Semantic Segmentation: Methods and BenchmarksSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [66] arXiv:2405.20981 (cross-list from cs.AI) [pdf, other]
-
Title: Generative Adversarial Networks in Ultrasound Imaging: Extending Field of View Beyond Conventional LimitsSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [67] arXiv:2405.20971 (cross-list from cs.LG) [pdf, other]
-
Title: Amortizing intractable inference in diffusion models for vision, language, and controlAuthors: Siddarth Venkatraman, Moksh Jain, Luca Scimeca, Minsu Kim, Marcin Sendera, Mohsin Hasan, Luke Rowe, Sarthak Mittal, Pablo Lemos, Emmanuel Bengio, Alexandre Adam, Jarrid Rector-Brooks, Yoshua Bengio, Glen Berseth, Nikolay MalkinComments: Code: this https URLSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [68] arXiv:2405.20915 (cross-list from cs.LG) [pdf, other]
-
Title: Fast yet Safe: Early-Exiting with Risk ControlAuthors: Metod Jazbec, Alexander Timans, Tin Hadži Veljković, Kaspar Sakmann, Dan Zhang, Christian A. Naesseth, Eric NalisnickComments: 25 pages, 11 figures, 4 tables (incl. appendix)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- [69] arXiv:2405.20910 (cross-list from physics.app-ph) [pdf, other]
-
Title: Predicting ptychography probe positions using single-shot phase retrieval neural networkSubjects: Applied Physics (physics.app-ph); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Data Analysis, Statistics and Probability (physics.data-an)
- [70] arXiv:2405.20838 (cross-list from cs.LG) [pdf, other]
-
Title: einspace: Searching for Neural Architectures from Fundamental OperationsAuthors: Linus Ericsson, Miguel Espinosa, Chenhongyi Yang, Antreas Antoniou, Amos Storkey, Shay B. Cohen, Steven McDonagh, Elliot J. CrowleyComments: Project page at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- [71] arXiv:2405.20771 (cross-list from cs.CR) [pdf, other]
-
Title: Towards Black-Box Membership Inference Attack for Diffusion ModelsSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [72] arXiv:2405.20759 (cross-list from cs.LG) [pdf, other]
-
Title: Information Theoretic Text-to-Image AlignmentSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [73] arXiv:2405.20725 (cross-list from cs.AI) [pdf, other]
-
Title: GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture SearchSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [74] arXiv:2405.20719 (cross-list from cs.AI) [pdf, other]
-
Title: Climate Variable Downscaling with Conditional Normalizing FlowsSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)
- [75] arXiv:2405.20693 (cross-list from eess.IV) [pdf, other]
-
Title: R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic ReconstructionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [76] arXiv:2405.20685 (cross-list from cs.LG) [pdf, other]
-
Title: Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature SpaceSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [77] arXiv:2405.20628 (cross-list from cs.AI) [pdf, other]
-
Title: ToxVidLLM: A Multimodal LLM-based Framework for Toxicity Detection in Code-Mixed VideosComments: ACL Findings 2024Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [78] arXiv:2405.20605 (cross-list from cs.LG) [pdf, other]
-
Title: Searching for internal symbols underlying deep learningComments: 10 pages, 7 figures, 3 tables and AppendixSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [79] arXiv:2405.20559 (cross-list from physics.optics) [pdf, other]
-
Title: Universal evaluation and design of imaging systems using information estimationSubjects: Optics (physics.optics); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Image and Video Processing (eess.IV); Data Analysis, Statistics and Probability (physics.data-an)
- [80] arXiv:2405.20525 (cross-list from cs.ET) [pdf, other]
-
Title: Comparing Quantum Annealing and Spiking Neuromorphic Computing for Sampling Binary Sparse Coding QUBO ProblemsSubjects: Emerging Technologies (cs.ET); Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)
- [81] arXiv:2405.20513 (cross-list from cs.LG) [pdf, other]
-
Title: Deep Modeling of Non-Gaussian Aleatoric UncertaintyAuthors: Aastha Acharya, Caleb Lee, Marissa D'Alonzo, Jared Shamwell, Nisar R. Ahmed, Rebecca RussellComments: 8 pages, 7 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [82] arXiv:2405.20501 (cross-list from cs.RO) [pdf, other]
-
Title: ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic CaneComments: 8 pages, 14 figures and chartsJournal-ref: In AAMAS (pp. 1514-1523) 2023Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [83] arXiv:2405.20470 (cross-list from cs.RO) [pdf, other]
-
Title: STHN: Deep Homography Estimation for UAV Thermal Geo-localization with Satellite ImageryComments: 8 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [84] arXiv:2405.20431 (cross-list from cs.LG) [pdf, other]
-
Title: Exploring the Practicality of Federated Learning: A Survey Towards the Communication PerspectiveSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [85] arXiv:2405.20420 (cross-list from cs.LG) [pdf, other]
-
Title: Back to the Basics on Predicting Transfer PerformanceComments: 15 pages, 3 figures, 2 tablesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [86] arXiv:2405.20413 (cross-list from cs.CR) [pdf, other]
-
Title: Jailbreaking Large Language Models Against Moderation Guardrails via Cipher CharactersComments: 20 pagesSubjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [87] arXiv:2405.20392 (cross-list from eess.IV) [pdf, other]
-
Title: Can No-Reference Quality-Assessment Methods Serve as Perceptual Losses for Super-Resolution?Comments: 4 pages, 3 figures. The first two authors contributed equally to this workSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [88] arXiv:2405.20380 (cross-list from cs.AI) [pdf, other]
-
Title: Gradient Inversion of Federated Diffusion ModelsSubjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
- [89] arXiv:2405.20355 (cross-list from cs.NE) [pdf, other]
-
Title: Enhancing Adversarial Robustness in SNNs with Sparse GradientsComments: accepted by ICML 2024Subjects: Neural and Evolutionary Computing (cs.NE); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Fri, 31 May 2024
- [90] arXiv:2405.20343 [pdf, other]
-
Title: Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single ImageAuthors: Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, Kaisheng MaComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
- [91] arXiv:2405.20340 [pdf, other]
-
Title: MotionLLM: Understanding Human Behaviors from Human Motions and VideosComments: MotionLLM version 1.0, project page see this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [92] arXiv:2405.20339 [pdf, other]
-
Title: Visual Perception by Large Language Model's WeightsAuthors: Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan SunSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [93] arXiv:2405.20337 [pdf, other]
-
Title: OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous DrivingComments: Code is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [94] arXiv:2405.20336 [pdf, other]
-
Title: RapVerse: Coherent Vocals and Whole-Body Motions Generations from TextAuthors: Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang GanComments: Project website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [95] arXiv:2405.20334 [pdf, other]
-
Title: VividDream: Generating 3D Scene with Ambient DynamicsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [96] arXiv:2405.20333 [pdf, other]
-
Title: SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical VideosComments: 15 pages, 7 figures, 9 tables, 1 video. Supplementary video available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [97] arXiv:2405.20330 [pdf, other]
-
Title: 4DHands: Reconstructing Interactive Hands in 4D with TransformersAuthors: Dixuan Lin, Yuxiang Zhang, Mengcheng Li, Yebin Liu, Wei Jing, Qi Yan, Qianying Wang, Hongwen ZhangComments: More demo videos can be seen at our project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
- [98] arXiv:2405.20327 [pdf, other]
-
Title: GECO: Generative Image-to-3D within a SECOndComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [99] arXiv:2405.20325 [pdf, other]
-
Title: MotionFollower: Editing Video Motion via Lightweight Score-Guided DiffusionAuthors: Shuyuan Tu, Qi Dai, Zihao Zhang, Sicheng Xie, Zhi-Qi Cheng, Chong Luo, Xintong Han, Zuxuan Wu, Yu-Gang JiangComments: 23 pages, 18 figures. Project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [100] arXiv:2405.20324 [pdf, other]
-
Title: Don't drop your samples! Coherence-aware training benefits Conditional diffusionComments: Accepted at CVPR 2024 as a Highlight. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [101] arXiv:2405.20323 [pdf, other]
-
Title: $\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous DrivingAuthors: Nan Huang, Xiaobao Wei, Wenzhao Zheng, Pengju An, Ming Lu, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Shanghang ZhangComments: Code is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [102] arXiv:2405.20320 [pdf, other]
-
Title: Improving the Training of Rectified FlowsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [103] arXiv:2405.20319 [pdf, other]
-
Title: ParSEL: Parameterized Shape Editing with LanguageSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Symbolic Computation (cs.SC)
- [104] arXiv:2405.20310 [pdf, other]
-
Title: A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D ReconstructionComments: preprint, under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [105] arXiv:2405.20305 [pdf, other]
-
Title: Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language ModelsComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [106] arXiv:2405.20299 [pdf, other]
-
Title: Scaling White-Box Transformers for VisionComments: project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [107] arXiv:2405.20283 [pdf, other]
-
Title: TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric MeshesSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [108] arXiv:2405.20282 [pdf, other]
-
Title: SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified FlowSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [109] arXiv:2405.20279 [pdf, other]
-
Title: CV-VAE: A Compatible Video VAE for Latent Generative Video ModelsAuthors: Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying ShanComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
- [110] arXiv:2405.20259 [pdf, other]
-
Title: FaceMixup: Enhancing Facial Expression Recognition through Mixed Face RegularizationComments: 29 pages, 9 figures, paper is under review on journalSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [111] arXiv:2405.20230 [pdf, other]
-
Title: Feature Fusion for Improved Classification: Combining Dempster-Shafer Theory and Multiple CNN ArchitecturesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [112] arXiv:2405.20224 [pdf, other]
-
Title: EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry ImagesComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [113] arXiv:2405.20222 [pdf, other]
-
Title: MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion ModelSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [114] arXiv:2405.20216 [pdf, other]
-
Title: Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI FeedbackComments: 28 pages, 18 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [115] arXiv:2405.20188 [pdf, other]
-
Title: SPARE: Symmetrized Point-to-Plane Distance for Robust Non-Rigid RegistrationSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [116] arXiv:2405.20161 [pdf, other]
-
Title: Landslide mapping from Sentinel-2 imagery through change detectionComments: to be published in IEEE IGARSS 2024 conference proceedingsSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [117] arXiv:2405.20155 [pdf, other]
-
Title: MotionDreamer: Zero-Shot 3D Mesh Animation from Video Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [118] arXiv:2405.20152 [pdf, other]
-
Title: Uncovering Bias in Large Vision-Language Models at Scale with CounterfactualsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [119] arXiv:2405.20141 [pdf, other]
-
Title: OpenDAS: Domain Adaptation for Open-Vocabulary SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [120] arXiv:2405.20136 [pdf, other]
-
Title: A Multimodal Dangerous State Recognition and Early Warning System for Elderly with Intermittent DementiaComments: 13 pages,9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [121] arXiv:2405.20126 [pdf, other]
-
Title: Federated and Transfer Learning for Cancer Detection Based on Image AnalysisSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [122] arXiv:2405.20117 [pdf, other]
-
Title: Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark DetectionComments: 12 pages, 13 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [123] arXiv:2405.20112 [pdf, other]
-
Title: RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [124] arXiv:2405.20109 [pdf, other]
-
Title: FMARS: Annotating Remote Sensing Images for Disaster Management using Foundation ModelsAuthors: Edoardo Arnaudo, Jacopo Lungo Vaschetti, Lorenzo Innocenti, Luca Barco, Davide Lisi, Vanina Fissore, Claudio RossiComments: Accepted at IGARSS 2024, 5 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [125] arXiv:2405.20093 [pdf, other]
-
Title: Rapid Wildfire Hotspot Detection Using Self-Supervised Learning on Temporal Remote Sensing DataSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [126] arXiv:2405.20091 [pdf, other]
-
Title: Visual Attention Analysis in Online LearningComments: Accepted in CEDI 2024 (VII Congreso Espa\~nol de Inform\'atica), A Coru\~na, SpainSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [127] arXiv:2405.20090 [pdf, other]
-
Title: Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [128] arXiv:2405.20084 [pdf, other]
-
Title: Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation ApproachComments: 15 pages (with references)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [129] arXiv:2405.20081 [pdf, other]
-
Title: NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language ModelsAuthors: Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie WangComments: 14 pages, 5 figures with supplementary materialSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [130] arXiv:2405.20072 [pdf, other]
-
Title: Faces of the Mind: Unveiling Mental Health States Through Facial Expressions in 11,427 AdolescentsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [131] arXiv:2405.20067 [pdf, other]
-
Title: N-Dimensional Gaussians for Fitting of High Dimensional FunctionsComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [132] arXiv:2405.20062 [pdf, other]
-
Title: Can the accuracy bias by facial hairstyle be reduced through balancing the training data?Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [133] arXiv:2405.20058 [pdf, other]
-
Title: Enhancing Plant Disease Detection: A Novel CNN-Based Approach with Tensor Subspace Learning and HOWSVD-MDAuthors: Abdelmalik Ouamane, Ammar Chouchane, Yassine Himeur, Abderrazak Debilou, Abbes Amira, Shadi Atalla, Wathiq Mansoor, Hussain Al AhmadComments: 17 pages, 9 figures and 8 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [134] arXiv:2405.20044 [pdf, other]
-
Title: A Point-Neighborhood Learning Framework for Nasal Endoscope Image SegmentationAuthors: Pengyu Jie, Wanquan Liu, Chenqiang Gao, Yihui Wen, Rui He, Pengcheng Li, Jintao Zhang, Deyu MengComments: 10 pages, 10 figures,Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [135] arXiv:2405.20030 [pdf, other]
-
Title: EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric VideosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [136] arXiv:2405.20025 [pdf, other]
-
Title: From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehaveComments: CV4Animals: Computer Vision for Animal Behavior Tracking and Modeling In conjunction with Computer Vision and Pattern Recognition 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [137] arXiv:2405.20008 [pdf, other]
-
Title: Sharing Key Semantics in Transformer Makes Efficient Image RestorationAuthors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Ming-Hsuan Yang, Nicu SebeComments: 9 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [138] arXiv:2405.19996 [pdf, other]
-
Title: DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the WildSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [139] arXiv:2405.19990 [pdf, other]
-
Title: DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-WorldSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [140] arXiv:2405.19957 [pdf, other]
-
Title: PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian SplattingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [141] arXiv:2405.19949 [pdf, other]
-
Title: Hyper-Transformer for Amodal CompletionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [142] arXiv:2405.19943 [pdf, other]
-
Title: Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution WeightingComments: AAAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [143] arXiv:2405.19931 [pdf, other]
-
Title: Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural NetworksComments: Preprint. Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [144] arXiv:2405.19921 [pdf, other]
-
Title: MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and MotionSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [145] arXiv:2405.19917 [pdf, other]
-
Title: Multimodal Cross-Domain Few-Shot Learning for Egocentric Action RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [146] arXiv:2405.19914 [pdf, other]
-
Title: Towards RGB-NIR Cross-modality Image Registration and BeyondAuthors: Huadong Li, Shichao Dong, Jin Wang, Rong Fu, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe JiComments: 18 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [147] arXiv:2405.19899 [pdf, other]
-
Title: Open-Set Domain Adaptation for Semantic SegmentationComments: 14 pages, 5 figures, 13 tables, CVPR 2024 PosterSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [148] arXiv:2405.19882 [pdf, other]
-
Title: PixOOD: Pixel-Level Out-of-Distribution DetectionComments: under review at ECCV 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [149] arXiv:2405.19876 [pdf, other]
-
Title: IReNe: Instant Recoloring in Neural Radiance FieldsAuthors: Alessio Mazzucchelli, Adrian Garcia-Garcia, Elena Garces, Fernando Rivas-Manzaneque, Francesc Moreno-Noguer, Adrian Penate-SanchezSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [150] arXiv:2405.19861 [pdf, other]
-
Title: Hierarchical Object-Centric Learning with Capsule NetworksAuthors: Riccardo RenzulliComments: Updated version of my PhD thesis (Nov 2023), with fixed typos. Will keep updated as new typos are discovered!Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [151] arXiv:2405.19854 [pdf, other]
-
Title: RTGen: Generating Region-Text Pairs for Open-Vocabulary Object DetectionComments: Technical reportSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [152] arXiv:2405.19833 [pdf, other]
-
Title: KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree RotationComments: Accepted by CVPR24Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [153] arXiv:2405.19822 [pdf, other]
-
Title: Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline MethodologyAuthors: Frank A. Ruis, Alma M. Liezenga, Friso G. Heslinga, Luca Ballan, Thijs A. Eker, Richard J. M. den Hollander, Martin C. van Leeuwen, Judith Dijk, Wyke HuizingaComments: Submitted to and presented at SPIE Defense + Commercial Sensing 2024, 13 pages, 4 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
- [154] arXiv:2405.19819 [pdf, other]
-
Title: Gated Fields: Learning Scene Reconstruction from Gated VideosSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [155] arXiv:2405.19818 [pdf, other]
-
Title: WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale BenchmarkComments: GitHub project: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [156] arXiv:2405.19817 [pdf, ps, other]
-
Title: Performance Examination of Symbolic Aggregate Approximation in IoT ApplicationsComments: Embedded World Conference, Nuremberg, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [157] arXiv:2405.19794 [pdf, other]
-
Title: Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree CameraComments: CVPR2024 EgoVis WorkshopSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [158] arXiv:2405.19783 [pdf, other]
-
Title: Instruction-Guided Visual MaskingAuthors: Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan ZhanComments: preprint, 21 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [159] arXiv:2405.19775 [pdf, other]
-
Title: Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion NetworkComments: 11 pages, 11 figures, to be published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [160] arXiv:2405.19773 [pdf, other]
-
Title: VQA Training Sets are Self-play Environments for Generating Few-shot PoolsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [161] arXiv:2405.19769 [pdf, other]
-
Title: All-In-One Medical Image Restoration via Task-Adaptive RoutingComments: This article has been early accepted by MICCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [162] arXiv:2405.19765 [pdf, other]
-
Title: Towards Unified Multi-granularity Text Detection with Interactive AttentionAuthors: Xingyu Wan, Chengquan Zhang, Pengyuan Lyu, Sen Fan, Zihan Ni, Kun Yao, Errui Ding, Jingdong WangComments: ICML 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [163] arXiv:2405.19754 [pdf, other]
-
Title: Mitigating annotation shift in cancer classification using single image generative modelsComments: Preprint of paper accepted at SPIE IWBI 2024 ConferenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [164] arXiv:2405.19751 [pdf, other]
-
Title: HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid QuantizationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [165] arXiv:2405.19746 [pdf, other]
-
Title: DenseSeg: Joint Learning for Semantic Segmentation and Landmark Detection Using Dense Image-to-Shape RepresentationAuthors: Ron Keuth, Lasse Hansen, Maren Balks, Ronja Jäger, Anne-Nele Schröder, Ludger Tüshaus, Mattias HeinrichSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [166] arXiv:2405.19745 [pdf, other]
-
Title: GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View SynthesisAuthors: Boming Zhao, Yuan Li, Ziyu Sun, Lin Zeng, Yujun Shen, Rui Ma, Yinda Zhang, Hujun Bao, Zhaopeng CuiComments: Accepted to SIGGRAPH 2024 Conference. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [167] arXiv:2405.19743 [pdf, other]
-
Title: May the Dance be with You: Dance Generation Framework for Non-HumanoidsAuthors: Hyemin AhnComments: 13 pages, 6 Figures, Rejected at Neurips 2023Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [168] arXiv:2405.19735 [pdf, other]
-
Title: Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing ScenesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [169] arXiv:2405.19732 [pdf, other]
-
Title: Two Optimizers Are Better Than One: LLM Catalyst for Enhancing Gradient-Based OptimizationSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [170] arXiv:2405.19727 [pdf, other]
-
Title: Automatic Dance Video Segmentation for Understanding ChoreographyComments: 9 pages, 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [171] arXiv:2405.19726 [pdf, other]
-
Title: Streaming Video Diffusion: Online Video Editing with Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [172] arXiv:2405.19723 [pdf, other]
-
Title: Encoding and Controlling Global Semantics for Long-form Video Question AnsweringComments: Work in progressSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [173] arXiv:2405.19722 [pdf, other]
-
Title: QClusformer: A Quantum Transformer-based Framework for Unsupervised Visual ClusteringAuthors: Xuan-Bac Nguyen, Hoang-Quan Nguyen, Samuel Yen-Chi Chen, Samee U. Khan, Hugh Churchill, Khoa LuuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [174] arXiv:2405.19718 [pdf, other]
-
Title: LED: A Large-scale Real-world Paired Dataset for Event Camera DenoisingComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [175] arXiv:2405.19716 [pdf, other]
-
Title: Enhancing Large Vision Language Models with Self-Training on Image ComprehensionComments: 19 pages, 14 figures, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [176] arXiv:2405.19712 [pdf, other]
-
Title: HINT: Learning Complete Human Neural Representations from Limited ViewpointsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [177] arXiv:2405.19708 [pdf, other]
-
Title: Text Guided Image Editing with Automatic Concept Locating and ForgettingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [178] arXiv:2405.19707 [pdf, other]
-
Title: DeMamba: AI-Generated Video Detection on Million-Scale GenVideo BenchmarkAuthors: Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, Huaxiong LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [179] arXiv:2405.19695 [pdf, other]
-
Title: Distribution Aligned Semantics Adaption for Lifelong Person Re-IdentificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [180] arXiv:2405.19689 [pdf, other]
-
Title: Uncertainty-aware sign language video retrieval with probability distribution modelingSubjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
- [181] arXiv:2405.19688 [pdf, other]
-
Title: DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric DetailsAuthors: Haitao Cao, Baoping Cheng, Qiran Pu, Haocheng Zhang, Bin Luo, Yixiang Zhuang, Juncong Lin, Liyan Chen, Xuan ChengSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [182] arXiv:2405.19684 [pdf, other]
-
Title: A Comprehensive Survey on Underwater Image Enhancement Based on Deep LearningComments: A survey on the underwater image enhancement taskSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [183] arXiv:2405.19682 [pdf, other]
-
Title: Fully Test-Time Adaptation for Monocular 3D Object DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [184] arXiv:2405.19678 [pdf, other]
-
Title: View-Consistent Hierarchical 3D SegmentationUsing Ultrametric Feature FieldsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [185] arXiv:2405.19675 [pdf, other]
-
Title: Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents TrainingAuthors: Aisha Urooj Khan, John Garrett, Tyler Bradshaw, Lonie Salkowski, Jiwoong Jason Jeong, Amara Tariq, Imon BanerjeeSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [186] arXiv:2405.19672 [pdf, other]
-
Title: CRIS: Collaborative Refinement Integrated with Segmentation for Polyp SegmentationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [187] arXiv:2405.19671 [pdf, other]
-
Title: GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene ReconstructionAuthors: Haodong Xiang, Xinghui Li, Xiansong Lai, Wanting Zhang, Zhichao Liao, Kai Cheng, Xueping LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [188] arXiv:2405.19669 [pdf, other]
-
Title: Texture-guided Coding for Deep FeaturesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [189] arXiv:2405.19668 [pdf, other]
-
Title: AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided OptimizationComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [190] arXiv:2405.19659 [pdf, other]
-
Title: CSANet: Channel Spatial Attention Network for Robust 3D Face Alignment and ReconstructionComments: 10 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [191] arXiv:2405.19657 [pdf, other]
-
Title: Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D GaussianComments: 10pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [192] arXiv:2405.19652 [pdf, other]
-
Title: Dual sparse training framework: inducing activation map sparsity via Transformed $\ell1$ regularizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [193] arXiv:2405.19646 [pdf, other]
-
Title: FaceLift: Semi-supervised 3D Facial Landmark LocalizationComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [194] arXiv:2405.19644 [pdf, other]
-
Title: EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery VideosComments: Early accepted by MICCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [195] arXiv:2405.19638 [pdf, other]
-
Title: Learning Robust Correlation with Foundation Model for Weakly-Supervised Few-Shot SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [196] arXiv:2405.19629 [pdf, other]
-
Title: YotoR-You Only Transform One RepresentationComments: 16 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [197] arXiv:2405.19620 [pdf, other]
-
Title: SparseDrive: End-to-End Autonomous Driving via Sparse Scene RepresentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [198] arXiv:2405.19609 [pdf, other]
-
Title: SMPLX-Lite: A Realistic and Drivable Avatar Benchmark with Rich Geometry and Texture AnnotationsAuthors: Yujiao Jiang, Qingmin Liao, Zhaolong Wang, Xiangru Lin, Zongqing Lu, Yuxi Zhao, Hanqing Wei, Jingrui Ye, Yu Zhang, Zhijing ShaoComments: ICME 2024;Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [199] arXiv:2405.19595 [pdf, ps, other]
-
Title: The RSNA Abdominal Traumatic Injury CT (RATIC) DatasetAuthors: Jeffrey D. Rudie, Hui-Ming Lin, Robyn L. Ball, Sabeena Jalal, Luciano M. Prevedello, Savvas Nicolaou, Brett S. Marinelli, Adam E. Flanders, Kirti Magudia, George Shih, Melissa A. Davis, John Mongan, Peter D. Chang, Ferco H. Berger, Sebastiaan Hermans, Meng Law, Tyler Richards, Jan-Peter Grunz, Andreas Steven Kunz, Shobhit Mathur, Sandro Galea-Soler, Andrew D. Chung, Saif Afat, Chin-Chi Kuo, Layal Aweidah, Ana Villanueva Campos, Arjuna Somasundaram, Felipe Antonio Sanchez Tijmes, Attaporn Jantarangkoon, Leonardo Kayat Bittencourt, Michael Brassil, Ayoub El Hajjami, Hakan Dogan, Muris Becircic, Agrahara G. Bharatkumar, Eduardo Moreno Júdice de Mattos Farina, Dataset Curator Group, Dataset Contributor Group, Dataset Annotator Group, Errol ColakComments: 40 pages, 2 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [200] arXiv:2405.19586 [pdf, other]
-
Title: SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied ManipulationAuthors: Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong LiComments: ICML 2024. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
- [201] arXiv:2405.19572 [pdf, other]
-
Title: Blind Image Restoration via Fast Diffusion InversionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [202] arXiv:2405.19569 [pdf, other]
-
Title: Improved Convex Decomposition with Ensembling and Boolean PrimitivesComments: 15 pages, 8 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [203] arXiv:2405.19568 [pdf, other]
-
Title: Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic SegmentationComments: 10 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [204] arXiv:2405.19525 [pdf, other]
-
Title: Lifelong Learning Using a Dynamically Growing Tree of Sub-networks for Domain Generalization in Video Object SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [205] arXiv:2405.19501 [pdf, other]
-
Title: MDS-ViTNet: Improving saliency prediction for Eye-Tracking with Vision TransformerSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [206] arXiv:2405.19465 [pdf, other]
-
Title: RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated AdapterAuthors: Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge LiComments: Accepted by ACL 2024 FindingsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [207] arXiv:2405.19458 [pdf, other]
-
Title: MemControl: Mitigating Memorization in Medical Diffusion Models via Automated Parameter SelectionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [208] arXiv:2405.19450 [pdf, other]
-
Title: FourierMamba: Fourier Learning Integration with State Space Models for Image DerainingSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [209] arXiv:2405.19442 [pdf, other]
-
Title: Large-scale DSM registration via motion averagingComments: 9 FiguresJournal-ref: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. X-1-2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [210] arXiv:2405.19429 [pdf, other]
-
Title: Conformal Recursive Feature EliminationAuthors: Marcos López-De-Castro (1 and 2), Alberto García-Galindo (1 and 2), Rubén Armañanzas (1 and 2) ((1) DATAI - Institute of Data Science and Artificial Intelligence, Universidad de Navarra, Pamplona, Spain,(2) TECNUN School of Engineering, Universidad de Navarra, Donostia-San Sebastian, Spain)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [211] arXiv:2405.19424 [pdf, other]
-
Title: Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based PoliciesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [212] arXiv:2405.19423 [pdf, other]
-
Title: Evaluating Vision-Language Models on Bistable ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [213] arXiv:2405.19413 [pdf, other]
-
Title: VisTA-SR: Improving the Accuracy and Resolution of Low-Cost Thermal Imaging Cameras for AgricultureSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [214] arXiv:2405.19387 [pdf, other]
-
Title: Video Anomaly Detection in 10 Years: A Survey and OutlookSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [215] arXiv:2405.20321 (cross-list from cs.RO) [pdf, other]
-
Title: Vision-based Manipulation from Single Human Video with Open-World Object GraphsSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [216] arXiv:2405.20291 (cross-list from cs.CR) [pdf, other]
-
Title: Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor ActivenessSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [217] arXiv:2405.20271 (cross-list from cs.LG) [pdf, other]
-
Title: ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane ReflectionsComments: Accepted to ICML 2024. Code available at this https URLSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [218] arXiv:2405.20247 (cross-list from cs.AI) [pdf, other]
-
Title: KerasCV and KerasNLP: Vision and Language Power-UpsAuthors: Matthew Watson, Divyashree Shivakumar Sreepathihalli, Francois Chollet, Martin Gorner, Kiranbir Sodhia, Ramesh Sampath, Tirth Patel, Haifeng Jin, Neel Kovelamudi, Gabriel Rasskin, Samaneh Saadat, Luke Wood, Chen Qian, Jonathan Bischof, Ian Stenbit, Abheesht Sharma, Anshuman MishraComments: Submitted to Journal of Machine Learning Open Source SoftwareSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Software Engineering (cs.SE)
- [219] arXiv:2405.20204 (cross-list from cs.CL) [pdf, other]
-
Title: Jina CLIP: Your CLIP Model Is Also Your Text RetrieverAuthors: Andreas Koukounas, Georgios Mastrapas, Michael Günther, Bo Wang, Scott Martens, Isabelle Mohr, Saba Sturua, Mohammad Kalim Akram, Joan Fontanals Martínez, Saahil Ognawala, Susana Guzman, Maximilian Werk, Nan Wang, Han XiaoComments: 4 pages, ICML2024 workshop submissionSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
- [220] arXiv:2405.20180 (cross-list from cs.LG) [pdf, other]
-
Title: Transformers and Slot Encoding for Sample Efficient Physical World ModellingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [221] arXiv:2405.20031 (cross-list from cs.RO) [pdf, other]
-
Title: Structure Gaussian SLAM with Manhattan World HypothesisSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [222] arXiv:2405.19725 (cross-list from quant-ph) [pdf, other]
-
Title: Quantum Visual Feature Encoding RevisitedSubjects: Quantum Physics (quant-ph); Computer Vision and Pattern Recognition (cs.CV)
- [223] arXiv:2405.19703 (cross-list from cs.LG) [pdf, other]
-
Title: Towards a Better Evaluation of Out-of-Domain GeneralizationSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- [224] arXiv:2405.19687 (cross-list from cs.NE) [pdf, other]
-
Title: Autonomous Driving with Spiking Neural NetworksSubjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
- [225] arXiv:2405.19567 (cross-list from cs.AI) [pdf, other]
-
Title: Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical GroundingAuthors: Shenghuan Sun, Gregory M. Goldgof, Alexander Schubert, Zhiqing Sun, Thomas Hartvigsen, Atul J. Butte, Ahmed AlaaComments: Code available at: this https URLSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [226] arXiv:2405.19547 (cross-list from cs.LG) [pdf, other]
-
Title: CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive LearningAuthors: Yiping Wang, Yifang Chen, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin Jamieson, Simon Shaolei DuComments: This paper supercedes our previous VAS paper (arXiv:2402.02055)Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [227] arXiv:2405.19538 (cross-list from cs.CL) [pdf, other]
-
Title: CheXpert Plus: Hundreds of Thousands of Aligned Radiology Texts, Images and PatientsAuthors: Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya Varma, Steven QH Truong, Chu The Chuong, Curtis P. LanglotzComments: 13 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [228] arXiv:2405.19516 (cross-list from eess.SP) [pdf, other]
-
Title: Enabling Visual Recognition at Radio FrequencySubjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
- [229] arXiv:2405.19492 (cross-list from eess.IV) [pdf, ps, other]
-
Title: TotalSegmentator MRI: Sequence-Independent Segmentation of 59 Anatomical Structures in MR imagesAuthors: Tugba Akinci D'Antonoli, Lucas K. Berger, Ashraya K. Indrakanti, Nathan Vishwanathan, Jakob Weiß, Matthias Jung, Zeynep Berkarda, Alexander Rau, Marco Reisert, Thomas Küstner, Alexandra Walter, Elmar M. Merkle, Martin Segeroth, Joshy Cyriac, Shan Yang, Jakob WasserthalSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [230] arXiv:2405.19461 (cross-list from cs.LG) [pdf, other]
-
Title: Clustering-Based Validation Splits for Domain GeneralisationSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [231] arXiv:2405.19349 (cross-list from eess.SP) [pdf, other]
-
Title: Beyond Isolated Frames: Enhancing Sensor-Based Human Activity Recognition through Intra- and Inter-Frame AttentionSubjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [232] arXiv:2405.19338 (cross-list from eess.SP) [pdf, other]
-
Title: Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV ImagesAuthors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei LiuComments: 17 pages, 8 figures and tablesSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [233] arXiv:2405.15306 (cross-list from cs.CL) [pdf, other]
-
Title: DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZComments: Project page: this https URLSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Thu, 30 May 2024
- [234] arXiv:2405.19335 [pdf, other]
-
Title: X-VILA: Cross-Modality Alignment for Large Language ModelAuthors: Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu YinComments: Technical ReportSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [235] arXiv:2405.19333 [pdf, other]
-
Title: Multi-Modal Generative Embedding ModelAuthors: Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan SunSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [236] arXiv:2405.19331 [pdf, other]
-
Title: NPGA: Neural Parametric Gaussian AvatarsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
- [237] arXiv:2405.19326 [pdf, other]
-
Title: Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language ModelsAuthors: Tianrun Chen, Chunan Yu, Jing Li, Jianqi Zhang, Lanyun Zhu, Deyi Ji, Yong Zhang, Ying Zang, Zejian Li, Lingyun SunSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC)
- [238] arXiv:2405.19321 [pdf, other]
-
Title: DGD: Dynamic 3D Gaussians DistillationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [239] arXiv:2405.19315 [pdf, other]
-
Title: Matryoshka Query Transformer for Large Vision-Language ModelsComments: Preprint. Our code and model are publicly available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [240] arXiv:2405.19305 [pdf, other]
-
Title: Real-Time Environment Condition Classification for Autonomous VehiclesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [241] arXiv:2405.19298 [pdf, other]
-
Title: Adaptive Image Quality Assessment via Teaching Large Multimodal Model to CompareAuthors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [242] arXiv:2405.19296 [pdf, other]
-
Title: Neural Isometries: Taming Transformations for Equivariant MLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
- [243] arXiv:2405.19295 [pdf, other]
-
Title: 3D Neural Edge ReconstructionComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [244] arXiv:2405.19283 [pdf, other]
-
Title: Programmable Motion Generation for Open-Set Motion Control TasksComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [245] arXiv:2405.19237 [pdf, other]
-
Title: ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron PruningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [246] arXiv:2405.19226 [pdf, other]
-
Title: ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex DescriptionsAuthors: Honglin Lin, Siyu Li, Guoshun Nan, Chaoyue Tang, Xueting Wang, Jingxin Xu, Rong Yankai, Zhili Zhou, Yutong Gao, Qimei Cui, Xiaofeng TaoComments: Accepted in ACL 2024 FindingsSubjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [247] arXiv:2405.19209 [pdf, other]
-
Title: VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long VideosAuthors: Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit BansalComments: 20 pages, first three authors contributed equally; Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [248] arXiv:2405.19203 [pdf, other]
-
Title: $E^{3}$Gen: Efficient, Expressive and Editable Avatars GenerationComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [249] arXiv:2405.19201 [pdf, other]
-
Title: Going beyond compositional generalization, DDPMs can produce zero-shot interpolationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
- [250] arXiv:2405.19194 [pdf, other]
-
Title: LOGO: Video Text Spotting with Language Collaboration and Glyph Perception ModelSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [251] arXiv:2405.19186 [pdf, other]
-
Title: MetaToken: Detecting Hallucination in Image Descriptions by Meta ClassificationAuthors: Laura Fieback (1,2), Jakob Spiegelberg (1), Hanno Gottschalk (2) ((1) Volkswagen AG, (2) TU Berlin)Comments: 18 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [252] arXiv:2405.19179 [pdf, other]
-
Title: Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial VehiclesComments: submitted to IROS 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [253] arXiv:2405.19173 [pdf, other]
-
Title: Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature PreservationComments: Accepted at 32nd European Signal Processing Conference (EUSIPCO 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [254] arXiv:2405.19149 [pdf, other]
-
Title: CaLa: Complementary Association Learning for Augmenting Composed Image RetrievalComments: To appear at SIGIR 2024. arXiv admin note: text overlap with arXiv:2309.02169Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
- [255] arXiv:2405.19124 [pdf, other]
-
Title: ACCSAMS: Automatic Conversion of Exam Documents to Accessible Learning Material for Blind and Visually ImpairedComments: Accepted at ICCHP 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [256] arXiv:2405.19117 [pdf, other]
-
Title: ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGsAuthors: Omar Moured, Sara Alzalabny, Anas Osman, Thorsten Schwarz, Karin Muller, Rainer StiefelhagenComments: Accepted at ICCHP 2024. Codes will be available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [257] arXiv:2405.19111 [pdf, other]
-
Title: Alt4Blind: A User Interface to Simplify Charts Alt-Text CreationAuthors: Omar Moured, Shahid Ali Farooqui, Karin Muller, Sharifeh Fadaeijouybari, Thorsten Schwarz, Mohammed Javed, Rainer StiefelhagenComments: Accepted at ICCHP 2024. Codes will be available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
- [258] arXiv:2405.19100 [pdf, other]
-
Title: Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge TransferSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [259] arXiv:2405.19092 [pdf, other]
-
Title: Benchmarking and Improving Detail Image CaptionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [260] arXiv:2405.19076 [pdf, other]
-
Title: Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and DesignAuthors: Markus J. BuehlerSubjects: Computer Vision and Pattern Recognition (cs.CV); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Materials Science (cond-mat.mtrl-sci); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [261] arXiv:2405.19074 [pdf, other]
-
Title: Resurrecting Old Classes with New Data for Exemplar-Free Continual LearningAuthors: Dipam Goswami, Albin Soutif--Cormerais, Yuyang Liu, Sandesh Kamath, Bartłomiej Twardowski, Joost van de WeijerComments: Accepted at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [262] arXiv:2405.19055 [pdf, other]
-
Title: FUSU: A Multi-temporal-source Land Use Change Segmentation Dataset for Fine-grained Urban Semantic UnderstandingAuthors: Shuai Yuan, Guancong Lin, Lixian Zhang, Runmin Dong, Jinxiao Zhang, Shuang Chen, Juepeng Zheng, Jie Wang, Haohuan FuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [263] arXiv:2405.19009 [pdf, other]
-
Title: Enhancing Vision-Language Model with Unmasked Token AlignmentComments: Accepted by TMLR; Code and models are available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [264] arXiv:2405.19005 [pdf, other]
-
Title: Auto-selected Knowledge Adapters for Lifelong Person Re-identificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [265] arXiv:2405.18991 [pdf, other]
-
Title: EasyAnimate: A High-Performance Long Video Generation Method based on Transformer ArchitectureComments: 6 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
- [266] arXiv:2405.18959 [pdf, other]
-
Title: Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text RetrievalAuthors: Rui Yang, Shuang Wang, Yingping Han, Yuanheng Li, Dong Zhao, Dou Quan, Yanhe Guo, Licheng JiaoComments: 16 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [267] arXiv:2405.18955 [pdf, other]
-
Title: RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal SupervisionAuthors: Jinzhong Wang, Xuetao Tian, Shun Dai, Tao Zhuo, Haorui Zeng, Hongjuan Liu, Jiaqi Liu, Xiuwei Zhang, Yanning ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [268] arXiv:2405.18945 [pdf, ps, other]
-
Title: WTTFNet: A Weather-Time-Trajectory Fusion Network for Pedestrian Trajectory Prediction in Urban ComplexAuthors: Ho Chun Wu, Esther Hoi Shan Lau, Paul Yuen, Kevin Hung, John Kwok Tai Chui, Andrew Kwok Fai LuiComments: 12 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [269] arXiv:2405.18937 [pdf, other]
-
Title: Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [270] arXiv:2405.18924 [pdf, ps, other]
-
Title: MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script IdentificationAuthors: Miguel A. Ferrer, Abhijit Das, Moises Diaz, Aythami Morales, Cristina Carmona-Duarte, Umapada PalJournal-ref: Cognitive Computation, Volume 16, pages 131 to 157,(2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [271] arXiv:2405.18911 [pdf, other]
-
Title: Exploring Human-in-the-Loop Test-Time Adaptation by Synergizing Active Learning and Model SelectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [272] arXiv:2405.18900 [pdf, other]
-
Title: Spectral Fidelity and Spatial Enhancement: An Assessment and Cascading of Pan-Sharpening Techniques for Satellite ImagerySubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [273] arXiv:2405.18897 [pdf, other]
-
Title: MLAE: Masked LoRA Experts for Parameter-Efficient Fine-TuningComments: Tech reportSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [274] arXiv:2405.18882 [pdf, other]
-
Title: DecomCAM: Advancing Beyond Saliency Maps through Decomposition and IntegrationAuthors: Yuguang Yang, Runtang Guo, Sheng Wu, Yimi Wang, Linlin Yang, Bo Fan, Jilong Zhong, Juan Zhang, Baochang ZhangComments: Accepted by Neurocomputing journalSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [275] arXiv:2405.18880 [pdf, other]
-
Title: EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic VisionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [276] arXiv:2405.18872 [pdf, other]
-
Title: Single image super-resolution based on trainable feature matching attention networkComments: 35pages, 12 figuresJournal-ref: Pattern Recognition, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [277] arXiv:2405.18863 [pdf, other]
-
Title: Neural Radiance Fields for Novel View Synthesis in Monocular GastroscopyComments: Accepted for EMBC 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [278] arXiv:2405.18861 [pdf, other]
-
Title: Domain-Inspired Sharpness-Aware Minimization Under Domain ShiftsComments: Published as a conference paper at ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [279] arXiv:2405.18857 [pdf, other]
-
Title: SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous DrivingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [280] arXiv:2405.18853 [pdf, other]
-
Title: Supervised Contrastive Learning for Snapshot Spectral Imaging Face Anti-SpoofingComments: We rank first at the Chalearn Snapshot Spectral Imaging Face Anti-spoofing Challenge on CVPR 2024; the paper is accepted by CVPR 2024 workshop;Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [281] arXiv:2405.18852 [pdf, other]
-
Title: LetsMap: Unsupervised Representation Learning for Semantic BEV MappingAuthors: Nikhil Gosala, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Paulo Drews-Jr, Wolfram Burgard, Abhinav ValadaComments: 23 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [282] arXiv:2405.18849 [pdf, other]
-
Title: SFANet: Spatial-Frequency Attention Network for Weather ForecastingAuthors: Jiaze Wang, Hao Chen, Hongcan Xu, Jinpeng Li, Bowen Wang, Kun Shao, Furui Liu, Huaxi Chen, Guangyong Chen, Pheng-Ann HengSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [283] arXiv:2405.18842 [pdf, other]
-
Title: Descriptive Image Quality Assessment in the WildSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [284] arXiv:2405.18840 [pdf, other]
-
Title: Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [285] arXiv:2405.18839 [pdf, other]
-
Title: MEGA: Masked Generative Autoencoder for Human Mesh RecoverySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [286] arXiv:2405.18831 [pdf, other]
-
Title: Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering BenchmarksComments: Accepted at 1st Workshop on Multimodalities for 3D Scenes CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [287] arXiv:2405.18816 [pdf, other]
-
Title: Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory MatchingSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [288] arXiv:2405.18812 [pdf, other]
-
Title: MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language ModelComments: 13 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [289] arXiv:2405.18810 [pdf, other]
-
Title: UniPTS: A Unified Framework for Proficient Post-Training SparsityComments: Accepted by CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [290] arXiv:2405.18808 [pdf, other]
-
Title: BRACTIVE: A Brain Activation Approach to Human Visual Brain LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [291] arXiv:2405.18801 [pdf, other]
-
Title: SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [292] arXiv:2405.18800 [pdf, ps, other]
-
Title: Face processing emerges from object-trained convolutional neural networksComments: 31 pages, 5 FiguresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [293] arXiv:2405.18790 [pdf, other]
-
Title: Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature StatisticsComments: Accepted to IEEE Transactions on Multimedia 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
- [294] arXiv:2405.18784 [pdf, other]
-
Title: LP-3DGS: Learning to Prune 3D Gaussian SplattingAuthors: Zhaoliang Zhang, Tianchen Song, Yongjae Lee, Li Yang, Cheng Peng, Rama Chellappa, Deliang FanSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [295] arXiv:2405.18774 [pdf, other]
-
Title: LLaMA-Reg: Using LLaMA 2 for Unsupervised Medical Image RegistrationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [296] arXiv:2405.18770 [pdf, other]
-
Title: Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial AttacksComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
- [297] arXiv:2405.18769 [pdf, other]
-
Title: OUS: Scene-Guided Dynamic Facial Expression RecognitionAuthors: Xinji Mai, Haoran Wang, Zeng Tao, Junxiong Lin, Shaoqi Yan, Yan Wang, Jing Liu, Jiawen Yu, Xuan Tong, Yating Li, Wenqiang ZhangComments: 12 pages, 6 figures, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [298] arXiv:2405.18762 [pdf, other]
-
Title: Inpaint Biases: A Pathway to Accurate and Unbiased Image GenerationComments: Paper accepted in CVPRW 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [299] arXiv:2405.18751 [pdf, other]
-
Title: On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch NormalizationAuthors: Jordi Armengol-Estapé, Vincent Michalski, Ramnath Kumar, Pierre-Luc St-Charles, Doina Precup, Samira Ebrahimi KahouSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [300] arXiv:2405.18750 [pdf, other]
-
Title: T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward FeedbackComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [301] arXiv:2405.18745 [pdf, other]
-
Title: PanoNormal: Monocular Indoor 360° Surface Normal EstimationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [302] arXiv:2405.18737 [pdf, ps, other]
-
Title: WLC-Net: a robust and fast deep-learning wood-leaf classification methodAuthors: Hanlong Li, Pei Wang, Yuhan Wu, Jing Ren, Yuhang Gao, Lingyun Zhang, Mingtai Zhang, Wenxin ChenComments: 41 pages, 14 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [303] arXiv:2405.18734 [pdf, other]
-
Title: PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware HistogramAuthors: Sifan Zhou, Zhihang Yuan, Dawei Yang, Xubin Wen, Xing Hu, Yuguang Shi, Ziyu Zhao, Xiaobo LuComments: 17 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [304] arXiv:2405.18721 [pdf, other]
-
Title: Correctable Landmark Discovery via Large Models for Vision-Language NavigationAuthors: Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan LiangComments: Accepted by TPAMI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [305] arXiv:2405.18716 [pdf, other]
-
Title: SketchDeco: Decorating B&W Sketches with ColourAuthors: Chaitat Utintu, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe SongSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [306] arXiv:2405.18715 [pdf, other]
-
Title: NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the WildComments: CVPR 2024, first two authors contributed equally. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [307] arXiv:2405.18706 [pdf, other]
-
Title: FocSAM: Delving Deeply into Focused Objects in Segmenting AnythingAuthors: You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong JiComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [308] arXiv:2405.18700 [pdf, other]
-
Title: Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion PredictionComments: Accepted by IEEE Transactions on Image ProcessingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [309] arXiv:2405.18684 [pdf, other]
-
Title: Learning Diffeomorphism for Image Registration with Time-Continuous Networks using Semigroup RegularizationComments: 15 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [310] arXiv:2405.18679 [pdf, other]
-
Title: Vim-F: Visual State Space Model Benefiting from Learning in the Frequency DomainSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [311] arXiv:2405.18677 [pdf, other]
-
Title: Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map FilteringComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [312] arXiv:2405.18672 [pdf, other]
-
Title: LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [313] arXiv:2405.18654 [pdf, other]
-
Title: Mitigating Object Hallucination via Data Augmented Contrastive TuningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [314] arXiv:2405.18616 [pdf, ps, other]
-
Title: Wavelet-Based Image Tokenizer for Vision TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [315] arXiv:2405.18606 [pdf, other]
-
Title: Track Initialization and Re-Identification for~3D Multi-View Multi-Object TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
- [316] arXiv:2405.18570 [pdf, other]
-
Title: Its Not a Modality Gap: Characterizing and Addressing the Contrastive GapSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [317] arXiv:2405.18560 [pdf, other]
-
Title: Potential Field Based Deep Metric LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [318] arXiv:2405.18541 [pdf, other]
-
Title: Low-Rank Few-Shot Adaptation of Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [319] arXiv:2405.18527 [pdf, other]
-
Title: Task-Driven Uncertainty Quantification in Inverse Problems via Conformal PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [320] arXiv:2405.18525 [pdf, other]
-
Title: REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout AlignmentAuthors: Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [321] arXiv:2405.18524 [pdf, other]
-
Title: Aligning in a Compact Space: Contrastive Knowledge Distillation between Heterogeneous ArchitecturesComments: 12 pages, 3 figures, conference paperSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [322] arXiv:2405.18523 [pdf, other]
-
Title: TripletMix: Triplet Data Augmentation for 3D UnderstandingAuthors: Jiaze Wang, Yi Wang, Ziyu Guo, Renrui Zhang, Donghao Zhou, Guangyong Chen, Anfeng Liu, Pheng-Ann HengSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [323] arXiv:2405.18511 [pdf, other]
-
Title: Feasibility and benefits of joint learning from MRI databases with different brain diseases and modalities for segmentationAuthors: Wentian Xu, Matthew Moffat, Thalia Seale, Ziyun Liang, Felix Wagner, Daniel Whitehouse, David Menon, Virginia Newcombe, Natalie Voets, Abhirup Banerjee, Konstantinos KamnitsasComments: Accepted to MIDL 2024Journal-ref: Proceedings of Machine Learning Research, MIDL 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [324] arXiv:2405.18487 [pdf, other]
-
Title: Anomaly detection for the identification of volcanic unrest in satellite imagerySubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [325] arXiv:2405.18483 [pdf, other]
-
Title: Towards Open Domain Text-Driven Synthesis of Multi-Person MotionsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [326] arXiv:2405.18438 [pdf, other]
-
Title: GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text ContextsAuthors: Zoltán Á. Milacski, Koichiro Niinuma, Ryosuke Kawamura, Fernando de la Torre, László A. JeniComments: 18 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [327] arXiv:2405.18437 [pdf, other]
-
Title: Transductive Zero-Shot and Few-Shot CLIPAuthors: Ségolène Martin (OPIS, CVN), Yunshi Huang (ETS), Fereshteh Shakeri (ETS), Jean-Christophe Pesquet (OPIS, CVN), Ismail Ben Ayed (ETS)Comments: 2024 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2024, Seattle (USA), Washington, United StatesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [328] arXiv:2405.19334 (cross-list from cs.AI) [pdf, other]
-
Title: LLMs Meet Multimodal Generation and Editing: A SurveyAuthors: Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng ChenComments: 51 Pages with 16 Figures, 12 Tables, and 534 References. GitHub Repository at: this https URLSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [329] arXiv:2405.19234 (cross-list from cs.LG) [pdf, other]
-
Title: Forward-Backward Knowledge Distillation for Continual ClusteringSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [330] arXiv:2405.19224 (cross-list from eess.IV) [pdf, other]
-
Title: A study on the adequacy of common IQA measures for medical imagesAuthors: Anna Breger, Clemens Karner, Ian Selby, Janek Gröhl, Sören Dittmer, Edward Lilley, Judith Babar, Jake Beckford, Timothy J Sadler, Shahab Shahipasand, Arthikkaa Thavakumar, Michael Roberts, Carola-Bibiane SchönliebSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [331] arXiv:2405.19204 (cross-list from eess.IV) [pdf, other]
-
Title: Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identificationAuthors: Michail Mamalakis, Héloïse de Vareilles, Shun-Chin Jim Wu, Ingrid Agartz, Lynn Egeland Mørch-Johnsen, Jane Garrison, Jon Simons, Pietro Lio, John Suckling, Graham MurraySubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [332] arXiv:2405.19112 (cross-list from eess.IV) [pdf, other]
-
Title: Reconstructing Interpretable Features in Computational Super-Resolution microscopy via Regularized Latent SearchComments: accepted for publication in Biological ImagingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [333] arXiv:2405.19098 (cross-list from cs.LG) [pdf, other]
-
Title: Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function PriorComments: ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- [334] arXiv:2405.19097 (cross-list from eess.IV) [pdf, other]
-
Title: A study of why we need to reassess full reference image quality assessment with medical imagesAuthors: Anna Breger, Ander Biguri, Malena Sabaté Landman, Ian Selby, Nicole Amberg, Elisabeth Brunner, Janek Gröhl, Sepideh Hatamikia, Clemens Karner, Lipeng Ning, Sören Dittmer, Michael Roberts, AIX-COVNET Collaboration, Carola-Bibiane SchönliebSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [335] arXiv:2405.19088 (cross-list from cs.CL) [pdf, other]
-
Title: Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous ContradictionsSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [336] arXiv:2405.19085 (cross-list from cs.AI) [pdf, other]
-
Title: Patch-enhanced Mask Encoder Prompt Image GenerationSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [337] arXiv:2405.19081 (cross-list from cs.RO) [pdf, ps, other]
-
Title: Uniform vs. Lognormal Kinematics in Robots: Perceptual Preferences for Robotic MovementsAuthors: Jose J. Quintana, Miguel A. Ferrer, Moises Diaz, Jose J. Feo, Adam Wolniakowski, Konstantsin MiatliukJournal-ref: Applied Sciences Volume 12 Issue 23 (2022)Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [338] arXiv:2405.19079 (cross-list from eess.IV) [pdf, other]
-
Title: On the Influence of Smoothness Constraints in Computed Tomography Motion CompensationAuthors: Mareike Thies, Fabian Wagner, Noah Maul, Siyuan Mei, Mingxuan Gu, Laura Pfaff, Nastassia Vysotskaya, Haijun Yu, Andreas MaierSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [339] arXiv:2405.19035 (cross-list from cs.RO) [pdf, other]
-
Title: A Good Foundation is Worth Many Labels: Label-Efficient Panoptic SegmentationSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- [340] arXiv:2405.18931 (cross-list from stat.ML) [pdf, other]
-
Title: EntProp: High Entropy Propagation for Improving Accuracy and RobustnessAuthors: Shohei EnomotoComments: Accepted to UAI2024Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [341] arXiv:2405.18786 (cross-list from cs.LG) [pdf, other]
-
Title: MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel DependenceSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [342] arXiv:2405.18782 (cross-list from eess.IV) [pdf, other]
-
Title: Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play PriorsSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- [343] arXiv:2405.18756 (cross-list from cs.LG) [pdf, other]
-
Title: Provable Contrastive Continual LearningComments: Accepted by ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP); Machine Learning (stat.ML)
- [344] arXiv:2405.18726 (cross-list from cs.SD) [pdf, other]
-
Title: Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRISubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [345] arXiv:2405.18614 (cross-list from cs.HC) [pdf, other]
-
Title: Augmented Physics: A Machine Learning-Powered Tool for Creating Interactive Physics Simulations from Static DiagramsSubjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [346] arXiv:2405.18533 (cross-list from eess.IV) [pdf, other]
-
Title: Cardiovascular Disease Detection from Multi-View Chest X-rays with BI-MambaComments: Early accepted paper for MICCAI 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [347] arXiv:2405.18498 (cross-list from cs.LG) [pdf, ps, other]
-
Title: The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual TasksSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [348] arXiv:2405.18449 (cross-list from eess.IV) [pdf, other]
-
Title: Adaptive Multiscale Retinal Diagnosis: A Hybrid Trio-Model Approach for Comprehensive Fundus Multi-Disease Detection Leveraging Transfer Learning and Siamese NetworksAuthors: Yavuz Selim InanSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [349] arXiv:2405.18435 (cross-list from eess.IV) [pdf, other]
-
Title: QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation ChallengeAuthors: Hongwei Bran, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag, Wenting Chen, Li Cheng, Prasad Dutand, Lara Dular, Mustafa A. Elattar, Ming Feng, Shengbo Gao, Henkjan Huisman, Weifeng Hu, Shubham Innani, Wei Jiat, Davood Karimi, Hugo J. Kuijf, Jin Tae Kwak, Hoang Long Le, Xiang Lia, Huiyan Lin, Tongliang Liu, Jun Ma, Kai Ma, Ting Ma, Ilkay Oksuz, Robbie Holland, Arlindo L. Oliveira, Jimut Bahan Pal, Xuan Pei, Maoying Qiao, Anindo Saha, Raghavendra Selvan, et al. (26 additional authors not shown)Comments: initial technical reportSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Wed, 29 May 2024 (showing first 71 of 152 entries)
- [350] arXiv:2405.18428 [pdf, other]
-
Title: DiG: Scalable and Efficient Diffusion Models with Gated Linear AttentionAuthors: Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang WangComments: Code is released at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [351] arXiv:2405.18426 [pdf, other]
-
Title: GFlow: Recovering 4D World from Monocular VideoComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [352] arXiv:2405.18425 [pdf, other]
-
Title: ViG: Linear-complexity Visual Sequence Learning with Gated Linear AttentionComments: Work in progress. Code is available at \url{this https URL}Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [353] arXiv:2405.18424 [pdf, other]
-
Title: 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian SplattingAuthors: Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan YangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [354] arXiv:2405.18416 [pdf, other]
-
Title: 3D StreetUnveiler with Semantic-Aware 2DGSComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [355] arXiv:2405.18415 [pdf, other]
-
Title: Why are Visually-Grounded Language Models Bad at Image Classification?Authors: Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-LevySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [356] arXiv:2405.18406 [pdf, other]
-
Title: RACCooN: Remove, Add, and Change Video Content with Auto-Generated NarrativesComments: The first two authors contribute equally. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [357] arXiv:2405.18405 [pdf, other]
-
Title: WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain GeneralizationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [358] arXiv:2405.18387 [pdf, other]
-
Title: A Review and Implementation of Object Detection Models and Optimizations for Real-time Medical Mask Detection during the COVID-19 PandemicJournal-ref: 2022 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Biarritz, France, 2022, pp. 1-6Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [359] arXiv:2405.18383 [pdf, other]
-
Title: Brain Tumor Segmentation (BraTS) Challenge 2024: Meningioma Radiotherapy Planning Automated SegmentationAuthors: Dominic LaBella, Katherine Schumacher, Michael Mix, Kevin Leu, Shan McBurney-Lin, Pierre Nedelec, Javier Villanueva-Meyer, Jonathan Shapey, Tom Vercauteren, Kazumi Chia, Omar Al-Salihi, Justin Leu, Lia Halasz, Yury Velichko, Chunhao Wang, John Kirkpatrick, Scott Floyd, Zachary J. Reitman, Trey Mullikin, Ulas Bagci, Sean Sachdev, Jona A. Hattangadi-Gluth, Tyler Seibert, Nikdokht Farid, Connor Puett, Matthew W. Pease, Kevin Shiue, Syed Muhammad Anwar, Shahriar Faghani, Muhammad Ammar Haider, Pranav Warman, Jake Albrecht, András Jakab, Mana Moassefi, Verena Chung, Alejandro Aristizabal, Alexandros Karargyris, Hasan Kassem, Sarthak Pati, Micah Sheller, Christina Huang, Aaron Coley, Siddharth Ghanta, Alex Schneider, Conrad Sharp, Rachit Saluja, Florian Kofler, Philipp Lohmann, Phillipp Vollmuth, et al. (21 additional authors not shown)Comments: 13 pages, 9 figures, 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [360] arXiv:2405.18368 [pdf, other]
-
Title: The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRIAuthors: Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, Ken Chang, Gennaro D'Anna, Lisa Deptula, Diviya Gupta, Muhammad Ammar Haider, Ali Hussain, Michael Iv, Marinos Kontzialis, Paul Manning, Farzan Moodi, Teresa Nunes, Aaron Simon, Nico Sollmann, David Vu, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Rongrong Chai, Verena Chung, Shahriar Faghani, Keyvan Farahani, Anahita Fathi Kazerooni, Eugenio Iglesias, Florian Kofler, Hongwei Li, Marius George Linguraru, Bjoern Menze, Ahmed W. Moawad, Yury Velichko, Benedikt Wiestler, Talissa Altes, Patil Basavasagar, Martin Bendszus, Gianluca Brugnara, Jaeyoung Cho, Yaseen Dhemesh, Brandon K.K. Fields, Filip Garrett, Jaime Gass, Lubomir Hadjiiski, et al. (35 additional authors not shown)Comments: 10 pages, 4 figures, 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [361] arXiv:2405.18361 [pdf, other]
-
Title: Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?Authors: Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [362] arXiv:2405.18330 [pdf, other]
-
Title: Frustratingly Easy Test-Time Adaptation of Vision-Language ModelsComments: Preprint. Work in progressSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [363] arXiv:2405.18326 [pdf, other]
-
Title: VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion TransformersComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [364] arXiv:2405.18322 [pdf, other]
-
Title: SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark EstimationComments: Accepted at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [365] arXiv:2405.18320 [pdf, other]
-
Title: Self-Supervised Learning Based Handwriting VerificationAuthors: Mihir Chauhan, Mohammad Abuzar Shaikh, Bina Ramamurthy, Mingchen Gao, Siwei Lyu, Sargur SrihariComments: 14 pages, 6 figures, 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [366] arXiv:2405.18304 [pdf, other]
-
Title: Multi-modal Generation via Cross-Modal In-Context LearningAuthors: Amandeep Kumar, Muzammal Naseer, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Hisham CholakkalComments: Technical ReportSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [367] arXiv:2405.18302 [pdf, other]
-
Title: Deep Network Pruning: A Comparative Study on CNNs in Face RecognitionAuthors: Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Prayag Tiwari, Josef BigunComments: Submitted to Pattern Recognition LettersSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [368] arXiv:2405.18299 [pdf, other]
-
Title: Deep Learning Innovations for Underwater Waste Detection: An In-Depth AnalysisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [369] arXiv:2405.18295 [pdf, other]
-
Title: Intent3D: 3D Object Detection in RGB-D Scans Based on Human IntentionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [370] arXiv:2405.18258 [pdf, other]
-
Title: Text-only Synthesis for Image CaptioningSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [371] arXiv:2405.18247 [pdf, other]
-
Title: Generating Print-Ready Personalized AI Art Products from Minimal User InputsSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [372] arXiv:2405.18240 [pdf, other]
-
Title: MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any ResolutionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [373] arXiv:2405.18224 [pdf, other]
-
Title: SSLChange: A Self-supervised Change Detection Framework Based on Domain AdaptationComments: This manuscript has been submitted to IEEE TGRS and is under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [374] arXiv:2405.18172 [pdf, other]
-
Title: AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any ScenarioComments: Project website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [375] arXiv:2405.18156 [pdf, other]
-
Title: VividPose: Advancing Stable Video Diffusion for Realistic Human Image AnimationAuthors: Qilin Wang, Zhengkai Jiang, Chengming Xu, Jiangning Zhang, Yabiao Wang, Xinyi Zhang, Yun Cao, Weijian Cao, Chengjie Wang, Yanwei FuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [376] arXiv:2405.18148 [pdf, other]
-
Title: Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic SegmentationComments: Accepted to WACV 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [377] arXiv:2405.18132 [pdf, other]
-
Title: EG4D: Explicit Generation of 4D Object without Score DistillationAuthors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [378] arXiv:2405.18131 [pdf, other]
-
Title: Self-Supervised Dual ContouringSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [379] arXiv:2405.18124 [pdf, other]
-
Title: Dual-Path Multi-Scale Transformer for High-Quality Image DerainingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [380] arXiv:2405.18119 [pdf, other]
-
Title: Low-Resource Crop Classification from Multi-Spectral Time Series Using Lossless CompressorsComments: 8 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [381] arXiv:2405.18087 [pdf, other]
-
Title: FlowSDF: Flow Matching for Medical Image Segmentation Using Distance TransformsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [382] arXiv:2405.18078 [pdf, other]
-
Title: Edge-guided and Class-balanced Active Learning for Semantic Segmentation of Aerial ImagesComments: 15 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [383] arXiv:2405.18071 [pdf, other]
-
Title: Text Modality Oriented Image Feature Extraction for Detecting Diffusion-based DeepFakeAuthors: Di Yang, Yihao Huang, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Run Wang, Geguang Pu, Yang LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [384] arXiv:2405.18065 [pdf, other]
-
Title: EffoVPR: Effective Foundation Model Utilization for Visual Place RecognitionAuthors: Issar Tzachor, Boaz Lerner, Matan Levy, Michael Green, Tal Berkovitz Shalev, Gavriel Habib, Dvir Samuel, Noam Korngut Zailer, Or Shimshi, Nir Darshan, Rami Ben-AriSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [385] arXiv:2405.18042 [pdf, other]
-
Title: Visualizing the loss landscape of Self-supervised Vision TransformerComments: NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and PracticeSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [386] arXiv:2405.18033 [pdf, other]
-
Title: RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance FieldsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [387] arXiv:2405.18029 [pdf, other]
-
Title: Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers?Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [388] arXiv:2405.18025 [pdf, other]
-
Title: Unveiling the Power of Diffusion Features For Personalized Segmentation and RetrievalSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [389] arXiv:2405.18021 [pdf, other]
-
Title: MULi-Ev: Maintaining Unperturbed LiDAR-Event CalibrationComments: Accepted at CVPR 2024 Workshop on Autonomous Driving. Copyright 2024 IEEESubjects: Computer Vision and Pattern Recognition (cs.CV)
- [390] arXiv:2405.18018 [pdf, other]
-
Title: A Calibration Tool for Refractive Underwater VisionComments: 7 pages, 5 figures, the paper is submitted to the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [391] arXiv:2405.18012 [pdf, other]
-
Title: Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity RecognitionAuthors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick KimSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [392] arXiv:2405.18004 [pdf, other]
-
Title: SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical CaptionsAuthors: Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin GaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [393] arXiv:2405.18003 [pdf, other]
-
Title: MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video InfillingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [394] arXiv:2405.17995 [pdf, other]
-
Title: DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive ArchitectureSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [395] arXiv:2405.17991 [pdf, other]
-
Title: VeLoRA: Memory Efficient Training using Rank-1 Sub-Token ProjectionsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [396] arXiv:2405.17965 [pdf, other]
-
Title: AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image CustomizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [397] arXiv:2405.17958 [pdf, other]
-
Title: FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor ScenesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [398] arXiv:2405.17942 [pdf, other]
-
Title: Self-supervised Pre-training for Transferable Multi-modal PerceptionComments: 8 pages. arXiv admin note: substantial text overlap with arXiv:2311.13750Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [399] arXiv:2405.17933 [pdf, other]
-
Title: ToonCrafter: Generative Cartoon InterpolationComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [400] arXiv:2405.17929 [pdf, other]
-
Title: Towards Unified Robustness Against Both Backdoor and Adversarial AttacksSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [401] arXiv:2405.17928 [pdf, other]
-
Title: Relational Self-supervised Distillation with Compact Descriptors for Image Copy DetectionComments: 12 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [402] arXiv:2405.17926 [pdf, other]
-
Title: SarcNet: A Novel AI-based Framework to Automatically Analyze and Score Sarcomere Organizations in Fluorescently Tagged hiPSC-CMsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [403] arXiv:2405.17916 [pdf, other]
-
Title: Boosting General Trimap-free Matting in the Real-World ImageAuthors: Leo Shan Wenzhang Zhou Grace ZhaoComments: 13 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [404] arXiv:2405.17913 [pdf, other]
-
Title: OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects SupervisionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [405] arXiv:2405.17905 [pdf, other]
-
Title: Cycle-YOLO: A Efficient and Robust Framework for Pavement Damage DetectionAuthors: Zhengji Li, Xi Xiao, Jiacheng Xie, Yuxiao Fan, Wentao Wang, Gang Chen, Liqiang Zhang, Tianyang WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
- [406] arXiv:2405.17903 [pdf, other]
-
Title: Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based FusionAuthors: Hongze Sun, Rui Liu, Wuque Cai, Jun Wang, Yue Wang, Huajin Tang, Yan Cui, Dezhong Yao, Daqing GuoComments: 16 pages, 7 figures, 9 tabes; This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
- [407] arXiv:2405.17901 [pdf, other]
-
Title: Near-Infrared and Low-Rank Adaptation of Vision Transformers in Remote SensingComments: 7 pages, 3 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [408] arXiv:2405.17894 [pdf, other]
-
Title: White-box Multimodal Jailbreaks Against Large Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [409] arXiv:2405.17891 [pdf, other]
-
Title: A Refined 3D Gaussian Representation for High-Quality Dynamic Scene ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [410] arXiv:2405.17886 [pdf, ps, other]
-
Title: Graphomotor and Handwriting Disabilities Rating Scale (GHDRS):towards complex and objective assessmentAuthors: Jiri Mekyska, Katarina Safarova, Tomas Urbanek, Jirina Bednarova, Vojtech Zvoncak, Jana Marie Havigerova, Lukas Cunek, Zoltan Galaz, Jan Mucha, Christine Klauszova, Marcos Faundez-Zanuy, Miguel A. Ferrer, Moises DiazJournal-ref: Australian Journalof Learning Difficulties, Routledge, 1-34,2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [411] arXiv:2405.17877 [pdf, other]
-
Title: Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical DiagnosisSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [412] arXiv:2405.17873 [pdf, other]
-
Title: MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision QuantizationAuthors: Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu WangComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [413] arXiv:2405.17872 [pdf, other]
-
Title: HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene ReconstructionComments: 13 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [414] arXiv:2405.17871 [pdf, other]
-
Title: Seeing the Image: Prioritizing Visual Correlation by Contrastive AlignmentSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [415] arXiv:2405.17859 [pdf, other]
-
Title: Adapting Pre-Trained Vision Models for Novel Instance Detection and SegmentationComments: 22 pages, 9 figures, Code is available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [416] arXiv:2405.17855 [pdf, other]
-
Title: A Deep Neural Network Approach to Fare EvasionAuthors: Johannes van der VyverComments: 4 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [417] arXiv:2405.17842 [pdf, other]
-
Title: Discriminator-Guided Cooperative Diffusion for Joint Audio and Video GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [418] arXiv:2405.17835 [pdf, other]
-
Title: Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian SplattingComments: Early accepted at MICCAI 2024, 10 pages, 2 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [419] arXiv:2405.17825 [pdf, other]
-
Title: Diffusion Model Patching via Mixture-of-PromptsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [420] arXiv:2405.17824 [pdf, other]
-
Title: mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image AnalysisSubjects: Computer Vision and Pattern Recognition (cs.CV)
[ showing 420 entries per page: fewer | more | all ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, new, 2406, contact, help (Access key information)