冯增喜
邮箱:
所属单位:信息与控制工程学院
发表刊物:SIGNAL IMAGE AND VIDEO PROCESSING
关键字:Crowd counting CNN · Transformer Weakly supervised
摘要:Crowd counting, an essential aspect in surveillance and traffic management, constitutes the task of estimating the number of individuals present within an image, serving as a crucial determinant for various operational decisions and security measures in these domains, traditionally relies on Convolutional Neural Networks (CNNs), excelling at local feature extraction yet falling short in capturing global context. Conversely, Transformers excel in capturing long-range dependencies but often overlook local intricacies. Current methodologies in crowd counting heavily depend on precise position-level annotations for supervised training, a process demanding significant time and labor. This has spurred interest in weakly supervised training, where models learn solely from count-level population annotations, holding immense practical and research potential. In our study, we propose TCCNet, a novel weakly supervised network marrying CNNs and Transformers for crowd counting. Addressing CNN’s limitation in global feature extraction, we integrated the Transformer model to enhance crowd counting accuracy by capturing extensive contextual information. Further bolstering the Transformer block with Post Normalization and Scaled Cosine Attention smoothed activation values and improved model stability. Moreover, our crowd counting regression block, incorporating inflated convolutions, expanded the model’s perceptual scope while maintaining spatial resolution, significantly benefiting crowd counting. Through extensive experimentation on five publicly available datasets and illustrative visualizations, TCCNet showcases remarkable proficiency in accurately identifying crowd regions within images. Our findings highlight the model’s exceptional counting performance, particularly in weakly supervised learning.
第一作者:冯增喜
论文类型:期刊论文
卷号:19(6): 483
ISSN号:1863-1703
是否译文:否
发表时间:2025-04-15
