访问量:   最后更新时间:--

冯增喜

硕士生导师
教师姓名:冯增喜
教师拼音名称:fengzengxi
所在单位:建筑设备科学与工程学院
职务:建筑设备科学与工程学院副院长
学历:博士研究生
性别:男
学位:博士学位
职称:副教授
在职信息:在职
主要任职:西安建筑科技大学建科学院专业教师、副院长
其他任职:陕西省自动化学会智能建筑与楼宇自动化专业委员会副秘书长
毕业院校:西安建筑科技大学
所属院系:建筑设备科学与工程学院
学科:控制科学与工程    
其他联系方式

邮箱:

论文成果
Exploring crowd counting methodology by integrating CNN and transformer: performance optimization under weak supervision
发布时间:2025-12-20    点击次数:

所属单位:信息与控制工程学院

发表刊物:SIGNAL IMAGE AND VIDEO PROCESSING

关键字:Crowd counting CNN · Transformer Weakly supervised

摘要:Crowd counting, an essential aspect in surveillance and traffic management, constitutes the task of estimating the number of individuals present within an image, serving as a crucial determinant for various operational decisions and security measures in these domains, traditionally relies on Convolutional Neural Networks (CNNs), excelling at local feature extraction yet falling short in capturing global context. Conversely, Transformers excel in capturing long-range dependencies but often overlook local intricacies. Current methodologies in crowd counting heavily depend on precise position-level annotations for supervised training, a process demanding significant time and labor. This has spurred interest in weakly supervised training, where models learn solely from count-level population annotations, holding immense practical and research potential. In our study, we propose TCCNet, a novel weakly supervised network marrying CNNs and Transformers for crowd counting. Addressing CNN’s limitation in global feature extraction, we integrated the Transformer model to enhance crowd counting accuracy by capturing extensive contextual information. Further bolstering the Transformer block with Post Normalization and Scaled Cosine Attention smoothed activation values and improved model stability. Moreover, our crowd counting regression block, incorporating inflated convolutions, expanded the model’s perceptual scope while maintaining spatial resolution, significantly benefiting crowd counting. Through extensive experimentation on five publicly available datasets and illustrative visualizations, TCCNet showcases remarkable proficiency in accurately identifying crowd regions within images. Our findings highlight the model’s exceptional counting performance, particularly in weakly supervised learning.

第一作者:冯增喜

论文类型:期刊论文

卷号:19(6): 483

ISSN号:1863-1703

是否译文:

发表时间:2025-04-15