中文
Profile
VIEW MORE
教育背景: 1.1998-2002年:毕业于西安建筑科技大学(本科); 2.2002-2005年:毕业于西安建筑科技大学(硕士); 3.2011-2017年:毕业于西安建筑科技大学(博士); 工作经历: 1.2005至2018.10:担任西安建筑科技大学信息与控制工程学院专业教师; 2.2018至今:担任西安建筑科技大学建筑设备科学与工程学院专业教师、副院长; 社会兼职: 陕西省自动化学会智能建筑与楼宇自动化专业委员会副秘书长
冯增喜
Associate Professor
Paper Publications
Exploring crowd counting methodology by integrating CNN and transformer: performance optimization under weak supervision
Release time:2025-12-20 Hits:
Affiliation of Author(s):
信息与控制工程学院
Journal:
SIGNAL IMAGE AND VIDEO PROCESSING
Key Words:
Crowd counting CNN · Transformer Weakly supervised
Abstract:
Crowd counting, an essential aspect in surveillance and traffic management, constitutes the task of estimating the number of individuals present within an image, serving as a crucial determinant for various operational decisions and security measures in these domains, traditionally relies on Convolutional Neural Networks (CNNs), excelling at local feature extraction yet falling short in capturing global context. Conversely, Transformers excel in capturing long-range dependencies but often overlook local intricacies. Current methodologies in crowd counting heavily depend on precise position-level annotations for supervised training, a process demanding significant time and labor. This has spurred interest in weakly supervised training, where models learn solely from count-level population annotations, holding immense practical and research potential. In our study, we propose TCCNet, a novel weakly supervised network marrying CNNs and Transformers for crowd counting. Addressing CNN’s limitation in global feature extraction, we integrated the Transformer model to enhance crowd counting accuracy by capturing extensive contextual information. Further bolstering the Transformer block with Post Normalization and Scaled Cosine Attention smoothed activation values and improved model stability. Moreover, our crowd counting regression block, incorporating inflated convolutions, expanded the model’s perceptual scope while maintaining spatial resolution, significantly benefiting crowd counting. Through extensive experimentation on five publicly available datasets and illustrative visualizations, TCCNet showcases remarkable proficiency in accurately identifying crowd regions within images. Our findings highlight the model’s exceptional counting performance, particularly in weakly supervised learning.
First Author:
fengzengxi
Indexed by:
Journal paper
Volume:
19(6): 483
ISSN No.:
1863-1703
Translation or Not:
no
Date of Publication:
2025-04-15

Pre One:混合随机反向学习和高斯变异的混沌松鼠搜索算法

Next One:A Power Load Prediction by LSTM Model Based on the Double Attention Mechanism for Hospital Building