西安建筑科技大学教师个人主页 zhangxinsheng--Home-- Multimodal prototype fusion network for paper-cut image classification

Paper Publications

Multimodal prototype fusion network for paper-cut image classification

Release time:2025-09-20 Hits:

Impact Factor:

4.9

DOI number:

10.1038/s40494-025-02036-8

Journal:

npj Heritage Science

Abstract:

This paper proposes a Multimodal Prototype Fusion Network (MPFN) to address challenges in paper-cut image classification, including artistic abstraction, imbalanced data, and unseen category adaptation. The framework introduces two variants: AMPFN, which dynamically fuses multimodal prototypes via cross-modal attention and residual learning, and IMPFN, a training-free model for rapid deployment. Leveraging CLIP for feature extraction, AMPFN achieves 90.71% accuracy (16-shot) on seen classes, while IMPFN attains 84.98% accuracy (16-shot) on unseen classes without training. Evaluations on paper-cut datasets and public benchmarks (PACS, ArtDL, CUB-200-2011) demonstrate superiority over existing methods. The approach mitigates data imbalance through n-shot prototypes and reduces computational costs via pre-trained features, proving robust in fine-grained and abstract art classification. This work offers a scalable solution for cultural heritage digitization and multimodal art analysis.

Note:

Zhang, X., Chen, D. & Qin, Y. Multimodal prototype fusion network for paper-cut image classification. npj Herit. Sci. 13, 462 (2025). https://doi.org/10.1038/s40494-025-02036-8

Indexed by:

Journal paper

Document Code:

462

Discipline:

Engineering

First-Level Discipline:

Computer Science and Technology

Document Type:

Volume:

Issue:

462

Page Number:

1-14

ISSN No.:

3059-3220

Translation or Not:

Date of Publication:

2025-01-01

Included Journals:

SCI

Links to published journals:

https://doi.org/10.1038/s40494-025-02036-8

Next One:融合输出关联和聚类图谱的突发舆情衍生主题发现