DOI number:
10.1038/s40494-025-02036-8
Abstract:
This paper proposes a Multimodal Prototype Fusion Network (MPFN) to address challenges in paper-cut image classification, including artistic abstraction, imbalanced data, and unseen category adaptation. The framework introduces two variants: AMPFN, which dynamically fuses multimodal prototypes via cross-modal attention and residual learning, and IMPFN, a training-free model for rapid deployment. Leveraging CLIP for feature extraction, AMPFN achieves 90.71% accuracy (16-shot) on seen classes, while IMPFN attains 84.98% accuracy (16-shot) on unseen classes without training. Evaluations on paper-cut datasets and public benchmarks (PACS, ArtDL, CUB-200-2011) demonstrate superiority over existing methods. The approach mitigates data imbalance through n-shot prototypes and reduces computational costs via pre-trained features, proving robust in fine-grained and abstract art classification. This work offers a scalable solution for cultural heritage digitization and multimodal art analysis.
Note:
Zhang, X., Chen, D. & Qin, Y. Multimodal prototype fusion network for paper-cut image classification. npj Herit. Sci. 13, 462 (2025). https://doi.org/10.1038/s40494-025-02036-8
First-Level Discipline:
Computer Science and Technology
Links to published journals: