Multi-Scale Feature Fusion Network for Object Recognition in Mountain Farmland Environments

Zhenglong Zhu; Qiang Zhang; Nanqing Zhang

PDF

Published: Jan 15, 2025

Keywords:

Deep learning Object recognition Multi-scale feature fusion Mountain farmland environments

Zhenglong Zhu

Zunyi Normal University

Qiang Zhang

Zunyi Normal University

Nanqing Zhang

Zunyi Normal University

Abstract

Object recognition in mountainous field environment is a crucial part of intelligent operation. Since the diversity of crops, weeds and other targets in the mountainous field environment, as well as the complexity of scale changes, object recognition faces great challenges. In order to solve these problems, it is particularly critical to effectively fuse global and local multi-scale features. To this end, this paper proposes a three-branch parallel multi-scale feature fusion network (MFFNet) for object recognition in mountain and field environments. The MFFNet contains specially designed global and local feature extraction modules, which can capture the global and local feature information in the image in parallel. In addition, the MFFNet introduces a feature fusion strategy based on channel attention and spatial attention to effectively integrate global and local features with different semantic depths. The experimental results show that our proposed MFFNet is superior to the comparison method in multiple index results on the self-built mountain field environment image dataset.

How to Cite

Zhu , Z., Zhang , Q., & Zhang , N. (2025). Multi-Scale Feature Fusion Network for Object Recognition in Mountain Farmland Environments. Journal of Research in Multidisciplinary Methods and Applications, 4(1), 01250401001. Retrieved from http://www.satursonpublishing.com/jrmma/article/view/a01250401001

Issue

Vol. 4 No. 1 (2025)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

Zeng Jiexian, Ji Kang. Multi-view aircraft target recognition algorithm based on multi-feature fusion[J].Journal of Nanchang Hangkong University(Natural Science Edition),2016,30(02):8-15.)

Logothetis N K, Sheinberg D L. Visual object recognition[J]. Annual review of neuroscience, 1996, 19: 577-621.

Forsyth D A, Mundy J L, di Gesú V, et al. Object recognition with gradient-based learning[J]. Shape, contour and grouping in computer vision, 1999: 319-345.

Zhou Yanan, Chen Hui, Liu Hongbin. Transactions of the CSAE,2022,38(23):213-222.)

Kang Mengzhen, Wang Xiujuan, Hua Jing, et al. Parallel Agriculture: Intelligent Technology Towards Smart Agriculture[J]. Chinese Journal of Intelligent Science and Technology, 2019, 1(2): 107-117.

Yao Z, Yang X, Wang B, et al. Multidimensional beta-diversity across local and regional scales in a Chinese subtropical forest: The role of forest structure[J]. Ecology and Evolution, 2023, 13(10): e10607.

Wang Q J, Zhang S Y, Dong S F, et al. Pest24: A large-scale very small object data set of agricultural pests for multi-target detection[J]. Computers and Electronics in Agriculture, 2020, 175: 105585.

Wang Q, Huang W, Xiong Z, et al. Looking closer at the scene: Multiscale representation learning for remote sensing image scene classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 33(4): 1414-1428.

Arkin E, Yadikar N, Muhtar Y, et al. A survey of object detection based on CNN and transformer[C]//2021 IEEE 2nd international conference on pattern recognition and machine learning (PRML). IEEE, 2021: 99-108.

[10] Maurício J, Domingues I, Bernardino J. Comparing vision transformers and convolutional neural networks for image classification: A literature review[J]. Applied Sciences, 2023, 13(9): 5521.

Khan S, Naseer M, Hayat M, et al. Transformers in vision: A survey[J]. ACM computing surveys (CSUR), 2022, 54(10s): 1-41.

Xu Y, Zhang Q, Zhang J, et al. Vitae: Vision transformer advanced by exploring intrinsic inductive bias[J]. Advances in neural information processing systems, 2021, 34: 28522-28535.

Zhang Y, Liu H, Hu Q. Transfuse: Fusing transformers and cnns for medical image segmentation[C]//Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, proceedings, Part I 24. Springer International Publishing, 2021: 14-24.

Wang H, Chen X, Zhang T, et al. CCTNet: Coupled CNN and transformer network for crop segmentation of remote sensing images[J]. Remote Sensing, 2022, 14(9): 1956.

Alzubaidi L, Zhang J, Humaidi A J, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions[J]. Journal of big Data, 2021, 8: 1-74.

Bo ZHANG. Artificial intelligence is entering the post deep-learning era[J]. Chinese Journal of Intelligent Science and Technology, 2019, 1(1): 4-6.

Li Y, Chen L, Huang Zhaohong, et al. Plant Leaf Detection Technology Based on Multi-scale Convolutional Neural Network Feature Fusion[J]. Chinese Journal of Intelligent Science and Technology, 2021, 3(3): 304-311.

Simonyan K. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.

He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25.

Zhao X, Wang L, Zhang Y, et al. A review of convolutional neural networks in computer vision[J]. Artificial Intelligence Review, 2024, 57(4): 99.

Khan A, Sohail A, Zahoora U, et al. A survey of the recent architectures of deep convolutional neural networks[J]. Artificial intelligence review, 2020, 53: 5455-5516.

Lindsay G W. Convolutional neural networks as a model of the visual system: Past, present, and future[J]. Journal of cognitive neuroscience, 2021, 33(10): 2017-2031.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.

Maurício J, Domingues I, Bernardino J. Comparing vision transformers and convolutional neural networks for image classification: A literature review[J]. Applied Sciences, 2023, 13(9): 5521.

Han K, Wang Y, Chen H, et al. A survey on vision transformer[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(1): 87-110.

Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.

Li Y, Miao N, Ma L, et al. Transformer for object detection: Review and benchmark[J]. Engineering Applications of Artificial Intelligence, 2023, 126: 107021.

Li X, Ding H, Yuan H, et al. Transformer-based visual segmentation: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.

Han K, Wang Y, Chen H, et al. A survey on vision transformer[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(1): 87-110.

Zhu W, Sun J, Wang S, et al. Identifying field crop diseases using transformer-embedded convolutional neural network[J]. Agriculture, 2022, 12(8): 1083.

Niu Z, Zhong G, Yu H. A review on the attention mechanism of deep learning[J]. Neurocomputing, 2021, 452: 48-62.

Hassanin M, Anwar S, Radwan I, et al. Visual attention methods in deep learning: An in-depth survey[J]. Information Fusion, 2024, 108: 102417.

Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10076-10085.

Chaudhari S, Mithal V, Polatkan G, et al. An attentive survey of attention models[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2021, 12(5): 1-32.

Papadopoulos A, Korus P, Memon N. Hard-attention for scalable image classification[J]. Advances in Neural Information Processing Systems, 2021, 34: 14694-14707.

Guo M H, Xu T X, Liu J J, et al. Attention mechanisms in computer vision: A survey[J]. Computational visual media, 2022, 8(3): 331-368.

Brauwers G, Frasincar F. A general survey on attention mechanisms in deep learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 35(4): 3279-3298.

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)