Context Aware Multimodal Fusion YOLOv5 Framework for Pedestrian Detection under IoT Environment

dc.contributor.authorShu, Y.
dc.contributor.authorWang, Y.
dc.contributor.authorZhang, M.
dc.contributor.authorYang, J.
dc.contributor.authorWang, Y.
dc.contributor.authorWang, J.
dc.contributor.authorZhang, Y.
dc.coverage.issue1cs
dc.coverage.volume34cs
dc.date.accessioned2025-04-10T12:13:02Z
dc.date.available2025-04-10T12:13:02Z
dc.date.issued2025-04cs
dc.description.abstractPedestrian detection based on deep networks has become a research hotspot in the field of computer vision. With the rapid development of the Internet of Things (IoT) and autonomous driving technology, the deployment of pedestrian detection models on mobile devices places higher demands on the accuracy and real-time performance of detection. In addition, fully integrating multimodal information can further improve the robustness of the model. To this end, this article proposes a novel multimodal fusion YOLOv5 network for pedestrian detection. Specifically, to improve the performance of multi-scale pedestrian detection, we enhance contextual awareness abilities by embedding the multi-head self-attention (MSA) mechanism and graph convolution operations in the existing YOLOv5 framework. In addition, we can fully explore the real-time advantages of the YOLOv5 framework in pedestrian detection tasks. To improve multimodal information fusion, we introduce the joint cross-attention fusion mechanism to enhance knowledge interaction between different modalities. To validate the effectiveness of the proposed model, we conduct a large number of experiments on two multimodal pedestrian detection datasets. All the results confirm that our proposed model obtains the highest performance in terms of multi-scale pedestrian detection. Moreover, compared to other multimodal deep models, our proposed model still shows superior performance.en
dc.formattextcs
dc.format.extent118-131cs
dc.format.mimetypeapplication/pdfen
dc.identifier.citationRadioengineering. 2025 vol. 34, iss. 1, s. 118-131. ISSN 1210-2512cs
dc.identifier.doi10.13164/re.2025.0118en
dc.identifier.issn1210-2512
dc.identifier.urihttps://hdl.handle.net/11012/250866
dc.language.isoencs
dc.publisherRadioengineering Societycs
dc.relation.ispartofRadioengineeringcs
dc.relation.urihttps://www.radioeng.cz/fulltexts/2025/25_01_0118_0131.pdfcs
dc.rightsCreative Commons Attribution 4.0 International licenseen
dc.rights.accessopenAccessen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectPedestrian detectionen
dc.subjectIoTen
dc.subjectdeep learningen
dc.subjectmultimodal fusionen
dc.subjectYOLOv5en
dc.titleContext Aware Multimodal Fusion YOLOv5 Framework for Pedestrian Detection under IoT Environmenten
dc.type.driverarticleen
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen
eprints.affiliatedInstitution.facultyFakulta elektrotechniky a komunikačních technologiícs

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
25_01_0118_0131.pdf
Size:
1.96 MB
Format:
Adobe Portable Document Format

Collections