A Comprehensive Evaluation of Deep Vision Transformers for Road Extraction from Very-high-resolution Satellite Data

dc.contributor.authorBolcek, Jancs
dc.contributor.authorGibril, Mohamed Barakat A.cs
dc.contributor.authorAl-Ruzouq, Ramics
dc.contributor.authorShanableh, Abdallahcs
dc.contributor.authorJena, Ratiranjancs
dc.contributor.authorHammouri, Nezarcs
dc.contributor.authorSachit, Mourtadha Sarhancs
dc.contributor.authorGhorbanzadeh, Omidcs
dc.coverage.issue9cs
dc.coverage.volume11cs
dc.date.accessioned2025-07-31T12:58:51Z
dc.date.available2025-07-31T12:58:51Z
dc.date.issued2025-01-02cs
dc.description.abstractTransformer-based semantic segmentation architectures excel in extracting road networks from very-high-resolution (VHR) satellite images due to their ability to capture global contextual information. Nonetheless, there is a gap in research regarding their comparative effectiveness, efficiency, and performance in extracting road networks from multicity VHR data. This study evaluates 11 transformer-based models on three publicly available datasets (DeepGlobe Road Extraction Dataset, SpaceNet-3 Road Network Detection Dataset, and Massachusetts Road Dataset) to assess their performance, efficiency, and complexity in mapping road networks from multicity VHR satellite images. The evaluated models include Unified Perceptual Parsing for Scene Understanding (UperNet) based on the Swin transformer (UperNet-SwinT), and Multi-path Vision Transformer (UperNet-MpViT), Twins transformer, Segmenter, SegFormer, K-Net based on SwinT, Mask2Former based on SwinT (Mask2Former-SwinT), TopFormer, UniFormer, and PoolFormer. Results showed that the models recorded mean F-scores (mF-score) ranging from 82.22% to 90.70% for the DeepGlobe dataset, 58.98% to 86.95% for the Massachusetts dataset, and 69.02% to 86.14% for the SpaceNet-3 dataset. Mask2Former-SwinT, UperNet-MpViT, and SegFormer were the top performers among the evaluated models. The Mask2Former, based on the SwinT, demonstrated a strong balance of high performance across different satellite image datasets and moderate computational efficiency. This investigation aids in selecting the most suitable model for extracting road networks from remote sensing data.en
dc.formattextcs
dc.format.extent1-17cs
dc.format.mimetypeapplication/pdfcs
dc.identifier.citationScience of Remote Sensing. 2025, vol. 11, issue 9, p. 1-17.en
dc.identifier.doi10.1016/j.srs.2024.100190cs
dc.identifier.issn2666-0172cs
dc.identifier.orcid0009-0008-0271-6543cs
dc.identifier.other193735cs
dc.identifier.urihttps://hdl.handle.net/11012/255372
dc.language.isoencs
dc.publisherElseviercs
dc.relation.ispartofScience of Remote Sensingcs
dc.relation.urihttps://www.sciencedirect.com/science/article/pii/S2666017224000749cs
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internationalcs
dc.rights.accessopenAccesscs
dc.rights.sherpahttp://www.sherpa.ac.uk/romeo/issn/2666-0172/cs
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/cs
dc.subjectRemote sensingen
dc.subjectRoad extractionen
dc.subjectSatellite dataen
dc.subjectSemantic segmentationen
dc.titleA Comprehensive Evaluation of Deep Vision Transformers for Road Extraction from Very-high-resolution Satellite Dataen
dc.type.driverarticleen
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen
sync.item.dbidVAV-193735en
sync.item.dbtypeVAVen
sync.item.insts2025.07.31 14:58:51en
sync.item.modts2025.07.31 14:32:56en
thesis.grantorVysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií. Ústav radioelektronikycs
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1s2.0S2666017224000749main.pdf
Size:
23.17 MB
Format:
Adobe Portable Document Format
Description:
file 1s2.0S2666017224000749main.pdf