Transformer-based Semantic Segmentation for Large-Scale Building Footprint Extraction from Very-High Resolution Satellite Images

dc.contributor.authorGibril, Mohamed Barakat A.cs
dc.contributor.authorAl-Ruzouq, Ramics
dc.contributor.authorShanableh, Abdallahcs
dc.contributor.authorJena, Ratiranjancs
dc.contributor.authorBolcek, Jancs
dc.contributor.authorZulhaidi Mohd Shafri, Helmics
dc.contributor.authorGhorbanzadeh, Omidcs
dc.coverage.issue10cs
dc.coverage.volume73cs
dc.date.accessioned2024-05-14T06:45:29Z
dc.date.available2024-05-14T06:45:29Z
dc.date.issued2024-03-09cs
dc.description.abstractExtracting building footprints from extensive very-high spatial resolution (VHSR) remote sensing data is crucial for diverse applications, including surveying, urban studies, population estimation, identification of informal settlements, and disaster management. Although convolutional neural networks (CNNs) are commonly utilized for this purpose, their effectiveness is constrained by limitations in capturing long-range relationships and contextual details due to the localized nature of convolution operations. This study introduces the masked-attention mask transformer (Mask2Former), based on the Swin Transformer, for building footprint extraction from large-scale satellite imagery. To enhance the capture of large-scale semantic information and extract multiscale features, a hierarchical vision transformer with shifted windows (Swin Transformer) serves as the backbone network. An extensive analysis compares the efficiency and generalizability of Mask2Former with four CNN models (PSPNet, DeepLabV3+, UpperNet-ConvNext, and SegNeXt) and two transformer-based models (UpperNet-Swin and SegFormer) featuring different complexities. Results reveal superior performance of transformer-based models over CNN-based counterparts, showcasing exceptional generalization across diverse testing areas with varying building structures, heights, and sizes. Specifically, Mask2Former with the Swin transformer backbone achieves a mean intersection over union between 88% and 93%, along with a mean F-score (mF-score) ranging from 91% to 96.35% across various urban landscapes.en
dc.formattextcs
dc.format.extent4937 -4954cs
dc.format.mimetypeapplication/pdfcs
dc.identifier.citationADVANCES IN SPACE RESEARCH. 2024, vol. 73, issue 10, p. 4937 -4954.en
dc.identifier.doi10.1016/j.asr.2024.03.002cs
dc.identifier.issn1879-1948cs
dc.identifier.orcid0009-0008-0271-6543cs
dc.identifier.other188212cs
dc.identifier.urihttps://hdl.handle.net/11012/245513
dc.language.isoencs
dc.publisherElseviercs
dc.relation.ispartofADVANCES IN SPACE RESEARCHcs
dc.relation.urihttps://www.sciencedirect.com/science/article/pii/S0273117724002205cs
dc.rightsCreative Commons Attribution 4.0 Internationalcs
dc.rights.accessopenAccesscs
dc.rights.sherpahttp://www.sherpa.ac.uk/romeo/issn/1879-1948/cs
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/cs
dc.subjectremote sensingen
dc.subjectsatellite imageryen
dc.subjectMask2formeren
dc.subjectCNNen
dc.subjectSwin Transformeren
dc.subjectvision transformeren
dc.titleTransformer-based Semantic Segmentation for Large-Scale Building Footprint Extraction from Very-High Resolution Satellite Imagesen
dc.type.driverarticleen
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen
sync.item.dbidVAV-188212en
sync.item.dbtypeVAVen
sync.item.insts2024.05.14 08:45:29en
sync.item.modts2024.05.14 08:13:56en
thesis.grantorVysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií. Ústav radioelektronikycs
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1s2.0S0273117724002205main.pdf
Size:
11.51 MB
Format:
Adobe Portable Document Format
Description:
file 1s2.0S0273117724002205main.pdf