A Multi-Dimensional DNS Domain Intelligence Dataset for Cybersecurity Research
| dc.contributor.author | Hranický, Radek | cs |
| dc.contributor.author | Ondryáš, Ondřej | cs |
| dc.contributor.author | Horák, Adam | cs |
| dc.contributor.author | Pouč, Petr | cs |
| dc.contributor.author | Jeřábek, Kamil | cs |
| dc.contributor.author | Ebert, Tomáš | cs |
| dc.contributor.author | Polišenský, Jan | cs |
| dc.coverage.issue | October | cs |
| dc.coverage.volume | 62 | cs |
| dc.date.accessioned | 2025-10-30T17:05:14Z | |
| dc.date.available | 2025-10-30T17:05:14Z | |
| dc.date.issued | 2026-01-01 | cs |
| dc.description.abstract | The escalating sophistication and frequency of cyber threats require advanced solutions in cybersecurity research. Particularly, phishing and malware detection have become increasingly reliant on data-driven approaches. This paper presents a unique dataset precisely curated to bolster research in network security, focusing on the classification and analysis of internet domains. This dataset contains information for over a million internet domains with detailed labels distinguishing between phishing, malware, and benign traffic. Our dataset is distinctive due to its comprehensive compilation of metainformation derived from multiple sources, including DNS records, TLS handshakes and certificates, WHOIS and RDAP services, IP-related data, and geolocation details. Such rich, multi-dimensional data allows for a deeper analysis and understanding of domain characteristics that are critical in identifying and categorizing cyber threats. The integration of information from diverse sources enhances the dataset's utility, providing a holistic view of each domain's footprint and its potential security implications. The data is formatted in JSON, ensuring versatility, accessibility for researchers, and easy integration into various analytical tools and platforms, facilitating ease of use in statistical analysis, machine learning, and other computational analyses. Our dataset's extensive volume and variety surpass any known publicly available resources in this field, making it an invaluable asset for both academic and practical development and testing of cybersecurity solutions. This paper thoroughly describes the value of the data, details the comprehensive methodology employed in the collection process, and provides a clear description of the data structure. Such documentation is crucial for ensuring that the dataset can be effectively utilized and reapplied in a variety of research contexts. Its structured format and the broad range of included features are critical for developing robust cybersecurity solutions and can be adapted for emerging threats. | en |
| dc.description.abstract | The escalating sophistication and frequency of cyber threats require advanced solutions in cybersecurity research. Particularly, phishing and malware detection have become increasingly reliant on data-driven approaches. This paper presents a unique dataset precisely curated to bolster research in network security, focusing on the classification and analysis of internet domains. This dataset contains information for over a million internet domains with detailed labels distinguishing between phishing, malware, and benign traffic. Our dataset is distinctive due to its comprehensive compilation of metainformation derived from multiple sources, including DNS records, TLS handshakes and certificates, WHOIS and RDAP services, IP-related data, and geolocation details. Such rich, multi-dimensional data allows for a deeper analysis and understanding of domain characteristics that are critical in identifying and categorizing cyber threats. The integration of information from diverse sources enhances the dataset's utility, providing a holistic view of each domain's footprint and its potential security implications. The data is formatted in JSON, ensuring versatility, accessibility for researchers, and easy integration into various analytical tools and platforms, facilitating ease of use in statistical analysis, machine learning, and other computational analyses. Our dataset's extensive volume and variety surpass any known publicly available resources in this field, making it an invaluable asset for both academic and practical development and testing of cybersecurity solutions. This paper thoroughly describes the value of the data, details the comprehensive methodology employed in the collection process, and provides a clear description of the data structure. Such documentation is crucial for ensuring that the dataset can be effectively utilized and reapplied in a variety of research contexts. Its structured format and the broad range of included features are critical for developing robust cybersecurity solutions and can be adapted for emerging threats. | en |
| dc.format | text | cs |
| dc.format.extent | 1-13 | cs |
| dc.format.mimetype | application/pdf | cs |
| dc.identifier.citation | Data in Brief. 2026, vol. 62, issue October, p. 1-13. | en |
| dc.identifier.doi | 10.1016/j.dib.2025.112062 | cs |
| dc.identifier.issn | 2352-3409 | cs |
| dc.identifier.orcid | 0000-0001-6315-8137 | cs |
| dc.identifier.orcid | 0009-0007-5400-8584 | cs |
| dc.identifier.orcid | 0000-0002-5317-9222 | cs |
| dc.identifier.orcid | 0009-0000-8525-3194 | cs |
| dc.identifier.other | 194220 | cs |
| dc.identifier.researcherid | KRR-2050-2024 | cs |
| dc.identifier.researcherid | JFA-4159-2023 | cs |
| dc.identifier.scopus | 57189302660 | cs |
| dc.identifier.scopus | 59536362400 | cs |
| dc.identifier.scopus | 57208510810 | cs |
| dc.identifier.uri | https://hdl.handle.net/11012/255608 | |
| dc.language.iso | en | cs |
| dc.relation.ispartof | Data in Brief | cs |
| dc.relation.uri | https://www.sciencedirect.com/science/article/pii/S235234092500784X | cs |
| dc.rights | Creative Commons Attribution 4.0 International | cs |
| dc.rights.access | openAccess | cs |
| dc.rights.sherpa | http://www.sherpa.ac.uk/romeo/issn/2352-3409/ | cs |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | cs |
| dc.subject | Domain | en |
| dc.subject | DNS | en |
| dc.subject | TLS | en |
| dc.subject | WHOIS | en |
| dc.subject | RDAP | en |
| dc.subject | IP | en |
| dc.subject | Geolocation | en |
| dc.subject | Malware | en |
| dc.subject | Phishing | en |
| dc.subject | Domain | |
| dc.subject | DNS | |
| dc.subject | TLS | |
| dc.subject | WHOIS | |
| dc.subject | RDAP | |
| dc.subject | IP | |
| dc.subject | Geolocation | |
| dc.subject | Malware | |
| dc.subject | Phishing | |
| dc.title | A Multi-Dimensional DNS Domain Intelligence Dataset for Cybersecurity Research | en |
| dc.title.alternative | A Multi-Dimensional DNS Domain Intelligence Dataset for Cybersecurity Research | en |
| dc.type.driver | article | en |
| dc.type.status | Peer-reviewed | en |
| dc.type.version | publishedVersion | en |
| sync.item.dbid | VAV-194220 | en |
| sync.item.dbtype | VAV | en |
| sync.item.insts | 2025.10.30 18:05:13 | en |
| sync.item.modts | 2025.10.30 09:33:12 | en |
| thesis.grantor | Vysoké učení technické v Brně. Fakulta informačních technologií. Ústav informačních systémů | cs |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- 1s2.0S235234092500784Xmain.pdf
- Size:
- 1.67 MB
- Format:
- Adobe Portable Document Format
- Description:
- file 1s2.0S235234092500784Xmain.pdf
