Synthetic Browsing Histories for 50 Countries Worldwide: Datasets for Research, Development, and Education

Loading...
Thumbnail Image
Date
2025-01-22
Authors
Komosný, Dan
Rehman, Saeed
Ayub, Muhammad Sohaib
Advisor
Referee
Mark
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Nature
Altmetrics
Abstract
Browsing histories can be a valuable resource for cybersecurity, research, and testing. Individuals are often reluctant to share their browsing histories online, and the use of personal data requires obtaining signed informed consent. Research shows that anonymized histories can lead to re-identification, nullifying the anonymity promised by informed consent. In this work, we present 500 synthetic browsing histories valid for 50 countries worldwide. The synthetic histories are compiled based on real browsing data using a series of transformation criteria, including website content, popularity, locality, and language, ensuring their validity for the respective countries. Each history maintains the order of webpage accesses and covers a one-month period. The motivation for publishing this dataset arises from the community's call for browsing histories from different countries for research, development, and education. The published synthetic browsing histories can be used for any purpose without legal restrictions.
Description
Citation
Scientific data. 2025, vol. 12, issue 1, p. 1-11.
https://www.nature.com/articles/s41597-025-04407-z
Document type
Peer-reviewed
Document version
Published version
Date of access to the full text
Language of document
en
Study field
Comittee
Date of acceptance
Defence
Result of defence
Document licence
Creative Commons Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
Citace PRO