Synthetic Browsing Histories for 50 Countries Worldwide: Datasets for Research, Development, and Education

Loading...
Thumbnail Image

Authors

Komosný, Dan
Rehman, Saeed
Ayub, Muhammad Sohaib

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Nature
Altmetrics

Abstract

Browsing histories can be a valuable resource for cybersecurity, research, and testing. Individuals are often reluctant to share their browsing histories online, and the use of personal data requires obtaining signed informed consent. Research shows that anonymized histories can lead to re-identification, nullifying the anonymity promised by informed consent. In this work, we present 500 synthetic browsing histories valid for 50 countries worldwide. The synthetic histories are compiled based on real browsing data using a series of transformation criteria, including website content, popularity, locality, and language, ensuring their validity for the respective countries. Each history maintains the order of webpage accesses and covers a one-month period. The motivation for publishing this dataset arises from the community's call for browsing histories from different countries for research, development, and education. The published synthetic browsing histories can be used for any purpose without legal restrictions.
Browsing histories can be a valuable resource for cybersecurity, research, and testing. Individuals are often reluctant to share their browsing histories online, and the use of personal data requires obtaining signed informed consent. Research shows that anonymized histories can lead to re-identification, nullifying the anonymity promised by informed consent. In this work, we present 500 synthetic browsing histories valid for 50 countries worldwide. The synthetic histories are compiled based on real browsing data using a series of transformation criteria, including website content, popularity, locality, and language, ensuring their validity for the respective countries. Each history maintains the order of webpage accesses and covers a one-month period. The motivation for publishing this dataset arises from the community's call for browsing histories from different countries for research, development, and education. The published synthetic browsing histories can be used for any purpose without legal restrictions.

Description

Citation

Scientific Data. 2025, vol. 12, issue 1, p. 1-11.
https://www.nature.com/articles/s41597-025-04407-z

Document type

Peer-reviewed

Document version

Published version

Date of access to the full text

Language of document

en

Study field

Comittee

Date of acceptance

Defence

Result of defence

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Creative Commons Attribution 4.0 International
Citace PRO