Towards identification of network applications in encrypted traffic
Loading...
Date
2025-09-03
Authors
Burgetová, Ivana
Matoušek, Petr
Ryšavý, Ondřej
Advisor
Referee
Mark
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Nature
Altmetrics
Abstract
Network traffic monitoring for security threat detection and network performance management is challenging due to the encryption of most communications. This article addresses the problem of identifying network applications associated with Transport Layer Security (TLS) connections. The evaluation of three primary approaches to classifying TLS-encrypted traffic was carried out: fingerprinting methods, Server Name Indication (SNI)-based identification, and machine learning-based classifiers. Each method has its own strengths and limitations: fingerprinting relies on a regularly updated database of known hashes, SNI is vulnerable to obfuscation or missing information, and AI techniques such as machine learning require sufficient labeled training data. A comparison of these methods highlights the challenges of identifying individual applications, as the TLS properties are significantly shared between applications. Nevertheless, even when identifying a collection of candidate applications, a valuable insight into network monitoring can be gained, and this can be achieved with high accuracy by all the methods considered. To facilitate further research in this area, a novel publicly available dataset of TLS communications has been created, with the communications annotated for popular desktop and mobile applications. Furthermore, the results of three different approaches to refine TLS traffic classification based on a combination of basic classifiers and context are presented. Finally, practical use cases are proposed, and future research directions are identified to further improve application identification methods.
Network traffic monitoring for security threat detection and network performance management is challenging due to the encryption of most communications. This article addresses the problem of identifying network applications associated with Transport Layer Security (TLS) connections. The evaluation of three primary approaches to classifying TLS-encrypted traffic was carried out: fingerprinting methods, Server Name Indication (SNI)-based identification, and machine learning-based classifiers. Each method has its own strengths and limitations: fingerprinting relies on a regularly updated database of known hashes, SNI is vulnerable to obfuscation or missing information, and AI techniques such as machine learning require sufficient labeled training data. A comparison of these methods highlights the challenges of identifying individual applications, as the TLS properties are significantly shared between applications. Nevertheless, even when identifying a collection of candidate applications, a valuable insight into network monitoring can be gained, and this can be achieved with high accuracy by all the methods considered. To facilitate further research in this area, a novel publicly available dataset of TLS communications has been created, with the communications annotated for popular desktop and mobile applications. Furthermore, the results of three different approaches to refine TLS traffic classification based on a combination of basic classifiers and context are presented. Finally, practical use cases are proposed, and future research directions are identified to further improve application identification methods.
Network traffic monitoring for security threat detection and network performance management is challenging due to the encryption of most communications. This article addresses the problem of identifying network applications associated with Transport Layer Security (TLS) connections. The evaluation of three primary approaches to classifying TLS-encrypted traffic was carried out: fingerprinting methods, Server Name Indication (SNI)-based identification, and machine learning-based classifiers. Each method has its own strengths and limitations: fingerprinting relies on a regularly updated database of known hashes, SNI is vulnerable to obfuscation or missing information, and AI techniques such as machine learning require sufficient labeled training data. A comparison of these methods highlights the challenges of identifying individual applications, as the TLS properties are significantly shared between applications. Nevertheless, even when identifying a collection of candidate applications, a valuable insight into network monitoring can be gained, and this can be achieved with high accuracy by all the methods considered. To facilitate further research in this area, a novel publicly available dataset of TLS communications has been created, with the communications annotated for popular desktop and mobile applications. Furthermore, the results of three different approaches to refine TLS traffic classification based on a combination of basic classifiers and context are presented. Finally, practical use cases are proposed, and future research directions are identified to further improve application identification methods.
Description
Citation
Annals of Telecommunications. 2025, vol. 2025, issue 9, p. 1-18.
https://link.springer.com/article/10.1007/s12243-025-01114-z
https://link.springer.com/article/10.1007/s12243-025-01114-z
Document type
Peer-reviewed
Document version
Published version
Date of access to the full text
Language of document
en

0000-0002-9947-9837