Italian academics say they’ve developed an algorithm that can detect the patterns of Android app activity inside Tor traffic with an accuracy of 97 percent.
The algorithm isn’t a deanonymization script, as it can’t reveal a user’s real IP address or other identifying details. However, it will reveal if a Tor user is using an Android app.
The work of researchers from the Sapienza University of Rome in Italy builds upon previous research that was able to analyze the TCP packet flows of Tor traffic and distinguish between eight traffic types: browsing, email, chat, audio streaming, video streaming, file transfers, VoIP, and P2P.
For their work, the Italian researchers applied a similar concept of analyzing the TCP packets flowing through a Tor connection to detect patterns specific to certain Android apps.
They then developed a machine learning algorithm that they trained with the Tor traffic patterns of ten apps: the Tor Browser Android app, Instagram, Facebook, Skype, uTorrent, Spotify, Twitch, YouTube, DailyMotion, and Replaio Radio.
With the algorithm trained, they were then able to point it at Tor traffic and detect whenever the user was utilizing one of the ten apps. Test results showed an algorithm accuracy of 97.3 percent.
However, the mechanism they devised isn’t as perfect and efficient as it sounds. For starters, it can only be used when there’s no background traffic noise on the communication channel, meaning it only works when the user is using his mobile device with one app, and nothing else.
If there are too many apps communicating at the same time in the phone’s background, TCP traffic patterns get muddled up, and the algorithm’s efficiency drops.
Second, there are also still issues with the accuracy of some results. For example, streaming-based apps such as Spotify or YouTube produce similar traffic patterns, leading to false positives.
There’s also an issue with the long “idle” periods for apps such as Facebook, Instagram, and the Tor Browser app, as user activity goes silent as they go through the accessed content.
As future experiments will factor in more apps, similar issues will pop up, increasing the chance of false positives and reducing the overall accuracy.