Friday, 19 February 2010

Firm uses typing cadence to finger unauthorized users

Though most users feel anonymous when browsing the Web, their browsers constantly turn over unique information such as a list of installed plugins, screen resolution, and the user agent string. Taken together, such bits of information can uniquely identify many users even without cookies.

But this is now old tech; behavioral analytics firms have already moved on. Cookies, browser signatures, and IP addresses can all help identify particular machines and particular browsers—but how can you tell which human actually sits behind the terminal at a given moment? One way is by measuring the "cadence" of their typing.

Scout Analytics has done just that in order to help its 40 paid content clients detect and stop those "sharing" their accounts without permission. Imagine that you sell access to an expensive database, so expensive that users are routinely tempted to share their "named accounts" with others in the office rather than pay for additional licenses. You would probably want to "encourage" these users to pay up or stop sharing the account, but it's difficult to know which logins are legitimate and which are not.

Cookies, browsers, and biometrics

That's where a company like Scout comes in. I spoke with Matt Shanahan, VP of Strategy for the company, about a research project that Scout just concluded that tried to figure out exactly when more than one person was using a single named account.

At first, Scout of course tried using cookies to track this information, but this produced terrible data; it suggested that six or seven different devices were being used to access each account, a number that seemed far too high to be plausible. So Scout then added browser data, of the kind highlighted by the EFF's recent Panopticlick project, to prevent problems like cleared cookies. When applied to a data set of 20 million actual logins to paid content sites, this refined technique identified nearly 600,000 unique devices being used for access.

This produced a more accurate count of "cookied browsers," but not of "actual users." An expensive subscription service might well be accessed by multiple people using the same central office computer, for instance, all using the same login, same browser, and same cookie.

So Scout used some Javascript timing features to watch how users type when they enter their login credentials for various services. Shanahan says that his algorithms need a minimum of 5 attempts at entering a phrase of at least 12 characters in order to generate a typing "cadence." By watching repeated logins, Scout could soon categorize these cadences into a digital pattern, then assign each pattern a serial number.

"As you're typing, you have a cadence and rhythm," Shanahan says, a rhythm that includes how long one holds down various keys and how long it takes to move between keys. Applying the technology to its data set of 20 million logins, Scout pulled out 175,000 unique patterns—thereby identifying 175,000 distinct users, even when they used the same login credentials on the same machine.

But only 130,000 users had subscribed to the services in question, meaning that 45,000 of the 175,000 people using the services were freeriding. Even if cookie tracking were 100 percent accurate, it would be off by a factor of 2-4x when it comes to tracking individual users of a service.

These typing patterns aren't quite unique—Shanahan estimates that 1 in 20,000 people share the same pattern—but when you combine that with IP addresses and browser information, it's good enough for its intended purpose.

What companies do with this information is up to them. Shanahan says some use the soft sell, calling up clients who are allowing multiple users on a single account and reminding them of the terms of service. The goal isn't to browbeat customers, but to convince them to pay up for the additional licenses they appear to need. Other companies might choose the "irritate them into submission" approach, perhaps by resetting the account password whenever multiple unique users access an account. Scout estimates that such information can boost subscription revenues by 10-15 percent.

"The amount that can be known from the network is pretty amazing," Shanahan notes, and he concedes that few users even know that their machines enable tracking of this kind. But he does point out that the patterns created by Scout's software don't identify people; each cadence pattern identifies someone unique, but the software has no idea who the person is.

That may be cold comfort to groups like the EFF, which have long been wary of online tracking schemes. While Panopticlick showed just how easy it was to uniquely track browsers, analytics companies like Scout can already pick out a browser's unique users.

With a bit more work, a court order, and the cooperation of an ISP, the day might not be far off when the old "Hey, that must have been someone else using my computer!" defense comes to an end. On the flip side, such technology could provide evidence that it really was someone else at your machine.

The RIAA no doubt wishes it had access to this technology back when it was still suing file-swappers and meeting this very objection in court.