Friday, 16 April 2010

Imitators dominate innovators in a virtual world

Social learning is something we do every day, in many aspects of our lives. Humans are expert imitators, and our innate ability to learn from others may be partly responsible for our success in inhabiting nearly every corner of the globe. On some levels, it makes sense: individuals can avoid making mistakes by imitating successful tactics used by others, saving valuable time and energy. However, imitation doesn’t always produce positive results. We can make mistakes when attempting to mimic what we’ve seen. Maybe circumstances have changed, and the technique we learned isn’t useful anymore. When does social learning benefit us, and when should we figure it out for ourselves?

A group of researchers set out to answer this question, and published their results in Science last week. To tackle the issue, the researchers set up a computer-based tournament based on Robert Axelrod’s ‘Prisoner’s Dilemma’ competitions in the late 1970s. In this type of tournament, entrants submit computerized strategies that compete against each other in a virtual world. Individuals, or “agents,” with the most successful strategies survive and reproduce, while less successful strategies die out.

In each round of the social learning tournament, automated agents could choose from 100 behaviors, each of which returned a certain payoff. The payoffs changed over the course of the tournament, simulating changing environmental conditions that might render a behavior more or less useful. In any round, agents could make one of three moves: use a behavior they already knew (Exploit), use asocial learning to test a new behavior by trial-and-error (Innovate), or learn socially by watching a behavior that another agent was performing in that round (Observe). Out of the three possible moves, only Exploit resulted in a payoff; the two learning moves would only return information about how profitable the behavior was in the current environmental conditions. Social learning was especially costly; if Observe was played when no other agent was performing a novel behavior, the agent learned nothing.

Over 10,000 rounds, agents had a constant probability of dying, but their ability to reproduce was based on their “success,” or the total of the payoffs they had received. Each strategy’s final score was determined by its average frequency in the population during the final 2,500 rounds.

The researchers received submissions of agents from academics, graduate students, and high-schoolers from 16 different countries. A huge variety of disciplines were represented, including computer science, philosophy, neuroscience, and primatology. Entries could be submitted as Matlab functions or in pseudocode form, which is a series of verbal, mathematical, and logical instructions of how the decisions should be made.

Out of 104 submitted strategies, one called discountmachine was the runaway winner. The researchers expected that the best strategies would balance Observe and Innovate moves, in order to limit the costs associated with social learning. Surprisingly, discountmachine (as well as the second-place strategy, intergeneration) used almost exclusively social, rather than asocial, learning. The results suggest that social learning was successful because agents playing Observe were learning behaviors that other agents had chosen to play based on their high payoffs; in other words, despite the potential cost, they were consistently learning behaviors with high returns.

The most successful strategies relied more heavily on information that was recently acquired, since knowledge of the payoffs was up-to-date. However, discountmachine went one step further than other strategies, varying the use of outdated information based on how quickly the environmental conditions were changing. When the environment was changing rapidly, old information was discounted much more heavily than when conditions were relatively stable.

Even when the researchers ran repeated simulations slowing the rate of environmental change, increasing the probability of social learning errors, and increasing the cost of social learning, the top-ranked strategies still dominated, suggesting that highly social strategies are adaptive across a wide range of conditions. Interestingly, there were a few situations in which social learning didn’t pay. Obviously, playing Observe was only beneficial when there were other agents around to imitate. Additionally, social learning wasn’t as successful when researchers eliminated the possibility of incorrectly imitating a behavior. It seems that this kind of copying error may be a source of behavioral diversity in populations.

Thanks to this tournament, winners Daniel Cownden and Timothy Lillicrap—the graduate students who created discountmachine—are £10,000 richer, and scientists have a much better grasp on when social learning pays, when it doesn’t, and why it is such a successful strategy.

Google Cloud Print: coming to a wireless device near you

The question of how to print from wireless devices has been thrust once again into the limelight recently thanks to the printing-anemic iPad. Longtime notebook and mobile device users are quite familiar with the printing conundrum—cables, drivers and all.

Google has announced that it's looking to address this problem in the form of Cloud Print. Part of the Chromium and Chromium OS projects, Cloud Print aims to allow any type of application to print to any printer. This includes Web, desktop, and mobile apps from any kind of device—potentially, this could be used on a BlackBerry, Windows machines, Macs, or even the iPad. (That is in addition to Google's own offerings: "Google Chrome OS will use Google Cloud Print for all printing. There is no print stack and there are no printer drivers on Google Chrome OS!" says the company.)

The devices would make use of a Web-based API to either communicate directly with cloud-aware printers, or to send signals to a proxy in order to communicate with "legacy" printers. Google says it's already developing software for Windows to perform this function, "and will support Mac and Linux later on as well."

Yes, there are already wireless printers that work over your local network without having to be tethered, but there are downsides to this solution (I say this as a semi-satisfied owner of one). The biggest hurdle is, of course, the fact that you must actually be on the same network in order to print. (I can't complete and print an expense report from this coffee shop now that I'm thinking about it, for example.) VPN is an option, but that's an extra step that could be eliminated.

Then there's the problem we discussed above: my wireless printer only has drivers for real computers. If I buy concert tickets on my phone or if I compose a document on my iPad, I have to wait till I have access to a computer to print them. These inconveniences could easily be addressed by cloud-based printing.

Google says that the Cloud Print project is still in the early stages, but, like with its other open source projects, the company has made the existing code and documentation open to the public. The documentation indicates that, in order to use Google's Cloud Print, users will have to associate their printers with their Google logins—a detail that might make some privacy advocates squirm. Still, Google stresses that it expects other companies to develop their own cloud printing solutions, so this is likely only the beginning of a trend.

Standalone Solaris subscriptions will soon be history

After its recent acquisition of Sun, enterprise software vendor Oracle began making some significant changes to Solaris licensing policies. Solaris 10, the latest stable version of Sun's UNIX operating system, was previously available for free without official support. Oracle changed the license last month, however, limiting it to a 90-day trial. The new license is relatively clear, but left a number of questions unanswered.

An Ars reader put some of those questions to his Sun account manager and got clarification about exactly what the new license terms will mean for Solaris users. According to the response that he received, Oracle intends to stop selling standalone Solaris subscriptions. Software support will only be available with hardware contracts. As our reader Eric explains in his blog, "There is no possible way to legally run Solaris on non-Sun servers. Period. End of story."

He also got some clarification about the terms of the 90-day trial and the situations in which it is applicable. He was told that the software will remain free for personal noncommercial uses, but that the free download license is limited to a 90-day trial for commercial production use.

"The license and accompanying entitlement from the web, without a contract and without hardware, only entitle the downloader to no-commercial, nonproduction, or personal use in perpetuity. Production use and evaluation for production are good for 90 days," he was told.

As we explained in our previous coverage, this license covers the latest stable, official Solaris release and doesn't have any impact on OpenSolaris, the open source distribution of the operating system. Oracle has indicated, however, that OpenSolaris might not be getting all of the new Solaris features that are developed in the future. The overdue OpenSolaris 2010.03 release hasn't materialized yet; the delay is likely attributable to the understandable disruption caused by the acquisition.

Thursday, 15 April 2010

BitTorrenting biology, getting the big picture in search

The biosciences, like other branches of research, are being dragged into the digital era. This is in part because traditional mediums of communications, including journal articles, are migrating online, and in part because high-throughput approaches to biological research are producing staggering amounts of data that can only be stored in digital form. A couple of papers released by PLoS ONE have presented new approaches to both aspects of digitization that, in essence, simply involve modifying tools that are common outside of the field specifically for use by biologists.

BitTorrenting genomes

The paper that describes the BioTorrents project lays out some staggering figures on the scale of the problem, based on a query posed to someone at the National Center for Biotechnology Information. In a recent month, the NCBI served the following data sets: 1,000 Genomes (9TB, served 100,000 times), Bacterial genomes (52GB, 30,000 downloads), and Gen Bank (233GB, 10,000 downloads), in addition to tens of thousands of retrievals of smaller datasets. That's a lot of bandwidth by anyone's standards, all of it served by a relatively small portion of the NIH.

As the paper points out, some of this is almost certainly redundant, as some labs are probably grabbing data that another group at the same institute—possibly in the same building—has already obtained. Fortunately, the larger community of Internet users has already figured out a system for sharing the burden of handling large files: BitTorrent.

Although it would be possible to start dumping files onto existing networks, there are two drawbacks to that, the authors argue: those networks are at risk of getting shut down due to copyright violations, and finding biological databases among the other content (read: porn, movies, and TV shows) is a needle-in-a-haystack issue. So, they've modified a GPLed client, and set up their own server at UC Davis, where they work. Anyone can download, but people have to register to upload data, allowing the administrators to police things for appropriate content. The server also mirrors the data immediately, in order to assure there's at least one copy available at all times.

The BioTorrent site enables people to find data based on metadata like categories and license type, and a basic text search is also available. Versioning options and RSS feeds are available for datasets that are frequently updated. Overall, it seems like a well-designed project, and I'm sure the NCBI appreciates having someone else shoulder the bandwidth load.

Making search visual

NCBI also happens to host PubMed, a database that contains the abstract of every significant biomedical journal (and a large portion of the less significant ones, too). Since relatively few of the journals published, at least until recent years, were open access, however, it doesn't allow full searching of an article's contents. A team just a bit down Route 80 from the Davis group (at UC Berkeley) have been doing some testing to find out whether biologists are missing out when they use PubMed.

Their project, online at BioText, uses a search engine that's indexed the full contents of a few hundred open access journals. Not only does it handle full text, but it also identifies when terms appear in figure legends, which describe the contents of images. This group's PLoS ONE paper focuses on user testing with a system that identifies relevant images based on the search terms, and displays those. It's a nice system, and the twenty or so biologists they tested it on seemed to like it.

Of course, user satisfaction may not be the right metric, if other studies are to be believed. The paper cites one that showed that blank squares improve the use of search results as much as images do, and that people tend to believe a result is more relevant simply if it comes with an image. So, although the users may be happier with the thumbnails, they are likely to be working less effectively.

Should a service of this sort actually prove more useful, it would be tempting to conclude that open access journals would end up having a greater impact on the scientific discourse, simply because it'll be easier to find things in them. Still, a couple of things may limit the impact. Google scholar is one of them; since the company's scanning operations deal with hardcover issues in university libraries, they won't be as up-to-date, though.

There's also a commercial company, PubGet, that wraps PubMed searches in a user interface that will inline any PDFs you have access to. Since most scientists work at institutions with extensive journal access, that means they can easily see the full text of a paper to decide if a search result is relevant. That still doesn't overcome the inability of PubMed to index the full text of a paper, however.

The end result (for now, at least) is that researchers will probably need to use several search tools in order to do an exhaustive check for relevant content. Unlike the open question of whether image thumbnails help or hurt, there's little doubt that this isn't great for productivity.

AT&T network problems: coverage, speed, or none of the above?

AT&T has blamed issues with its network on smartphone users—sometimes pointing to iPhone users in particular—moving tons of data over its network. A new report from ABI Research says that Verizon and Sprint are moving much more data over their networks than AT&T, suggesting that its limited geographical coverage is to blame. However, AT&T says that the data and conclusions aren't very accurate.

The ABI report, titled "US Mobile Operator Traffic Profiles," claims that Verizon and Sprint networks carried far more data in 2009 than AT&T. This is despite the fact that AT&T has more active data devices on its network, and also typically rates as one of the fastest 3G data networks in the US. ABI researcher Dan Shey cited two main reasons for this seemingly illogical conclusion: 3G data modems, and wider coverage areas.

Both Verizon and Sprint have far more 3G laptop modems and mobile hotspots, like the MiFi, in use on their networks compared to AT&T. "It is laptop mobile data connections that have the most impact on operator data traffic levels," Shey said in a statement.

The study also cited Verizon's and Sprint's wider 3G coverage areas as contributing to their higher levels of data traffic. Certainly, if devices can be used in more areas, then there is a greater potential for higher data use.

However, AT&T told Ars that the study has a number of flaws, and that its conclusions don't support AT&T's own research. AT&T spokesperson Seth Bloom pointed out that based on analysis of third-party data, AT&T actually carries about half of all mobile data on its network. It also has twice the number of smartphone customers as the next nearest competitor, and mobile traffic has increased 5,000 percent over the last three years.

Bloom told Ars that the study overestimates data use from laptop users, and underestimates the impact of smartphone users. The ABI report estimates that laptop users use 25x more data than smartphone users, though AT&T cited a Frost & Sullivan study that put that number closer to 5x. ABI also assumes that iPhones use about 2x-3x more data than the average smartphone, whereas AT&T's own data says that iPhones and other similar phones consume as much as 10x the traffic of other devices. It's worth noting, too, that most data modem plans come with a monthly data cap, usually 5GB, while most smartphone plans have no such restriction.

Finally, Bloom said, the numbers just don't seem to add up. "For example, Sprint claimed 48.1 million customers to AT&T’s 85.1 million at the end of the fourth quarter 2009," Bloom told Ars. "If AT&T serves 37 million more customers, and more smartphone customers than any other US carrier, how could Sprint possibly carry more mobile data traffic?"

The impact of the wider geographical area also seems overemphasized. For users that find themselves in more remote areas, that wider coverage is very important. But AT&T's 3G coverage covers a wide range of urban areas, reaching about 75 percent of the population, and its 2G EDGE network covers far more.

ABI Research did not respond to our requests for comment on AT&T's objections to the report's content, but Bloom told Ars that AT&T had not been contacted by ABI to provide any data for the report.

Which network actually carried the biggest number of megabytes hardly matters to the vast majority of consumers, however. With pricing largely the same between carriers, most consumers are either picking the network that has the best coverage and reliability in their area, or the one that has the device they most want.