Friday, 16 April 2010

Imitators dominate innovators in a virtual world

Social learning is something we do every day, in many aspects of our lives. Humans are expert imitators, and our innate ability to learn from others may be partly responsible for our success in inhabiting nearly every corner of the globe. On some levels, it makes sense: individuals can avoid making mistakes by imitating successful tactics used by others, saving valuable time and energy. However, imitation doesn’t always produce positive results. We can make mistakes when attempting to mimic what we’ve seen. Maybe circumstances have changed, and the technique we learned isn’t useful anymore. When does social learning benefit us, and when should we figure it out for ourselves?

A group of researchers set out to answer this question, and published their results in Science last week. To tackle the issue, the researchers set up a computer-based tournament based on Robert Axelrod’s ‘Prisoner’s Dilemma’ competitions in the late 1970s. In this type of tournament, entrants submit computerized strategies that compete against each other in a virtual world. Individuals, or “agents,” with the most successful strategies survive and reproduce, while less successful strategies die out.

In each round of the social learning tournament, automated agents could choose from 100 behaviors, each of which returned a certain payoff. The payoffs changed over the course of the tournament, simulating changing environmental conditions that might render a behavior more or less useful. In any round, agents could make one of three moves: use a behavior they already knew (Exploit), use asocial learning to test a new behavior by trial-and-error (Innovate), or learn socially by watching a behavior that another agent was performing in that round (Observe). Out of the three possible moves, only Exploit resulted in a payoff; the two learning moves would only return information about how profitable the behavior was in the current environmental conditions. Social learning was especially costly; if Observe was played when no other agent was performing a novel behavior, the agent learned nothing.

Over 10,000 rounds, agents had a constant probability of dying, but their ability to reproduce was based on their “success,” or the total of the payoffs they had received. Each strategy’s final score was determined by its average frequency in the population during the final 2,500 rounds.

The researchers received submissions of agents from academics, graduate students, and high-schoolers from 16 different countries. A huge variety of disciplines were represented, including computer science, philosophy, neuroscience, and primatology. Entries could be submitted as Matlab functions or in pseudocode form, which is a series of verbal, mathematical, and logical instructions of how the decisions should be made.

Out of 104 submitted strategies, one called discountmachine was the runaway winner. The researchers expected that the best strategies would balance Observe and Innovate moves, in order to limit the costs associated with social learning. Surprisingly, discountmachine (as well as the second-place strategy, intergeneration) used almost exclusively social, rather than asocial, learning. The results suggest that social learning was successful because agents playing Observe were learning behaviors that other agents had chosen to play based on their high payoffs; in other words, despite the potential cost, they were consistently learning behaviors with high returns.

The most successful strategies relied more heavily on information that was recently acquired, since knowledge of the payoffs was up-to-date. However, discountmachine went one step further than other strategies, varying the use of outdated information based on how quickly the environmental conditions were changing. When the environment was changing rapidly, old information was discounted much more heavily than when conditions were relatively stable.

Even when the researchers ran repeated simulations slowing the rate of environmental change, increasing the probability of social learning errors, and increasing the cost of social learning, the top-ranked strategies still dominated, suggesting that highly social strategies are adaptive across a wide range of conditions. Interestingly, there were a few situations in which social learning didn’t pay. Obviously, playing Observe was only beneficial when there were other agents around to imitate. Additionally, social learning wasn’t as successful when researchers eliminated the possibility of incorrectly imitating a behavior. It seems that this kind of copying error may be a source of behavioral diversity in populations.

Thanks to this tournament, winners Daniel Cownden and Timothy Lillicrap—the graduate students who created discountmachine—are £10,000 richer, and scientists have a much better grasp on when social learning pays, when it doesn’t, and why it is such a successful strategy.

Google Cloud Print: coming to a wireless device near you

The question of how to print from wireless devices has been thrust once again into the limelight recently thanks to the printing-anemic iPad. Longtime notebook and mobile device users are quite familiar with the printing conundrum—cables, drivers and all.

Google has announced that it's looking to address this problem in the form of Cloud Print. Part of the Chromium and Chromium OS projects, Cloud Print aims to allow any type of application to print to any printer. This includes Web, desktop, and mobile apps from any kind of device—potentially, this could be used on a BlackBerry, Windows machines, Macs, or even the iPad. (That is in addition to Google's own offerings: "Google Chrome OS will use Google Cloud Print for all printing. There is no print stack and there are no printer drivers on Google Chrome OS!" says the company.)

The devices would make use of a Web-based API to either communicate directly with cloud-aware printers, or to send signals to a proxy in order to communicate with "legacy" printers. Google says it's already developing software for Windows to perform this function, "and will support Mac and Linux later on as well."

Yes, there are already wireless printers that work over your local network without having to be tethered, but there are downsides to this solution (I say this as a semi-satisfied owner of one). The biggest hurdle is, of course, the fact that you must actually be on the same network in order to print. (I can't complete and print an expense report from this coffee shop now that I'm thinking about it, for example.) VPN is an option, but that's an extra step that could be eliminated.

Then there's the problem we discussed above: my wireless printer only has drivers for real computers. If I buy concert tickets on my phone or if I compose a document on my iPad, I have to wait till I have access to a computer to print them. These inconveniences could easily be addressed by cloud-based printing.

Google says that the Cloud Print project is still in the early stages, but, like with its other open source projects, the company has made the existing code and documentation open to the public. The documentation indicates that, in order to use Google's Cloud Print, users will have to associate their printers with their Google logins—a detail that might make some privacy advocates squirm. Still, Google stresses that it expects other companies to develop their own cloud printing solutions, so this is likely only the beginning of a trend.

Standalone Solaris subscriptions will soon be history

After its recent acquisition of Sun, enterprise software vendor Oracle began making some significant changes to Solaris licensing policies. Solaris 10, the latest stable version of Sun's UNIX operating system, was previously available for free without official support. Oracle changed the license last month, however, limiting it to a 90-day trial. The new license is relatively clear, but left a number of questions unanswered.

An Ars reader put some of those questions to his Sun account manager and got clarification about exactly what the new license terms will mean for Solaris users. According to the response that he received, Oracle intends to stop selling standalone Solaris subscriptions. Software support will only be available with hardware contracts. As our reader Eric explains in his blog, "There is no possible way to legally run Solaris on non-Sun servers. Period. End of story."

He also got some clarification about the terms of the 90-day trial and the situations in which it is applicable. He was told that the software will remain free for personal noncommercial uses, but that the free download license is limited to a 90-day trial for commercial production use.

"The license and accompanying entitlement from the web, without a contract and without hardware, only entitle the downloader to no-commercial, nonproduction, or personal use in perpetuity. Production use and evaluation for production are good for 90 days," he was told.

As we explained in our previous coverage, this license covers the latest stable, official Solaris release and doesn't have any impact on OpenSolaris, the open source distribution of the operating system. Oracle has indicated, however, that OpenSolaris might not be getting all of the new Solaris features that are developed in the future. The overdue OpenSolaris 2010.03 release hasn't materialized yet; the delay is likely attributable to the understandable disruption caused by the acquisition.

Thursday, 15 April 2010

BitTorrenting biology, getting the big picture in search

The biosciences, like other branches of research, are being dragged into the digital era. This is in part because traditional mediums of communications, including journal articles, are migrating online, and in part because high-throughput approaches to biological research are producing staggering amounts of data that can only be stored in digital form. A couple of papers released by PLoS ONE have presented new approaches to both aspects of digitization that, in essence, simply involve modifying tools that are common outside of the field specifically for use by biologists.

BitTorrenting genomes

The paper that describes the BioTorrents project lays out some staggering figures on the scale of the problem, based on a query posed to someone at the National Center for Biotechnology Information. In a recent month, the NCBI served the following data sets: 1,000 Genomes (9TB, served 100,000 times), Bacterial genomes (52GB, 30,000 downloads), and Gen Bank (233GB, 10,000 downloads), in addition to tens of thousands of retrievals of smaller datasets. That's a lot of bandwidth by anyone's standards, all of it served by a relatively small portion of the NIH.

As the paper points out, some of this is almost certainly redundant, as some labs are probably grabbing data that another group at the same institute—possibly in the same building—has already obtained. Fortunately, the larger community of Internet users has already figured out a system for sharing the burden of handling large files: BitTorrent.

Although it would be possible to start dumping files onto existing networks, there are two drawbacks to that, the authors argue: those networks are at risk of getting shut down due to copyright violations, and finding biological databases among the other content (read: porn, movies, and TV shows) is a needle-in-a-haystack issue. So, they've modified a GPLed client, and set up their own server at UC Davis, where they work. Anyone can download, but people have to register to upload data, allowing the administrators to police things for appropriate content. The server also mirrors the data immediately, in order to assure there's at least one copy available at all times.

The BioTorrent site enables people to find data based on metadata like categories and license type, and a basic text search is also available. Versioning options and RSS feeds are available for datasets that are frequently updated. Overall, it seems like a well-designed project, and I'm sure the NCBI appreciates having someone else shoulder the bandwidth load.

Making search visual

NCBI also happens to host PubMed, a database that contains the abstract of every significant biomedical journal (and a large portion of the less significant ones, too). Since relatively few of the journals published, at least until recent years, were open access, however, it doesn't allow full searching of an article's contents. A team just a bit down Route 80 from the Davis group (at UC Berkeley) have been doing some testing to find out whether biologists are missing out when they use PubMed.

Their project, online at BioText, uses a search engine that's indexed the full contents of a few hundred open access journals. Not only does it handle full text, but it also identifies when terms appear in figure legends, which describe the contents of images. This group's PLoS ONE paper focuses on user testing with a system that identifies relevant images based on the search terms, and displays those. It's a nice system, and the twenty or so biologists they tested it on seemed to like it.

Of course, user satisfaction may not be the right metric, if other studies are to be believed. The paper cites one that showed that blank squares improve the use of search results as much as images do, and that people tend to believe a result is more relevant simply if it comes with an image. So, although the users may be happier with the thumbnails, they are likely to be working less effectively.

Should a service of this sort actually prove more useful, it would be tempting to conclude that open access journals would end up having a greater impact on the scientific discourse, simply because it'll be easier to find things in them. Still, a couple of things may limit the impact. Google scholar is one of them; since the company's scanning operations deal with hardcover issues in university libraries, they won't be as up-to-date, though.

There's also a commercial company, PubGet, that wraps PubMed searches in a user interface that will inline any PDFs you have access to. Since most scientists work at institutions with extensive journal access, that means they can easily see the full text of a paper to decide if a search result is relevant. That still doesn't overcome the inability of PubMed to index the full text of a paper, however.

The end result (for now, at least) is that researchers will probably need to use several search tools in order to do an exhaustive check for relevant content. Unlike the open question of whether image thumbnails help or hurt, there's little doubt that this isn't great for productivity.