Tuesday, 23 March 2010

Multicore requires OS rewrites? Well, maybe

A Microsoft kernel engineer, Dave Probert, gave a presentation last week outlining his thoughts on how the Windows kernel should evolve to meet the needs of the multicore future ahead of us. Probert complained that current operating systems fail to capitalize on the capabilities of multicore processors and leave users waiting. "Why should you ever, with all this parallel hardware, ever be waiting for your computer?" he asked.

Probert said that a future OS should not look like Windows or Linux currently do. In particular, he targeted the way current OSes share processor cores between multiple applications. He suggested that in multicore OSes, cores would instead be dedicated to particular processes, with the OS acting more as a hypervisor; assigning processes to cores, but then leaving them alone to do their thing. It might then be possible to abandon current abstractions like protected memory—abstractions that are necessary in large part due to the sharing of processor resources between multiple programs and the operating system itself.

The reason for this major change is, apparently, that it will improve the responsiveness of the system. Current OSes don't know which task is the most important, and though there are priority levels within the OS, these are generally imprecise, and they depend on programs setting priorities correctly in the first place. The new approach would purportedly improve responsiveness and provide greater flexibility, and would allow CPUs to "become CPUs again."

Probert is an engineer for Microsoft, working on future generations of the Windows kernel. He acknowledged that other engineers at Microsoft did not necessarily agree with his views.

At least, this is what has been claimed; the're one original report from IDG, and that's about the extent of it. The presentation was made at the Microsoft and Intel-sponsored Universal Parallel Computing Research Center at the University of Illinois at Urbana-Champaign, and the slides unfortunately appear to be available only to university attendees and sponsors. Either the report is missing some key point from the presentation that explains the ideas, or it's just not that surprising that Probert's Microsoft colleagues don't agree with him since, well, the suggestion just doesn't make a whole lot of sense.

The big reason that you might have to "wait for your computer" is because your computer hasn't done what you've asked it for. It's still loading a document, or rendering a Web page, or computing your spreadsheet, or something else. Dedicating cores to specific processes isn't going to change that—the problem is not task-switching overhead (which is negligible, and far, far quicker than human reactions can detect) or the overhead of protected memory. The problem is much simpler: programs are slow to react because the tasks they're doing take a finite amount of time, and if sufficient time has not elapsed the task will not be complete!

It's true that some programs do bad things like failing to respond to user input while they're performing lengthy processing, but that's bad coding, and dedicating cores to processes isn't going to do a thing to prevent it. That problem needs to be fixed by developers themselves. The broader problem—splitting up those tasks so that they can be computed on multiple processors simultaneously, and hence get faster and faster as more cores are available—remains a tough nut to crack, and indeed is one of the problems that the Parallel Computing Research Center is working on.

Most peculiar is the alleged claim that this model has "more flexibility" than current models. Current systems can already dedicate processor cores to a task, by having the OS assign a task to the core and then letting it run uninterrupted. But they can also multiplex multiple processes onto a single core to enable running more processes than one has cores (we call this "multitasking").

This isn't to say that operating systems won't undergo changes as more cores become more common. Windows 7, for example, included a raft of changes to improve the scaling of certain system components when used on large multi-core systems. But none of these changes required throwing out everything we have now.

It's not possible that we might yet have to do just that in order to get useful scaling if and when CPUs routinely ship with dozens or hundreds of cores. But unless something's missing from the explanation, it's hard to see just how a massive single-tasking system is the solution to any of our problems.