Time Management (Computer Metaphors) Part 2 – Polling and Interrupts

2011-01-08

Good time management is a bit like computer programming, in some ways at least … How do you keep track of tasks which can’t be carried forward, until some outside event has taken place? Perhaps you’re waiting for a response from a vendor, or a decision from your manager on which option to take. ‘Hanging’ tasks like this can compete for your attention: you can’t do anything for the time being, but you don’t want to forget about them. Tasks like this aren’t really a “to do” item, there’s no action you can take. Therefore on some level, your mind keeps telling you to check their progress over and over. You just can’t let it lie and get on with something else. Polling and the Postman’s Knock To risk a slightly contrived analogy, imagine ordering a book from Amazon: a reference book you need before you can carry on with your project. What do you do after you fire off the order? There are only two basic options: keep checking your letterbox until it arrives, or get on with something else and wait for the postman to knock. In computer parlance, the first approach is called polling, and the second is interrupt-based. Polling for an event is quite terrible as far as efficiency is concerned. You just keep checking for the event to occur, over and over, without getting anything done in the meantime. You’d only want to do this in unusual situations: perhaps the postman won’t come right to your door, and leaves the parcel at the end of your drive instead. In that case you would have to keep looking out the window for the parcel. However, the upside is that since you’re continually checking for the arrival, it will be more obvious if it’s been delayed. The polling approach is fairly inept in general, assuming you have other things to do with your time. It carries the cynical view that “I can’t trust anybody, I have to do everything myself!” It takes an inaccurate view of responsibilities and feels like a generalised form of nagging (“Is it done yet? … Is it done yet? …") Conversely the interrupt-based approach applies the inversion of control principle. Rather than taking the burden yourself, this places responsibility where it is due. The postman carries knowledge of your parcel’s arrival, so he can knock on your door when its time. This allows you to forget about the order, depending purely on the ‘postman’s knock’ which acts as an interrupt. At least in that case you could get on with something else in the meantime. However, there are several disadvantages to using interrupts. First, you might forget you ordered the parcel, and it never arrives. You had the luxury to get on with something else, but this distracted you from the pending task which never completed. The second disadvantage is that when the postman finally knocks, you might be disturbed from what you were doing, allured by the shiny new tome in your haste to read it cover-to-cover. Ideally we want an approach to time management which gives us all the upsides, but controls for the risks. “Give me a call when it’s ready, but I’ll keep one eye open in case that call never comes.” Fortunately, patterns to solve this problem exist, deep in the guts of your favourite multitasking operating system. Processes and I/O The polling approach may be seen in specialised single-process, real-time or embedded systems. If only one special task is running, and that task is waiting for an event, then there is no harm in spinning in a loop. However, in typical multitasking systems, polling would be expensive and quite unnecessary. Other more important tasks may be able to proceed while a task is waiting for its data. Therefore an interrupt-based ‘postman’s knock’ mechanism is favoured. This design ensures that a multitasking operating system can maintain its guarantee that processes should be scheduled in priority order (in simple terms). The interrupt-based approach is fair: I submit a request for the data, then I sleep while other things happen. Subsequently I get a nudge once my data is ready. Let’s see how this happens in more detail. When a process reads from disk, that request is sent to the disk controller, then the process is suspended (and placed onto a special Blocked queue). Processes on the Blocked queue are waiting for some outside event to complete, and will never get any processing time until then. The hard disk (which can operate independently of the CPU itself) begins seeking the requested data in the background. Meanwhile, the operating system switches to another process, say updating the screen or listening for mouse clicks. It’s as though the pending “disk read” is out of mind, giving the luxury of a responsive system in the meantime. Later on, the hard disk finishes its work. The hard disk controller is an actor in its own right, and gives the operating system a nudge (or interrupt): “Hey, someone requested some data - here it is”. How does the operating system respond to the interrupt? Instead of immediately switching to the program that was waiting for the data (which might disturb a higher-priority task), the operating system moves the waiting process from the Blocked queue, to the Ready queue. The Ready queue contains all the programs that can proceed with their work, often according to some priority order. Once this is done, the operating system takes the opportunity to pick a new task from the Ready queue according to some priority order. It may or may not be the task which requested the data. Once the waiting task gets its turn to run, it will proceed at the very next instruction after the call to read, with the pending data now in context. This emerges as an elegant programming model - the application programmer does not have to explicitly do anything to deal with the complex mechanism under the hood. One issue is that a process may have to wait a very long time to receive its data, creating a ‘hang’. What if some remedial action should be taken if a time limit is surpassed? There’s no way the waiting process can implement this requirement - it’s in stasis on the Blocked queue and will never execute any code until the interrupt comes. To deal with this, a timeout value has to be passed to the operating system no later than when the call to read is invoked. This suggests that the operating system must do some housekeeping on the Blocked queue; perhaps its items are ordered by a “time to live”. This strategy appears to be utilised in the socket timeout parameter often available on calls to read data from a network. Maintaining your own Blocked and Ready Queues The scenarios described are similar to situations where a task on your to-do list is blocked on some outside event. Clearly we would want to take an interrupt-based approach in real life. But how do we avoid the twin perils of (i) losing track of the time spent waiting, and (ii) becoming distracted by incoming ‘data’ as it arrives? The solution to both of these concerns is the “Waiting For” and “Next Actions” lists described in the Getting Things Done system. The “Waiting For” list is your Blocked Queue - a list of items requiring a response from an outside party, each with an optional ‘time to live’ before something has to be done to expedite it. Your “Waiting For” list is separate from your “Next Actions” (your very own Ready Queue). It doesn’t pollute your everyday consciousness; you only have to review the “Waiting For” list as often as you really have to. This ‘housekeeping’ prevents any items from sliding off the “Waiting For” list, and being forgotten. The second concern is easily solved. When something you are waiting for becomes available, instead of becoming side-tracked by it, you simply move it from the “Waiting For” list to an appropriate addition to the “Next Actions” list. Appropriately recorded, this frees your mind to focus on one task at a time. You can then either continue what you were doing, or pick something from your “Next Items” (or Ready Queue) according to priority.