Monday, May 24, 2010
I had to edit the HTML of those posts a bit because Blogger, as far as I can tell, is not quite as smart as WordPress as regards footnotes in particular. Let me know if anything seems to be broken.
I bet not too many people reading this have ever used a terminal hooked up to a minicomputer, and even fewer have used a forms-based interface on a terminal of that kind. But this kind of human-computer interaction is still a commonplace event in the business world: legacy IBM mini systems are everywhere. (If Y2K couldn’t kill them, nothing can.) People call this model “green-screen technology,” and they mean it pejoratively—it’s something archaic, clunky, and generally inferior to “modern” user interfaces.
That charge is mostly accurate. Though green-screen technology has its place (and to say what that place is, would be getting off-track, but it does have one) there’s no denying that it’s old-fashioned. But let me point out that so much of the Web consists of exactly that sort of thing. Web pages with fill-out boxes, check boxes, and radio buttons are even called “forms,” in obvious acknowledgement; so are Visual Basic application windows. The metaphor works, sort of: you fill in a form and then click “submit” (or press “Enter” on your IBM terminal’s keyboard—ever wonder why Windows PC keyboards say “Enter” instead of “Return”? now you know) which is like sending in a (paper) form to an office somewhere. Then you get an answer back—the results of a computation, a database query, etc.1
None of this is shocking. What is shocking is how much modern software exists whose interface is still essentially forms-based, yet which pretends to be interactive. They’re two very, very different interface paradigms. Interactivity in software comes from more than just adding buttons & windows to a forms-based interface. I.e., if your idea of successful HCI consists of a modal window in which the user fills in a bunch of fields and presses a button, whereupon a new modal window pops up containing a report, then you’re not only putting lipstick on a pig but also being just plain dishonest: you’re selling ’70s tech as if it were something new. Way too many commercial products that are essentially prettified database frontends (which isn’t a bad thing in itself) are designed with this mentality—that all you, the user, ever do with a computer is run an offline query (and maybe a batch of them if you’re a power user). ("But I'm not a fish!")
Now think about actual interactivity, the thing that microcomputers give us (or at least were supposed to, back around 1980). This is the state where not just all the data you’re working with but also the operations on that data and their results are fully accessible at all times, within reason. It’s the guiding mentality behind WYSIWIG in word processors, for example, as opposed to typesetting software like nroff or TeX (in which you write your document as a text file with interpolated commands, then submit that file to a program which outputs a proof). Another great example is Excel, which is nothing like programming numerical computations in a traditional programming language (for Excel is a programming tool—it has more in common with friendly interpreted language environments like the old 8-bit BASICs than with much application software). You see all your numbers in front of you, and by clicking in a cell or pressing a magic keystroke you can see all the operations on them (i.e. formulas), or the results of those operations. And you have total freedom to change or transform the data or the operations in realtime. There’s no modality to speak of.
Again, because this is the critical idea: you can’t just base an interface on pulling stuff out, changing it, and then resubmitting (putting it back in), and call it interactive. True interactivity requires non-modality of not just operations but also data: that is, all the data should be accessible all the time. Jeff Atwood wrote a great blog post about taking incremental search beyond the dataspace into the commandspace (pace Emacs). I’d like to see a lot more development of and experimentation with interfaces that use this kind of dynamic filtering to perform search, Neuromancer-style n-dimensional visualization of the dataspace, or a combination of both. Imagine this: instead of filling out a form and hitting “search,” you type (or click on) your parameters and watch a nebula of data dynamically shade itself as you type, with color and transparency indicating the sets involved and their relevance rating2—sort of a 3-D mixture of Venn diagrams and PivotTables.3 Or... remember the holodeck-furniture-database-search scene from the Star Trek episode “Schisms”?4
- Actually I like to think of this not so much as a “sending a form into a government office” model of computing as a “Wizard of Oz” model. You make your request of the Great and Powerful Oz and hope he gives you back something you can use.[back]
- C’mon, let’s use those alpha channels and all that other pretty stuff that modern graphics hardware can do for something other than another variation on Doom! [back]
- But please don’t call it “drilling down!” That’s not what that means, but I’ll save that for another entry. [back]
- Why is this not required watching for budding interaction designers and database programmers? [back]
In my previous post in this blog I challenged the feasibility, from the interface-design point of view, of running applications in a browser window—on the grounds that applications and data are two different things, and the browser is inherently a tool for viewing data.
Let me add a couple of thoughts to this. First, a general point: it’s important to note that this isn’t just a linguistic or even an epistemological issue, but an ontological one. That is, it’s not just a matter of what kinds of arrangements of bits we call “data” versus “applications” and what kind of tool we use to manipulate them. It’s a question of what that tool is, what it does, to what, and for whom. Think about physical tools, the kind you buy at a hardware store.1 They’re classified according to a huge variety of schemata:
Some are classified according to the raw material on which they are designed to operate: a crosscut saw for wood, versus a hacksaw for metal.
Some, according to the physical shape of the artifact they manipulate: an Allen wrench for bolts or screws with a concave hexagonal impression in the head, a Robertson driver for those with a similar but square impression.
Some, according to the purpose of the artifact they manipulate: a flare wrench for fittings on hydraulic lines.
Some, according to the operation to be performed, largely independent of the context of the object of the operation: a screw extractor for rotating fasteners whose heads are damaged.
And often there is an overlapping schema wherein tools are classified according to the general circumstances in which you would use them, hierarchically, with groups and subgroups: there are mechanics’ tools, and there are metric tools, and then there are wrenches with built-in tubing, for opening hydraulic bleed screws without a mess in brake or clutch systems with metric fasteners.2
The point is that these classification schemes aren’t something imposed from outside, as biologists impose the Linnaean taxonomy on the ever-changing and ever-being-discovered sloppiness of the natural world in order to make it a little bit more manageable. The epistemology of a tool guides its ontology: that brake-bleeding wrench was designed specifically for the task, very likely by some mechanic fed up with the inadequate tools he or she had available to do a brake job, and a crosscut saw acquired the form it has not by chance but because generations of woodworkers refined the design to cut certain pieces of lumber in a way that was useful to them. So tools evolve not only with their objects but also with the circumstances of their use, and it’s an oversimplification to say that there is a straightforward correspondence between the tool and its object and thus a clear-cut division of tools by what object they act upon. This isn’t an excuse for the browser as application platform as currently understood, though. Exactly the opposite! The tool that is used to run online applications and explore online databases must be one that is tailored to its job, rather than the clumsy square-peg-in-a-round-hole hack of the browser-as-it-stands.
And this brings us to the other point I want to add to the previous entry. Douglas Hofstadter said of the supposed form/content distinction3 that “content is just fancy form.”4 Are applications, then, just fancy data? It’s tempting to state the question and its counterpoint as opposing theses:
- [First,] Applications are just fancy data: the more complex a data set becomes, the more operations its inherent properties suggest, until some “tipping point” of complexity is reached, at which those operations can be abstracted from that data set and ones with similar structure. [vs.]
- [Second,] Applications are closely analogous to tools; data, to raw materials and the workpieces made from them: though they may evolve together, the two are fundamentally different.
I don’t think I can, or need to, disprove the second, though I think in the paragraphs above I’ve pointed the way towards some problems with the thesis that make it less appealing than it might at first be.
The first is also intuitively appealing, but it too is problematic. I think there’s a implied argument there, a flawed one: there’s a leap in logic between the premise “data sets inspire operations” and the conclusion “those operations comprise the application.” While the premise is true, valuable and significant operations often emerge from the users of data; and through a feedback process, these operations become commonplace in ways that the data alone never could have suggested. People made tabular data easier to understand by making graphs of it for hundreds of years before some mad genius at Microsoft came up with PivotTables, and now they’re indispensable. But they sure aren’t inherent in a ledger of handwritten numbers. Nicholson Baker made a perceptive point in one of his semi-autobiographical novels that the designers of sugar packets and windshield wipers didn’t anticipate that people would centrifuge the first to better control the release of the contents and use the second to keep advertising flyers from blowing off of parked cars in the wind, but those behaviors have become integral facets of the use and therefore cultural significance of those artifacts.
And yet not everything you can do with a particular kind of data is something you should do. Word processors replace typewriters, in that they let you do things with paper. You can put various semantically significant symbols on a piece of paper, and you can also make an airplane out of it. Should a word processing program, then, contain a paper-airplane-design feature? Probably not.
The appeal of flawed thesis #1 above when I first started thinking hard about it a few years ago led me to embrace document-centric user interface design. For instance, I mentioned the idea approvingly in a review of Alan Cooper’s book The Inmates Are Running the Asylum in 2001 (an essay that now seems somewhat embarrassingly snarky and strident, but I’m archiving it here as-is anyway rather than trust Amazon to hold on to it for me forever). I still like the idea in theory, but I have serious doubts about the viability of the implementation. As I note in the Cooper review, Jef Raskin’s work in UI design exhibits the most extreme form of document-centricity—no applications at all. In the characteristic systems Raskin pioneered, the screen is a single window into one big document containing text, numbers, pictures, whatever; in theory, any operation can be performed at any point in the document at any time. Not only does the user not need to open a spreadsheet application to total up a column of numbers in the middle of a word-processing document, but he or she simply can’t, because there is no spreadsheet and no word processor; there’s just the numbers and the text. To add up those numbers you’d just select them and invoke a “TOTAL” command of some kind. Don't believe me? Read the description of the Canon Cat interface.
This is supposed to make life with the computer easier, because it does away with modes, the most-feared bugbear of interface design since the early days of the Macintosh. That is, you never have to worry about whether you’re in the “typing mode” or the “calculating mode” (for example), because (again) there is no spreadsheet and no word processor to switch between. But just a few sentences ago I said that there are problems with the implementation of this notion. Here’s the issue: Not all operations can be performed on all types of data. What happens when you try to invoke that “TOTAL” command after selecting a column of words? Will the computer do nothing? Will it spit back at you the total of the ASCII values of the letters in the selected words? (Let’s hope not!) Will it beep? (Ditto.) There’s no good answer. If you’re allowed to perform any operation on any type of data, cases where user input doesn’t make sense are going to be plentiful. (And any interface that makes it easier to make mistakes is obviously not an improvement.)
The alternative is to allow only those operations that make sense for the particular type of data in question. And that’s just modes again! Suddenly our noble and inspired designers of document-centric interfaces find themselves impaled on the horns of the elephant in the room.5 (Ouch!6)
- The definitive discussion of this is probably somewhere in Wittgenstein, if one were to look hard enough. [back]
- This one is going on my birthday list for sure. [back]
- ”Supposed” to other people than just me and Hofstadter. See, for example, this whitepaper. [back]
- In Metamagical Themas. And perhaps the design of a tool is an emergent property of its use—just fancy use! [back]
- The designers of archy, the latest system inspired by Raskin’s work, imply that they solve these problems by making the system smart enough to detect the kind of data being acted upon and perform the correct action. Way to beg the question, guys. [back]
- Sorry about the mixed metaphor and cliche. On the third hand, though, why exactly is modality so dreaded? I’m not the first person to notice that life itself is modal. If you pick up a pencil you’re constrained to a certain, even if fairly large, set of actions: you can write, pick your nose, stab your enemies, or stir a pot of soup with that pencil, but you can’t loosen a bolt, or do a thousand other things for which only other tools are appropriate. (And here we are back at the flawed but oh-so-seductive analogy between physical and virtual tools.) [back]
[Originally posted April 1st, 2007. No foolin’!]
I’ve been thinking of large websites with heavy back ends (Amazon being the canonical example) as applications for a long time now. So I have a bit of a so-what reaction when I hear people talking about a paradigm shift to applications in the browser. I want to ask, don’t you remember what Scott McNealy was saying in every interview in the late ’90s—Sun’s slogan “the network is the computer”? Turns out the people promoting a web-based thin-client model ten years ago were just way ahead of their time; it took technologies like Ajax and proof-of-concept apps like GMail and Google Maps to make the idea concrete. The reason I’m underwhelmed is not so much that something old has been dressed up and called the latest thing (what else is marketing about?), but more that there’s a fundamental change that needs to happen before apps in a browser will work. This isn’t a technological barrier—more precisely, it isn’t just a technological barrier, but also (more challenging!) one of human-computer interaction and design.
The problem is this: as it stands, the web browser as an environment for applications is almost irredeemably broken. We’re used to thinking of the navigation controls (buttons, bookmarks menu, etc.) in the browser as first-class controls, while the widgets in the window are second-class. If you get somewhere you don’t want to be in the browser, you don’t hunt through the window for an escape hatch to the previous page provided by the site designer—you just click the “back” button. (But [consider this] does “forward” ever do anything useful or predictable?) But in doing that you’ve made a conscious choice between two different interfaces—that of the browser and that of the page. Which interface does what? Giving the page its own controls is like giving the road its own steering wheel.
(Actually, the “back” button has been broken since day 1 [or at least since the first time I used the Web, in 1994, via Mosaic]. Here’s an example.
Start at page A and click a link to go to page B. Then click a link to go to page C. Then click the “back” button twice to return to the home page, A. Click on a link to go to page D. Now try to return to page B via the “back” button. You won’t be able to! As the history menu will indicate, the browser remembers only that you visited A. The interface is broken because it’s unnecessarily confusing. The “back” button is trying to serve two different and incompatible purposes: it’s supposed to mean both “undo” and “go to a higher level in the hierarchy.” The latter doesn’t work, because a fundamental principle of Web ontology1 is that the web is a network, not a hierarchy. There’s only incidentally an “up” in hypertextspace! Further, if the browser saved your entire surfing history [for this session], and if “back” also meant “up a level,” what would it mean to click “back” while viewing a child page (e.g. C above)? Would you end up at B or D? 2 Clearly the only workable solution is for “back” to mean “undo,” and for the browser history to show every page visited, in parity. Or is it workable? It’d be nice for “forward” to mean “redo.” But what does it mean [just to give one of many available troubling examples] to undo the submission of a form?)
Perhaps the real problem is deeper. The web browser as such is a tool for accessing data. It may have grown far beyond its origins as a graphical Gopher, but it’s still, at heart, just a way to navigate a topology of discrete records (pages) in a huge (non-relational) database (the ’Net). But now we’re asked to think of the browser also as an environment in which to run applications. Applications and data, despite the promises of object-oriented programming (irrelevant anyway, since that’s a methodology of software architecture, not interface architecture3), are two entirely different kinds of entities. This means that one program that does both is inevitably going to have, as I just noted, an inconsistent, confusing, unfriendly interface. Blurring the distinction between applications and data under present interface standards only makes things worse. Why not remove the controls entirely and make the browser into essentially a sort of terminal emulator window for remote applications? Or why not go all the way in the other direction and make everything you work with on the computer part of a unified, modeless, totally data-centric interface, like Swyftware and the Canon Cat? (Actually, I’m less than half joking with that last rhetorical question—Jef Raskin’s legacy is the only viable hope I’ve yet seen for a truly new and truly better approach to the UI.)
Jesse James Garrett’s whitepaper that introduced the term “Ajax” posed as an important open question “Does Ajax break the back button?” I’d turn that around: Does the back button break Ajax? That is, is the Web 0.9 interface of the browser a vestigial impediment to writing applications that run well (meaning at the same usability level as traditional non-Web-based applications) in the browser window?
- E.g., as articulated in Chapter 1 of the Polar Bear Book. [back]
- The mirror image of this problem afflicts the implementation of the cd command in bash (the standard shell on Linux). If you are currently in directory X and follow symbolic link S to directory Y, then enter the command “cd ..”, you end up not in the parent of Y but in X again! There is no way to get to Z, Y's parent, without explicitly specifying more of its path than just “..” (i.e. “parent of current”). This is broken beyond belief. Look in any documentation for any command-line interface that includes the cd command (MS-DOS, VMS, Unix shells, whatever) and I guarantee you won’t find “cd ..” explained as meaning “undo.” For it to behave as such is horrifyingly inconsistent. “..” means “up one level in the hierarchy.” Symbolic links explicitly break the hierarchy, but that’s OK: they’re understood to be “hyperspace” shortcuts, like the Secret Passage across the board in Clue that takes the player on a third-dimensional trip outside the board-game Flatland. [back]
- And the dangers of the tendency of programmers, and of companies headed by programmers, to conflate the two are legion. [back]