Monday, October 10, 2011

You’ve got a strong back—get a job!

Here’s the current roster of personal projects I’m working on in an effort to become more employable:
  • A realtime, multithreaded Scala program which I'm building in stages but which I envision will probably end up at 10-20,000 lines of code--involving realtime computations against a NoSQL database and interaction with a SOAP service.
  • A build-your-own content-management website á là—I’m tinkering with the Django, Drupal, and Catalyst MVC frameworks, trying to decide which will work best while getting some exposure to PHP 5, Python 3, and Moose in the bargain. This will include a MySQL backend, so I’m brushing up on SQL as well.
  • Some Arduino toys including a couple of retro, 8-bit-style Gameduino games to polish my rusty C skills.
  • A slightly more ambitious microcontroller project, this one using an ARM board and the mbed platform to do some real-world data acquisition and control of a mechanical system (that’s deliberately vague: more details will come when I start to get an idea whether it’ll actually work!).

(A gold star for those who know what classic ’90s movie the title of this post is a quote from.)

Tuesday, October 4, 2011

Software engineering methodology: my, how you’ve grown!

“Hey, is anyone working on foo.c in the Whatsit build? I need to make some changes in it.”

That’s what source control consisted of at my first job as a software engineer, fifteen years ago. Bug tracking took place somewhere between the brain of the lead developer for each project and his email inbox. Code reviews consisted of the smirking face of another programmer appearing over the top of my cube, saying, “I found a place where you dereferenced a null pointer. Want me to show you?”

I learned about the classic academic work in software engineering in school as part of my computer science degree: Fred Brooks, the waterfall model, unit testing, and so on. But as far as I knew in 1996, outside of large companies with an established research culture (Intel, IBM, Xerox) none of those methods were being applied. Neither was the general idea behind them, that software development needed planning, structure, and organization, things that could not to be trusted to individual programmers working in isolation.

It’s amazing how much things have changed in the past fifteen years. The classic book The Psychology of Computer Programming was published in the '70s but its effect was purely academic. It took years for the lessons described there to filter down to the level of actual practice, and it wouldn’t have happened without Steve McConnell’s Code Complete. From the latter book grew the agile development movement, a reaction both against the waterfall model (which makes it so easy to postpone testing until it gets pushed off the bottom of the release schedule) and against cowboy development culture (which makes for a host of problems—poor bug tracking, poor source control, reinvention of the wheel as every programmer builds his own libraries, and on and on.).

And I never would have expected the massive growth that has taken place in what I’ll call the collaborative programmer culture. Just as the free exchange of information in general has exploded due to the Internet, so has in particular the exchange of information on how to program better. There’s something going on that’s greater than just more message boards and more acceptance of the open-source “more eyes mean fewer bugs” idea. I remember when the entire O’Reilly library fit on just one of those old-fashioned revolving wire book stands at the best technical bookstore in the world; now it takes four or five stands, not just because there are more programming languages and software technologies to cover but also because the coverage of each is deeper (e.g. compare the matched set of Llama and Camel Books that used to make up the entire printed documentation of Perl, to the dozens of books now published on the language). Code Complete spawned an entire genre of “software as a craft” books like the Pragmatic Programmer series.

All these factors mean, I hope, that a new generation of software developers is growing up with the idea that agile development methodology is not just the correct way to write software but the natural, comfortable one. This is probably the biggest change ever, with the greatest lasting effect, in the way software is created—bigger than the widespread adoption of OOP over straight procedural programming, bigger than the shift to dynamic languages and automatic garbage collection, bigger than Linux making a professional development environment available to students. It’ll be interesting to see its long-term effect... has the looming dragon of the Software Crisis finally been slain?

Friday, September 16, 2011

Determinism and market madness

The past few months of misery in the stock market have been so obviously driven by fundamentals that there’s something a little bit lightheaded and silly about saying a rebound is on the way because of technical factors. Consider the following chart (click to enlarge), which doesn’t even take into account the political and military complexities in North Africa:

Yes, I know, technical analysis is supposed to (depending whose explanation you read about why it works) operate independent of events external to the price of the stock, or perhaps to incorporate those factors in such a way that there’s no need to look at anything but the price; but there are philosophical holes in that explanation the size and stickiness of the La Brea Tar Pits. Leaving aside whether those holes can be patched (and, also, my apologies for the layers of mixed metaphor I seem to be building up here like a stale rhetorical napoleon), it’d be nice if we could look at this market death spiral we’re in and see it as something regular and predictable (and finite!). People operate in the everyday world by matching patterns and proceeding as if those patterns will continue, what Sherlock Holmes always called “deductive logic” but is really inductive; I don’t think they do it by following anything remotely like the rules of formal logic (I think modeling the brain as a computer running a program puts the cart pretty far out in front of the horse), but they do do it in a fairly consistent way. By making that distinction, though, I'm at odds with a lot of people in cognitive science, philosophy, and artificial intelligence.

Three hundred years ago, around the same time that Adam Smith was laying the ground work for those researchers by asserting that we were all members of the supremely logical species Homo economicus, his empiricist contemporaries elsewhere in Western Europe were coming out pretty strong in favor of a pure determinism: the world as giant clock. It’s a seductive way to think, and I admit to having the following quote from the Marquis de Laplace on my fridge:

We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at any given moment knew all of the forces that animate nature and the mutual positions of the beings that compose it, if this intellect were vast enough to submit the data to analysis, could condense into a single formula the movement of the greatest bodies of the universe and that of the lightest atom; for such an intellect nothing could be uncertain and the future just like the past would be present before its eyes.

The thing about Laplace’s brand of determinism is that it’s compelling on a really fundamental level because it’s how we live our lives day-to-day; if it weren’t so compelling, and weren’t so familiar, then why would the head-scratching, insomnia-inducing philosophical discontinuity between it and the concept of free will even be worth talking about? The steady encroachment, warranted or not, by scientists onto the traditional grounds of metaphysics has muddied the popular conception of what the free-will debate is about, by introducing the idea of quantum particles with unknowable and/or unpredictable states. But, woo-woo stuff like that aside, where determinism impacts our lives isn’t in the question of whether you or I, as conscious beings riding around in our heads and looking out of our eye sockets, act predictably to satisfy our needs, but whether all those other eye-socketed things around us act in a way that we can predict well enough to function. And it seems that they do. (This is a finer-grained version of the truism that we all, like it or not, are in practice practicing Rousseauians: without the Social Contract, how would you ever make it through five minutes on the freeway alive?)

In 1980 I was an avid reader of Creative Computing magazine, which, like other great magazines of the early home-computer culture of the late 1970s and early ’80s, published in each issue complete programs (not just illustrative snippets of code) that you could type into your computer, save, and run. Most were games, but many were serious applications, such as primitive databases; Compute! even published a full-featured word processor. Believe it or not, this was until 1990 or so (I think PC World was the last holdout, or maybe one of the Atari- or Amiga-specific magazines) one of the chief ways that non-commercial software was distributed (the other being BBSs). So, late in 1980 Creative Computing printed a program, with accompanying explanatory article, that was supposed to predict the outcome of that year's Presidential election. As a kid I thought this was magic. Via simulation games that had you playing a bullfighter or king of Sumer, I had gained an intuitive grasp of the way that software could model a small part of the real world; but I couldn't make the leap of generalization from those toy models to one that would, accurately enough to be interesting, simulate the behavior of 100 million adults. (Never mind that like most people under the age of 25 or so, I found the behavior of even a single adult totally incomprehensible, period.)

I'm reconstructing this from memory, of course, and I'd love to see a copy of the magazine again, or have someone email me saying that they wrote that program. I imagine that the algorithms used were predicated on some sort of deterministic optimism about voter behavior in the 1980 race, i.e., an assumption that similar factors were present as in prior elections. (Or were there? Carter’s single term is now remembered mostly for an unusually heated level of strife in the Middle East and an economy so screwed-up that a relatively new term of art from economics became common currency just so people could grasp what was happening.) Probably there was also a dash of statistics added in, since if I remember right the program was supposed to narrow in on its best guess as you entered numbers from the earlier-reporting states, just like the TV networks do. (“As X goes, so goes the nation.”) It’s this latter algorithm, or group of algorithms, that most intrigues me. It’s pattern matching of a sort; and pattern matching, no more and no less, is what technical analysis of capital markets is all about. When we see that A usually precedes B, after a while we begin to expect B when we see A; if we get what we expect, our predictive model is strengthened. It’s a little bit circular, but so is all inductive reasoning.

That circularity becomes a problem only when we mistake an A-prime, or a C, or even a Z, for an A, and have become complacent enough that we mistake the B-prime or D or unrecognizable squiggle that follows for a B—and then act as if a B is guaranteed. On that path lies madness, and something worse than madness: losing money.

Thursday, July 7, 2011

Java and the growth of the software business: The more things change...

I’m at a position career-wise where it’s to my advantage to broaden my technical skills before I deepen them any further, so Java is in the to-be-learned queue for real this time (as opposed to in previous posts in which I wrote “I really ought to learn Java” and then went back to banging out backwards ifs in Perl). I’ve been balking at putting Java on the front burner for a while, and for a while I didn’t know why.

Now the reason has finally bubbled to the surface: it’s that Java seemed like a toy back in 1998 or so when I was first programming professionally. Its speed was an issue; the programmers I hung around with were hardcore old-school bare-metal types, and our attitude was that you wrote systems code in C or assembler, applications code in C or C++ (using the Win32 or Microsoft Foundation Classes libraries), and anything else was too slow for serious work. Java’s promise of platform agnosticism seemed like marketing smoke and mirrors, and we couldn’t get past the fact that it was interpreted. (Interpreted? Interpreters were for scripting languages, and you couldn’t write a real application in those, either—more words to eat as Perl and Python matured!)

I’ll get back to the speed issue in a bit. What’s really changed my mind the most about Java is the sheer amount of support for the language in the form of third-party libraries, on top of the staggeringly large, well-thought-out platform that Sun provided from the get-go and Oracle has continued to enlarge and refine. I say “well-thought-out” deliberately; the Microsoft Foundation Classes sounded like a good idea in 1999, but in practice their functionality overlapped with the C++ standard library and the older APIs that Windows programmers were already familiar with, and the typical MFC program ended up being a mashup of all three. (Which is why MS did a good and brave thing when they scrapped all the old cruft of MFC and Win32 and built .NET from scratch.)

The particular example that tipped the balance isn’t so important as such, except that it illustrates the fine-grained libraries that are out there: it’s the jzy3d package, which provides the kind of three-dimensional data visualization that a year ago I was trying to build by hand. And this brings us back to the question of speed: jzy3d is built on top of JOGL, a Java API for OpenGL; and one of the example programs linked from the JOGL site is a port of the game Quake2 that runs in the browser. I was flabbergasted to see that. Quake, for you young-uns, was a ground-breaker in immersive 3-D graphics on its release in 1994 or so; to run it smoothly in high resolution took a high-end CPU (I had a 100 MHz Pentium, pretty good for the time) and a $300 graphics card; it ran best in DOS, needing to hammer on the metal directly for maximum speed; and it’s still a benchmark for low-powered systems (is there a port to the iPhone? I don’t know, but I’d be impressed if there were). As far as I’m concerned, if Java is fast enough to run Quake, then it’s fast enough to run any typical desktop application.

And I think that’s what the majority of the software business thinks, too. Department of Labor stats and the Wikipedia page on software engineering show a doubling in SE jobs since 2004, and that’s despite outsourcing and the 2008 economic crash; what’s more, that statistic doesn’t account for the number of programmers who don’t have the title of software engineer. I’d suspect the true level of growth to be quite a bit greater.

And though I can’t find any information as to what language the majority of these people are coding in most of the time (and if I could I’d suspect it of having been rigged by someone with a vested interest, like Oracle or MS... though here is an attempt from 2006—look at the numbers at the end of each section), I bet Java accounts for much of that growth. The language is just that good, and that effective, and that well-supported. It's enabled many more programming jobs to be created, because it's lowered the barrier to entry. Let me stress that I don’t mean that as a euphemism for “Now you don't have to be as smart to be a programmer, so a lot of stupid people are getting good white-collar jobs that they wouldn't have otherwise.” Rather, a flatter learning curve for mainstream software development is a universal win: it means more jobs, better-working software, and a more robust world economy, thanks to, for one, the continuing adoption of general-purpose hardware running custom software in industries that previously relied on custom and/or archaic hardware for process control. The more things change, the more they change!

Ten years ago people were calling Java “C++ for dummies.” Well, guess what? That isn’t an insult any more. People are coming to realize that C++ can be a real pain in the butt. It’s hard to write maintainable, reliable code in C++. You can do it; I can do it, and I have done so in the past. And I want to make clear that where speed is of the absolute essence, C++ is obviously a better choice (although I think the complexity-vs.-features tradeoff of using C++ vs. good ol’ C ought to be be carefully considered in those cases). There's at least one excellent API for user-level software in general use that works best with C++, Qt, and if there's one thing you can say about Qt programs it's that they are snappy! And, undoubtedly no coincidence, the API is a delight, very clean and lightweight—it reminds me of programming BeOS. (With the speed and memory of the average consumer-level computer being what they are these days, it’s ridiculous that we still have to put up with sluggish “click-and-wait” UIs... but that's a topic for another post.)

In practice, most C++ code, I would bet, is written using a subset of the language with which the programmers on any given project feel more comfortable, ignoring the really tricky features (no multiple inheritance, for example; maybe no operator overloading) and using some sort of integrated garbage collection scheme (overloaded new, for example) rather than relying on the programmer to manage dynamically allocated memory. I didn’t realize until I took a good look at Java (if you must know, I’ve been reading the first 200 pages or so of Java in a Nutshell a bit at a time before bed for a while now) that that comes very close to describing what Java syntax is: a slightly more verbose, less complicated version of C++. (And of course it has garbage collection, although not tunable for the needs of real-time programming, as far as I know.) QED: There’s nothing wrong with that. (Really, who could possibly take offense at replacing =0 with virtual, or : with extends?) Unless you need that last bit of speed that only a true compiled-to-assembler language can give, I can’t see anything wrong with Java as a baseline for applications programming.

Monday, June 20, 2011

Is SQL obsolete?

Lately I've been studying databases and SQL. Not long ago I purchased a large corpus of financial data (historical stock market indices), which came with a front-end application to output the numbers in various ways for easy importation into other software. What’s going on behind the scenes, most likely, is that a relational database is being queried and various views are being produced as flat tables—there's nothing surprising there. But using this software has made me want to know more about database theory, and to understand SQL better, since the latter is basically the lingua franca of databases in the corporate world and everywhere else.

Hence the question in the title of this blog entry. RDMSs are a fact of life—almost universally agreed upon as the best way to store large data collections—but the way we use them has changed: with increased computer power we are now able to query a database in far more complex ways, and far more frequently; this is what makes data mining possible. And general-purpose programming languages have changed equally radically since SQL was developed. In short: if SQL was developed to support a certain kind of database access, a kind that was not easy to do with the programming languages of the time, and a kind that is not much like the kind of access we need now, has its time passed? And if the answer is “yes,” what comes next?

One fact that supports a “yes” vote: First, not only have programming languages changed, but the environments in which we use them have, as well. The designers of SQL didn’t anticipate point-and-click interfaces, let alone the WWW; they probably expected front-end interfaces of some kind, but were thinking only of the green-screen mainframe fill-and-submit forms of the 1970s. The typical web application is a mess. It’s a Frankenstein’s monster of barely adequate tools assembled into an effective but painfully ugly whole: peek behind the curtain of the Great and Powerful Oz of your typical e-commerce site or a Web-based ERP system sitting on a corporate intranet, and you’ll see dynamic HTML full of JavaScript, presenting to the user some buttons that execute PHP code, which queries an RDMS using SQL code, which was written inline in the PHP and is then handed off to the server as a string variable. It works, but, again, it’s a crude, ugly mess. This is the reason behind the push towards more integrated development approaches for Web 2.0 applications—I’m thinking of Ruby on Rails as a particularly fine example, though as I mentioned in the previous post here, I’m leaning towards Catalyst or perhaps a Python-based solution for my own work.

Another point: the languages in which better database front-ends can and should be written lend themselves naturally to direct access to the database. I’ll anticipate objections to the point in the last paragraph by saying that SQL has changed over the past 40 years. Stored procedures are a major difference, and one that can, in part, avoid the clunkiness of having to pass every query to the database as a long sequence of string literals. Extensions like Transact-SQL make stored procedures able to process data in a much smarter way, reducing the number of individual queries the front end has to do. One result is a greatly reduced load on the network between the server and the client, which I'm not saying isn't important! But, in terms of abstraction, stored procedures, I argue, are the wrong direction. (T-SQL in particular falls far short of being a modern programming language: it looks like nothing so much as Fortran77. Wikipedia points out—damning with faint praise!—that it’s Turing-complete. A ball of yarn and a Sharpie are also Turing-complete, but that doesn't mean I'm going to write code to access a database with them.) A far better design is to build the functionality of SQL directly into the front-end language. This is a simple but incredibly powerful idea. Modern programming languages have the capability (i.e., which older languages did not) to effortlessly create and manipulate data structures of essentially unlimited complexity. There’s no reason why the programmer couldn’t specify queries and retrieve data via a library that translates native-language idioms into SQL transparently; going a step further, I don’t see why a language library shouldn’t allow access to the RDMS server directly, exposing the results of a query as a data structure native to the language, without SQL in the middle at all. Neither of these is pie-in-the-sky stuff: for implementations of the former idea, see Rails’ Active Record Query interface, Python’s SQLAlchemy, or Django. Nobody seems to have actually implemented the complete elimination of SQL yet, but there’s one brief but thought-provoking discussion here. And Microsoft's LINQ goes a long way towards breaking down the distinction between different kinds of data storage, providing a unified interface; as do the various wizards in Visual Studio 2010 which allow you to just drag and drop tables from Access into your project.

Part of the reason for the continued use of explicit SQL code constructed by a front-end app and submitted as strings, I suspect (again, sidestepping the speed issue) has to do with security. Stored procedures provide security via encapsulation; explicit code can be more easily checked for attempted exploits such as SQL injection. But the latter is an issue precisely because of the exposure of the database structure at an unnecessarily high level; and the former is what object-oriented languages are all about. (Again, a design feature that SQL needed to provide at the time of its conception but which has now been implemented with far more versatility and elegance in programming languages: OOP and other techniques preach the gospel of “maximize cohesion, minimize coupling” to improve not just design and maintainability of software but also data security.)

I want to make clear that I’m not advocating the elimination of relational databases as such. (Even though arguably most current implementations don’t really follow relational database theory in practice.) What I don’t like is the crude way that we interact with them. I want to see programmer-space access that goes beyond building SQL code with string manipulation operators, then handing it off to the database; I want to see end-user-space access that goes beyond clunky wizards that build a limited subset of SQL queries (always almost, but not quite, the one you really need). The relational model has been adopted almost universally as the standard way to store data sets of arbitrary size, for very good reasons that held true thirty years ago and still hold true today. But let’s make sure our reasons for the way we interact with the relational model still hold true as well.

Sunday, June 19, 2011

Perl Ate My Brain!

Looking back at my earlier posts here I see a lot of optimism about rebooting (an irresistible pun) my ability to write software; a year or so ago, I had a plan to learn some new languages, notably Java, and refresh my knowledge of existing ones, notably C++, by embarking on a few different programming projects. It seems I’ve taken a slightly different route. I’ve done a lot of programming over the past year, but it’s almost all been in Perl. Hardcore Perl fans would say this was inevitable—that the language is so powerful and flexible that, once you start using it, there’s no need to use any other. People with a lower opinion of Perl would be more likely to call it a sickness.

What’s so seductive about Perl? A few things, to start, off the top of my head:
  • The astonishing variety of freely available modules (Perl-speak for libraries) for highly specialized tasks: more than a few times I’ve girt my loins for a difficult bit of coding, only to check CPAN and find that someone else has already done it for me.

  • Flexible built-in data structures and the ability to easily create more elaborate ones when the built-ins won’t do. Perl’s built-in lists, hashes, and references seem to capture the best features of the C++ Standard Template Library, Lisp lists, and hand-built C data structures, with performance (thanks to a highly evolved and optimized interpreter) fast enough for most purposes.

  • Syntax that, once a slight learning curve is overcome, feels natural and is (surprise!) easy to read. Perl’s reputation for looking like line noise comes from two things: the idiomatic use of regular expressions as the most effective way to parse string (and often numeric as well) data, and the way that non-alphanumeric characters are used to mark variable types. The former did encourage “write-only” code in Perl 4, but in the current implementation of the language, you can interpolate comments in regexps to make them as readable as you want. The latter becomes more rational once you realize that the dollar, percent, and at symbols are used to identify a variable’s type not just at declaration, but at use, in context; they aren’t just an uglier cousin of Hungarian notation but a concise expression of an extremely elegant kind of dynamic casting. There are a few rough edges—for example, the backslash and curly braces are overloaded for references (sort of like C pointers) in a way that makes me uncomfortable—but on the other hand, there are only so many non-alphanumeric characters in ASCII. Perhaps Perl 6 should move on to use some interesting Unicode symbols, or (I’m only mostly joking here) something like APL… (In the meantime, the plethora of dollar signs in Perl code gives me a bit of a warm fuzzy feeling, as it reminds me of my childhood programming in 8-bit home computer BASIC, where the dollar sign indicated a string variable.)

I still fully intend to learn Python. Even though he was contrasting immature versions of both languages, open-source guru Eric Scott Raymond’s classic essay on the elegance and ease of Python compared to Perl still makes as big an impression on me now as it did when I read it ten years ago. Perl’s become more elegant since then, and its suitability for large software projects has been greatly enhanced by the Moose project, which transforms Perl’s OOP features to such a degree that it almost creates a separate dialect of the language. On the other hand, Python has loud cheerleaders in the scientific community: my impression is that the NumPy libraries have in just a few years supplanted Fortran as the development environment of choice for numerical analysis of large data sets, something that is a critical programming task for me. And somewhere near the top of my to-do list is to learn one of the new platform-independent Web 2.0 programming frameworks; Ruby on Rails seems to be the most popular, although I privately suspect that’s partly because of its oddly catchy name—but Python-based systems are close behind.

Friday, January 28, 2011

In praise of mailing lists

I've recently been on the hunt for mailing list software that can be set up without superuser access, since my webspace provider (of course) doesn't allow the latter. I don't want to leave a home machine on 24/7 to run a list server if I can avoid it; and if I did, neither of the traditional options, GNU mailman and majordomo, would fit the bill: they're way too over-featured and tricky to set up. (A bootable Linux CD that runs a preconfigured listserv and nothing else, analogous to the ones that set up a machine automatically as a firewall or router, would work, but I haven't seen any such.) All I need is a way to setup one email address as a reflector to a list of addresses, i.e. subscribers. I might have to roll my own and set it up as a cron job on the web server—there are Perl modules to get mail using POP3 and send it using SMTP, and I could even with a little bit of effort implement automatic creation of searchable dynamic archive pages.

What's interesting is that in the process of some extensive Googling trying to find a prefab solution for what I need, I came across quite a bit of discussion the tone of which was that mailing lists are obsolete: a curious legacy of Web 1.0 that has no place in the new world of message boards let alone social media. For example, academia is one of the traditional strongholds of mailing lists (and if I recall correctly, among the first pieces of advice on Internet etiquette that I read online, some twenty years ago, was a page like this one explaining how to behave on a list); yet this author strongly suggests forgoing a mailing list in favor of a LinkedIn board or Facebook page for people active in nonprofit organizations (whose online-communication needs, I imagine, are similar to those of academics).

I know I'm a bit old-fashioned, but for me mailing lists are an essential tool for the particular way that I use the Internet as a social medium. Reading new messages as they come in is a lot easier than remembering to regularly check a message board, Facebook wall, etc. for new posts. All my mailing lists' messages come to the same GMail address, and some of the features of GMail seem tailor-made for mailing lists: in particular, grouping messages by subject (so that threads stay together in a neat, tidy, click-once-to-read-me collection), and automatic labeling (again, it just takes one click to see all the conversations belonging to one list's universe and no others). And while archiving messages for later reference is my responsibility, I can also be sure that the ability to retrieve them is as reliable as my email is in general—with message boards, the server could go down at any time, losing historical posts, or a restructuring of the database could break links to old posts (both have happened to me in the past couple of years, with the result that I now am forced to print out and file a hard copy of every useful message as soon as I see it, with obvious drawbacks).

(Ironically, you can host a mailing list via Facebook—but how many people use that feature, I wonder?)

Maybe I'm not quite such a dinosaur: here's a lovely essay on Slate from only six months ago, arguing that
e-mail lists—especially off-the-record lists—are better than Twitter, Facebook, and Tumblr at fostering a sense of community and generating deep, thoughtful conversations.
That's exactly my experience.