Friday, July 16, 2010

Does your installer do dependency checking?

As I threatened to do a few posts ago, I've been working on some graphical applications written in Perl and using the Simple DirectMedia Layer. I got a bit of practice code written, but not tested, a few days ago—a 3-D plot of a trig function—only to find that the software that allows Perl code to talk to the SDL libraries wasn't actually installed on my machine. OK, I thought, so that's not part of the default Perl install on OS X—I'll just find it and install it.

Easier said than done! First I had to get the entire developer kit (2+ Gb of downloads, and a couple of hours to install) from Apple. (I can see excluding the GUI developer tools, XCode and friends, from a default OS X install—but why the command-line C compiler and a basic set of libraries and header files? They'd take up a negligible amount of disk space.)

Next came installing the actual SDL libraries, then the Perl-to-SDL interface. I first tried Fink, which aims to provide one-stop shopping for .deb packages à la the various GUI frontends to apt in different Linux distros. Fink presents you with a window of available packages (which updates every time you open it, naturally) and downloads the ones you choose either as an OS X binary if one exists or as source if necessary (then configuring the latter to build under Darwin, compiling it, etc.). Fink has worked well for me in the past, but in this case though the list of available packages included SDL, it didn't include SDL-Perl. Fine, I thought, I'll tell Fink to install SDL, then install SDL-Perl via the command-line CPAN installer module included with Perl. The former worked OK; the latter didn't. The CPAN installer module couldn't deal with the many modules that were prerequisites to installing SDL-Perl, so I gave up and started doing everything by hand. This quickly led me into a "dependency hell" wherein every tarball I downloaded seemed to require that something else already be installed. About here is when I started to feel like banging my head against the keyboard. THIS SHOULDN'T HAPPEN. The problem of package dependencies on Unix-like systems was solved 15 years ago when the first Debian Linux distribution was released. I remember installing Debian 1.something on my first Linux box, which had previously been running Red Hat with its markedly inferior .rpm package system, and marvelling at how smoothly the install process went despite my idiosyncratic selection of software from the packages available.

Then I tried MacPorts, a similar front end to open-source software, mostly command-line and X Window stuff configured for Darwin. After installing it from a .pkg file (a self-installing Mac OS X package format) I typed just one line in a terminal window (MacPorts so far has no GUI front end):

sudo port install p5-sdl_perl
and was rewarded with a couple of hundred lines of beee-you-dee-ful status info scrolling by; extracting this, checksumming that, and compiling the other thing, as MacPorts automagically set up the necessary prerequisites and finally SDL-Perl itself. When I tried to run my Perl script again, it Just Worked.1,2

Moral: No matter whether your software is the most user-friendly GUI application ever, or some kind of arcane tool only to be used by superhackers; if you don't implement (at the very least) dependency checking or (preferably) automatic dependency resolution, you deserve forty lashes with a wet noodle, as Ann Landers used to say.

  1. Well, almost—I had to tell Perl where to find the new modules, but that was both trivial and something I needed to know how to do anyway. [back]
  2. Check it out! (Not impressed yet? Hey, it's just proof of concept.) [back]

Wednesday, July 14, 2010

Webscraping awful JavaScript, part II

I edited the previous post to remove the link to the website I'm trying to scrape data from, in order to protect the guilty. What I am dealing with there is about 8000 lines of JavaScript, of which, I think, roughly 6000 lines comprise multiple blocks of code that are identical except for a loop condition, comparison, or other minor change. It's a classic if ugly C idiom, and sometimes unavoidable in a language so primitive. In a higher-level language like Java, C++, or JavaScript, it's the mark of a tyro—especially in JavaScript, which has first-class functions, which could be passed as arguments to a single instantiation of the repeated code.1

Luckily I can sidestep the mess and just figure out the data-parsing code.

  1. Douglas Crockford, in JavaScript: The Good Parts (hilariously, a little less than one-fifth the length of JavaScript: The Definitive Guide), argues that first-class functions are the best and most important thing about it: they make the language essentially "LISP in C's clothing." [back]

Thursday, July 8, 2010

Reverse-engineering dynamically-created JavaScript

This is interesting: a page I want to webscrape some options price data from appears to be entirely created dynamically by JavaScript code which itself is created dynamically by an unknown CGI backend (probably PHP). That seems a little bit kludgey, but I understand the reasoning behind it; the page is interactive, but there's a lot of data that can potentially be displayed. This way there is one big server hit when the page is first loaded (to get a snapshot of all the options data)—presumably that CGI code is querying a database—and the JS just displays the data or not as the user clicks show/hide for each stat or each block of options prices.

I can get the data I need by parsing the JavaScript code, if I can figure out how that code parses its data strings (i.e. the data "passed" to it by the underlying CGI code) for display; luckily, the JavaScript string-manipulation methods seem to be modeled closely on Perl's.

I'll then have the data in an elaborate Perl data structure, and can manipulate it as I see fit.

Monday, July 5, 2010

My take on "best practices" for hiring programmers

Do the majority of managers hiring for programming jobs think like Justin James?

I can't find a page to link to right now that sums it all up, but I have the impression that the way the wind is blowing right now in the programming job market, there are a lot more jobs than there are qualified people. The number of computer science graduates is way down now from a peak ten years ago during and shortly after the so-called dot-com boom, and while the slack has somewhat been taken up by people who, however well-intentioned, think they should be instantly hireable because they whizzed through Teach Yourself Javascript in 24 Hours in only eight, that's little consolation.

As a hiring manager, I assume you'd want first of all to screen the latter group out; optimally, you'd want to select from what remained (perhaps 25% of applicants, charitably) the tiny fraction (less than 1% overall) of truly smart, capable, competent people. But James' approach in the blog post I linked to above seems designed to hire not the great but only the good enough. What I'm bothered by in particular is James' expectations for the candidate's understanding of technologies inside and outside of the mainstream. It's a long quotation but I think it's worth it:

I am not hiring Lisp, Prolog, Erlang, APL, Scheme, Clipper, PowerBuilder, Delphi, Pascal, Perl, Ruby, Python (forgive me for including those four in this list), Fortran, Ada, Algol, PL/1, OCaml, F#, Spec#, Smalltalk, Logo, StarLogo, Haskell, ML, D, Cobra, B, or even COBOL (which is fairly mainstream) developers. If you show these on your resume, I will want to interview you just for the sake of slipping in a few questions about these items. I am serious. As part of my secret geekiness, I am really into obscure and almost obscure languages and technologies. I know that a lot of those items take better-than-industry-average intellect and experience to do; they also provide a set of experiences that gives their practitioners a great angle on problems. While you will never directly use those skills in my shop, you will be using those ways of thinking, and it will give us something to talk about on your first day.

There's more than a whiff of ambivalence here, isn't there? Anything in that list is "obscure" and "you'll never directly use those skills," although "you will be using those ways of thinking." The subtext here is that a good programmer, a hireable programmer, is one who knows the languages and technologies that are currently the most popular (prima facie the stuff that isn't in James' list in the quotation above); anything even a little bit outside of the mainstream isn't a marketable skill, though it might make for good talk around the water cooler. And James is probably about as liberal and accommodating as it gets. Another manager might be more plain-spoken: "Well, it's nice if you know LISP, 'cause that shows you're a geek who programs for fun when he goes home; but you really need to know dot-NET 'cause that's where the money is."

As I suggested a few paragraphs back, I object to this mentality because it places a premium on being a cut-'n'-paste programmer over being one who actually thinks about what he or she is doing; on being a good-enough programmer who understands enough of the current flavor-of-the month technology to get by, rather than being something qualitatively different, the great programmer who knows that technology and its limits, but also knows others and their limits; it's like the difference—yeah, I know these programming-as-cabinetry-or-whatever-physical-craft analogies have been done to death, but bear with me—between someone who puts together furniture on an assembly line, versus someone who hand-crafts each piece individually; maybe it's even the distinction between knowledge and wisdom.

Grand claims? Impractical stuff? Expectations with nothing to do with the reality of the job market? Well, go read Joel Spolsky's blog post on the same subject, and get back to me.

Done? OK. What I wanted you to get from that is that Spolsky also hires programmers; and he takes one sentence to dismiss James' emphasis on job candidates' knowing the flavor-of-the-month tech. For Spolsky, being a good programmer is all about aptitude. What skills should a good programmer have? Wrong question. Distinguish "skills" from "familiarities," and discard the latter; what programmers need, among other things, is a knack for simultaneous application of logic (the real "guts" of programming whatever language or library you're using) and creativity (because the best solution to a software problem is so often outside the box). This doesn't mean lack of hands-on technical knowledge— quite the contrary: it's a wide knowledge of tools, and when to use the right one for the job, that makes that famous 10x difference in productivity that a great programmer has over a good one 1 manifest. It's what distinguishes software designers from software engineers; programmers from code monkeys; IT from CS (pick your terminology). As in the old saying, it's the state of being a man who's been taught how to fish.

So, let's say I'm a hiring manager. What kind of concrete test can I use in the real world to determine whether an applicant is a great programmer, or has what it takes to become one (probably an even more preferable case since I'll then have a hand in their professional growth and can shape it to my needs—oh dear,that sounds more sinister than I meant it to!)? Interestingly, Spolsky in the post I linked to above dismisses out of hand the kind of brainteasers that Microsoft interviews used to be (maybe still are?) famous for (so much so that a micro-industry sprang up with its own corner in the job-searcher section at Barnes & Noble, dedicated to help prepare you for the inevitable). He does offer a couple of positive suggestions: make the candidate write some real code on the spur of the moment (although I'd be a little less stringent about the requirements than he is, and a little more forgiving of bugs); ask the candidate to discuss (not "solve") a story problem, what Jon Bentley in Programming Pearls referred to as a back-of-the-envelope calculation.2

Those are good, but I can think of a couple of screening tests that perhaps bring Spolsky down to earth—that is, they speak to the candidate's approach to programming, but they also explicitly involve the tools of software development.

  • One would be something like—if hiring for a web frontend position—"What text editor would you use for making some quick changes in a CSS file?" The answer "What's a text editor?" would be an immediate fail. But so would "Notepad." Why? Because it indicates a lack of the programming "gene" in a couple of different ways: a lack of understanding of the toolset available to today's programmer (what? you mean there are other text editors?); an indifference to version control (it won't break anybody else's code if I just make this little change); a lack of sufficient creativity to see that a tool like Notepad could be vastly improved for the specific needs of programmers (source code coloring? paren matching? huh?).
  • Another would be to test the applicant's willingness to use appropriate technology specifically in their code. Again, this is job- and applicant-skill- dependent to an extent, but here's an example: Applicant has recently graduated from a well-regarded CS program. I describe to him/her a scenario in which a client, having been happy with a custom-built interactive software package for manufacturing process control sold them by our company, now wants to be able to automate the process further by integrating some kind of scripting language. Does the interviewee say, "Well, if it's a command-line tool, a shell script should do it; if it's an interactive text-based application, that's what Expect is for; if it's a graphical application things get more complicated." Or does he/she say, without missing a beat, "Lex and Yacc!"? Or, "Well, how complicated a language are we talking about? Lex and Yacc are probably overkill; I could build a little parser more quickly by hand, and then just traverse an abstract syntax tree in memory... [followed by a flurry of thinking aloud how that might be done]." Any of those three could be right, depending. But anything that smacks of reinventing the wheel would be wrong.3
  • A third would be to ask the applicant to describe a hobby or other spare-time interest that involves problem-solving, and prompt until you hear a sufficient level of detail that indicates that this person has his logic hat on not only when programming but at other times, and that it feels natural. For instance, I like to work on old cars, specifically European cars from the '80s, for fun; and if you get me started I will talk your ear off about how those kind of cars are modern enough to have electronic engine controls but not modern enough to have the kind of computerized self-diagnostic abilities that new cars have, which presents a challenge, when the car doesn't run right, that is a lot like debugging code. But I'm not sure questions like that about the candidate's life outside the job can be presented in a non-EOE-violating way.

  1. Originally from The Psychology of Computer Programming, I believe, this number has been pretty much taken as fact since the first edition of McConnell's Code Complete. In other words, don't blame me. [back]

  2. Besides the Bentley book (and its sequel, More Programming Pearls), a great source of these is John Paulos' book Innumeracy. [back]

  3. At a former job I had in which several programmers were working on a C++/MFC app, the need arose to do some complicated pattern matching in an input string. One programmer went home and spent three hours after dinner writing some character-by-character algorithms, basically C string manipulation at its goriest. The other added a public-domain regular expression library to the project, then wrote, debugged, and thoroughly tested a regexp to do the same thing—in 15 minutes. There's your order-of-magnitude difference in productivity. [back]