Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Sunday, June 19, 2011

Perl Ate My Brain!

Looking back at my earlier posts here I see a lot of optimism about rebooting (an irresistible pun) my ability to write software; a year or so ago, I had a plan to learn some new languages, notably Java, and refresh my knowledge of existing ones, notably C++, by embarking on a few different programming projects. It seems I’ve taken a slightly different route. I’ve done a lot of programming over the past year, but it’s almost all been in Perl. Hardcore Perl fans would say this was inevitable—that the language is so powerful and flexible that, once you start using it, there’s no need to use any other. People with a lower opinion of Perl would be more likely to call it a sickness.

What’s so seductive about Perl? A few things, to start, off the top of my head:
  • The astonishing variety of freely available modules (Perl-speak for libraries) for highly specialized tasks: more than a few times I’ve girt my loins for a difficult bit of coding, only to check CPAN and find that someone else has already done it for me.

  • Flexible built-in data structures and the ability to easily create more elaborate ones when the built-ins won’t do. Perl’s built-in lists, hashes, and references seem to capture the best features of the C++ Standard Template Library, Lisp lists, and hand-built C data structures, with performance (thanks to a highly evolved and optimized interpreter) fast enough for most purposes.

  • Syntax that, once a slight learning curve is overcome, feels natural and is (surprise!) easy to read. Perl’s reputation for looking like line noise comes from two things: the idiomatic use of regular expressions as the most effective way to parse string (and often numeric as well) data, and the way that non-alphanumeric characters are used to mark variable types. The former did encourage “write-only” code in Perl 4, but in the current implementation of the language, you can interpolate comments in regexps to make them as readable as you want. The latter becomes more rational once you realize that the dollar, percent, and at symbols are used to identify a variable’s type not just at declaration, but at use, in context; they aren’t just an uglier cousin of Hungarian notation but a concise expression of an extremely elegant kind of dynamic casting. There are a few rough edges—for example, the backslash and curly braces are overloaded for references (sort of like C pointers) in a way that makes me uncomfortable—but on the other hand, there are only so many non-alphanumeric characters in ASCII. Perhaps Perl 6 should move on to use some interesting Unicode symbols, or (I’m only mostly joking here) something like APL… (In the meantime, the plethora of dollar signs in Perl code gives me a bit of a warm fuzzy feeling, as it reminds me of my childhood programming in 8-bit home computer BASIC, where the dollar sign indicated a string variable.)

I still fully intend to learn Python. Even though he was contrasting immature versions of both languages, open-source guru Eric Scott Raymond’s classic essay on the elegance and ease of Python compared to Perl still makes as big an impression on me now as it did when I read it ten years ago. Perl’s become more elegant since then, and its suitability for large software projects has been greatly enhanced by the Moose project, which transforms Perl’s OOP features to such a degree that it almost creates a separate dialect of the language. On the other hand, Python has loud cheerleaders in the scientific community: my impression is that the NumPy libraries have in just a few years supplanted Fortran as the development environment of choice for numerical analysis of large data sets, something that is a critical programming task for me. And somewhere near the top of my to-do list is to learn one of the new platform-independent Web 2.0 programming frameworks; Ruby on Rails seems to be the most popular, although I privately suspect that’s partly because of its oddly catchy name—but Python-based systems are close behind.

Wednesday, June 2, 2010

"Now seems like a good time," I said to myself...

..."to get those rusty programming skills going."


I had found myself wanting to do some analysis in Excel of price behavior of a large list of stocks.


I glanced at the first few pages of Perl and LWP, and then at the Regular Expressions Pocket Reference; I opened Firebug on the Yahoo Finance "summary page" for a stock I was interested in, so that I could see the raw HTML I was dealing with; and wrote the following:1



#!/usr/bin/perl
use LWP::Simple;

# Expects a list of security symbols on standard input, one per line.

print("Symbol\tPrevClo\tOpen\tLast\n");

while ($sym = <>)
{
chop $sym;
$summary=get("http://finance.yahoo.com/q?s=$sym");
die "Couldn't get Yahoo Finance Quote Summary page for symbol $sym!"
unless defined $summary;

$summary =~ m/>Prev Close:<.*?>(\d+\.\d+)</;
$prevclose = $1;
$summary =~ m/>Open:<.*?>(\d+\.\d+)</;
$open = $1;
$summary =~ m/>Last Trade:<.*?>(\d+\.\d+)</;
$last = $1;

print("$sym\t$prevclose\t$open\t$last\n");
}



It worked the first time—not bad for not having done any programming whatsoever for about five years and nothing of significant size for ten. (Yes, I know it's not very idiomatic Perl—combining the match regexps and doing a few other things would probably cut the line count in half.) That code took a few hours to produce, but subsequent similar programs to web-scrape other pages took much less time, now that I was in the groove.

For example, more exciting was the following, which expects the same list of symbols:




#!/usr/bin/perl
use LWP::Simple;

print "<html>";

while ($sym = <>)
{
chop $sym;

print "<font size=5>$sym</font><br>";

foreach $period ("1d","1w","1m")
{
getstore("http://ichart.finance.yahoo.com/z?" .
"s=$sym&t=$period&q=l&l=on&z=m&p=e5,e20&" .
"a=p12&lang=en-US&region=US",
"$sym$period.png");
print "<img src=\"$sym$period.png\"/>";
}

print "<br><br>\n";
}

print "</html>";



What I'm doing here, if it isn't clear, is scraping a number of security price charts from Yahoo Finance, saving the image files locally, and building a crude but effective web page to make them viewable in one place. Beats looking at each stock by hand for price trends, let me tell you!


Now, all of this may seem like "Hello World" stuff to anyone reading this who's had any programming experience beyond Computer Science 101. But what I think shouldn't be taken for granted here is the amazing ability to (in the first script, as the simpler example) in just a few lines of code, suck an entire web page into a string variable, search that string in a complex way, and output the result in a universally readable (i.e. by humans or other programs) format. We used to want applications to have built-in programming languages—now (and here's the takeaway!) we have programming languages with built-in applications: very-high-level functionality to do things that only applications used to be able to do. And we can do them in a scriptable, redirectable, programmatic way. Admittedly much of this is due to the straightforward API of the LWP module (Perlspeak for "library"); but I'd argue that that accessibility is a function of the design of Perl; there's obviously stuff going on there behind the scenes that would be much harder to write in a language without such integral support for string manipulations (C, say).


I was weaned as a programmer on 1980s consumer 8-bit machines, the multimedia powerhouses of their day, on which even in a high-level language (built-in BASIC), to do anything interesting you had to twiddle bits. And most of the software I've been paid to write has been low-level stuff in C—device drivers and the like. So I'm easily impressed and easily seduced by VHL (very-high-level) languages that let you do so much with so little typing. Of course there's danger inherent in only knowing high-level languages. When you don't understand what's really going on at the machine level, optimization can be much more difficult, for example. Ironically, though, even as undergraduate computer science programs deemphasize C and assembler skills and move their students towards Java, C#, .NET, PHP, and so on—preparing them more effectively for the kind of web-back-end-database-interface work 9 out of 10 of them will face as new programmers—even as this huge and largely unremarked shift in what it means to be a professional computer programmer takes place, hobbyists tinker with microcontrollers, programmed at as low a level as you want, to recapture some of that early-80s frontier-machine-code feeling. Some will call this retrograde or Luddite-ish but the truth is, I think, that controlling hardware directly with one's code fulfills some kind of deep need in the engineering personality to exercise maximum control over one's immediate universe; and there's nothing wrong with the practical experience gained thus: few programmers will ever write an operating system, true, but there will always be lesser software that needs to run "close to the metal." (A $5 pocket calculator will never run a Java interpreter, for instance. I think...)


Getting back to my own programming for my own use and profit, far more complicated and wonderful things will come in time. I'm comfortable using Perl for this kind of stuff, but have never written a program of any serious size in it. What little user-level software development I've done has been fairly strictly object-oriented code in C++. I only know how to use Perl procedurally; understanding the OOP features of the language, which seem to be highly regarded, would be a good thing to have under my belt.


On the other hand, I have had a strong hankering to learn Python, thanks to what seems to me to be a very elegant syntax. And I have the book A Primer on Scientific Programming with Python which—while I'm quite sure that somewhere on CPAN there's a module to support in Perl the same kind of computations I need to do—numerical integration and differentiation, curve fitting, linear regression, etc.—is an excellent tutorial for Python in general besides describing the appropriate libraries in detail. Lastly, the Beautiful Soup library looks like an even cleaner way to do webscraping.


  1. How the heck do you format code nicely (i.e. not just in a non-proportional font but also indented correctly, lines that overrun the margin indicated clearly, and with symbols correctly escaped) in idiomatic HTML these days? Yeah, I know there's the <pre> tag, but it doesn't help you with lines that run past the right edge of your text frame (or wherever your body text is going), and you still have to festoon your code with &whatever-entity tags to escape all the non-alphanumeric characters.[back]