Thursday, July 8, 2010

Reverse-engineering dynamically-created JavaScript

This is interesting: a page I want to webscrape some options price data from appears to be entirely created dynamically by JavaScript code which itself is created dynamically by an unknown CGI backend (probably PHP). That seems a little bit kludgey, but I understand the reasoning behind it; the page is interactive, but there's a lot of data that can potentially be displayed. This way there is one big server hit when the page is first loaded (to get a snapshot of all the options data)—presumably that CGI code is querying a database—and the JS just displays the data or not as the user clicks show/hide for each stat or each block of options prices.

I can get the data I need by parsing the JavaScript code, if I can figure out how that code parses its data strings (i.e. the data "passed" to it by the underlying CGI code) for display; luckily, the JavaScript string-manipulation methods seem to be modeled closely on Perl's.

I'll then have the data in an elaborate Perl data structure, and can manipulate it as I see fit.

No comments:

Post a Comment