Saturday, July 12, 2008

RegEx for extracting in HTML

Today I am at the Ann Arbor Give Camp - a weekend of coding frenzy where about 60 geeks are holed up in Washtinaw Community College developing web sites and web apps for charitible institutions and other good causes.



One of my tasks was to design a Master Page template from a mockup which was done in Publisher, and had tons of vml and office code. I needed to extract images and a rightClick-saveAs did not work (because of scripting, vml, etc.) All images were in the index_files folder, with a "image####.gif" filename. So I dumped the code in Expresso, and ran a match using the regex:
index_files\/image\d{1,4}\.gif,
Viola! I got a list of match which I copied and pasted in an HTML and downloaded the images.

No comments: