Pull content from an "old school" tabular design webpage
I've been searching on the web for the best way to do this but I haven't found anything. Basically, I'm redesigning an internal web site for my company to comply with html 4.01 standards and css. I need to grab the content and links from these old pages but the guy who designed the page used tables for design. Some of the pages are long and I don't have time to retype and re-link all the content.
Is there a way to just grab the content and links without all the stupid table layout crap? I would prefer a script or a program to do this.
Thanks,
Comments
I found a website with your suggestion (http://www.rexx.com/~dkuhlman/quixote_htmlscraping.html)
I guess once I get this program down, the actual extraction will be much easer.
Thanks.