This forum is in permanent archive mode. Our new active community can be found here.

Pull content from an "old school" tabular design webpage

edited September 2007 in Technology
I've been searching on the web for the best way to do this but I haven't found anything. Basically, I'm redesigning an internal web site for my company to comply with html 4.01 standards and css. I need to grab the content and links from these old pages but the guy who designed the page used tables for design. Some of the pages are long and I don't have time to retype and re-link all the content.

Is there a way to just grab the content and links without all the stupid table layout crap? I would prefer a script or a program to do this.

Thanks,

Comments

  • Sounds like a job for regular expressions.
  • I looked at the Wikipedia article for Regular Expression. It sounds like it will do the job but I have very little (i.e. none) experience in writing regular expressions. Any hints on how to get started?
  • I looked at the Wikipedia article for Regular Expression. It sounds like it will do the job but I have very little (i.e. none) experience in writing regular expressions. Any hints on how to get started?
    It's hard. Start doing research on "HTML scraping". You are going to need to write programs.
  • Ahhh.... I hate programming.

    I found a website with your suggestion (http://www.rexx.com/~dkuhlman/quixote_htmlscraping.html)

    I guess once I get this program down, the actual extraction will be much easer.

    Thanks.
Sign In or Register to comment.