Sunday, January 29, 2012

A tool to help understand web pages

With a little Ruby, a little GNU, and some reading, I built a small utility to help me understand web pages.

The utility takes the HTML, picks out the links, and draws a map of the web site. It shows which pages link to other pages. (It works only for static pages, not dynamic ones.) The result is a PostScript image with pages listed and arrows connecting the pages.

I can use this to navigate web pages and get a view from a height. I can also find 'orphan' web pages, pages that are not linked.

I used a lot of off-the-shelf components: Ruby and its built-in functions for parsing HTML, GNU sort, and GraphViz. The components do the heavy lifting, and save me a lot of time.

It was a good exercise. I learned a lot, and now I have a useful tool!

No comments:

Post a Comment