`templatemaker` can extract data from similarly formatted strings (or websites)

(holovaty.com)

templatemaker looks like a useful Python library to extract the unique data on web pages.

From Adrian's blog post:

Well, say you want to get the raw data from a bunch of Web pages that use the same template -- like restaurant reviews on Yelp.com, for instance. You can give templatemaker an arbitrary number of HTML files, and it will create the "template" that was used to create those files. ("Template," in this case, means a string with a number of "holes" in it, where the holes represent the parts of the page that change.) Once you've got the template, you can then give it any HTML file that uses that same template, and it will give you the raw data: "The value for hole 1 is 'July 6, 2007', the value for hole 2 is 'blue'," etc.