摘要 |
An efficient method for parsing HTML pages identifies pages containing a mix of static and dynamic content. The pages are parsed to form abstract syntax trees (ASTs), which are then cached along with the pages. When a later version of a page is retrieved, it is compared against the cached version, and only those portions of the AST that contain different content are reparsed.
|