主权项 |
1. A method of collecting learning materials for informal learning, the method comprising:
detecting an addition of an item to a curation; extracting one or more links from a page referenced by the item; downloading pages corresponding to the one or more links; filtering the downloaded pages to generate candidate index pages by excluding a subset of the downloaded pages not having links that point back to the page referenced by the item; detecting information blocks containing links pointing back to the page referenced by the item in one or more of the candidate index pages; locating a primary information block in one or more of the candidate index pages based on a page structure analysis; performing a uniform resource locator (URL) structures analysis of the one or more of the candidate index pages that include the links pointing back to the page referenced by the item in the primary information block; based at least partially on the URL structures analysis, identifying an appropriate index page from the candidate index pages; locating a primary information block in the appropriate index page, the primary information block including a portion of the appropriate index page where a majority of substantive information is contained; and generating an automated extraction rule configured to direct a system to the primary information block of the appropriate index page. |