The present invention provides a system which is able to detect similar web page elements which are described in mark-up language, such that the content of those elements can be captured. Text content may then be sent to a text classifier for further analysis.