A computer-implemented method of extracting key phrases from a document is disclosed comprising the steps of accessing a repository comprising linked subjects, the repository comprising first and second data structures representing the relationship between said subjects using different representation criteria; pruning the first data structure by removing links between subjects based on a further relationship between said subjects in the second data structure; matching phrases in said document to subjects in the pruned first data structure; further pruning the pruned first data structure by removing unmatched subjects that are not linked to matched subjects; determining a ranking for each matched subject; and selecting key phrases using the determined subject rankings. A computer program for implementing the steps of this method when executed on a computer is also disclosed.