发明名称 Systems and methods for determining the topic structure of a portion of text
摘要 Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to "topics", latent variables in the PLSA model, and "topics" to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.
申请公布号 US2003182631(A1) 申请公布日期 2003.09.25
申请号 US20020103053 申请日期 2002.03.22
申请人 XEROX CORPORATION 发明人 TSOCHANTARIDIS IOANNIS;BRANTS THORSTEN H.;CHEN FRANCINE R.
分类号 G06F17/27;G06F17/30;(IPC1-7):G06F17/24 主分类号 G06F17/27
代理机构 代理人
主权项
地址