发明名称 DETERMINING SEGMENTS FOR DOCUMENTS
摘要 A document is received for segmentation. The document includes multiple atomic textual units in a sequence. These units may correspond to sentences, phrases, paragraphs, concept phrases, chapters, etc. A distance function is selected that determines a distance between one set of atomic textual units and another set of atomic textual units. The distance between the sets is large for sets that are dissimilar, and small for sets that are similar. The distance function is applied to the atomic textual units to separate each of the atomic textual units into multiple segments, while maintaining the sequence of the atomic textual units.
申请公布号 US2016070692(A1) 申请公布日期 2016.03.10
申请号 US201414482015 申请日期 2014.09.10
申请人 Microsoft Corporation 发明人 Kenthapadi Krishnaram;Kannan Anitha;Gollapudi Sreenivas
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项 1. A method comprising: receiving a document comprising a plurality of atomic textual units, by a computing device; receiving a distance function by the computing device, wherein the distance function takes as an input a first sequential subset of the plurality of atomic textual units and a second sequential subset of the plurality of atomic textual units and outputs a distance between the first and the second sequential subsets; and determining a plurality of segments of the document using the distance function by the computing device, wherein each segment includes a sequential subset of the plurality of atomic textual units.
地址 Redmond WA US