Phrase clustering,申请号US201213596678-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	Phrase clustering
摘要	Systems and associated methods for enhanced concept understanding in large document collections through phrase clustering are described. Embodiments take as input an initial set of phrases and estimate centroids using a clustering process. Embodiments then generate new phrases around each of the current centroids using the current phrases. These new phrases are added to the current set, and the clustering process is iterated. Upon convergence, embodiments finalize clusters based on phrases of any given length.
申请公布号	US8880526(B2)	申请公布日期	2014.11.04
申请号	US201213596678	申请日期	2012.08.28
申请人	International Business Machines Corporation	发明人	Bhattacharya Indrajit;Godbole Shantanu Ravindra;Sharma Akshit
分类号	G06F17/30	主分类号	G06F17/30
代理机构	Ference & Associates LLC	代理人	Ference & Associates LLC
主权项	1. A method for phrase based clustering comprising: utilizing at least one processor to execute computer code configured to perform the steps of: accessing a collection of items to be clustered; receiving an initial set of phrases as input; clustering the collection of items to be clustered using the initial set of phrases to create centroids; generating a new set of phrases around the centroids; adding the new set of phrases to the initial set of phrases to produce a combined set of phrases; and re-clustering the collection of items to be clustered using the combined set of phrases; wherein said generating of a new set of phrases around the centroids comprises: finding high weight words in a context vector for a centroid; finding existing phrases that appear around words of a centroid; and pruning phrases that do not have high weight for at least one of the words of the centroid; said pruning comprising: generating a higher-order phrase via combining two lower-order phrases, each of the higher-order phrase and the two lower-order phrases comprising a context vector; and employing a monotonicity property, wherein the higher-order phrase has high weight for a word in its context vector if both of the lower order phrases individually each have high weight for the at least one word in their context vectors.
地址	Armonk NY US

您可能感兴趣的专利

VIOLET COLORING MATTER, COLORANT, DYE AND PIGMENT EACH CONTAINING THE VIOLET COLORING MATTER

METHOD FOR THAWING AND PRESERVING TUNA MEAT

METHOD FOR SCATTERING AGROCHEMICAL AND DEVICE THEREFOR

BUTT RECEIVER AND WAIST BAG HAVING THE SAME

BASE FILTRATION EQUIPMENT

POWER TRANSMISSION STRUCTURE OF COMBINE HARVESTER

RIDGE-LEVELING MACHINE

HOLDER FOR RECEIVING JIGGLE FISHHOOK

LOTTERY DEVICE AND GAME MACHINE PROVIDED WITH THE SAME

Dynamic IP addressing and quality of service assurance

Writing elements which connect together

Open/close body control equipment and method

Petunia plant named MP7'

Integrated filter with improved I/O matching and method of fabrication

Protecting device for a teat localizer

Anthurium plant named Red Miracle'

Method and system for selection of mode of operation of a service in light of use of another service in an ADSL system