CONTENT BASED SIMILARITY DETECTION,申请号US201414219613-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	CONTENT BASED SIMILARITY DETECTION
摘要	Content Based Similarity Detection. A computer implemented method includes computing a hash of each word in a collection of books to produce a numerical integer token using a reduced representation and computing an Inverse Document Frequency (IDF) vector comprising the number of books the token appears in, for every token in the collection of books. The method also includes creating a token occurrence count vector for each book in the collection and normalizing the token occurrence count vector using the IDF vector to create a Term Frequency-Inverse Document Frequency (TF-IDF) vector. Further, the method includes reducing each TF-IDF vector by using random projections to obtain a final signature representing each book in the collection, reducing each TF-IDF vector by using random projections to obtain a final signature representing each book in the collection and using a trained machine learning algorithm, determining whether each of the list of candidate books is similar to the target book.
申请公布号	US2015154497(A1)	申请公布日期	2015.06.04
申请号	US201414219613	申请日期	2014.03.19
申请人	Kobo Incorporated	发明人	BRAZIUNAS Darius;CHRISTENSEN Jordan;GIVONI Inmar Ella;ISAAC Neil
分类号	G06N5/04;G06F17/30;G06N99/00	主分类号	G06N5/04
代理机构		代理人
主权项	1. A computer implemented method comprising: computing a hash of each word in a collection of books to produce a numerical integer token using a reduced representation; computing an Inverse Document Frequency (IDF) vector comprising the number of books said token appears in, for every token in said collection of books; creating a token occurrence count vector for each said book in said collection; normalizing said token occurrence count vector using said IDF vector to create a Term Frequency-Inverse Document Frequency (TF-IDF) vector; reducing each said TF-IDF vector by using random projections to obtain a final signature representing each said book in said collection; creating at least two similarity scores between a target book and a list of candidate books; and using a trained machine learning algorithm, determining whether each of said list of candidate books is similar to said target book.
地址	Toronto CA

您可能感兴趣的专利

HIGH ELECTRICALLY CONDUCTIVE POLYANILINE COMPLEXES HAVING POLAR OR POLAR AND HYDROGEN BONDING SUBSTITUENTS

PROCESS FOR THE MANUFACTURE OF AN ETHYLENE VINYL ACETATE BASED EMULSION, EMULSION OBTAINED AND APPLICATIONS

ROTATIONAL ELECTRIC MOTOR WITH RETARDER

EXPRESSION OF GENES IN TRANSGENIC PLANTS

APPARATUS FOR MEASURING THE QUALITY OF MEAT

FLAT STRUCTURE, IN PARTICULAR FLOOR COVERING, AND PROCESS FOR MANUFACTURING THE SAME

DIESEL LUBRICANTS AND METHODS

PROCESS FOR PREPARING LOW DENSITY POROUS CROSS-LINKED POLYMERIC MATERIALS

MULTI-PASSBAND, DIELECTRIC FILTER CONSTRUCTION

SPIRO-OXETANES, PROCESS FOR PREPARING THE SAME AND MEDICAMENTS

THIAZOLE DERIVATIVE

CATALYST REGENERATION

SYSTEM FOR ACCESSING DISTRIBUTED DATA CACHE CHANNEL AT EACH NETWORK NODE TO PASS REQUESTS AND DATA

TEXTILE CORE HAVING IMPROVED START-UP GROOVE AND GROOVING BLADE THEREOF

COLLAPSIBLE CONTAINER

PROCESS FOR REGULATING THE BRAKING PRESSURE USING A BRAKE SERVO

FORMED BODY WITH AGGLOMERATED THERMOPLASTIC MATERIAL AND/OR PLASTIC FOIL MATERIAL

GAS-DISTRIBUTING STATION WITH ENERGETIC INSTALLATION

PLUG-IN COUPLING FOR PRESSURE LINESS

BRAKE SHOE HOLD DOWN SPRING