DATA SORTING FOR LANGUAGE PROCESSING SUCH AS POS TAGGING,申请号US201514804802-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	DATA SORTING FOR LANGUAGE PROCESSING SUCH AS POS TAGGING
摘要	Technology is disclosed that improves language coverage by selecting sentences to be used as training data for a language processing engine. The technology accomplishes the selection of a number of sentences by obtaining a group of sentences, computing a score for each sentence, sorting the sentences based on their scores, and selecting a number of sentences with the highest scores. The scores can be computed by dividing a sum of frequency values of unseen words (or n-grams) in the sentence by a length of the sentence. The frequency values can be based on posts in one or more particular domains, such as the public domain, the private domain, or other specialized domains.
申请公布号	US2017024376(A1)	申请公布日期	2017.01.26
申请号	US201514804802	申请日期	2015.07.21
申请人	Facebook, Inc.	发明人	Eck Matthias Gerhard
分类号	G06F17/28;G06N5/02	主分类号	G06F17/28
代理机构		代理人
主权项	1. A method for obtaining engine training data that has high coverage comprising: receiving a set of potential training data snippets comprising one or more n-grams; for each selected snippet of two or more of the potential training data snippets, computing a snippet score for the selected snippet by: identifying one or more n-grams of the selected snippet as unseen n-grams;obtaining a frequency value for the identified unseen n-grams;computing a sum of the obtained frequency values;computing a length value of the selected snippet; andcomputing the snippet score for the selected snippet by dividing the sum of the obtained frequency values by the length value of the selected snippet; sorting the set of potential training data snippets, as sorted snippets, based on the computed snippet scores; selecting, based on snippet locations in the sorted snippets, one or more of the potential training data snippets as the engine training data; and storing the engine training data in a memory, wherein the engine training data is used by an engine to perform automated language processing functions.
地址	Menlo Park CA US

您可能感兴趣的专利

VASO-OCCLUSIVE DEVICES HAVING EXPANDABLE FIBERS

DEVICES FOR MANIPULATING TISSUE AND RELATED METHODS

SURGICAL STAPLING LOADING UNIT WITH STROKE COUNTER AND LOCKOUT

SURGICAL STAPLER CARTRIDGE WITH COMPRESSION FEATURES AT STAPLE DRIVER EDGES

METHOD AND APPARATUS FOR MEASURING PHYSICAL CONDITION BY USING HEART RATE RECOVERY RATE

METHOD AND SYSTEM FOR MONITORING OXYGENATION LEVELS OF A COMPARTMENT FOR DETECTING CONDITIONS OF A COMPARTMENT SYNDROME

Method and a System for Monitoring Oxygen Level of an Environment

Systems and Methods for Rehabilitating the Hand

PHYSIOLOGY SENSING DEVICE AND INTELLIGENT TEXTILE

HEADGEAR FOR DRY ELECTROENCEPHALOGRAM SENSORS

GAZE TRACKING VARIATIONS USING DYNAMIC LIGHTING POSITION

PARAMETER-BASED CONTROL OF A LUMEN TRAVELING DEVICE

Two-Piece Foam Piston Pump

METHOD AND COMPUTER PROGRAM FOR CONTROLLING A FRYER, AND FRYER ARRANGED FOR CARRYING OUT SUCH METHOD

Semi-Continuous Apparatus for Creating an Extract from Coffee or Other Extractable Materials

Celebratory Glass Top and Method of Use

NECK PILLOW WITH COMPARTMENT FOR BLANKET

Anchorable Beach Towel and Storage Pouch

Freezer Cabinet