发明名称 SCALABLE, MEMORY-EFFICIENT MACHINE LEARNING AND PREDICTION FOR ENSEMBLES OF DECISION TREES FOR HOMOGENEOUS AND HETEROGENEOUS DATASETS
摘要 Optimization of machine intelligence utilizes a systemic process through a plurality of computer architecture manipulation techniques that take unique advantage of efficiencies therein to minimize clock cycles and memory usage. The present invention is an application of machine intelligence which overcomes speed and memory issues in learning ensembles of decision trees in a single-machine environment. Such an application of machine intelligence includes inlining relevant statements by integrating function code into a caller's code, ensuring a contiguous buffering arrangement for necessary information to be compiled, and defining and enforcing type constraints on programming interfaces that access and manipulate machine learning data sets.
申请公布号 US2014337269(A1) 申请公布日期 2014.11.13
申请号 US201414272263 申请日期 2014.05.07
申请人 WISE IO, INC. 发明人 EADS DAMIAN RYAN
分类号 G06N5/02 主分类号 G06N5/02
代理机构 代理人
主权项 1. A method comprising: implementing, within a single-machine computing environment comprised of hardware and software components that include at least one processor, the steps of: integrating function code into a caller's code to inline relevant statements so that repetitive pushing and popping of a collection of variables having different variable characteristic types, variable data types, and variable group storage characteristics to and from a stack at each compilation is eliminated, wherein a data structure in which a set of heterogeneous data is comprised of a collection of variables having different variable characteristic types, variable data types, and variable group storage characteristics is defined so that multiple columns with exactly the same variable characteristic types and variable data types are grouped together as a variable group, where each variable is represented by a canonical index, and each variable group has a variable group index, and each variable within a variable group has a within group index; instantiating a subsample of data structures representing a subset of instances of a set of heterogeneous data, wherein the subsample of data structures reside outside of the variable group to reduce complication from storing the collection of variables having different variable characteristic types, variable data types, and variable group storage characteristics; defining an intermediate data structure to represent the collection of variables contiguously to place bytes required for compilation in a contiguous arrangement so that fewer pages of data are pulled from memory to blocks in a plurality of caches that include a weight cache, a label cache, a feature cache, and a triple cache; and representing each variable characteristic type by a type that describes a variable data type in one or more template arguments to enable pattern matching at a node induction and statistics computation time on each type so that an appropriate instantiation of a routine is directed for each combination of variable characteristic type and variable data type.
地址 Berkeley CA US