发明名称 APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS
摘要 An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time.
申请公布号 US2017017492(A1) 申请公布日期 2017.01.19
申请号 US201615282082 申请日期 2016.09.30
申请人 Ben-Kiki Oren;PARDO ILAN;Valentine Robert;Weissmann Eliezer;Markovich Dror;Yosef Yuval 发明人 Ben-Kiki Oren;PARDO ILAN;Valentine Robert;Weissmann Eliezer;Markovich Dror;Yosef Yuval
分类号 G06F9/38;G06F12/0875;G06F9/30 主分类号 G06F9/38
代理机构 代理人
主权项 1. A system comprising: a plurality of processors; a first interconnect to communicatively couple two or more of the plurality of processors; a second interconnect to communicatively couple one or more of the plurality of processors to one or more other system components; and a system memory communicatively coupled to one or more of the processors; at least one processor comprising: a plurality of simultaneous multithreading (SMT) cores, each of the SMT cores to perform out-of-order instruction execution for a plurality of threads;at least one shared cache circuit to be shared among two or more the of SMT cores;at least one of the SMT cores comprising: an instruction fetch circuit to fetch instructions of one or more of the threads,an instruction decode circuit to decode the instructions,a register renaming circuit to rename registers of a register file,an instruction cache circuit to store instructions to be executed,a data cache circuit to store data; at least one level 2 (L2) cache circuit to store both instructions and data and communicatively coupled to the instruction cache circuit and the data cache circuit; a communication interconnect circuit to communicatively couple one or more of the SMT cores to an accelerator device, the communication interconnect circuit to provide the accelerator device access to resources of one or more of the processors including the at least one shared cache circuit; and a memory access circuit to identify an accelerator context save/restore region in a memory responsive to a context save/restore value, the context save/restore region to store an accelerator context state.
地址 Tel-Aviv IL