发明名称 System coherency in a distributed graphics processor hierarchy
摘要 Methods and systems may provide for executing, by a physically distributed set of compute slices, a plurality of work items. Additionally, the coherency of one or more memory lines associated with the plurality of work items may be maintained, by a cache fabric, across a graphics processor, a system memory and one or more host processors. In one example, a plurality of crossbar nodes track the one or more memory lines, wherein the coherency of the one or more memory lines is maintained across a plurality of level one (L1) caches and a physically distributed cache structure. Each L1 cache may be dedicated to an execution block of a compute slice and each crossbar node may be dedicated to a compute slice.
申请公布号 US9436972(B2) 申请公布日期 2016.09.06
申请号 US201414227525 申请日期 2014.03.27
申请人 Intel Corporation 发明人 Koker Altug;Navale Aditya
分类号 G09G5/36;G06T1/00;G06F15/00;G06T1/60;G06F12/08 主分类号 G09G5/36
代理机构 Jordan IP Law, LLC 代理人 Jordan IP Law, LLC
主权项 1. A system comprising: a display to present visual content; a system cache coupled to one or more host processors and a system memory; a graphics interface coupled to the system cache; and a graphics processor coupled to the graphics interface, the graphics processor including: a physically distributed set of compute slices to execute a plurality of work items associated with the visual content, wherein each compute slice includes a plurality of execution blocks each having a plurality of execution units, anda cache fabric to maintain a coherency of one or more memory lines associated with the plurality of work items across the graphics processor, the system memory and the one or more host processors, wherein the cache fabric includes, a plurality of level one (L1) caches, each L1 cache being dedicated to an execution block of a compute slice,a physically distributed shared cache structure, wherein the coherency of the one or more memory lines is to be maintained across the plurality of L1 caches and the shared cache structure across the physically distributed set of compute slices, wherein the shared cache structure includes a level two (L2) cache having a plurality of banks, and wherein the shared cache structure is to hash at least one of the one or more memory lines across the plurality of banks, anda plurality of crossbar nodes to track the one or more memory lines, each crossbar node being dedicated to a compute slice and each crossbar node of the plurality of crossbar nodes being connected to the L1 cache of each of the plurality of execution blocks of the compute slice to which the crossbar node is dedicated, the plurality of crossbar nodes further to distribute one or more snoop requests originating from the system cache to the shared cache structure and the plurality of L1 caches via the graphics interface, collect one or more snoop results from the shared cache structure and the plurality of L1 caches, and communicate the one or more snoop results to the system cache.
地址 Santa Clara CA US