发明名称 Managing coherent memory between an accelerated processing device and a central processing unit
摘要 Existing multiprocessor computing systems often have insufficient memory coherency and, consequently, are unable to efficiently utilize separate memory systems. Specifically, a CPU cannot effectively write to a block of memory and then have a GPU access that memory unless there is explicit synchronization. In addition, because the GPU is forced to statically split memory locations between itself and the CPU, existing multiprocessor computing systems are unable to efficiently utilize the separate memory systems. Embodiments described herein overcome these deficiencies by receiving a notification within the GPU that the CPU has finished processing data that is stored in coherent memory, and invalidating data in the CPU caches that the GPU has finished processing from the coherent memory. Embodiments described herein also include dynamically partitioning a GPU memory into coherent memory and local memory through use of a probe filter.
申请公布号 US9430391(B2) 申请公布日期 2016.08.30
申请号 US201213601126 申请日期 2012.08.31
申请人 Advanced Micro Devices, Inc.;ATI Technologies ULC 发明人 Asaro Anthony;Normoyle Kevin;Hummel Mark
分类号 G06F12/00;G06F13/00;G06F13/28;G06F12/08 主分类号 G06F12/00
代理机构 Volpe and Koenig, P.C. 代理人 Volpe and Koenig, P.C.
主权项 1. A method of managing a coherent memory between a first processor and a second processor comprising: monitoring a flag register to determine whether data stored in a system coherent memory is available for processing, wherein the flag register notifies the second processor that data is available for processing in the system coherent memory; monitoring and recording the addresses of cache lines used by a first processor using a probe filter; receiving an address for a data that is available to be processed using the second processor; comparing the address for the data to the addresses recorded with the probe filter to determine whether the data has been exported by the first processor; in response to determining that data was not exported by the first processor, partitioning a second processor memory into a local memory and a coherent memory, wherein a portion of the coherent memory containing the recorded cache lines is stored in the local memory; and in response to determining that data was exported by the first processor, sending a probe to the first processor to retrieve the data exported by the first processor.
地址 Sunnyvale CA US