发明名称 Method and apparatus for visualizing component workloads in a unified shader GPU architecture
摘要 A method of calculating performance parameters for a type of data being executed by a unified processing subunit. In one embodiment, a task (e.g., a draw call) is executed by a processing pipeline (e.g., a GPU). An ALU within a unified processing subunit (e.g., a unified shader processing unit) is queried to determine a type of data (e.g., vertex processing, pixel shading) being processed by the ALU. Performance parameters (e.g., bottleneck and utilization) for the type of data being processed by the ALU is calculated and displayed (e.g., stacked graph). Accordingly, software developers can visualize component workloads of a unified processing subunit architecture. As a result, utilization of the unified processing subunit processing a particular data may be maximized while bottleneck is reduced. Therefore, the efficiency of the unified processing subunit and the processing pipeline is improved.
申请公布号 US8963932(B1) 申请公布日期 2015.02.24
申请号 US200611641447 申请日期 2006.12.18
申请人 Nvidia Corporation 发明人 Kiel Jeffrey T.;Cornish Derek M.
分类号 G06T1/20 主分类号 G06T1/20
代理机构 代理人
主权项 1. A method of calculating performance parameters for a type of data being executed by a unified processor subunit, said method comprising: executing an executable task on a processor pipeline comprising a plurality of processing subunits and further comprising said unified processor subunit; querying said unified processor subunit and in response thereto determining a data type being processed by said unified processor subunit; and calculating performance parameters for said unified processor subunit processing said data type, wherein said calculating performance parameters comprises calculating a bottleneck that is a measurement of adverse performance of said plurality of processing subunits caused by said unified processor subunit, and wherein said bottleneck is a function of performance parameters associated with said unified processor subunit and parameters associated with said processing pipeline, and wherein said bottleneck is a measure of time that said unified processor subunit is processing said data type plus a measure of time that said unified processor subunit pauses an upstream component because said unified processor subunit is busy minus the time which said unified processor subunit is paused because a downstream component is busy and does not accept further data, all over the time required by said processing pipeline to process said executable task, and wherein said calculating is based on a counter operable to increment based on an individual processing of said data type.
地址 Santa Clara CA US