发明名称 Multi-petascale highly efficient parallel supercomputer
摘要 A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
申请公布号 US9081501(B2) 申请公布日期 2015.07.14
申请号 US201113004007 申请日期 2011.01.10
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Asaad Sameh;Bellofatto Ralph E.;Blocksome Michael A.;Blumrich Matthias A.;Boyle Peter;Brunheroto Jose R.;Chen Dong;Cher Chen-Yong;Chiu George L.;Christ Norman;Coteus Paul W.;Davis Kristan D.;Dozsa Gabor J.;Eichenberger Alexandre E.;Eisley Noel A.;Ellavsky Matthew R.;Evans Kahn C.;Fleischer Bruce M.;Fox Thomas W.;Gara Alan;Giampapa Mark E.;Gooding Thomas M.;Gschwind Michael K.;Gunnels John A.;Hall Shawn A.;Haring Rudolf A.;Heidelberger Philip;Inglett Todd A.;Knudson Brant L.;Kopcsay Gerard V.;Kumar Sameer;Mamidala Amith R.;Marcella James A.;Megerian Mark G.;Miller Douglas R.;Miller Samuel J.;Muff Adam J.;Mundy Michael B.;O'Brien John K.;O'Brien Kathryn M.;Ohmacht Martin;Parker Jeffrey J.;Poole Ruth J.;Ratterman Joseph D.;Salapura Valentina;Satterfield David L.;Senger Robert M.;Smith Brian;Steinmacher-Burow Burkhard;Stockdell William M.;Stunkel Craig B.;Sugavanam Krishnan;Sugawara Yutaka;Takken Todd E.;Trager Barry M.;Van Oosten James L.;Wait Charles D.;Walkup Robert E.;Watson Alfred T.;Wisniewski Robert W.;Wu Peng
分类号 G06F15/173;G06F9/06;G06F15/76 主分类号 G06F15/173
代理机构 Scully, Scott, Murphy & Presser, P.C. 代理人 Scully, Scott, Murphy & Presser, P.C. ;Morris, Esq. Daniel P.
主权项 1. A massively parallel computing structure comprising: a plurality of processing nodes interconnected by multiple independent networks, each processing node including a plurality of processing elements for performing computation or communication activity as required when performing parallel algorithm operations, a first of said multiple independent networks includes an n-dimensional torus network, n is an integer greater than 3, including communication links interconnecting said processing nodes for providing high-speed, low latency point-to-point and multicast packet communications among said processing nodes or independent partitioned subsets thereof; and, said n-dimensional torus network for enabling point-to-point, all-to-all, collective (broadcast, reduce) and global barrier and notification functions among said processing nodes or independent partitioned subsets thereof, wherein combinations of said multiple independent networks interconnecting said processing nodes are collaboratively or independently utilized according to bandwidth and latency requirements of an algorithm for optimizing algorithm processing performance, wherein each said processing element is multi-way hardware threaded supporting transactional memory execution and thread level speculation, wherein said plurality of processing elements are configured to run speculative threads in parallel, wherein each processing element is further configured to: communicate with a communications pathway, the pathway comprising a first level cache and a second level cache;switch between at least two modes of using the first and second level caches, both modes allowing the first level cache and/or a prefetch unit to be operated in a speculation blind manner, wherein the at least two modes comprise:a first mode where, responsive to a write from a speculative thread, at least one line corresponding to results is evicted from the first level cache and/or said prefetch unit and recorded in the second level cache; anda second mode where, responsive to a write from a speculative thread, the first level cache stores results, and wherein responsive to selection of the first mode, said processing element is configured to: determine whether a speculative thread seeks to write;upon a positive determination, write from the speculative thread through the first level cache to the second level cache;evict a line from the first level cache and/or a prefetch unit corresponding to the writing; andresolve speculation downstream from the first level cache,wherein, subsequent to evicting a line, said processing element is further configured to:determine if a speculative thread seeks to access an address corresponding to the line in the first level cache, and if so,retrieve an appropriate version of data from the second level cache.
地址 Armonk NY US