发明名称 BRANCH LOOK-AHEAD INSTRUCTION DISASSEMBLING, ASSEMBLING, AND DELIVERING SYSTEM APPARATUS AND METHOD FOR MICROPROCESSOR SYSTEM
摘要 A method and system of the branch look-ahead (BLA) instruction disassembling, assembling, and delivering are designed for improving speed of branch prediction and instruction fetch of microprocessor systems by reducing the amount of clock cycles required to deliver branch instructions to a branch predictor located inside the microprocessors. The invention is also designed for reducing run-length of the instructions found between branch instructions by disassembling the instructions in a basic block as a BLA instruction and a single or plurality of non-BLA instructions from the software/assembly program. The invention is also designed for dynamically reassembling the BLA and the non-BLA instructions and delivering them to a single or plurality of microprocessors in a compatible sequence. In particular, the reassembled instructions are concurrently delivered to a single or plurality of microprocessors in a timely and precise manner while providing compatibility of the software/assembly program.
申请公布号 US2016283243(A1) 申请公布日期 2016.09.29
申请号 US201514735147 申请日期 2015.06.10
申请人 Jung Yong-Kyu 发明人 JUNG YONG-KYU
分类号 G06F9/38;G06F9/32;G06F9/30 主分类号 G06F9/38
代理机构 代理人
主权项 1. An apparatus for producing a branch look-ahead (BLA) instruction disassembling, assembling, and delivering system comprising: a BLA instruction/non-BRA instruction (BI/non-BI) disassembling system; a single or plurality of dynamic BI/non-BI assembling and delivering systems; and a single or plurality of backend microprocessors; wherein the apparatus for producing the BI disassembling, assembling, and delivering system is operable to: identify each BI comprising an entire or a part of the basic block with or without including a branch or flow control instruction required to be predicted by a branch predictor, from a software and/or assembly program for generating BI program and non-BI program wherein the basic block is an instruction segment with only one entry and only one exit in the program;compose BI program comprising BIs and/or some non-BIs with additional information to access a single or plurality of the non-BIs associated to each BI if necessary;deliver the BIs from the BI program to a single or plurality of the microprocessors according to the BI fetch order while delivering the non-BIs from the non-BI program to the microprocessors according to the non-BI fetch order obtained from the associated BI;deliver a single or plurality of BIs to the branch predictor for predicting a single or plurality of locations of the next BIs in the BI program and start to deliver the next BIs to the microprocessors while continuous delivering the non-BIs associated the previous or current BIs delivered to the microprocessors; andproduce a single or plurality of branch prediction results of the BIs before completely fetching the non-BIs of the associate BIs;where in the BI disassembling, assembling, and delivering system is further operable to: disassemble NIs in a software and/or assembly program to BI program and non-BI program;compose a BI comprising a single or plurality of another BIs and/or non-disassembled NIs;compose a single or plurality of BIs representing a single or plurality of levels of loops in the software and/or assembly program;compose a non-BI comprising a single or plurality of another non-BIs and/or non-disassembled NIs;compose a single or plurality of non-BIs representing a single or plurality of levels of loops in the software and/or assembly program;assign BIs and non-BIs to the sequentially and/or concurrently accessible BI and non-BI main memories;access BIs and non-BIs from the sequentially and/or concurrently accessible BI and non-BI main memories to the sequentially, concurrently, and/or a single or plurality of times quickly accessible BI and non-BI caches, wherein the plurality of times quickly accessible caches are two or more times faster than NI fetch speed of microprocessors;assemble the NIs from the non-BI program and/or BI program during the NI fetch operation via a single or plurality of BI/non-BI prefetch/fetch systems;prefetch the BIs addressed by the BI prefetch/decode units to the BI caches;prefetch the BIs addressed by the BLA systems to the BI caches whenever a single or plurality of branch target addresses is obtained from a single or plurality of BLA branch prediction units in the BLA systems or a single or plurality of interrupt processing units in the backed microprocessors;terminate the BI prefetch after continuously prefetching BIs from both of the predicted and non-predicted paths one or more times to the BI caches;decode the prefetched BIs for prefetching the associated non-BIs and for prefetching variable-length native instructions (NIs) to the BI caches;fetch the BIs addressed by the BI fetch/decode units to the BLA systems;fetch the BIs addressed by the BLA systems to the BLA systems whenever a single or plurality of branch target addresses is obtained from a single or plurality of BLA branch prediction units in the BLA systems or a single or plurality of interrupt processing units in the backed microprocessors;decode the fetched BIs for fetching the associated non-BIs and for fetching fixed- and/or variable-length NIs to the BLA systems;forward a single or plurality of the fetched BIs to a single or plurality of BI decode units via the BI fetch units;initiate the branch prediction operations of the BIs received within a single or plurality of clock cycles ahead comparing with the branch prediction operations with NIs fetched and decoded in the non-BI fetch units and the non-BI decode units by identifying any BIs required to be predicted for their branch operations and branch target locations with the branch prediction information forwarded to the BLA branch prediction units;initiate next BI and non-BI prefetch and fetch operations according to the branch prediction results available to a single or plurality of clock cycles ahead for enhancing performance of the microprocessor by reducing taken-branch latencies;filter BIs representing a single or plurality of loops by the BI decode units and hold further BIs fetched in the BI fetch units while reissuing the same BIs representing the same single or plurality of loops to the BAL branch prediction units;eliminate recursive BI and non-BI prefetching and fetching operations from the BI/non-BI memory systems to the BRA systems via the BI/non-BI prefetch/fetch systems;decode the BIs to redirect the associated non-BIs fetched to a single or plurality of non-BIQs in the non-BI fetch units if the fetched non-BIs and/or NIs in the non-BIQ are changed;detect and process the disrupting operations of BI, non-BI, and/or NI fetch, decode, and/or execution orders, such as interrupts by the interrupt processing units, branch misprediction correction operations by the backend processing engines, and other parts in microprocessors;store current BPCs, non-BPCs, and/or NPCs to the stacks in order to resume the disrupted operations of the BIs, non-BIs, and/or NIs;update new NPC values to the BPCs in the BI fetch units and/or non-BPCs and/or NPCs in the non-BI fetch units to prefetch and/or fetch the BIs, non-BIs, and/or NIs from the disrupted locations;restore NPC values of the BPCs, non-BPCs, and/or NPCs stored in the stacks to the BPCs and/or non-BPCs and/or NPCs;reset a single or plurality of values of the non-BPCs and/or NPCs whenever the last NIs of the associated non-BIs are fetched;increase a single or plurality of values of the non-BPCs and/or NPCs whenever non-BIs and/or NIs of the next non-BIs are fetched or whenever the first NIs of the associated non-BIs are fetched; andrepeat resetting and increasing the values of the non-BPCs and/or NPCs until the next non-BIs and/or the last NIs of the non-BIs are fetched.
地址 Erie PA US