-
1.
公开(公告)号:US12229557B2
公开(公告)日:2025-02-18
申请号:US18601640
申请日:2024-03-11
Applicant: Apple Inc.
Inventor: Brian R. Mestan , Gideon N. Levinsky , Michael L. Karm
Abstract: In an embodiment, a processor comprises an atomic predictor circuit to predict whether or not an atomic operation will complete successfully. The prediction may be used when a subsequent load operation to the same memory location as the atomic operation is executed, to determine whether or not to forward store data from the atomic operation to the subsequent load operation. If the prediction is successful, the store data may be forwarded. If the prediction is unsuccessful, the store data may not be forwarded. In cases where an atomic operation has been failing (not successfully performing the store operation), the prediction may prevent the forwarding of the store data and thus may prevent a subsequent flush of the load.
-
公开(公告)号:US12204430B2
公开(公告)日:2025-01-21
申请号:US17033746
申请日:2020-09-26
Applicant: Intel Corporation
Inventor: Ahmad Yasin
Abstract: Embodiments are disclosed for monitoring processor performance, including cost of events. In an embodiment, a processor includes a first counter, a second counter, a handler circuit, and an enable circuit. The first counter is to count occurrences of an event in the processor and to overflow upon the count of occurrences reaching a specified value. The second counter to measure a performance cost of the event. The handler circuit to generate and an event sampling record. The record is to include at least one value reflecting the performance cost. The enable circuit is to enable the handler circuit to generate the record.
-
公开(公告)号:US20250021336A1
公开(公告)日:2025-01-16
申请号:US18768088
申请日:2024-07-10
Applicant: Akeana, Inc.
Inventor: Rabin Sugumar
Abstract: A processor core includes a local cache hierarchy, prefetch logic, and a prefetch table, where the processor core is coupled to an external memory system. A data stream is detected, where the data stream includes multiple load instructions, including a load instruction that causes a cache miss, resulting in prefetching. A prefetch table is initialized with information pertaining to load instructions, and includes a Positive or Negative value (PON), a stride, and a saturation count. Information in the prefetch table is updated as new load instructions are prefetched. An underlying stride of the data stream is discovered, based on the updating. Data is prefetched using an offset, where a polarity of the offset is based on the PON, enabling effective stride detection with dynamic directionality and out-of-order instructions.
-
公开(公告)号:US20250013468A1
公开(公告)日:2025-01-09
申请号:US18898309
申请日:2024-09-26
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Xianzhe LIU , Jianjiang ZENG , Yandong LV
Abstract: Embodiments of this application disclose an instruction translation method. The method includes: obtaining a return instruction of a function call instruction; obtaining a first address mapping result based on a second address indicated in the return instruction; storing the first address mapping result in a running stack space; and obtaining a first translation result of the return instruction, where the first translation result is a binary translation result of the return instruction, and the second translation result indicates to obtain, from a target location, an instruction indicated by the first address mapping result and execute the instruction. In this application, a running stack space of a source program is reused, thereby saving a storage space. In addition, an address of a return instruction does not need to be checked each time the return instruction is translated, thereby reducing overheads during translation and increasing program running efficiency.
-
公开(公告)号:US20240419606A1
公开(公告)日:2024-12-19
申请号:US18812008
申请日:2024-08-22
Applicant: TEXAS INSTRUMENTS INCORPORATED
Inventor: Timothy David Anderson , Mujibur Rahman , Dheera Balasubramanian Samudrala , Peter Richard Dent , Duc Quang Bui
IPC: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/30 , G06F9/32 , G06F9/345 , G06F9/38 , G06F9/48 , G06F11/00 , G06F11/10 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F15/78 , G06F17/16 , H03H17/06
Abstract: An example device includes a first register storing a first vector comprised of a set of vector elements; a second register having a set of lanes and configured to store a second vector; and a storage that stores a set of control elements. Each such control element corresponds to a respective one of the vector elements of the set of vector elements in the first register. In addition, each control element of the set of control elements has a first portion that specifies, for the corresponding vector element of the set of vector elements, a lane of the set of lanes of the second register, and a second portion that specifies whether the corresponding vector element of the set of vector elements is to be routed to the lane specified by the first portion. The example device further includes processing circuitry to, based on an instruction that specifies the first register and the second register, generate the second vector based on the set of control elements.
-
公开(公告)号:US12164438B2
公开(公告)日:2024-12-10
申请号:US18460772
申请日:2023-09-05
Applicant: TEXAS INSTRUMENTS INCORPORATED
Inventor: Duc Quang Bui , Joseph Raymond Michael Zbiciak
IPC: G06F9/30 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/32 , G06F9/345 , G06F9/38 , G06F9/48 , G06F11/00 , G06F11/10 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F12/1045 , G06F17/16 , H03H17/06 , G06F15/78
Abstract: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
-
公开(公告)号:US20240403394A1
公开(公告)日:2024-12-05
申请号:US18801580
申请日:2024-08-12
Inventor: Kanad Ghose
Abstract: A secure processor, comprising a logic execution unit configured to process data based on instructions; a communication interface unit, configured to transfer of the instructions and the data, and metadata tags accompanying respective instructions and data; a metadata processing unit, configured to enforce specific restrictions with respect to at least execution of instructions, access to resources, and manipulation of data, selectively dependent on the received metadata tags; and a control transfer processing unit, configured to validate a branch instruction execution and an entry point instruction of each control transfer, selectively dependent on the respective metadata tags.
-
公开(公告)号:US20240403055A1
公开(公告)日:2024-12-05
申请号:US18800249
申请日:2024-08-12
Applicant: TEXAS INSTRUMENTS INCORPORATED
Inventor: Joseph Zbiciak , Timothy Anderson
IPC: G06F9/32 , G06F9/30 , G06F9/345 , G06F9/38 , G06F11/00 , G06F11/10 , G06F12/02 , G06F12/0875 , G06F12/0897 , G06F13/16 , G06F13/40
Abstract: A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck.
-
公开(公告)号:US12131157B2
公开(公告)日:2024-10-29
申请号:US17984336
申请日:2022-11-10
Applicant: AzurEngine Technologies Zhuhai Inc.
Inventor: Toshio Nagata , Yuan Li , Jianbin Zhu , Ryan Braidwood
CPC classification number: G06F9/30145 , G06F9/30036 , G06F9/30043 , G06F9/321
Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may include a sequencer configured to: decode instructions that include scalar instructions and vector instructions, execute decoded scalar instructions, and package decoded vector instructions as configurations. The processor may further include a plurality of columns of vector processing units coupled to the sequencer. The plurality of columns of vector processing units may include a plurality of processing elements (PEs) and each of the PEs may include a plurality of Arithmetic Logic Units (ALUs). The sequencer may be configured to send the configurations to the plurality of columns of vector processing units.
-
公开(公告)号:US20240330203A1
公开(公告)日:2024-10-03
申请号:US18739768
申请日:2024-06-11
Applicant: Texas Instruments Incorporated
Inventor: Mujibur Rahman , Timothy David Anderson
IPC: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/30 , G06F9/32 , G06F9/345 , G06F9/38 , G06F9/48 , G06F11/00 , G06F11/10 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F15/78 , G06F17/16 , H03H17/06
CPC classification number: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/4876 , G06F7/49915 , G06F7/53 , G06F7/57 , G06F9/3001 , G06F9/30014 , G06F9/30021 , G06F9/30032 , G06F9/30036 , G06F9/30065 , G06F9/30072 , G06F9/30098 , G06F9/30112 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/32 , G06F9/345 , G06F9/3802 , G06F9/3818 , G06F9/383 , G06F9/3836 , G06F9/3851 , G06F9/3856 , G06F9/3867 , G06F9/3887 , G06F9/48 , G06F11/00 , G06F11/1048 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F17/16 , H03H17/0664 , G06F9/30018 , G06F9/325 , G06F9/381 , G06F9/3822 , G06F11/10 , G06F15/7807 , G06F15/781 , G06F2212/452 , G06F2212/60 , G06F2212/602 , G06F2212/68
Abstract: Devices and methods are provided for performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers. In an example, a device includes a processor that includes a multiply circuit. The multiply circuit is configured to multiply floating point numbers in response to a floating point multiply instruction, and is further configured to determine values of implied bits of mantissas of the floating point numbers, and multiply the mantissas in parallel with the determining operation.
-
-
-
-
-
-
-
-
-