Skip to content

Latest commit

 

History

History
491 lines (422 loc) · 21.7 KB

chapter6.adoc

File metadata and controls

491 lines (422 loc) · 21.7 KB

I/O MTT extension

Supervisor domains may be granted control over DMA-capable devices. When such direct device association is supported, the system might also incorporate multiple instances of IOMMU. Each IOMMU instance can be tied directly to a supervisor domain, allowing that domain to manage address translation and protection for DMA that originates from devices under its control.

To uphold isolation properties, the DMA from the devices and the IOMMU linked with a supervisor domain must adhere strictly to the access protections encoded in the MTT of the respective supervisor domain. Additionally, using the MTT, the RDSM enforces that the IOMMU memory-mapped programming regions are access-restricted to the supervisor domain the IOMMU is assigned to.

At any given time, a solitary supervisor domain is scheduled for execution on a RISC-V hart by the root domain security manager (RDSM). As part of this scheduling, the RDSM programs a pointer to the MTT into a CSR within the hart. Unlike the RISC-V harts, DMA-capable devices connected to a supervisor domain remain continuously active. Such devices might initiate DMA even if the associated domain is not currently active on any RISC-V harts. As a result, the MTT of all supervisor domains must be constantly active for DMA protection. Furthermore, the IO subsystem must possess the capability to select the appropriate MTT for enforcement based on the identity of the device initiating the DMA.

Given this setup, the I/O subsystem is required to offer the following functions:

  • Supervisor Domain Classifier (SDCL): This classifier within the I/O subsystem interprets the attributes of a DMA request and determines the appropriate MTT for that request.

  • MTT Checker (MTTCHK): This function ensures that stipulated access controls by Smmtt are applied to the memory regions accessed by the DMA. It uses the MTT identified by the SDCL.

Collectively, these two functionalities form a logical block in the I/O subsystem, referred to as the I/O MTT checker.

Placement of I/O MTT checker

The IO Bridge serves as an intermediary, situated between the IO devices and the system interconnect, with the primary role of processing DMA transactions. These IO devices can initiate DMA transactions utilizing IO Virtual Addresses (IOVA). Notably, an IOVA could be in the form of a Virtual Address (VA), Guest Virtual Address (GVA), or Guest Physical Address (GPA). The configuration and interfacing of the I/O MTT Checker with respect to the IO Bridge is graphically represented in Figure 1.

IOMTTCHK
Figure 1: I/O MTT checker placement

The IO Bridge invokes the SDCL function using the SDID request interface (SDR) and provides it the identifiers associated with the incoming transaction. The SDCL classifies the request using the identifiers and provides the SDID and the IOMMU ID on the SDID completion interface (SDC).

The IO Bridge uses the IOMMU ID to determine the IOMMU governing this request. The IO Bridge uses the device translation request (DTR) interface to the selected IOMMU to request address translation. The selected IOMMU provides the response to the address translation request on the device translation completion (DTC) interface. As part of the address translation process, the IOMMU may need to access its in-memory data structures over its data structure interface (DSI).

The IO Bridge invokes the MTTCHK function over the MTT check request (MCR) interface and provides it the SDID and physical address of the access. The MTTCHK uses the SDID to determine the MTT associated with the supervisor domain and checks if the physical address may be accessed by the device or IOMMU associated with that supervisor domain. The result of the check is provided on the MTT check completion (MTC) interface to the IO Bridge. As part of the MTT check, the MTTCHK may need to perform implicit accesses to the MTT using the MTT walk interface (MWI).

Each IOMMU offers a memory-mapped register programming interface that the associated supervisor domain uses to configure and control the IOMMU. The RISC-V hart employs the MTT to enforce access controls on the physical address. This is done to ascertain whether the supervisor domain currently executing on that RISC-V hart possesses the rights to access the physical address of an IOMMU register programming interface.

The I/O MTT checker provides a memory-mapped register programming interface associated with the RDSM. The RDSM employs the MTT (or PMP) to prohibit access to these registers from any of the supervisor domains.

I/O MTT Checker Register Interface

Each I/O MTT checker (IOMTTCHK) register interface is memory-mapped starting at an 8-byte aligned physical address and includes the registers used to configure the SDCL and MTTCHK functions in the I/O MTT checker.

Note

Implementations may choose to implement a coarser alignment for the start address of the register interface. For example, some implementations may locate the register interface within a naturally aligned 4-KiB region (a page) of physical address space for each register interface. Coarser alignments may enable register decoding to be implemented without a hardware adder circuit.

The behavior for register accesses where the address is not aligned to the size of the access, or if the access spans multiple registers, or if the size of the access is not 4 bytes or 8 bytes, is UNSPECIFIED. An aligned 4 byte access to a IOMTTCHK register must be single-copy atomic. Whether an 8 byte access to an IOMTTCHK register is single-copy atomic is UNSPECIFIED, and such an access may appear, internally to the IOMTTCHK implementation, as if two separate 4 byte accesses were performed.

Note

The IOMTTCHK registers are defined in such a way that software can perform two individual 4 byte accesses, or hardware can perform two independent 4 byte transactions resulting from an 8 byte access, to the high and low halves of the register as long as the register semantics, with regards to side-effects, are respected between the two software accesses, or two hardware transactions, respectively.

The IOMTTCHK registers have little-endian byte order (even for systems where all harts are big-endian-only).

Note

Big-endian-configured harts that make use of an RERI may implement the REV8 byte-reversal instruction defined by the Zbb extension. If REV8 is not implemented, then endianness conversion may be implemented using a sequence of instructions.

Table 1. I/O MTT Checker register layout
Offset Name Size Description Optional?

0

capabilities

8

Capabilities

No

8

control

8

Control

No

16

operand-0

8

Operand 0

No

24

operand-1

8

Operand 1

No

The reset value is 0 for the following registers fields.

  • control - BUSY and STATUS fields

The reset value is UNSPECIFIED for all other registers and/or fields.

Capabilities (capabilities)

The capabilities register is a read-only register that holds the I/O MTT checker capabilities.

Capabilities register fields
{reg: [
  {bits:  8, name: 'VER'},
  {bits:  1, name: 'MXL'},
  {bits: 39, name: 'WPRI'},
  {bits: 16, name: 'custom'}
], config:{lanes: 4, hspace:1024}}

The VER field holds the version of the specification implemented by the I/O MTT checker. The low nibble is used to hold the minor version of the specification and the upper nibble is used to hold the major version of the specification. For example, an implementation that supports version 1.0 of the specification reports 0x10.

The MXL field indicates the supported MTT address protection schemes. If 1, then the MTT modes for XLEN=64 are supported else the MTT modes for XLEN=32 are supported.

Control register (control)

The control register is used to control classification of DMA requests using the identifiers associated with the DMA requests to determine the associated supervisor domain ID (SDID) and the MTT pointer (MTTP).

Control register (control)
{reg: [
  {bits:  8, name: 'OP (WARL)'},
  {bits: 16, name: 'RULEID (WARL)'},
  {bits:  8, name: 'WPRI'},
  {bits:  7, name: 'STATUS (RO)'},
  {bits:  1, name: 'BUSY (RO)'},
  {bits: 24, name: 'WPRI'}
], config:{lanes: 8, hspace:1024}}

The OP field is used to instruct IOMTTCHK to perform an operation listed in I/O MTT checker operations (OP). The RULEID is identifier of a rule in the SDCL function to operate on. The RULEID value of 0 indicates that the operation applies to all rules and is supported only if explicitly specified by an operation.

Table 2. I/O MTT checker operations (OP)
Operation Encoding Description

 — 

0

Reserved for future standard use.

SET_ENTRY

1

Configure the rule identified by RULEID with the operands specified in operand-0 and operand-1 registers.

GET_ENTRY

2

Read the configurations of a rule identified by RULEID. On successful completion of the operation, the operand-0 and operand-1 registers hold the current configurations of the rule. If the operation is not successful then the contents of operand-0 and operand-1 are UNSPECIFIED.

MTTINVAL

3

Invalidate MTT entries from a MTT cache. The operation may be requested to invalidate all entries of an MTT cache or to invalidate entries corresponding to an address range specified in the operand-1.

IOFENCE

4

This command can be used to request that IOMTTCHK ensure that all previous read and write requests from devices that have already been processed by IOMTTCHK be committed to a global ordering point such that they can be observed by all RISC-V harts, IOMMUs and devices in the system.

 — 

5-127

Reserved for future standard use.

 — 

128-255

Designated for custom use.

When the control is written, IOMTTCHK may need to perform several actions that may not complete synchronously with the write. A write to the control sets the BUSY bit to 1 indicating that IOMTTCHK is performing the requested operation. The behavior of writing the control register when the BUSY bit is 1 is UNSPECIFIED. Some implementations may ignore the second write and others may perform the operation determined by the second write. Software must verify that BUSY is 0 before writing control.

Note

An implementation that can always perform the requested operation synchronously with the write to control register may hardwire the BUSY field to 0.

When the BUSY bit reads 0 the operation is complete and the STATUS field provides a status value (control.STATUS field encodings) of the requested operation.

Table 3. control.STATUS field encodings
STATUS Description

0

Reserved

1

Operation was successfully completed.

2

Invalid operation (OP) requested.

3

Operation requested for invalid RULEID.

4

Illegal/invalid operand encodings used.

5-127

Reserved for future standard use.

128-255

Designated for custom use.

Before requesting the SET_ENTRY operation using the control register, software should program the fields of the operand-0 and operand-1 registers. The SET_ENTRY operation utilizes the following fields from the operand-0 and operand-1 registers: SRC_IDT, SRC_IDM, TEE_FLT, SRC_ID, IOMMU_ID, SDID, MTT_MODE and PPN.

If multiple rules are programmed to match a transaction, the implementation may act based on any one of those matching rules. However, if a transaction does not match any of the rules, the IO Bridge is notified of this condition. The subsequent behavior of the IO Bridge for unmatched transactions remains UNSPECIFIED.

An implementation that performs the requested operation synchronously may hardwire the BUSY bit to 0. The GET_ENTRY operation ignores the contents of both the operand-0 and operand-1 registers. If the GET_ENTRY operation is unsuccessful, the contents of these registers remain UNSPECIFIED. However, upon a successful GET_ENTRY operation, the listed fields reflect the configurations of the rule identified by control.RULEID. The state of any unlisted fields in the operand-0 and operand-1 registers is UNSPECIFIED.

The contents of operand-0 and operand-1 are disregarded by the IOFENCE operation.

The MTTINVAL operation ignores the contents of operand-0 register but utilizes the following fields from the operand-1 register: PPNV, PPN and S.

Operand 0 register (operand-0)

The operand-0 register holds the input operands or the output results of operations requested through control.OP.

Operand-0 register (operand-0)
{reg: [
  {bits:  4, name: 'SRC_IDT (WARL)'},
  {bits:  2, name: 'SRC_IDM (WARL)'},
  {bits:  2, name: 'TEE_FLT (WARL)'},
  {bits: 24, name: 'SRC_ID'},
  {bits:  8, name: 'IOMMU_ID (WARL)'},
  {bits:  8, name: 'SDID (WARL)'},
  {bits:  4, name: 'SRL'},
  {bits:  4, name: 'SML'},
  {bits:  4, name: 'SQRID'},
  {bits:  4, name: 'WPRI'}
], config:{lanes: 8, hspace:1024}}

The SRC_IDT field identifies the type of identifier from the DMA transaction used by this classification rule. The SRC_IDT encodings are listed in operand-0.SRC_IDT field encodings.

Table 4. operand-0.SRC_IDT field encodings
SRC_IDT Description

0

None. This rule does not match any incoming transaction. All other fields of the operand-0 and operand-1 register are ignored if the control.OP is SET_ENTRY. All other fields of operand-0 and operand-1 register are UNSPECIFIED if the control.OP is GET_ENTRY.

1

Filter by device ID. The device ID is specified in SRC_ID field and may be up to 24-bit wide.

2

Filter by PCIe IDE stream ID and PCIe segment ID. The IDE stream ID is specified in the bits 7:0 of the SRC_ID field and the segment ID in bits 15:8 of the SRC_ID. The bits 23:16 of the SRC_ID field are ignored.

3 - 7

Reserved for future standard use.

8 - 15

Designated for custom use.

Note

In PCIe systems, an originating device can be pinpointed using a unique 16-bit identifier. This identifier is a composite of the PCI bus number (8 bits), device number (5 bits), and function number (3 bits), collectively referred to as the routing identifier or RID. In scenarios where an IOMMU manages multiple hierarchies, there’s also an optional segment number, which can be up to 8 bits. Each hierarchy in this context represents a distinct PCI Express I/O interconnect topology. Here, the Configuration Space addresses, which are delineated by the Bus, Device, and Function number tuple, remain distinct. Sometimes, the term Hierarchy is synonymous with Segment. Especially when in Flit Mode, the Segment number can be part of a Function’s ID.

The SRC_IDM field can configure SRC_ID matching mode for transactions. The SRC_IDM encodings are listed in operand-0.SRC_IDM field encodings.

Table 5. operand-0.SRC_IDM field encodings
SRC_IDM Description

0

Reserved for future standard use.

1

Unary. If Unary is selected, then this rule matches if all the bits of the source ID of the transaction match the value configured in the SRC_ID field.

2

NAPOT. If NAPOT is selected, then the rule matches a naturally aligned power-of-two range of source IDs. In this mode, the lower bits of the SRC_ID, up to and including the first low-order zero bit, are masked; the unmasked bits are compared with the corresponding bits in the source ID of the transaction to match.

3

TOR. If TOR (Top-Of-Range) is selected, the SRC_ID field forms the top of a range of source IDs. If rule r's SRC_IDM is set to TOR, the rule matches any source ID s if: s is greater than or equal to SRC_ID of rule r-1 and is less than the SRC_ID of rule r. If r is 0, then zero is used as the lower bound. If SRC_ID of rule r-1 is greater than or equal to that of rule r and TOR is selected for rule r, then rule r does not match any address.

Note

The following example illustrates the use of SRC_IDM=NAPOT when SRC_IDT is by DEVID and a 24-bit PCIe device_id comprised of the segment, bus, device, and function number is used. In the table below, y acts as a placeholder representing any 1-bit value.

Table 6. SRC_IDM with SRC_IDT set to DEVID based filtering
SRC_IDM SRC_ID Comment

0

yyyyyyyy yyyyyyyy yyyyyyyy

One specific seg:bus:dev:func

1

yyyyyyyy yyyyyyyy yyyyy011

seg:bus:dev - any func

2

yyyyyyyy yyyyyyyy 01111111

seg:bus - any dev:func

3

yyyyyyyy 01111111 11111111

seg - any bus:dev:func

The TEE_FLT field may be used to filter transactions associated with a Trusted Execution Environment (TEE). The encodings for the TEE_FLT field can be found in operand-0.TEE_FLT field encodings.

Table 7. operand-0.TEE_FLT field encodings
TEE_FLT Description

0

Reserved for future standard use.

1

Rule matches TEE-associated transactions.

2

Rule matches transactions that are not TEE associated.

3

Rule matches both TEE-associated and non-TEE associated transactions.

Note

PCIe IDE provides security for transactions from one Port to another. These transactions might be initiated by contexts within the device, such as an SR-IOV virtual function, which are associated with a Trusted Execution Environment (TEE). Within the IDE TLP header, there’s a "T" bit that helps differentiate transactions related to a TEE. The TEE_LIM filter can be employed to associate these TEE-related transactions with a different supervisor domain than the transactions not related to TEE. This distinction is made even if both types of transactions are received on the same PCIe IDE stream.

The IOMMU_ID field identifies the instance of the IOMMU that should be used to provide address translation and protection for the transactions matching this rule.

The SDID field identifies the supervisor domain whose memory is accessed by this transaction.

The SRL and SML fields along with operand-1.SSM field are used to determine the effective RCID and MCID provided by the IOMMU for device originated requests. The determination of the effective RCID and MCID is as specified by [SMQOSID]. The SQRID identifies the QRI for requests originating from the devices associated with the SD and accompanies the RCID and MCID in the requests made by the device to the QRI.

Operand 1 register (operand-1)

The operand-1 register holds the input operands or the output results of operations requested through control.OP.

Operand-1 register (operand-1)
{reg: [
  {bits:  4, name: 'MTT_MODE (WARL)'},
  {bits:  1, name: 'PPNV (WARL)'},
  {bits:  1, name: 'S (WARL)'},
  {bits:  1, name: 'SSM'},
  {bits:  3, name: 'WPRI'},
  {bits: 44, name: 'PPN'},
  {bits: 10, name: 'WPRI'}
], config:{lanes: 8, hspace:1024}}

The MTT_MODE field identifies the mode of the MTT. It’s interpreted as outlined in [mtt-32] when capabilities.MXL is 1, and as detailed in [mtt-64] otherwise. The MTT_MODE field is programmed into the rule identified by RULEID via the SET_ENTRY operation and can be retrieved by the GET_ENTRY operation. Both the IOFENCE and MTTINVAL operations disregard the MTT_MODE field.

The PPN field programs the PPN of the root page of the MTT during the SET_ENTRY operation and is retrieved by the GET_ENTRY operation. The IOFENCE operation disregards this field.

The MTTINVAL operation refers to the PPNV field to determine the validity of the PPN field when it’s set to 1. If the PPNV field is 0, the MTTINVAL operation affects all entries from the MTT associated with RULEID. If not, it acts on the PPN range as specified by the PPN and S fields. When the PPNV field is 1, the S field sets the address range size for the MTTINVAL operation. With an S field value of 0, the range size is 4 KiB. But, when the S field has a value of 1, the MTTINVAL operation focuses on a NAPOT range. This range is decided by the low-order bits of the PPN field, going up to the first low-order 0 bit (inclusive of this position). If the initial low-order 0 bit position is denoted as x, the size of the range is computed as (1 << (12 + x + 1)).