Confidential Computing Notes

Repo: ~/learning/gpucc (local)
Hardware: AMD EPYC (SEV-SNP), Intel Xeon (TDX), NVIDIA H100/A100

What is Confidential Computing?

Running workloads in hardware-enforced trusted execution environments (TEEs)
The host/hypervisor cannot read or tamper with guest memory
Attestation: cryptographic proof that the workload is running in a genuine TEE with expected code

AMD SEV-SNP

SEV = Secure Encrypted Virtualization
SNP = Secure Nested Paging (latest generation)
Each VM gets its own encryption key managed by a dedicated security processor (PSP)
Memory pages are encrypted transparently – the guest OS doesn’t need modification
Attestation via /dev/sev-guest ioctl:
- Send 64 bytes of user data (challenge/nonce)
- Get back a signed report (4096 bytes) from the AMD PSP
- Report contains measurement of the VM, policy, platform info

struct snp_report_req {
    uint8_t user_data[64];    // your challenge
    uint32_t vmpl;            // privilege level
    uint8_t report[4096];     // signed attestation report
};
// ioctl(fd, SNP_GET_REPORT, &req)

Kernel params needed: mem_encrypt=on iommu=pt amd_iommu=on kvm_amd.sev=1 kvm_amd.sev_snp=1
QEMU launch requires -object sev-snp-guest with cbitpos and policy

Intel TDX

TDX = Trust Domain Extensions
Similar concept to SEV-SNP but Intel’s approach
Uses a TDX Module running in a new CPU mode (SEAM)
Each TD (Trust Domain) has isolated memory, CPU state
Attestation via Intel’s SGX-style quoting infrastructure
Advice from colleague with PhD: Intel generally hires lots of PhDs who implement Everything but it’s overcomplicated; AMD stuff tends to be simpler / works

GPU Confidential Computing

NVIDIA H100 supports CC mode – GPU memory is encrypted
GPU gets its own attestation report (firmware measurement, nonce, signature)
Combined attestation: SEV-SNP report (VM integrity) + GPU report (GPU integrity)
VFIO passthrough to give the confidential VM direct GPU access:

# unbind from host driver, bind to vfio-pci
echo "$GPU_PCI_ID" > /sys/bus/pci/devices/$GPU_PCI_ID/driver/unbind
echo "vfio-pci" > /sys/bus/pci/devices/$GPU_PCI_ID/driver_override
echo "$GPU_PCI_ID" > /sys/bus/pci/drivers/vfio-pci/bind

The whole pipeline: host setup -> guest image -> launch VM with GPU passthrough -> run CUDA inside confidential VM

Attestation Flow

Verifier sends a nonce/challenge
Guest requests attestation from SEV-SNP (CPU-level) and GPU
Both return signed reports containing:
- Platform identity
- Firmware/code measurements
- The nonce (proves freshness)
Verifier checks signatures against known-good root of trust
If valid, verifier trusts the computation results

Workshop/Lab Ideas

TDX vs SEV-SNP comparison: setup, attestation, performance
GPU CC hello world: multiply two primes inside encrypted GPU memory, verify via attestation
Remote attestation demo: client sends challenge, CC VM responds with proof