Confidential Computing Notes
- Repo: ~/learning/gpucc (local)
- Hardware: AMD EPYC (SEV-SNP), Intel Xeon (TDX), NVIDIA H100/A100
What is Confidential Computing?
- Running workloads in hardware-enforced trusted execution environments (TEEs)
- The host/hypervisor cannot read or tamper with guest memory
- Attestation: cryptographic proof that the workload is running in a genuine TEE with expected code
AMD SEV-SNP
- SEV = Secure Encrypted Virtualization
- SNP = Secure Nested Paging (latest generation)
- Each VM gets its own encryption key managed by a dedicated security processor (PSP)
- Memory pages are encrypted transparently – the guest OS doesn’t need modification
- Attestation via
/dev/sev-guestioctl:- Send 64 bytes of user data (challenge/nonce)
- Get back a signed report (4096 bytes) from the AMD PSP
- Report contains measurement of the VM, policy, platform info
struct snp_report_req {
uint8_t user_data[64]; // your challenge
uint32_t vmpl; // privilege level
uint8_t report[4096]; // signed attestation report
};
// ioctl(fd, SNP_GET_REPORT, &req)
- Kernel params needed:
mem_encrypt=on iommu=pt amd_iommu=on kvm_amd.sev=1 kvm_amd.sev_snp=1 - QEMU launch requires
-object sev-snp-guestwith cbitpos and policy
Intel TDX
- TDX = Trust Domain Extensions
- Similar concept to SEV-SNP but Intel’s approach
- Uses a TDX Module running in a new CPU mode (SEAM)
- Each TD (Trust Domain) has isolated memory, CPU state
- Attestation via Intel’s SGX-style quoting infrastructure
- Advice from colleague with PhD: Intel generally hires lots of PhDs who implement Everything but it’s overcomplicated; AMD stuff tends to be simpler / works
GPU Confidential Computing
- NVIDIA H100 supports CC mode – GPU memory is encrypted
- GPU gets its own attestation report (firmware measurement, nonce, signature)
- Combined attestation: SEV-SNP report (VM integrity) + GPU report (GPU integrity)
- VFIO passthrough to give the confidential VM direct GPU access:
# unbind from host driver, bind to vfio-pci
echo "$GPU_PCI_ID" > /sys/bus/pci/devices/$GPU_PCI_ID/driver/unbind
echo "vfio-pci" > /sys/bus/pci/devices/$GPU_PCI_ID/driver_override
echo "$GPU_PCI_ID" > /sys/bus/pci/drivers/vfio-pci/bind
- The whole pipeline: host setup -> guest image -> launch VM with GPU passthrough -> run CUDA inside confidential VM
Attestation Flow
- Verifier sends a nonce/challenge
- Guest requests attestation from SEV-SNP (CPU-level) and GPU
- Both return signed reports containing:
- Platform identity
- Firmware/code measurements
- The nonce (proves freshness)
- Verifier checks signatures against known-good root of trust
- If valid, verifier trusts the computation results
Workshop/Lab Ideas
- TDX vs SEV-SNP comparison: setup, attestation, performance
- GPU CC hello world: multiply two primes inside encrypted GPU memory, verify via attestation
- Remote attestation demo: client sends challenge, CC VM responds with proof