We recently got a batch of old PCs in at work for a drive-wiping project, and several of them were MacPro 6,1’s - the affectionately-dubbed “trash can” models from 2013. They have dual GPUs and I figured they would be well-suited to running small to medium local AI models. I got my hands on one and decided to try to get the basics running over this past long weekend.

I thought the hard part would be getting the model running and fiddling with the parameters - assuming I could install Linux in the first place, I figured the GPU wouldn’t cause me any grief. Little did I know…

Installing Debian

I went with my tried and true server OS first, Debian. It took over 12 hours to install. I honestly don’t know why this was the case, and I didn’t investigate, but it did raise an immediate red flag.

After it installed, I figured it would start running more smoothly, at least after I fiddled with some drivers and kernel modules. It didn’t. In fact, I couldn’t even get a display manager to load.

Given the issues I’d already encountered, I moved on to Proxmox.

Installing Proxmox

I figured Proxmox would have better driver support, if nothing else. I think I was right at least in a sense, because the installation process was dead simple and took about 20 minutes.

I hadn’t given much thought to post-installation steps, but when I did, I realized I’d need to set up GPU passthrough so that the guest VM could utilize the dual GPUs. I figured it couldn’t be that bad. I’d passed through USB ports before.

Again, little did I know…

Attempting GPU passthrough

References

Here are some links (I’m not sure if I could call them helpful given the result of this project, but I used these nonetheless):

Useful commands learned

I never really know a command or a flag until I’ve used it extensively for a project. This was the case with lspci and grep’s -C (context, as I think of it) flag. If nothing else, I’ve leveled up in that regard, as well as various other hardware commands and grepping dmesg output. (I’m trying to look on the bright side here.)

  • lscpi -v: display verbose information about PCI devices. Useful when combined with grep
  • grep -C 10: show 10 lines of context before and after the matched line when grepping output.

Following guides

I’m going to write down the steps that ultimately ended up passing through the GPU, despite the fact that I was unable to get the guest machine to load it. If I ever want to revisit this, I won’t have to go through this headache again.

Setting grub boot cmdline flags

  • Edit /etc/default/grub and add intel_iommu=on and iommu=pt to the GRUB_CMDLINE_LINUX_DEFAULT line (no I still don’t know what IOMMU means)
  • Regenerate grub config with update-grub
  • Reboot, verify that IOMMU is enabled with dmesg | grep -e DMAR -e IOMMU

Blacklisting GPU driver from host machine

  • Add the following lines to /etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist amdgpu
blacklist nvidia
blacklist nouveau

Adding vfio drivers to kernel modules

  • Add the following lines to /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Adding specific kernel module options

/etc/modprobe.d/kvm.conf

options kvm ignore_msrs=1

/etc/modprobe.d/iommu_unsafe_interrupts.conf

options vfio_iommu_type1 allow_unsafe_interrupts=1

Persisting changes to kernel modules

  • update-initramfs -u
  • reboot now

Getting the GPU’s PCI-ID and vendor ID

Tahiti is the model of my GPU, 02:00.0 was the PCI-ID. The vendor ID will be in the format XXXX:XXXX, and for me there were four - one for each of the dual GPUs, one for each of the associated audio devices.

lspci -v | grep -C10 -i tahiti
lspci -n -s 02:00
echo options vfio-pci ids=XXXX:XXXX,XXXX:XXXX disable_vga=1 >> /etc/modprobe.d/vfio.conf

Reboot and lspci -v to double-check that the card is using the vfio-pci driver instead of the amdgpu or radeon drivers.

Creating the VM

Follow the reddit guide for this, I got so lost in different guides and I’m not going to try to piece everything back together.

Here’s my VM configuration after everything:

args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=scsi0;net0
cores: 4
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-100-disk-1,efitype=4m,ms-cert=2023k,pre-enrolled-keys=1,size=4M
hostpci0: 0000:02:00,pcie=1,romfile=HD7970.rom
hostpci1: 0000:06:00,pcie=1,romfile=HD7970.rom
machine: q35
memory: 49152
meta: creation-qemu=11.0.0,ctime=1781987584
name: ollama
net0: virtio=BC:24:11:0A:0A:C6,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-100-disk-0,iothread=1,size=500G
scsihw: virtio-scsi-single
smbios1: uuid=1a25fd58-f976-4f2c-b5c3-54dfb3a15489
sockets: 1
vmgenid: 6e5c9738-72a0-4988-ae07-97df66c62d09

Troubleshooting

I went through several iterations before I finally got a configuration where passthrough itself was working - the host machine was indeed passing along the GPU to the guest machine. Now I was running into Debian not loading the GPU.

Trying to get a ROM file

In several of the guides it mentions possibly needing a ROM file for the GPU. Based on my dmesg output on the guest machine, I needed a ROM. So I tried to get one.

I could not find one on the website, and I wasn’t even sure what GPU I had in the first place. The lspci output lists FOUR different models - HD 7970, HD 8970, Radeon R9 780X, and (as I discovered way later on) D700.

I tried several of these ROM files, to no avail. It didn’t help that I was stress testing my AI model throughout this, and it kept hallucinating a “Mac OEM” GPU ROM for this specific model. It was absolutely convinced this existed.

So I decided to try to dump the ROM. Both amdvbflash and the Proxmox recommended method of cat’ing a device file in /sys/bus/pci/devices didn’t work - I was just getting a “Failed to read ROM” from the former, and a cat I/O error from the latter.

This took hours. I eventually realized I had made an incredibly dumb error - when I used wget to grab the ROM files from TechPowerUp, I was pulling an HTML page, and NOT a ROM file. So OF COURSE it was giving an “invalid ROM header” error on the guest machine. I had to grab one from my browser and then scp it to the Proxmox server. This is about when I realized that the FirePro D700 was listed as a subsystem (ANOTHER model name), and that’s the one I tried. I didn’t try any others, because at that point it was 2am and I was fed up.

I eventually came back and tried another ROM with the same results. It could still be the ROM file, but I’m not going to investigate further for now… I’m tired of it.

Seeing if it was an AMD vendor reset bug

See this post - basically trying to build and apply a kernel module that resets the GPU after passing it to the guest, so that the guest can utilize it. The post contains more information.

I patched this seemingly successfully and it didn’t make a difference.

Where to from here

I’m going to install Ubuntu Server LTS and see where that takes me.

Note

I started Ubuntu Server LTS installing, got in the shower, got out and it was done. The GPU hardware is detected. Dunno, at least I learned something.

EOF