I’ve wanted to replace SMB for a while now, but I’ve long fought with NFS issues in my home NAS setup. I kept running into permission issues and couldn’t figure out why. Of course, it would help if I researched and understood the NFS permission model, but I’ve learned time and again that the way I learn is by getting my hands dirty. I get it working, then later on down the line I figure out why, and things start to click. This has been especially true with Docker.

So my solution certainly isn’t the best or neatest, but it works, and I learned a couple things along the way.

Why?

The impetus for doing all this was finally understanding not just what hard links (ln without the -s) are - but why I might use them. If you already know, fine, but the key difference that made it click for me is that while a symbolic link is an inode that points to another inode, a hard link is an inode that points to the same data block as another inode. It’s basically its own standalone file, but it doesn’t take twice as much space. A hard link is essentially just a file - an inode and a data block - but we call it a hard link once we’ve created a second inode pointing at that data block. As long as one inode exists pointing to that data block, the filesystem understands that to be space that is taken on disk. So, I could download a file named file.txt, create a hard link in my home directory called link.txt, delete the original file.txt from my downloads, and still would retain a copy of that file - and through the process, never really stored more than one copy of that file.

Ok, cool, but why would you use them? Let’s say you have a music library, and you want to organize one folder structure as a flat directory of songs. Another as a nested hierarchy of artists albums songs. Another as a listing of just albums in the format “Artist - Album” for directory name. With hard links, you can do this!

I tried it on SMB, and I immediately ran into an error. Invalid cross-device link something or other. I figured SMB didn’t support it, and I used this as my excuse to jump into the migration.

Had I paused for a second, I would have realized that I had tried to make a hard link from one SMB share to another, which wouldn’t have worked on NFS either - and in fact didn’t work when I tried it after the migration. So, I ended up doing a full-blown restructure of my NAS shares (again) on top of the migration to NFS.

Anyway, here’s my after-the-fact justification.

Pros of switching to NFS:

  • NFS is part of the broader Linux ecosystem
  • Hard links will work (I thought)
  • I want to

Cons of switching to NFS:

  • Security (or lack thereof)
  • Troubleshooting permissions
  • My original reason was ill-founded
  • It can’t see child datasets as directory structures in TrueNAS - they are simply empty folders

Permission issues

Turns out, it was just a single setting that I needed to change (typical). NFS shares ACL mode is by default Restricted in TrueNAS, and you need to set to Passthrough.

I also set the mapall user and mapall group to smbuser - a user I had previously created for my SMB shares, and that still owned those datasets. I had gotten this far in my previous troubleshooting, and I was able to create and delete files on the mounted side with non-root permissions, but I kept running into Docker errors. Turns out, the Docker error was when it attempted to chmod files/directories, and setting to passthrough fixes this.

I still don’t want to manage groups on the server side. I want it to be a dumb storage device where anyone on the mounted side can do what is needed. Yes, defense-in-depth would imply that I need to do a full-scale rebuild, and have different Docker containers running as different users, with appropriate permissions, and in appropriate groups, but I ain’t got time for that.

Part of it is laziness, part of it is understanding my threat model (most of my exposure is through physical or local access, and I have bigger problems if that’s the case), part of it is the fact that I don’t have all my requirements built out - I do a lot of testing and fiddling, and trying to future-proof the security model would be a monumental task. Maybe a task for future me, but I’m not high enough level to do that right now.

Security

After working with SMB and iSCSI, I took it for granted that NFS would have some sort of authentication mechanism, even if it was poor or outdated. Turns out it doesn’t. You have to run an entirely separate Kerberos instance - so if I wanted authentication I should have stuck with SMB. The only things I read about Kerberos, both from the TrueNAS/FreeNAS forums and TrueNAS documentation itself, is that it shouldn’t even be attempted unless you’re a security veteran. Also, the documentation page mentions Active Directory, and I seized up involuntarily.

I can, though, bind the NFS service just to my management network, which is limited to devices connected via Tailscale or on the admin network itself directly wired into the network, and that’s what I ended up doing. Also set allowed networks on each NFS share to 10.0.99.0/24 and 10.20.99.0/24 - the management network, and my dedicated storage network.

Migration

The migration itself would have been fairly straightforward, if not for one thing: I wanted to encrypt the underlying dataset. I could have just switched the share type to NFS, remounted, fiddled with permissions, edited the fstab, and went on my merry way. Instead, I had to create a new, encrypted dataset, and copy all the files over, then create the NFS share on that dataset. Also, I wanted to keep the original dataset name. There isn’t a good synonym to “media”, and as much as I can be lazy about security sometimes, I can get notably OCD in some areas. So, I had to create a new dataset with a different name, copy the files, rename the old dataset, then rename the new dataset.

Previously, I’ve copied over files on the machine where the disks are mounted. This has been remarkably dumb on my part - it takes orders of magnitude longer to copy from one SMB share over the network to another SMB share. Luckily I realized the error of my ways this time (I had about half a terabyte to copy), and decided I would copy on the NAS itself - after all, the shares are just directories under /mnt.

The other consideration I ran into was that I needed to get rid of the child datasets - that, or create an NFS share for each one of them, and mount them each separately. I went with the former.

Here’s what I did:

  • Preparation: disable all services using shares, unmount all shares, turn off all SMB shares in TrueNAS web GUI, double check nothing is using the directories with lsof and fuser
  • Create a new dataset in the TrueNAS GUI, with encryption settings, and named media_new.
  • Copy the files in the old dataset media to the new dataset media_new, with the following command (some options for safety only):
    • rsync -aHAX /mnt/tank/media/ /mnt/tank/media_new
      • -a preserves file info such as timestamps and much else
      • -H preserves hard links (unnecessary, lol)
      • -A preserves ACLs (probably unnecessary, since the ACLs are set at the TrueNAS dataset level)
      • -X preserves extended attributes (again, I don’t think I used these, another unnecessary flag)
  • Verify the files copied with no issues by running the same command with the -n option for dry run (should give no output)
  • Rename the old dataset with zfs rename tank/media tank/media_old
  • Rename the new dataset with zfs rename tank/media_new tank/media
  • Create NFS share on the new dataset
  • Mount and create fstab entries
    • 10.0.99.37/mnt/tank/media /nfs/media/music nfs rw,bg 0 0
  • Edit all docker-compose.yml files to point to new directory, start all Docker containers
  • Review any TrueNAS tasks - cloud sync, replication, snapshots - and make sure they are unaffected, fix them, or create replacements (in particular, cloud sync tasks were dropped completely on the initial rename)

One other thing I decided to pre-empt was Docker containers trying to create directories before the NFS shares were mounted. Somewhere along the line in my research I had seen this being a potential problem. Turns out I could just do this:

EDITOR=vim systemctl edit docker

File contents:

[Unit]
Requires=home-user-docker.mount nfs-media.mount
After=home-user-docker.mount nfs-media.mount

Also, the bg option in the /etc/fstab line means that the NFS service will attempt to mount the drives continuously, instead of failing and giving up - in case the NAS is not online when the initial mount happens at startup.

Aftermath

Not much of note has changed - my services are now chugging along as normal, with minimal downtime all things considered. I had to do separate migration from child datasets to subdirectories in other places on my dataset structure, and got the hang of it after a couple.

EOF