Kernel panic when using fclonefileat from ES

Hi, I am developing instant snapshot backup solution for macOS using Endpoint Security. We have stumbled upon a Kernel Panic when using "fclonefileat" API.

We are catching a kernel panic on customer machines when attempting to clone the file during ES sync callback:

panic(cpu 0 caller 0xfffffe002c495508): "apfs_io_lock_exclusive : Recursive exclusive lock attempt" @fs_utils.c:435

I have symbolized the backtrace to know it is related to clone operation with the following backtrace:

apfs_io_lock_exclusive
apfs_clone_internal
apfs_vnop_clonefile

I made a minimal repro that boils down to the following operations:

  1. apfs_crash_stress - launch thread to do rsrc writes
static void *rsrc_write_worker(void *arg)
{
  int id = (int)(long)arg;
  char buf[8192];
  long n = 0;
  
  fill_pattern(buf, sizeof(buf), 'W' + id);
  
  while (n < ITERATION_LIMIT) {
    int file_idx = n % NUM_SOURCE_FILES;
    int fd = open(g_src_rsrc[file_idx], O_WRONLY | O_CREAT, 0644);
    if (fd >= 0) {
      off_t off = ((n * 4096) % RSRC_DATA_SIZE);
      pwrite(fd, buf, sizeof(buf), off);
      if ((n & 0x7) == 0)
        fsync(fd);
      
      close(fd);
    } else {
      setxattr(g_src[file_idx], "com.apple.ResourceFork",
               buf, sizeof(buf), 0, 0);
    }
    
    n++;
  }
  printf("[rsrc_wr_%d] done (%ld ops)\n", id, n);
  return NULL;
}
  1. apfs_crash_es - simple ES client that is cloning the file (error checking omitted for brevity)
static std::string volfsPath(uint64_t devId, uint64_t vnodeId)
{
  return "/.vol/" + std::to_string(devId) + "/" + std::to_string(vnodeId);
}

static void cloneAndScheduleDelete(const std::string& sourcePath, dispatch_queue_t queue, uint64_t devId, uint64_t vnodeId)
{
  struct stat st;
  if (stat(sourcePath.c_str(), &st) != 0 || !S_ISREG(st.st_mode))
    return;

  int srcFd = open(sourcePath.c_str(), O_RDONLY);
  const char* cloneDir = "/Users/admin/Downloads/_clone";
  mkdir(cloneDir, 0755);

  const char* filename = strrchr(sourcePath.c_str(), '/');
  filename = filename ? filename + 1 : sourcePath.c_str();
  
  std::string cloneFilename = std::string(filename) + ".clone." + std::to_string(time(nullptr)) + "." + std::to_string(getpid());
  std::string clonePath = std::string(cloneDir) + "/" + cloneFilename;

  fclonefileat(srcFd, AT_FDCWD, clonePath.c_str(), 0);
  {
    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, 1 * NSEC_PER_SEC), queue, ^{
      if (unlink(clonePath.c_str()) == 0)
      {
        LOG("Deleted clone: %s", clonePath.c_str());
      }
      else
      {
        LOG("Failed to delete clone: %s", clonePath.c_str());
      }
    });
  }
  close(srcFd);
}

static const es_file_t* file(const es_message_t* msg)
{
  switch (msg->event_type)
  {
    case ES_EVENT_TYPE_AUTH_OPEN:
      return msg->event.open.file;
    case ES_EVENT_TYPE_AUTH_EXEC:
      return msg->event.exec.target->executable;
    case ES_EVENT_TYPE_AUTH_RENAME:
      return msg->event.rename.source;
  }
    
  return nullptr;
}

int main(void)
{
  es_client_t* cli;
  auto ret = es_new_client(&cli, ^(es_client_t* client, const es_message_t * msgc)
  {
    if (msgc->process->is_es_client)
    {
      es_mute_process(client, &msgc->process->audit_token);
      return respond(client, msgc, true);
    }

    dispatch_async(esQueue, ^{
      bool shouldClone = false;
      if (msgc->event_type == ES_EVENT_TYPE_AUTH_OPEN)
      {
        auto& ev = msgc->event.open;
        if (ev.fflag & (FWRITE | O_RDWR | O_WRONLY | O_TRUNC | O_APPEND))
        {
          shouldClone = true;
        }
      }
      else if (msgc->event_type == ES_EVENT_TYPE_AUTH_UNLINK || msgc->event_type == ES_EVENT_TYPE_AUTH_RENAME)
      {
        shouldClone = true;
      }

      if (shouldClone)
      {
        if (auto f = ::file(msgc))
          cloneAndScheduleDelete(f->path.data, cloneQueue, f->stat.st_dev, f->stat.st_ino);
      }

      respond(client, msgc, true);
    });
  });
  LOG("es_new_client -> %d", ret);

  es_event_type_t events[] = {
    ES_EVENT_TYPE_AUTH_OPEN,
    ES_EVENT_TYPE_AUTH_EXEC,
    ES_EVENT_TYPE_AUTH_RENAME,
    ES_EVENT_TYPE_AUTH_UNLINK,
  };
  es_subscribe(cli, events, sizeof(events) / sizeof(*events));
}

Create 2 terminal sessions and run the following commands:

 % sudo ./apfs_crash_es
 % sudo ./apfs_crash_stress ~/Downloads/test/

Machine will very quickly panic due to APFS deadlock. I expect that no userspace syscall should be able to cause kernel panic. It looks like a bug in APFS implementation and requires fix on XNU/kext side.

We were able to reproduce this issue on macOS 26.3.1/15.6.1 on Intel/ARM machines.

Here is the panic string:

Source code without XCode project:

Full XCode project + full panic is available at https://www.icloud.com/iclouddrive/0f215KkZffPOTLpETPo-LdaXw#apfs%5Fcrash%5Fes

Answered by DTS Engineer in 881596022

Hi, I am developing an instant snapshot backup solution for macOS using Endpoint Security. We have stumbled upon a Kernel Panic when using the "fclonefileat" API.

Yes, this is a known bug (r.161340058). More specifically, the "..namedfork/rsrc" construct is a longstanding part of the system, but it's not widely used and it allows the expression of operations that aren't entirely "coherent" (like cloning an object that isn't a file). Have you tested this on macOS 26.4? I believe the issue should now be fixed.

Having said that:

I expect that no userspace syscall should be able to cause a kernel panic. It looks like a bug in the APFS implementation and requires a fix on the XNU/kext side.

...this is a good example of one of my long-standing warnings, namely that us fixing kernel bugs doesn't mean your code will work. The panic above is actually fixed in the VFS layer by having "fclonefileat" fail, just like clonefile does when presented with the same scenario.

As a more general comment, I'd suggest adding a check for the "..namedfork/" construct in the ES client and handling these files as a special case. Typically, that means stripping the suffix off the path so that you're working with the actual file. I'd actually add two checks, one that is specific for "..namedfork/rsrc" and another for "..namedfork/<anything else>" to watch for anything "new". As far as I'm aware, we ONLY use "..namedfork/rsrc" and I'm not sure the more general syntax even works, but it's an easy edge case to watch for.

Finally, I'll note that the bug above has raised a more general concern about "..namedfork" being handled in a consistent way. There is now some ongoing work (r.161084094) to standardize how these checks are handled in the VFS layer, and it's likely that will eventually lead to some small changes in API behavior. Notably, "truncate" currently fails on named forks and it's likely that will at some point start working (like ftruncate already does).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hi, I am developing an instant snapshot backup solution for macOS using Endpoint Security. We have stumbled upon a Kernel Panic when using the "fclonefileat" API.

Yes, this is a known bug (r.161340058). More specifically, the "..namedfork/rsrc" construct is a longstanding part of the system, but it's not widely used and it allows the expression of operations that aren't entirely "coherent" (like cloning an object that isn't a file). Have you tested this on macOS 26.4? I believe the issue should now be fixed.

Having said that:

I expect that no userspace syscall should be able to cause a kernel panic. It looks like a bug in the APFS implementation and requires a fix on the XNU/kext side.

...this is a good example of one of my long-standing warnings, namely that us fixing kernel bugs doesn't mean your code will work. The panic above is actually fixed in the VFS layer by having "fclonefileat" fail, just like clonefile does when presented with the same scenario.

As a more general comment, I'd suggest adding a check for the "..namedfork/" construct in the ES client and handling these files as a special case. Typically, that means stripping the suffix off the path so that you're working with the actual file. I'd actually add two checks, one that is specific for "..namedfork/rsrc" and another for "..namedfork/<anything else>" to watch for anything "new". As far as I'm aware, we ONLY use "..namedfork/rsrc" and I'm not sure the more general syntax even works, but it's an easy edge case to watch for.

Finally, I'll note that the bug above has raised a more general concern about "..namedfork" being handled in a consistent way. There is now some ongoing work (r.161084094) to standardize how these checks are handled in the VFS layer, and it's likely that will eventually lead to some small changes in API behavior. Notably, "truncate" currently fails on named forks and it's likely that will at some point start working (like ftruncate already does).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Kernel panic when using fclonefileat from ES
 
 
Q