Significant Performance Regression in Apple Clang 16 for Assembly File Processing

I've observed a significant performance regression in Apple Clang 16 (Xcode 16.0/16.2) compared to Clang 15 (Xcode 15.2) when processing flutter aot compilation. Further research shows that clang -cc1as process became extremely slow. The compilation time has increased by approximately 4x.

Environment

  • Machine: Apple M2 (8C8T)
  • Memory: 16GB
  • macOS Version: 14.7.2
  • Target: Flutter AOT compilation (snapshot_assembly.o)

Performance Comparison

Xcode VersioniOS SDKDuration
15.217.21:08.90
15.218.21:03.98
16.217.24:11.07
16.218.24:08.43
16.018.24:29.32

Reproduction Steps

The issue can be reproduced with the following command which is generated by flutter aot_assembly_release process:

time ${xcode_app_path}/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc \
-arch arm64 \
-miphoneos-version-min=12.0 \
-v \
-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS18.2.sdk \
-c ${project_path}/.dart_tool/flutter_build/f9ebf46f040933de7c8d103c84d38156/arm64/snapshot_assembly.S \
-o ${project_path}/.dart_tool/flutter_build/f9ebf46f040933de7c8d103c84d38156/arm64/snapshot_assembly.o

Additional Information

  • This issue specifically affects large assembly files generated by Flutter's AOT compilation
  • The performance regression appears to be consistent across different iOS SDK versions
  • The same assembly file compiles significantly faster with Xcode 15.2
  • Same performance regression observed on M4 Mac mini, suggesting this is not hardware-specific
  • Size of object:
size -m ${project_path}/.dart_tool/flutter_build/f9ebf46f040933de7c8d103c84d38156/arm64/snapshot_assembly.o
Segment : 64577616
Section (__TEXT, __text): 26603344
Section (__DATA, __bss): 48 (zerofill)
Section (__TEXT, __const): 21292928
Section (__DWARF, __debug_abbrev): 61
Section (__DWARF, __debug_info): 8934534
Section (__DWARF, __debug_line): 4464443
Section (__LD, __compact_unwind): 3282208
total 64577566
total 64577616

Questions

  1. Is this a known issue with Apple Clang 16?
  2. Are there any workarounds or compiler flags we can use to improve the performance?
  3. Is this behavior expected or should it be considered a regression?

Any insights or suggestions would be greatly appreciated.

Please open a bug report with the details of what shared here. Don't forget to include an example project demonstrating the performance changes that you describe. Once you open the bug report, please post the FB number here for my reference.

If you have any questions about filing a bug report, take a look at Bug Reporting: How and Why?

— Ed Ford,  DTS Engineer

Significant Performance Regression in Apple Clang 16 for Assembly File Processing
 
 
Q