I've observed a significant performance regression in Apple Clang 16 (Xcode 16.0/16.2) compared to Clang 15 (Xcode 15.2) when processing flutter aot compilation. Further research shows that clang -cc1as process became extremely slow. The compilation time has increased by approximately 4x.
Environment
- Machine: Apple M2 (8C8T)
- Memory: 16GB
- macOS Version: 14.7.2
- Target: Flutter AOT compilation (snapshot_assembly.o)
Performance Comparison
Xcode VersioniOS SDKDuration15.2 | 17.2 | 1:08.90 |
15.2 | 18.2 | 1:03.98 |
16.2 | 17.2 | 4:11.07 |
16.2 | 18.2 | 4:08.43 |
16.0 | 18.2 | 4:29.32 |
Reproduction Steps
The issue can be reproduced with the following command which is generated by flutter aot_assembly_release process:
time ${xcode_app_path}/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc \ -arch arm64 \ -miphoneos-version-min=12.0 \ -v \ -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS18.2.sdk \ -c ${project_path}/.dart_tool/flutter_build/f9ebf46f040933de7c8d103c84d38156/arm64/snapshot_assembly.S \ -o ${project_path}/.dart_tool/flutter_build/f9ebf46f040933de7c8d103c84d38156/arm64/snapshot_assembly.o
Additional Information
- This issue specifically affects large assembly files generated by Flutter's AOT compilation
- The performance regression appears to be consistent across different iOS SDK versions
- The same assembly file compiles significantly faster with Xcode 15.2
- Same performance regression observed on M4 Mac mini, suggesting this is not hardware-specific
- Size of object:
size -m ${project_path}/.dart_tool/flutter_build/f9ebf46f040933de7c8d103c84d38156/arm64/snapshot_assembly.o Segment : 64577616 Section (__TEXT, __text): 26603344 Section (__DATA, __bss): 48 (zerofill) Section (__TEXT, __const): 21292928 Section (__DWARF, __debug_abbrev): 61 Section (__DWARF, __debug_info): 8934534 Section (__DWARF, __debug_line): 4464443 Section (__LD, __compact_unwind): 3282208 total 64577566 total 64577616
Questions
- Is this a known issue with Apple Clang 16?
- Are there any workarounds or compiler flags we can use to improve the performance?
- Is this behavior expected or should it be considered a regression?
Any insights or suggestions would be greatly appreciated.