Posts

Post not yet marked as solved
0 Replies
448 Views
Hello,I made some modifications to a fragment shader, to blend 4 textures instead of 2. And this made the shader awfully slow (32ms vs 8ms).The input textures are 4K x 4K, RGBA8unorm. Which makes 64Mpx per texture, or 256Mpx for the 4 textures. At 60fps this would require 15GB/s of bandwidth.The test hardware is a MBP from 2015 with Intel i5 5257U with Iris 6100 Graphics. According to Ark Intel, the max memory bandwidth for this CPU is 25.6GB/s. I assume (but I'm not so sure) that the memory bandwidth for the integrated GPU is also this 25.6GB/s.At this point I have the impression that my fragment shader (requiring 15GB/s) should run at a solid 60fps on the MBP, but with 32ms per frame (even not taking into account the WindowServer) it's obviously not the case. Here is the Metal fragment shader code :half4 blendColors(half4 c1, half4 c2) { // From https://en.wikipedia.org/wiki/Alpha_compositing#Alpha_blending const half4 dst = c1; const half4 src = c2; const half outA = src.a + dst.a * (1 - src.a); const half3 outRGB = outA == 0 ? half3(0) : (src.rgb * src.a + dst.rgb * dst.a * (1 - src.a)) / outA; return half4(outRGB, outA); } fragment float4 fragmentFunc(RasterizerData in [[stage_in]], constant int& inputCount [[buffer(kInputImageCountIndex)]], array<texture2d, 4> inputs [[texture(kInputImageIndex)]]) { constexpr sampler currentSampler(mag_filter::nearest, min_filter::linear, mip_filter::nearest); half4 blendedSample(1.0); for (int i = 0; i < inputCount; ++i) { auto layerSample = inputs[i].sample(currentSampler, in.textureCoordinate); blendedSample = blendColors(blendedSample, layerSample); } return float4(blendedSample); }Here are the pipeline statistics reported by the GPU Frame Debugger:https://artoverflow.io/downloads/pipeline%20statistics.pngAnd the performance metrics :https://artoverflow.io/downloads/performance%20metrics.pngOne very suspicious metric in my opinion is the L3 Cache Miss Rate, which was much lower before I add multiple input textures. This makes sense because a fragment does one sample from a texture, then one sample from another, etc. Rather than many consecutive samples from the same single input texture. And each fragment execution does that. Note that the 4 input textures are mipmapped but here it's a capture when Metal view is displayed on 4K display, and texture is sampled without any zoom so mipmapping should have no effect here.If I were to reduce this cache miss rate, I would do blending from 2 textures only but with 3 passes. Like blend tex A & B, then B & C, then C & D. But this implies reading 3 x 2 x 64Mpx and writing 3 x 64Mpx. That makes 23GB/s read and 11GB/s write. And that's assuming that read-write texture is available (not the case on Intel GPU I tested). So this would be worse…Are there recommendations about how to display blended textures more efficiently?I've been looking into MTLBlendFactor and MTLBlendOperation which I suppose are the same operations as in OpenGL, but as I'm making a drawing app I want to support more blend modes than the ones natively supported. And according to https://gamedev.stackexchange.com/questions/17043/blend-modes-in-cocos2d-with-glblendfunc built-in blend modes are not enough for that.
Posted
by Ceylo.
Last updated
.
Post not yet marked as solved
2 Replies
1.6k Views
Hello,I'm a bit new to signed macOS app distribution and I'm trying to use the Crashes panel of Xcode Organizer.Currently it is empty and saying that "AppName has not been uploaded to App Store Connect to receive crash logs.". I made my app crash on another device where sharing crash data with App Developers is enabled in System Preferences > Privacy.Question is: does this mean that macOS apps distributed outside of the Mac App Store can't receive crash logs in Xcode Organizer?I'm talking of apps distributed with "Developer ID" rather than "Mac App Store" in Organizer's "Distribute App" panel. I've read https://help.apple.com/xcode/mac/current/#/deved2cca77d but it's not clear wether only App Store apps are concerned.
Posted
by Ceylo.
Last updated
.
Post not yet marked as solved
0 Replies
1.5k Views
Hi,I'm writing performance tests using facilities provided by XCTest with Xcode 10.2: func testPerformanceExample() { self.measure { ... } }My issue is that the code I measure is supposed to be very fast (around 1ms) and I want the test to verify that no regression is introduced. However making the code much slower currently does not trigger a test failure, although Xcode clearly notices that it is slower as shown in these screenshots:https://artoverflow.io/downloads/worse.pnghttps://artoverflow.io/downloads/performance_result.pngThe test logs contain this:Test Case '-[DummyTests.DrawingDrawableTests testPerformanceExample]' started. [...]/DrawingDrawableTests.swift:132: Test Case '-[DummyTests.DrawingDrawableTests testPerformanceExample]' measured [Time, seconds] average: 0.006, relative standard deviation: 14.419%, values: [0.008725, 0.005750, 0.005917, 0.005780, 0.005808, 0.005798, 0.005741, 0.005819, 0.005837, 0.005759], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "Local Baseline", baselineAverage: 0.002, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.100, maxStandardDeviation: 0.100 Test Case '-[DummyTests.DrawingDrawableTests testPerformanceExample]' passed (0.471 seconds).I notice that they contain "maxRegression: 0.100" which looks to be an absolute threshold in seconds. And indeed, making my block take more than 0.1s actually makes the performance test fail. But this makes XCTestCase.measure() pretty useless for anyone that wants to run really optimized code. In current case this is for real time rendering and I want the test to detect when the implementation is not able to reach 60 fps.Of course I could manually run the app and check, but I want to reduce manual testing time. That's where XCTest is supposed to help.Is there any way to configure this maxRegression or more generally to make XCTestCase.measure() usable for fast blocks?At the moment the only workaround I have is to artificially increase the amount of work being done in the measure blocks, but this has 2 very annoying drawbacks:- the test is much slower than needed- this arbitrarily chosen amount of work only allows detecting regression on my own hardware. If the test is run on faster hardware, the amount of measured work needs to be increased again, so this makes my test code not future-proof at all.
Posted
by Ceylo.
Last updated
.
Post marked as solved
1 Replies
2.2k Views
Hello,I have a compute shader that draws in a read_write texture2d. Running this shader on macOS 10.14 gives the expected result, but on macOS 10.13 I notice that reading the texture gives black instead of the texture's color. I checked macOS 10.14 release notes & Metal Feature Set Tables and didn't notice anything that should affect read_write textures.The Metal function is as follow:kernel void drawDot(texture2d<half, access::read_write=""> texture [[texture(kComputeImageIndex)]], constant int2& gidOrigin [[buffer(kComputeGidOriginIndex)]], uint2 gid [[thread_position_in_grid]]) { const uint2 pixPos = gid + uint2(gidOrigin); texture.write(texture.read(pixPos), pixPos); // texture.write(half4(1.0, 0.0, 0.0, 1.0), pixPos); }If I use line 7 instead of line 6, output image is white with a red square around position given by gidOrigin. If I do the texture read instead, output contains a black square, although the input texture is fully white (I checked in Xcode GPU debugger). So something wrong is happening with the texture read() call.Am I using read_write texture access in a bad way? Adding texture.fence() does not change anything, as I would expect: I'm never trying to read a pixel written by another thread of the grid.And in Metal Shading Language Specification pdf I didn't see any restriction there could be on using read_write texture.Thanks!Lucas===============================================================Edit: I just came upon release notes of macOS 10.12 where read_write textures were introduced: Function Texture Read-WritesApple is giving the following usage example:kernel void my_kernel(texture2d<float, access::read_write=""> texA [[ texture(0) ]], ushort2 gid [[ thread_position_in_grid ]]) { float4 color = texA.read(gid); color = processColor(color); texA.write(color, gid); }Which is the same kind of operation I am doing. So I suppose there is a bug in macOS 10.13…===============================================================Edit 2 : just to make sure I didn't miss something obvious, I took "Hello Compute" example provided by Apple and modified it just enough to use read_write texture. I observed the same behavior: all good on 10.14 but black output on 10.13. Going to file a bug report although I don't have much hope…Below is the full diff to Hello Compute project:index 3c1d1c8..b0307af 100644 --- a/Renderer/AAPLRenderer.m +++ b/Renderer/AAPLRenderer.m @@ -118,7 +118,7 @@ Implementation of renderer class which performs Metal setup and per frame render textureDescriptor.pixelFormat = MTLPixelFormatBGRA8Unorm; textureDescriptor.width = image.width; textureDescriptor.height = image.height; - textureDescriptor.usage = MTLTextureUsageShaderRead; + textureDescriptor.usage = MTLTextureUsageShaderWrite | MTLTextureUsageShaderRead; // Create an input and output texture with similar descriptors. We'll only // fill in the inputTexture however. And we'll set the output texture's descriptor @@ -192,13 +192,17 @@ Implementation of renderer class which performs Metal setup and per frame render id commandBuffer = [_commandQueue commandBuffer]; commandBuffer.label = @"MyCommand"; + id blitEnc = [commandBuffer blitCommandEncoder]; + MTLOrigin orig = { 0, 0, 0}; + MTLSize size = { _inputTexture.width, _inputTexture.height, _inputTexture.depth }; + [blitEnc copyFromTexture:_inputTexture sourceSlice:0 sourceLevel:0 sourceOrigin:orig sourceSize:size + toTexture:_outputTexture destinationSlice:0 destinationLevel:0 destinationOrigin:orig]; + [blitEnc endEncoding]; + id computeEncoder = [commandBuffer computeCommandEncoder]; [computeEncoder setComputePipelineState:_computePipelineState]; - [computeEncoder setTexture:_inputTexture - atIndex:AAPLTextureIndexInput]; - [computeEncoder setTexture:_outputTexture atIndex:AAPLTextureIndexOutput]; diff --git a/Renderer/AAPLShaderTypes.h b/Renderer/AAPLShaderTypes.h index 7a57f9d..897e6d9 100644 --- a/Renderer/AAPLShaderTypes.h +++ b/Renderer/AAPLShaderTypes.h @@ -22,7 +22,6 @@ typedef enum AAPLVertexInputIndex // Metal API texture set calls typedef enum AAPLTextureIndex { - AAPLTextureIndexInput = 0, AAPLTextureIndexOutput = 1, } AAPLTextureIndex; diff --git a/Renderer/AAPLShaders.metal b/Renderer/AAPLShaders.metal index 47a5e29..8d0e2de 100644 --- a/Renderer/AAPLShaders.metal +++ b/Renderer/AAPLShaders.metal @@ -90,8 +90,7 @@ constant half3 kRec709Luma = half3(0.2126, 0.7152, 0.0722); // Grayscale compute kernel kernel void -grayscaleKernel(texture2d<half, access::read=""> inTexture [[texture(AAPLTextureIndexInput)]], - texture2d<half, access::write=""> outTexture [[texture(AAPLTextureIndexOutput)]], +grayscaleKernel(texture2d<half, access::read_write=""> outTexture [[texture(AAPLTextureIndexOutput)]], uint2 gid [[thread_position_in_grid]]) { // Check if the pixel is within the bounds of the output texture @@ -101,7 +100,7 @@ grayscaleKernel(texture2d<half, access::read=""> inTexture [[texture(AAPLTextureI return; } - half4 inColor = inTexture.read(gid); + half4 inColor = outTexture.read(gid); half gray = dot(inColor.rgb, kRec709Luma); outTexture.write(half4(gray, gray, gray, 1.0), gid); }
Posted
by Ceylo.
Last updated
.
Post not yet marked as solved
6 Replies
780 Views
Hello,In a macOS storyboard I can put a NSSplitViewController that will display links to two child NSViewController.Similarly I want to design a custom NSViewController with child view controllers that I can use in storyboard, is it doable ?When I look at the Connections Inspector of the split view controller in the storyboard, it shows links to "Triggered Segues" bound to "split items". I wonder how to achieve the same with a custom view controller.I checked the header of NSSplitViewController and it just contains a@property (copy) NSArray<__kindof NSSplitViewItem *> *splitViewItems;So there doesn't look to be any specific attribute so that it appears in "Triggered Segues".If I try to manually create segues from my NSViewController to some other NSViewController, I only get the usual segues like "Show", "Custom", "Modal", "Popover", "Sheet". But I'm not actually interested in a segue in the sense of "transition" between scenes. In documentation NSStoryboardSegue is actually described as "A transition or containment relationship between two scenes in a storyboard.". I'm interested in this containment part, but I don't how to achieve it.
Posted
by Ceylo.
Last updated
.