I have just started to dive into exactly what a compute shader is and what kinds of things it can do. One of the examples given was converting an image to greyscale which got me thinking. In my OpenGL ES pipeline there were a few passes where it was essentially image processing with one texture going into a shader that was just rendering a full screen quad for things like blurring an image and fading extermely dark colors.
It has occured to me that these goals could potentially be better done with a compute pipeline.
Can I generalize that a task is usually better suited for a compute pipeline if it does not technically involve rendering triangles or are there cases such as texture sampling where a render pipeline still has an edge?
In my opinion, biggest difference between render and compute pipelines is domain control.
I. Compute
With compute, you just have one (two, three) dimensional grid, and the compute kernel is going to be invoked on every point in grid. You also get to control dicing up the domain into thread groups, threads and so on. This allows you to use some extra features like "threadgroup" address space for fast communication between threads in same threadgroup by means of special local memory, inaccessible from graphics functions. This in turn may help making certain types of algorithms run faster (like parallel-prefix-scan or "stencil" type computation that is often used in graphic filters or CFD simulations).
On the other hand, dividing the domain is additional task and not a trivial one if you care about the performance. Especially so that "perfect" solution depends on device. So if you want something to run as fast as possible you really should prepare several versions of the compute kernels (it is usually more efficient to assign more than one "cell" of problem to one "thread" of metal compute kernel than do 1-to-1 mapping), and maybe prepare some "autotune" code trying out several combinations of kernels and threadgroup sizes to find what works best on given device.
Note that samplers CAN be used in compute functions.
II. Render
Now for render you just describe geometric primitives (points, triangles and so on) and you don't have to worry about threadgroups, threads and all this stuff. This is better suited for drawing, because Metal/driver will likely come with better approach than ordinary programmer can. But two level (I'll leave out tesselation shaders) structure of typical render pipeline gives you also some programmable, conditional control over what gets processed.
For example, I once had compute kernel spanning big texture (very close to 16K/memory limits of the device). But the processing wasn't really done on the whole texture, it was occuring in some areas (depending on input data, which was another texture). And I couldn't optimise this compute kernel below some 1/10th of a second. So I diced up the whole area into triangle mesh, every two triangles forming quad 128 by 128 texels. And in every vertex shader check was performed whether computation is needed in this area or not, and if it wasn't needed, Z-coord was altered to put triangles in question outside of <near,far> clipping range. This in turn gave huge speedup, because computations were done only where needed, and not on the whole domain.
On the other hand, render setup is more tedious because you have to write at least two shaders, render pipeline setup is more work than for a compute one. And using render for compute can sometimes be super awkward, with floating point coordinates being converted back and forth to what really is some discrete memory indices.
Hope that helps a bit.