I think that the core idea of metal in the apple systems will provide heavy, in depth gamers a faster and easier way to play and develope their games, now I'm just wondering is there a way to make metal any faster
The core Idea Of Metal
Yes.
A central focus of Metal is to dramatically reduce the CPU cost of common operations. However, it is still quite possible for wasteful code to blow out the CPU cost and end up running slowly and eating up a bunch of battery life. That is, even though the CPU and GPU can run concurrently, your GPU frames can't cycle any faster than the CPU workload required to encode / enqueue them. Thus, quite a lot of making Metal go fast can be making sure that your CPU workload isn't consuming all your time. For me, the CPU cost is usually around 10% or less of the GPU time in well behaved code. If you are running more than that, then you may need to spend some time finding out why.
Most of the time, it comes down to object reuse. It is maybe the key way to improve Metal performance. Much of Metal is designed around the assumption that you will allocate expensive resources (textures, buffers, shaders, queues) up front and then reuse them. As an application writer, this is fairly simple, since you should know in advance most of your usage requirements. Your application lives at the top level of scope. Most things are usually knowable. If you are writing middle-ware then you may have to be a bit more clever, but it is certainly doable. For example, you can for example set up a context object that tracks reusable state, which your user can then create and destroy (hopefully infrequently) as his need for Metal comes and goes.
MTLResource objects can be especially expensive, because they consume so much memory. These have to be allocated, zero filled, wired down, etc. Likewise, shader code involves running a compiler. You should reuse pipeline state objects as much as is possible and avoid needlessly recompiling libraries and MTLFunctions. There is an offline compiler available for Metal. Use it to prepare a default.metallib when you build your application and load that from your application when it runs.
Another tip is the use of MTLCommandBuffers with unretained references. In standard operation, every time you attach a metal resource to a MTLCommandBuffer using a MTLCommandEncoder, the framework will retain the objects to make sure they stick around until after the work is done. However, if you have allocated most of your resources once up front (per above) and consequently you know they may last for quite a while, much longer than the MTLCommandBuffer, then it is wasteful for the MTLCommandBuffer to retain the object every time you attach it to a render or compute kernel launch only to release it once it is done. Creating your MTLCommandBuffer with unretained references moves the responsibility for making sure all resources last for the lifetime of the MTLCommandBuffer to you. This is a bit of extra book keeping, but in exchange, there will be a lot less retain/release overhead.
Some lengthy processes like compiling shaders offer an asynchronous method to collect the completed work. Taking advantage of these will help you move expensive work off the main thread and make your app more responsive.
You can look to MetalKit for rapid creation of MTLTextures and best practices for getting your work to the screen. Likewise, you can make use of canned shaders in MetalPerformanceShaders.framework to accelerate common operations in your metal pipeline.