I don't know if this contributes to the gesture recognizer problem, but one issue in your code is that you are using the frames of views/windows when you probably want the bounds instead.
The frame (and center) of a view represents its size and placement in the coordinates of its superview, and the bounds represents its area in its own coordinate system.
When you use keyWindow.frame and keyWindow.center in your code, the result is in the coordinates of the screen which may not be the same as keyWindows own coordinate system.
You want the frames of your subviews to be set in terms of the bounds of keyWindow.
For example, blackBackgroundView should use the bounds of keyWindow for its frame.
let blackBackgroundView = UIView(frame: keyWindow.bounds)
You would probably also want to use keyWindow.bounds.width in your aspect ratio calculation and when setting the zooming view/s frame, and you would need to convert keyWindow.center into keyWindow's coordinates from the screen coordinates.
zoomingImageView.center = keyWindow.convert(keyWindow.center, from: nil)
One other thing that would help to make your code cleaner (but isn't causing any problems at the moment), would be to force the value of startingFrame to be non-optional right away instead of repeatedly dealing with it later.
let startFrame = startingImageView.superview!.convert(startingImageView.frame, to: nil)
That way, if superview is nil you will just crash at that point instead of the next line (since startFrame would be nil), and don't have to keep dealing with startFrame as an optional for the rest of your code.