Is it best to crop other people out of training videos for a Create ML activity classifier?

My activity classifier is used in tennis sessions, where there are necessarily multiple people on the court. There is also a decent chance other courts' players will be in the shot, depending on the angle and lens.

For my training data, would it be best to crop out adjacent courts?