Create MLにおけるImage and Video Style Transferモデルの作成

WWDC20に戻る

Create MLにおけるImage and Video Style Transferモデルの作成

Create MLのStyle Transferを使用して、写真や動画にスタイル付けされたエフェクトを導入します。モデルを数分でトレーニングして、クリエイティブなビジュアル機能をAppに簡単に導入できる方法を紹介します。トレーニングのプロセスと、結果を制御するためのオプションについて学びます。また、これらのモデルの内3つを同時にARKitでデモンストレーションすることによって、リアルタイムのパフォーマンスを確認します。

本セッションの前に、Create MLを使いこなせるようになっておくことをお勧めします。概要については、“Introducing the Create ML App”を参照してください。

リソース
- Create ML
- - HDビデオ
  - SDビデオ
関連ビデオ

WWDC20
- Swiftを使用したCreate MLでのトレーニング管理
WWDC19
- Create ML Appの紹介
Hello and welcome to WWDC. Hey everyone. I'm David a machine learning engineer from the Create ML Team. And in this session we're going to talk about building Style Transfer models and Create ML. Style Transfer is a new machine learning task available this year in the Create ML app. It can be used to blend a style and content image together. Let's see what happens when we apply Style Transfer to this pair of images. Wow that's pretty cool! The stylized result looks great. The model has transferred the colors shapes and textures from the style image to the content image. The color of the style image is very important. The white background and black paint strokes from this style are transferred by the model to create a black and white result. Paintings aren't the only source of style. The model can learn and transfer the tiles from this mosaic as well. Express your creativity by using different patterns from nature. For example the ice cracks in this style can be transferred by the model to produce a pretty compelling result. These are just a few examples of what you can do with Style Transfer and Create ML. But wouldn't it be great if we could do even better. What if you could apply Style Transfer to more than just a static image. Amazing. Each frame is stylized fast enough to maintain a smooth stylization experience. You can use a Style Transfer model in real time for your video apps. Using the A13 bionic chip we can achieve a processing speed up to 120 frames per second. To train a Style Transfer model we need to provide some training data, a style image, and a directory of content images. The model learns the balance between content and style from your images. For optimal results the content images used for training should be similar to what you expect to stylize at inference time. In this example I have a directory of natural content images. You can use model parameters to optimize the behavior of a Style Transfer model. You can configure your model behavior with style strength and style density model parameters. Let's explore style strength first. With a style strength parameter. You can tune the balance between style and content. With low style strength, only parts of the background adopt the style image qualities and the people dancing have adopted only a small amount of blue color. Now with high style strength the background is even more stylized and the people dancing completely adopt the style image colors and ice cracks. Notice the difference when I put them next to each other. Next let's explore style density. With the style density parameter, course details can be learned by the model. Here, I've drawn a grid on top of the style image. Let's zoom in on a region. The model focuses on high level details in this region such as the bird and learns these coarse qualities. Here's an example of coarse stylization.
Fine details can also be learned by the model by setting the style density parameter to a higher setting. With a fine grid the zoomed in region contains mostly color and brush strokes. This produces a different stylized result. Let's compare the coarse and fine results side by side to visualize the difference. You can use the style density parameter to explore a wide range of such stylizations. Let's train a style transfer model together in the Create ML app. I'll open the Create ML app and create a new document. I'll select the new Style Transfer template. I'll name the project DanceStylizer and fill in a short description. The app has already selected the settings tab and is ready for my training data and model parameters. I have this style image that you saw earlier in the session. I'll drag it into the training style image data well. I also have an image of people dancing. I'll use that for validation in order to visualize the model quality throughout the training process. Lastly I need a directory of content images to learn the balance between content and style. I can either download a set of a few hundred natural content images directly from the app, which works in a wide variety of use cases. Or I can use my own folder of content images. I'll drag in a folder of six hundred natural content images. I have the option to optimize for image or video use cases. I'll choose video since I'm interested in real-time Style Transfer apps and I'll train for four hundred iterations. I can use the style strength slider to control the balance between content and style and I can also use the style density slider to explore coarse or fine stylization. The default parameters work pretty well in most cases. Now that I'm finished configuring my training settings I'll click the train button in the toolbar. The app processes the style and content images and immediately starts training the model. Every five iterations a new model checkpoint stylizes my validation image. I can use this to visualize the model stylization interactively through the training process. At any point I can take a model snapshot by clicking the snapshot button in the toolbar. A snapshot is an ML model that I can use later in an app. My model snapshots are saved under model sources. Style loss and content loss graphs helped me understand the balance between content and style.
The style loss decreases as my model learns to adopt the artistic qualities of my style image. The model has completed training 400 iterations and it looks like the style loss has converged. I can train my model for more iterations by clicking the train more button in the toolbar. I'm happy with my model stylization on the validation image, so let's go to the preview tab. I can navigate to the preview tab to test out my model with some new data. I'll drag in a test image.
I can toggle back and forth between the stylized image and the original content so that I can clearly visualize the stylized effect. I can compare the stylized result from different models snapshots. Since this model was optimized for video, I'll try dragging in a test video. Now this is really getting interesting. At this point I can choose to download, share with a colleague, or open the stylized result in the QuickTime Player app. In the output tab, I can find out more information about my model. The size of the model is quite small, under one megabyte, which makes it really convenient to bundle with my apps. I can see the OS availability of my model to ensure compatibility and by selecting predictions I can find out even more about my model such as input and output layer names. At this point I can get the model to use in my apps by clicking the get, open in Xcode, or share button in the toolbar.
And that's how easy it is to train a Style Transfer model. As you just saw in the demo, training a model takes only a few minutes. Let's recap some important concepts from the demo we learned about training checkpoints and how to interact with the model training process in a new and exciting way. We compared model snapshots at different training iterations and discovered a new ability to extend training to improve the quality of a model. For more fun eye control, be sure to check out the session Control Training in Create ML with Swift. To show you some amazing experiences Style Transfer can bring to your apps I'd like to introduce Geppy. Hello I'm Geppy Parziale. I'm going to show you how to combine Style Transfer with ARKit. I got several style transform models from David and I decided to create a new virtual world with them. Let me show you how.
ARKit captures this around the real world environment and each Style Transform model allows me to stylize the scene. Cool. Let me show how to do it. Here I'm using ARKit to capture the surrounding real world environment and stylizing it with one of our models. ARKit generates AR frames. Each AR frame contains a CVPixelBuffer that I rescale to the input size expected by my Style Transfer model. Then the rescale and CVPixelBuffer is stylized using the CoreML model and rendered onscreen using Metal.
I can use ARKit to add a virtual object to the scene.
And here she is, Michelle. Oh and she's a very good dancer. To add a 3D object to the scene I can define an ARAnchor that specifies the location in the real world. The virtual object is then rendering seamlessly in the scene using Metal. But I can do more.
Here it is. And it seems she really likes her new look.
Here I stylize offline the virtual object texture with a different style but I want to do even more.
Using the power of the Apple Neural Engine and this optimized Style Transfer model I can run multiple style concurrently in real time.
Combine it with ARKit person segmentation. I'm executing two style transfer model one for the background environment and one for the segmented person. The stylize the result are blended together using Metal. Pretty cool, huh? These highly optimized Style Transfer models, generated with Create ML combined with ARKit and the hardware acceleration of the Apple Neural Engine and Metal allow you to unleash all the power of iOS14 for your apps. I can't wait to see all the cool experiences you will build with Style Transfer models trained with Create ML. And now I'll hand it back to David for a recap. Thanks Geppy. Your app looks really amazing. Let's summarize. With Style Transfer and Create ML, you can train models for both image and video use cases. The video style transfer model is extremely performant and can be easily combined with other apple technologies such as ARKit. The model size is small which makes it convenient to bundle with your apps. Reduced training time makes the training experience interactive and fun. You can train a model in a few minutes and quickly iterate with different styles and model parameters. We can't wait to see what you come up with using Style Transfer in Create ML. Thank you.

リソース

関連ビデオ

WWDC20

WWDC19