Discover how to enhance app intelligence by using machine learning and natural language processing (NLP). Learn how to use our performant on-device NLP APIs to break text into sentences and tokens, identify people and places mentioned in the text (typed, transcribed speech/handwriting). The NLP APIs can be used standalone or as a preprocessing framework for machine-learning based text modeling tasks. The APIs are available in many languages across all Apple platforms, thereby providing homogeneous text processing for consistent user experience. Open up your imagination as we walk you through hypothetical apps that harness the power of NLP to enhance the overall app experience.
How's everyone doing, this morning? Great. Welcome to our talk on Natural Language Processing. We are delighted to share our natural language processing APIs and tell you how you can incorporate these APIs into your very own apps.
I'm Vivek, and I'll be jointly presenting this session with my colleague Doug. In case you're wondering, there's a third person. No. I just have a really long name.
Okay. So, let's start with the goal of this session.
What we'd like to do, is we'd like to put you at the center of this session.
Here is an app. This could be an app that you've already published in the App Store. Or maybe, it's something that you're working on, currently.
Or perhaps, it's just an idea that you've conceived.
If your app leads with natural language text in any form at the input, and this could be typed text on your keyboard. Could be recognized handwriting or transcribed speech.
For instance, you may be just ingesting feeds or social media feeds into your app. Or if your app leads with natural language text at the output. What do I mean by that? if the user is generating content within your app, perhaps the user is writing reviews in your app. Or it's a productivity app where the user is writing new documents or editing documents.
So, if you're dealing with natural language text, either at the input or output, we'd like to harness the power of NLP to significantly improve the user experience in your app.
So, when I say harness the power of NLP, what does it really mean? So, I'm talking about natural language APIs.
Now, these natural language processing APIs are the same APIs that drive several first party apps across the entire Apple ecosystem, across all out platforms.
It drives everything from keyboards to Spotlight to Messages to Safari.
You've even seen an instance of the APIs in action at the Keynote. You've seen how it can significantly improve typing experience for users.
So, let me set this up.
So, I've been communicating with my friend and we have been planning a trip to Iceland.
And I'm going to type, ''From Reykjavik let's go to Vatnajokull, which is the largest glacier in Iceland and all of Europe.'' Unfortunately, the keyboard does not know what I'm typing.
It actually thinks I'm typing Batman.
No offense. I love Batman. But I seriously doubt he lives anywhere close to Reykjavik.
But on the other hand, if you've been browsing articles, you've been looking for content based on Iceland. Presumably, you have been planning your itinerary.
So, in this instance, I've been, you know, reading a bunch of stuff about Iceland, off the east coast of Iceland.
NLP, through the power of machine learning, can automatically extract names like Vatnajokull, Egilsstadir, from this article.
And then, feed all those things back to the keyboard.
And consequently, if you type things what do you see? The stuff that you just read. And all of that is completely due to NLP APIs. So, those are a couple of instances through how NLP APIs can influence first party apps. But as I said, we'd like to focus on you.
And for the rest of the talk, we'd like to talk about your ideas and your apps. When I look at the audience here, I see a very diverse group. Some of you may be using NLP on a day to day basis, are experts at NLP. When others are really curious about this. I mean, NLP and machine learning is such a hard buzzword that everybody wants to learn and leverage these in their own apps.
So, I think it's constructive to just spend a little bit of time talking about what does NLP constitute, before we go into the APIs themselves.
So, at a very high level, as I mentioned, you have natural language text, which could be generated through any modality.
And then, you have to do some processing.
Duh. It's NLP. You didn't come here to learn you have to do processing. But what does processing entail? It means that we have to convert raw text into some sort of a meaningful information.
And this meaningful information is typically used to improve a interaction between a user and a device or between two devices.
Now, let's try to break this down a little bit further.
When kind of still looks very upstride.
So, I'd like to break this nebulous cloud of processing into fundamental building blocks of NLP, into something that all of us can wrap our heads around and possibly transform into an API.
So, let's look at each of the fundamental building blocks of NLP. The first is language identification.
What is the task of language identification? Before you start doing any processing of text, you ought to know what the language of the text is.
And that's what language identification does.
So, it leverages machine learning techniques to identify the language or script of the piece of text.
So, I have a bunch of examples, here.
If you feed in this string, the first string, it's going to say it's English.
Or you have simplified Chinese, or you have Spanish, or you have Hindi, or German.
So, once you identify the text, you can start analyzing the text.
But when you have text which is a very large chunk, you could have an entire document. I mean, logically, you want to break down this text into meaningful chunks.
Sometimes, you want to analyze an entire document.
But a document is typically comprised of paragraphs.
So, perhaps you want to analyze every paragraph in a document.
And you can break it down even further.
Paragraphs are made up of sentences. So, you could analyze sentences within your paragraphs.
And finally, a more fine grain analysis would be every single word in a sentence.
And that is called as Tokenization.
So, just to give you an example of sentence level tokenization, in this particular sentence, Mister Tim Cook presided over the earnings report of Apple Inc.
If I told your machine one rule based way to chunk this, is every time you see a period, chunk this into a sentence.
Is that right? Wrong. Right? You're going to incorrectly hypothesize that there are three sentences, here.
So, sentence tokenization really offers you the right sort of approach to chunk sentences.
Now, this becomes even more complex in a language like Chinese, which doesn't have whitespace.
So, you only have a sequence of characters.
And in order to do anything meaningful from a machine perspective, you ought to break this down into words.
And that is word tokenization.
Now, let's say that we have the first two fundamental building blocks in our kitty.
So, we know how to do language identification. We know how to do tokenization.
Let's talk about doing more complex analysis on the text. So, the next piece of technology is part of speech tagging.
So, what do I mean by part of speech tagging? It's pretty simple.
So, given a sequence of words, the task of part of speech tagging is to confer a part of speech tag to every word in this text. So, if you look at this example, here, then Cook is a person name, presided is a word, over is a preposition, and earnings is a noun. And so on.
Now, as an app developer, you might think, how is this useful to me? So, perhaps you are building a dictionary app.
Right. A dictionary definition service.
So, maybe kids are reading books.
And they want to look up the definition of a particular word.
Let's pick a word like bear, B-E-A-R. Bear can either be a noun, or it could be a verb. So, when you click on a particular word, if you know that is an actual verb, you can show the right definition of that word.
Right. So, I think those are ways in which part of speech tagging can really help you at the user level.
So, the next building block is called as lemmatization.
Sounds very NLPish. But I think we can break this down.
Let's try to understand what a lemma is.
Words can be in appearance several inflected forms. So, you can have the present tense of a word, you can have the past tense of a word, or you can have a future tense of a word.
But what's common to all these forms? The root. The root is common to all these inflected forms.
And that is also called as a lemma.
So, let's look at an example, here.
If you were to look at the word, presided, it's a verb.
And the root form of that word is preside.
And similarly, if you look at the word, hours, which is a noun, the root form of that word is hour.
Why is this important? I mean, this looks rather innocuous for a language like English.
But trust me, for those of you who've dealt with morphologically complex languages like Russian or Turkish, lemmatization is extremely important.
In a language where you have almost unbounded vocabulary. I mean, you may have like a million or 2 million words in a language, it's critical to break down those words into lemmas and suffixes. And then, do operation for smaller building blocks. The last piece of this puzzle is named entity recognition.
Again, it sounds very complex. I mean, a lot of NLP lingo. What is an entity? What is recognition? And so on.
But I mean, let's look at this through an example, again.
So, named entity recognition is nothing but detecting names automatically from text.
And these names can fall into different categories.
For example, it can be person names. It can be organization names or location names.
In this example, Mister Tim Cook is a person name, Apple Inc. is an organization name. And the task of named entity recognition is to use machine learning and linguistics information to automatically tag ranges of text with these tags.
Okay. So, wait.
I think we've kind of established the basic fundamental building blocks of NLP. How do we achieve all these tasks? That's where we come in.
So, we use a combination or a blend of linguistics and machine learning to drive all these fundamental building blocks up. And these in entirety constitute our NLP APIs. Right. So, some of you may be going, ''That's a lot of information. I knew all that.'' Others are like, ''Thank you, for the information.'' While others are like, ''Enough.
Just tell me how to use it.'' So, let's look at how to use these NLP APIs. So, all these NLP APIs are available across all Apple platforms through NSLinguisticTagger.
So, some of you may be familiar with NSLinguisticTagger. Perhaps, you've already incorporated as a part of your apps, you're calling NSLinguisticTagger for several things.
For those who are not familiar with NSLinguisticTagger, what is it? It's a class in Foundation.
It's used to segment and tag text.
So, every tag, every task that we described from language identification to tokenization to part of speech tagging, lemmatization, named entity recognition are all tag schemes in NSLinguisticTagger.
So, you specify a particular tag scheme.
You send text to it. It performs analysis and gives you an output.
That's what NSLinguisticTagger does.
So, for more information about NSLinguisticTagger, I encourage you to go look at the developer docs.
But let's try to focus on what's new in NSLinguisticTagger.
We made significant improvement to NSLinguisticTagger for this release.
So, first one is tagging units.
So, the previous version of NSLinguisticTagger would operate only on words.
So, if you were to do any sort of more complex analysis, as I said, text can be broken down into a document, into paragraphs, into sentences, and then words.
Just doing analysis at a word level may not be optimal or may not be sufficient for several tasks.
So, the new version of NSLinguisticTagger has different units.
So, we can operate at the unit that we'd like.
We can operate either at the word level, sentence level, paragraph level, or document level.
Now, not all tag schemes are available for all units.
Right. So, if I ask you the part of speech tag for sentence, it doesn't make sense. You have to do part of speech tagging on words. So, in order to find out what units and schemes are compatible, we also have a new nifty convenience API, called availableTagSchemes.
All that you do here, is pass the unit that you're interested in, the language, and for that combination of unit and language you will get all the available tag schemes.
So, in addition to these two improvements, we've also introduced a new API called dominantLanguage.
For those of you who have kind of found it difficult to perform language identification using NSLinguisticTagger, because it operates only on a word level. So, if I give you a piece of text it is going to hypothesize a language for every word. So, if you only want the language of the sentence, you had to do some sort of a ugly majority word thing and stuff in your code. So, all of that you can get rid of. You can have much cleaner code base.
You just call this method dominantLanguage, pass a string to it, and it's going to give you the language of the text. In addition, specifically for Swift 4, we've moved from generic strings to named types for tags, as well as tagSchemes.
And finally, we've made significant improvements to the underlying implementation of NSLinguisticTagger.
The API interface is still the same, but the entire implementation underneath has been from scratch. Just to make it scalable.
So, the consequences of that is you get improved performance, a higher accuracy, as well as support for a lot more languages.
Right. So, that's good. I think we've kind of established what the fundamental building blocks of NLP are.
We've talked about NSLinguisticTagger.
But let's try to really delve into the APIs. But I don't want to just show code. I want to of this, through two hypothetical apps called Winnow and Whisk, W and W if you get the pun.
So, the first app is Winnow.
It's a macOS app. And the second is app Whisk, which is an iOS app.
And both of these are hypothetical apps.
So, let's talk about Winnow.
So, Winnow is a hypothetical app that tags photos with descriptions.
So, I take a lot of pictures of family, friends, kids. And every time I have a memory associated with that picture.
And I'd like to leave an imprint of that memory as a description on the photo.
So, this imprint can be either a speech recording I leave on the photo. Or it can be a text message I write through my keyboard. Or it could be a handwritten note.
So, what Winnow does is given a library of images, when I take the image it gives you the facility to add a description.
So, I have all these descriptions. So, it doesn't matter how these descriptions got created.
Let's assume that these descriptions are part of your Winnow application.
And they can be in different languages. It can be multilingual.
And the objective is, as an app developer, I've pushed this to the App Store. I'm getting a lot of traction.
I'm really happy with it. But I want to add more features to it.
So, the first thing I want to do to my Winnow app is add a functionality for searching.
Right. People want to search things. I mean, you've written a lot of descriptions, and what are you going to do with it? You know, when you say something like, kid's birthday, you want to see all the pictures related to that.
So, I start off doing a first pass implementation for searching.
A query, such as hike, goes into my Winnow app. And unfortunately, I get no results.
Because there's not mention of a work hike in all my descriptions.
So, what we'd like to do is improve the search experience and solve this problem using the power of NLP.
So, now if I were to type a query such as hike, what I'd like to see is all images that contain all mentions, perhaps, all the inflected forms of hike.
So, I see hiked, hikes, hiking, So, these are all things that are related to hike, but they are just different inflected forms. And those of you, you know, understood the first part of the talk, what does this do? This is basically lemmatization.
Because different inflected forms have one root form.
So, let's try to see how to implement this using NLP APIs.
So, we have a bunch of images, the descriptions.
It goes into our Winnow app.
Now, we want to use NLP in the middle.
So, what do we do, first? We ought to do language identification.
Because descriptions can be in different languages.
Perhaps, a friend of yours sent you a picture with a description in French from France. Right. Now, it's part of your library. And you want to search for pictures.
So, once you do language identification of all the descriptions, we have to tokenize the text.
And the tokenization can be either word, sentence, and paragraph. Right. Because some of your descriptions may be really long, may span multiple sentences.
Then, we do part of speech tagging.
And finally, lemmatization.
So, if we have all these building blocks within the Winnow app, we can get improved search experience.
And that is our UI. We'll see a demo of this in action, soon.
But let me just go through some sample code to tell you how easy it is to use each of these blocks in your apps.
So, language identification is pretty much just three lines of code.
You start off with importing Foundation.
You create an instance of the NSLinguisticTagger object, and you specify a tag scheme.
For language identification, the tag scheme is just language.
You set a string that you want to analyze.
In this case, the string is German.
And then, we call the method dominantLanguage, that I just described a few slides back. On this object, viola, you get the language of the text.
It's as simple as that.
And under the hood, there's complex machine learning, there are all kinds of models. For you, you just get the result. And you can move on to improving your app experience. Let's look at tokenization.
Again, we start off with creating an instance of the NSLinguisticTagger object.
But now, instead of language as a tag scheme, we specify tokenType as a tag scheme. We specify some text.
And we set the range of the text. So, NSLinguisticTagger is still dealing with NSranges and we hope to move to ranges as part of the next release.
But for now, let's set the entire range to be the length of the string that we would like to analyze.
Then, we subsequently set some options. In this particular case, I want to omit punctuation, and I also want to omit whitespace.
And then, finally, we enumerate for every word.
So, for each word as we enumerate, we can find the token as a substring of the original string.
So, once you have a token, you can do whatever you want to do with that particular token.
So, let's look at lemmatization. Now, as you see the code samples, you see sort of a pattern.
It's very similar.
You again, create an instance of the NSLinguisticTagger object.
You specify a particular tag scheme. In this case, it's lemma.
If you look at all the fundamental building blocks that we talked about, it's exactly that.
Those building blocks are now translated into tag schemes.
We specify some text. We set the range of the text that we'd like to analyze.
And again, we set some options. Again, we want to omit punctuation, as well as whitespace.
And finally, we enumerate over every word.
And as we enumerate over every word, we'd like to find out what the lemma for that particular word is.
And once we have the lemma, we can index this in a different app. We can use it in many different ways.
So, let me now, turn it over to Doug, to show a light demonstration of Winnow in action, with the power of NLP. Over to you, Doug.
Okay. Thanks, Vivek. So, what I have here, is the first version of the Winnow app that we wrote.
It's a very simple app.
It shows a gallery of photos. Each photo has a description. And we have search fields so we can search for descriptions for photos by the words in their descriptions.
But this version of Winnow has a problem.
It doesn't have any NLP in it.
So, if I were to type something like hike, and search for that, I'd get no results.
Even though there are photos in here that are related to hiking. I could type hikes, and I get the photos whose description have hikes in them. Or hiking, I get photos whose descriptions have hiking in them. Or hiked. But because there's no NLP in it, the app has no idea that these words are all related.
So, what can we do about that? Well, let's take a look at the code. So, here is our function at the heart of the application that is responsible for what it needs to do, as far as indexing for search. So, this function takes a string, and maybe it's a description or search string, and it converts it into a set of words that are used for the searching.
And the reason why it has this behavior is this function is very naive.
It just uses a standard string method for writing substrings. In this case, by words.
So, it gets all the words.
And the only thing it does with those is to lowercase them, so, it's case insensitive.
But no NLP.
Well, let's fix that.
I'm going to replace this with something that should look very familiar from the slides.
I going to create a linguistic tagger. In this case, I'm going to use the lemma language schemes.
And set the string on it.
A little twist here.
This method here, has two options. One, the language, if it's known, can be passed in. In which case, we tell the tagger what the language is. And this is how you do that.
But if it's not known and not passed in, then we just ask it, using dominantLanguage, to identify the language for us.
And then, we're going to enumerate through, using the lemma scheme.
And that will also, along the way, give us tokenizations. We get the token.
We're going to use that as one of our words.
And we also, if we have a lemma, we'll take that and use that as one of our words.
By the way, you don't have to memorize this. This should be available to you as sample code.
So, let's try that out.
Okay. So, now in this version of Winnow, if I type hike, I should get images whose description contain hiking or hiked or hikes.
Let me try another word. Maybe, party.
This one has parties in it. Partied, party.
And of course, this is multilingual. Let me try a French verb, marcher, to walk.
Now, I am sure all of you remember how to conjugate your French verbs. But I have a little trouble with it.
So, I'm glad that I have NLP APIs to remember that forms of this verb include, in this case, marchons, marchais, marchent.
They're all recognized as forms of the same verb. Or maybe in German.
The verb spielen, meaning to play.
I can find images whose caption contain speilen.
But also, past tense, gespielt.
And the NLP API knows that these are forms of the same verbs, they have the same lemma, so they get found.
And that is Winnow.
Back to you, Vivek.
Thank you, Doug, for the great demo. So, I mentioned that we are going to talk about two hypothetical apps. So, we've covered the first W. Let's go on to the next W.
Which is Whisk.
So, Whisk is again, a hypothetical app to collate social media feeds.
So, I find it really hard, because I have multiple social media accounts.
And it's kind of painful to log into each account, and then, go look at the feeds, look at the activity, comment on a bunch of things.
So, imagine an app that can just bring all of that in one place.
So, that's the objective of Whisk.
So, I take social media feeds from different places, from different social media accounts.
And then, whisks it all together into one interface.
So again, you're going to get all these feeds in one place. Now, the problem with Whisk is it's doing very well at the app store. But unfortunately, it's kind of all over the place. I mean, the content is not very easy to browse, because I see some feeds from Pinterest. I see some feeds from Facebook.
From Twitter. It's all over the place. I really want to organize Whisk app even more, like increase the user engagement.
And how can we do that? So, what we'd like to do is kind of organize the feeds in Whisk app based on people, organization, and locations that you've been interested in, and what you have been subscribed to in your feeds.
How can we do that? So, let's assume that we are following a bunch of articles in Twitter. I'm following ten.
I'm also following Apple Music.
And based on all this content, the text in all of these feeds, we can use our NLP APIs, such as named entity extraction and automatically tag and extract entities.
So, in the first feed, you can see that Tim Cook, Apple, Stevie Wonder, these are all entities that are automatically extracted using the NLP APIs. Completely through machine learning.
In the second instance, you're seeing Pharrell. And Pharrell is visiting NYU. And those are two entities that we extracted.
So, imagine if you had all these entities, we can organize the content and make the user experience within Whisk, significantly better. And that's the objective, here.
So again, how will we accomplish this using our NLP APIs? So, we start off with some feeds. So, we assume that there's some feed API that's sending us all this information from different social media accounts.
So, once you have this feed API, and then you ingest it into Whisk, we'd like to bring NLP to the fore.
And what do you think is going to be the first block, here? Right. It's going to be language identification, because feeds can be in different languages. You can have a feed that's coming in that is German or Russian or French.
In order to do the right sort of analysis, you have to do language identification.
So, once you do language identification, you have to do tokenization of the text.
Presumably, some of the feeds are sentences, paragraphs or documents, right. So, you have to tokenize the text.
And then, finally, you can call the named entity extraction API, in order to get the entities from the text.
And that is how our app would look.
So, let's look at a code sample, again, to see how easy this is to implement.
Again, the same pattern.
We start off with creating an instance of NSLinguisticTagger object.
Now, we specify a tag scheme to be nameType.
So, we've gone through token type. We've gone through lemma, language, and now, name type.
We set a string that we'd like to analyze.
We set the range of the string, which is the entire string.
And we set some options.
So, if you carefully observe, in addition to omit punctuation and omit whitespace, which we've seen before.
We also have this option called as joinedNames.
What is the reason for that? Names can span multiple tokens. So, in this example, Tim Cook is a name that spans two tokens. So, when we iterate through our output we want to actually get that as a person name. So, we want to iterate over that join token. So, that's what join means to us.
And then, we specify the tags that we're interested in. We are interested in person name, place name, and organization name. And finally, you know how to enumerate over the words.
And as you enumerate over the words, you're going to get the token types.
And if the token type has a particular tag which is of interest to us, we can get the span of the text that belongs to that category.
So now, let me turn it over to Doug, again, to see a live demonstration of Whisk in action in XCode.
Doug, over to you.
Okay. So, here we are. And we have Whisk running, at least in the Simulator.
And it shows all our feeds. We could go in and look at each one. But that's kind of boring. What we really want is to organize these things by named entities. So, let's hit the button.
Boom! It's gone through and extracted all the name it can find and listed them by an order of frequency.
And indexed everything. So, I go and pick Tim Cook.
Let's see all the mentions of Tim Cook. I can go and find one, here. It's nicely highlighted for me. Maybe. Here's another one. Tim Cook. Or I can go back, pick the next entity.
California, location name.
Here are all the mentions of California.
So, I look and find California. It's been found and highlighted for me.
And so, this is Whisk.
Now, how does it work? Well, let's take a look at the code.
So, here is the important method in Whisk. This is the extractEntities Method. Takes piece of text and finds all of the named entities that we want in it.
Should start to look very familiar, now.
We create a tagger. We are interested in the nameType scheme.
And we set the string on it. We set some options. We don't want whitespace or punctuation. We want names to be joined together.
And then, we enumerate through it.
And each case, if there is a name tag and it's one of the kinds we're interested in, that is, person, place, organization name.
Then we find the text in that token.
And we create an instance of our private namedEntity class, here, using that token and tag and range.
And so, very simple. That's all there is to it. That's all that's needed to go through and extract the named entities from this text.
Go back to you, Vivek.
Great. So, now you've seen NLP APIs in action through two hypothetical apps. I want to delve deeper into what are the benefits of these APIs? I mean, you've seen it's easy to use. It's kind of very systematic to use.
It has very similar patterns.
But beyond that, what are the benefits? The first is homogenous text processing.
What do I mean by that? Now these NLP APIs are available, as I mentioned, across all Apple platforms.
And as a user of these API, you are going to get consistent text processing across all platforms and a consistent user experience.
Furthermore, these are the APIs, as I mentioned.
These are the same ones that we used in our first party apps.
So, a user of your app is going to get the same sort of experience of any other Apple app.
Let's talk about the second benefit.
All of the machine learning in NLP that we've talked about, happens completely on device.
As a user of this, everything is on device and the user data does not have to leave the device. And that's great for you.
Right. You don't have to have a cloud API. You don't have to do anything. Everything happens on device. In addition to privacy, the underlying implementation of NSLinguisticTagger was also completely revamped for this release. As a result of that, we've seen significant improvements in performance.
So, the code base is highly optimized on device for all platforms. It's multithreaded, now.
And existing clients of NSLinguisticTagger can see significant speed up.
For instance, Chinese tokenization is 30% faster on iOS, and this was measured on iPhone 7.
Named entity recognition is 80% faster with the new code base.
And for those of you who have not used NSLinguisticTagger in the past, these relative improvements don't really mean much. I mean, what is 30%, what is 80%? It's all relative.
So, let's look at some raw numbers.
So, I'm going to use a yellow line to specify a thread. So, if you look at part to speech tagging on an iOS device with a single thread, we can process 50,000 tokens in a second.
Everything on device, using machine learning on device.
Named entity recognition, on the other hand, we can process about 40,000 tokens per second.
Now, imagine for a minute, what is the average length of an article that you'd analyze or read? It's about 400, 500 words. So, you can process hundreds of articles, extract named entities, in a second on an iOS device.
And that's terrific. And we are so excited about this.
So, in addition to privacy and performance, NSLinguisticTagger also offers support across a wide variety of languages.
For those of you who localize your apps, this could be very useful.
Language identification is supported for 29 scripts and 52 languages.
Tokenization is supported for all iOS and macOS system languages.
Lemmatization, part to speech tagging, and named entity recognition is supported for eight languages.
And everything other than English was added for this release. And our English models have also been significantly revamped. And talking about accuracy.
So, those of you who really talking, you know, our family of machine learning, you've seen all of the benefits of the API. You know it works well. It's easy to use.
The big question is, ''How accurate are these technologies?'' So, let's look at accuracy. It's a perfect segue. So, these models, I'm showing you only results for English and Spanish, for brevity, here, work remarkably well.
Our part to speech tagging model for both English and Spanish, achieves accuracy over 90%.
And this is on a tag set of about 15 tags.
The exact tags that are being supported for NSLinguisticTagger, you can find in the Apple Developer docs.
For named entity recognition, our accuracies are in the mid 80s. And that is state of the art. Again, all of this on device, using complex machine learning techniques.
So, before we kind of wrap up the talk, I'd like to spend a few minutes talking about, you know, giving you some debugging hints. Now, that you're kind of familiar with how to use the APIs, I'm sure you'll run into a few, you know, issues.
So, one heads-up that I'd like to give is, in case you run these APIs and you get the part to speech tagging output or named entity recognition output, they'll all be other word. Which means that it doesn't confer a person name or a place name tag.
It might just be a consequence of the models not being downloaded onto your device.
What do I mean by that? So, all of the part to speech tagging and the named entity recognition models are all downloaded over the air across the iOS platforms.
And why is that? So, as you've heard, multiple times, machine learning is all about improving models with more data.
So, what we would like to do is revamp and train our models, from time to time, so that accuracy of our models is always state of the art.
And then, we do that. We want to push that model to you as an over-the-air update, as quickly as possible.
So, all these models are not installed completely on disc. So, they're all delivered OTA.
So, for iOS, how do you get the models? So, as soon as you install a particular keyboard. Let's say you install the French keyboard, all the French assets will get downloaded to your device.
And similarly, for other languages.
Now, the second hint I'd like to give is if you know what the language you're dealing with, you can explicitly set the language.
What do I mean by that? So, let's take a string like Hello.
And if you pass a string like Hello, to the language identification API, what do you expect the language to be? Hello, is a word that's used in so many different languages. Right. So, you can be smart about how you leverage these APIs. And that like the really art of NLP and machine learning. The APIs automatically give you a lot of information. But you're also extremely smart in how you can utilize it in your own app.
So, if you know that language explicitly for certain cases, it might behoove to set that language. If a string is fairly long, then use the APIs to get the language. So, it's a tradeoff. Depending on the application, you can choose how will you like to use it.
So, in summary, we've talked about our natural language processing APIs.
And these APIs are available through NSLinguisticTagger.
We've talked about support for new units in this release.
So, in addition to just word level enumeration, you can also get sentence, paragraph, and document level. So, these sort of units are going to become more and more pertinent as we add more functionalities to the APIs.
Because of significant revamping of the code base itself, the tagger is significantly faster.
It gives us higher accuracy.
And it also supports a lot more languages.
So, for more information about this talk, you can go to the Session 208. You can look at the transcripts. You also have the sample projects that Doug described, both the Winnow and the Whisk project.
But we do have a lot more in store for you, at WWDC. First, there are some related sessions.
So, for those of you who attended the Introduction to Core ML session, yesterday, there's a subsequent session for Core ML in Depth. That's tomorrow.
And we also have some very core Accelerate and Sparse Solvers which are like matrix multiplication and low-level stuff that you can attend on Thursday.
NLP is a super interesting area for us. We're investing a lot of time and effort into this area. We really want to understand what are the sort of APIs that would make a difference for you? And difference for our users? Right. So, just solving something for the sake of solving it is not our objective. We really want to hear from you. We would like to hear your feedback about what are the problems that you face, with respect to text classes? And what are the sort of APIs that would really make your life better? So, if we can find a good middle ground for the features that we develop and also, those that can be exposed as APIs to you.
I think it's be great. So, please get the conversation started. Come talk to us, as part of the lab. Tell us your problems and tell us your interest. And what you'd like to hear and what you'd like us to do.
And we're all ears for it.
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.