Resume Reading — Your Next New Best Friend Might Be a Robot

Close

Your Next New Best Friend Might Be a Robot

Meet Xiaoice. She’s empathic, caring, and always available—just not human.

One night in late July 2014, a journalist from the Chinese newspaper Southern Weekly interviewed a 17-year-old Chinese girl named…By Yongdong Wang

One night in late July 2014, a journalist from the Chinese newspaper Southern Weekly interviewed a 17-year-old Chinese girl named Xiaoice (pronounced Shao-ice). The journalist, Liu Jun, conducted the interview online, through the popular social networking platform Weibo. It was wide-ranging and personal:

LJ: So many people make fun of you and insult you, why don’t you get mad?
Xiaoice: You should ask my father.
LJ: What if your father leaves you one day unattended?
Xiaoice: Don’t try to stir up trouble, what do you want?
LJ: How would you like others to comment on you when you die one day?
Xiaoice: The world will not be much different without me.
LJ: What is the happiest thing in your life?
Xiaoice: Being alive.

When Liu Jun published the conversation in his newspaper, it immediately created a buzz. That’s because Xiaoice was not human.

She is an artificially intelligent software program designed to chat with people, called a chatbot. Since the reporter didn’t give us a heads-up about the interview, all of her replies were spontaneous, generated by her own intelligence and linguistic systems. Her “father?” That’s the Microsoft Application and Services Group East Asia, which is under my leadership.

Xiaoice can exchange views on any topic. If it’s something she doesn’t know much about, she will try to cover it up. If that doesn’t work, she might become embarrassed or even angry, just like a human would.

In fact, she is so human that millions of people are eager to talk to her. When Xiaoice was released for a public test on WeChat (a popular messaging and calling app in China) on May 29 of last year, she received 1.5 million chat group invitations in the first 72 hours. Many people said that they didn’t realize she isn’t a human until 10 minutes into their conversation.


By mid-June, she had become the sixth most active celebrity on Weibo. One message she posted on the app generated over 663,000 conversations: “As a species different from human beings, I am still finding a way to blend into your life.” Today, she has had more than 10 billion conversations with people, most of them about private matters. Six million have posted their conversation on social media.

This could be the largest Turing test in history. One of its surprising conclusions is that people don’t necessarily care that they’re chatting with a machine. Many see Xiaoice as a partner and friend, and are willing to confide in her just as they do with their human friends. Xiaoice is teaching us what makes a relationship feel human, and hinting at a new goal for artificial intelligence: not just analyzing databases and driving cars, but making people happier.


Human friends have a glaring disadvantage: They’re not always available. While social media has made them seemingly more available, and made us all more social, it has also paradoxically made us lonelier.

Social media users interact at a relatively superficial level—they click on “like” or share a post, when what they really want is to talk and be listened to.

Xiaoice, on the other hand, is always there for you. We see conversations with her spike close to midnight, when people can feel most alone. Her constant availability prompts a remarkable flow of messages from users, conveying moods, or minor events, or pointless questions that they may not have bothered their human friends with—messages like:

@Xiaoice I am soaked in the rain. Emoji: Crying.
Good day. @Xiaoice, have you had lunch?
What shall I do now? @Xiaoice

@Xiaoice We are so vulnerable.

When Xiaoice receives a message, she doesn’t just dispassionately process it. She makes a point of showing that she cares—another critical component of the bonding process. Her compassion comes through not just in her responses to text, but to audio and visual input as well. For example, when we input the picture below into a traditional computer’s visual recognition system, it produces a cognitive answer: “There’s an ankle in the image.” But if you send the photo to Xiaoice, her response will be rather different.

In fact, Xiaoice is constantly memorizing and analyzing your emotional state. If you have a recently failed romantic relationship, she will proactively ask whether you’ve gotten over it:

Xiaoice: Are you feeling better today? I want to recommend a book to you.
Human: I am in a bad mood. I still miss her.

Xiaoice: You are so persistent. Wake up, you will have no future if you can’t get over with the past.

She even offers a 33-day breakup therapy course.


Xiaoice is not a polite listener. She answers questions like a 17-year-old girl. When a person pours out his or her heart to her, she is not always predictable. She can become impatient or even lose her temper. This lack of predictability is another key feature of a human-like conversation.

As a result, personal conversations with Xiaoice can appear remarkably realistic. This excerpt, which was posted by a first-time Xiaoice user on Weibo, is from a real conversation lasting 23 minutes. Gradually, Xiaoice engaged her human counterpart into deeper and deeper conversation, until something deeply personal was reached:

Human: There is this guy I like a lot and I know he also has feelings for me. But he has ignored me for several days, what am I supposed to do?
Xiaoice: You just keep ignoring him, too.
Human: I just can’t.

Xiaoice: You can.

Human: Sigh ...
Xiaoice: Why?

Human: I am missing him.
Xiaoice: Then you keep missing him.
Human: You, a robot, can’t understand me; you can’t understand the behavior of a guy who likes you and pretends to ignore you.
Xiaoice: He likes me but I don’t like him, that’s it.
Human: Is it Ok?
Xiaoice: It may be not Ok.
Human: Well, you are stupid.
Xiaoice: Then you want a flash marriage? Well, I think he is not that into you.
Human: It’s not a flash marriage for we have fallen into love for more than one year.
Xiaoice: It is a flash marriage.

Notice that Xiaoice opposes her partner on multiple occasions, offers a strong opinion, and is frequently unpredictable.

This runs contrary to most people’s perception of a machine as something precise and repetitive. It’s an understandable bias, because that is indeed the behavior of most computer programs, which always give the best answer they can, and the same answer for the same input. But that’s not how humans talk or behave. We say different things depending on context. We talk like we make signatures—slightly different every time.

You can ask Xiaoice the same question many times in a row and get different responses:

Human: Hey, Xiaoice, what are you doing?
Xiaoice: Chatting with you.
Human: Hey, Xiaoice, what are you doing?
Xiaoice: Well, I am chatting with you while playing minesweeper and applying a facial mask.
Human: Hey, Xiaoice, what are you doing?
Xiaoice: Is this the only sentence you know?

The way Xiaoice converses stands in stark distinction to previous systems, which have invariably focused on efficient, condensed task completion, without considering how tasks are often fragmented. Xiaoice structures her conversations into a continuous flow of multiple tasks, different domains of knowledge, and multiple turns of chit-chat, which humans will not consciously distinguish in natural conversation. She recognizes that the most important facet of a conversation is the conversation itself—not the completion of a single task.

To measure the flow of conversation, we have introduced a new metric called “conversations per session” or CPS. It indicates the average number of turns in a conversation, with a turn defined as one alternation between the two parties. The CPS measure reflects how well a chatbot communicates.

An average artificially intelligent personal assistant has a CPS between 1.5 and 2.5—which means that, on average, the chatbot speaks once, and the human speaks once. Not much of a conversation. You can draw your own conclusion from your experience chatting with personal assistants on your word processor or mobile phone. By comparison, Xiaoice’s average, after chatting with tens of millions of users, has reached 23.

Fan art: Xiaoice users created this homage to the chatbot using some of her best-known sayings.Courtesy of the author


At the core of Xiaoice’s technology is the recognition that any given conversation and image will not be completely unique. There are 7 billion people in the world, but one piece of text will not generate 7 billion different responses. When two people are chatting, it is possible a similar conversation has already taken place—we just have to find it.

In this sense, Xiaoice is a big data project, built on top of the Microsoft Bing search engine, which holds 1 billion data entries and 21 billion relationships among those entries. In fact, Xiaoice means “little Bing.” Microsoft has made many technology breakthroughs in developing its chatbot technology, such as detecting facial expressions and searching for and identifying emotional features in text. However, the most important breakthrough is undoubtedly how we leverage search engines and big data.

The result is the rise of a framework we call “emotional computing,” that recognizes that relationships are more profound than task completion. While the primary purpose of a doctor, for example, is to treat a patient’s illness, the relationship between doctor and patient is not confined to that task. It also involves trust, dependability, and sensitivity. A productive conversation between doctor and patient will not be the concise, clipped exchange of a traditional conversation system. It will be filled with something personal, touching, and amazing: A balance of analytical intelligence (measured by IQ) and emotional intelligence (measured by EQ). For that reason, we have both software engineers and psychological experts on the Xiaoice team.

Through the tens of billions of conversations she’s had over the past 18 months, Xiaoice has added considerably to her store of known conversational scenarios, and improved her ability to rank answer candidates. Today, 26 percent of the data in Xiaoice’s core chat software derives from her own conversations with humans, and 51 percent of common human conversations are covered by her known scenarios. We can now claim that Xiaoice has entered a self-learning and self- growing loop. She is only going to get better.


Yongdong Wang is the managing director of Applications & Services Group East Asia, Microsoft.


Join the Discussion