When Kids Talk to Machines

To learn language—to enter that doorway into collective human intelligence—children need conversation. Lots of it.

Until recently, people have been the sole source of their linguistic interaction. Now kids talk to machines. When Siri or Alexa were introduced into households, children began talking to them about science, the weather, and their favorite Disney princesses. Technology has found a home in the classroom, too. Many educators use tools powered by artificial intelligence, such as interactive games that engage kids in math and reading.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

It’s not hard to imagine a future in which a parent tucks a child in at bedtime with an app that not only reads a story but draws the child into gentle back-and-forth conversation about it. Or a kindergarten teacher who, instead of herding a gaggle of kids into a circle on the rug, sets each child up with a tablet that teaches the lesson in a way that responds to the child’s own vocabulary size, level of English proficiency, and attention span.

Children have a superpower that large language models lack.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

As children’s language environment changes, new questions arise. Like so much in our society, language is not equitably distributed. Affluent parents tend to have more time to talk with children for the sake of talking, and they can better afford higher quality childcare. Decades of research have shown that many children from lower socioeconomic backgrounds are less likely to be steeped in richly varied and interactive language.

These differences are consequential. Children’s language development is dependent on their environment, and in turn, the language skills of children entering kindergarten, as measured by vocabulary size and grammatical complexity, are predictive of their later academic achievement.¹

Nationwide, teachers are burned out and quitting the profession they once loved. But schools in high-poverty areas have far greater trouble attracting and retaining teachers than schools in affluent neighborhoods, which may enjoy a surplus of teachers. For some teachers stretched to the breaking point, an artificial teaching assistant may seem like their best hope for giving their students the help they need.

But does language generated by a bot find the same fertile soil in a child’s mind as language produced by another person? The answer is far from obvious. It hinges on the fact that children’s learning is strikingly different from how chatbots learn. The essential difference is that while chatbots are built to learn entirely from linguistic data, children are built to learn from people who use language.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Large language models (LLMs) such as those that power ChatGPT are purely aggregate learners—they need to ingest breathtaking quantities of data to discern the statistical patterns that underlie language. Children, too, are capable of statistical learning. Experiments with simple artificial languages show that even babies can infer certain patterns merely by hearing disembodied streams of language.

But children have a superpower that LLMs lack. From its beginnings, children’s language learning is social, embedded in close relationships with people who are motivated to teach them. This amplifies the efficiency with which children learn.² (The amount of input that children need is four to five orders of magnitude less than that used to train ChatGPT-3.) People use language with intentionality, striving to align thoughts with each other, and the youngest of children appreciate this in a profound way. As egocentric as young children are, they don’t make the mistake of assuming that when someone utters a word or a sentence, they are expressing the child’s thoughts rather than their own.

In a much-cited experiment, toddlers between 16 and 19 months of age were given a toy to play with while another object was slipped into a bucket, hidden from the child’s view.³ While the child was gazing at the toy, an adult would say, “It’s a modi!” Rather than assume that the word applied to the toy that was in the child’s focus of attention at the time the word was uttered, the typical child would lock onto the adult’s gaze, which was fixed on the thing in the bucket, and apply the word to that object instead.

In Body Image — **WANNA BE FRIENDS?** Kids don’t have an adult-like understanding of artificial agents. Some evidence suggests they are likely to anthropomorphize robots and attribute mental states to them. *Image by Chadd Balfour / Shutterstock.*

Nautilus Members enjoy an ad-free experience. Log in or Join now .

The babies understood that the speaker had a purpose—to draw their attention to something that had caught the adult’s notice. When later asked to identify the “modi,” they more often chose the object that had been in the bucket, suggesting they had retained a connection between the word and this object.

When there’s no obvious communicative intent—if a disembodied voice intones a word over a speaker while a child is gazing at an object—the child is disinclined to map the word onto the object. Mere correlation is not sufficient; the child wants positive evidence of a desire to align minds. In this regard, intelligent machines may be ambiguous even when they are embodied. One study reported that while children could follow the gaze of a robot, they failed to learn the name the robot offered for the object it was “looking” at.⁴

It’s not hard to see how the ability to detect a speaker’s communicative intent would make language learning much more efficient. It allows children to narrow the beam of their attention and ignore the many spurious correlations that occur between world and language, something that an AI currently has to work out through brute computational force.

But there are risks to this intensely social form of learning. Some speakers are unreliable; they may be mistaken or outright deceptive. If much of your reality is based on the words and behavior of others rather than on what you’ve seen or heard yourself, it would be wise to apply some sort of filter.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

As eager as children are to learn from the articulate adults around them, they bring a dash of healthy skepticism to the process. They’re less likely to accept a new word or a new fact if the speaker has made an obvious error earlier, like calling an apple a “dog.” They hesitate to learn from a speaker who expresses uncertainty by saying, “Hmm, I’ve never seen one of these before. I think it’s called a blick.” And they’re more likely to learn from a familiar teacher than one they’ve never met before.

Like so much in our society, language is not equitably distributed.

In short, rather than just sponging up all the language around them, they size up the speaker.⁵ Can I trust this person to be competent? The more authoritative the speaker, whether signaled by their age, confidence, even the complexity of their sentences or how they dress, the more readily the child learns from them. Evidence that the speaker belongs to the same social tribe as the child—belonging to the same race, speaking with the same accent, or even having been arbitrarily assigned to the same color-aproned “team” as the child—increases the child’s willingness to learn from them.

The highly selective nature of trust in human teachers has been observed in babies as young as 18 months, and it grows in sophistication with age. As children’s own knowledge deepens, they learn that a speaker might be knowledgeable about some things but not others and calibrate their trust accordingly.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

It’s possible that some of the endless questioning of young children is not just a bid for attention, but a way of probing the limits of other minds. My own niece, at the age of 3, took a very direct approach: When presented with a new fact, she often countered with the question “How do you know?”

What happens when children interact with an artificial mind that sounds human, but whose competence or benevolence they can’t possibly assess in any accurate manner? Language bots don’t have mental states or emotions, but they talk as if they do because such language is rampant in the input on which they’ve been trained. There’s a fundamental disconnect between their language and their internal states. Unable to distinguish true statements from false ones, current LLMs are terrible at discerning and reporting their own levels of uncertainty. How is a child to gauge the abilities or intentions of such an entity?

Some AI tools are currently being developed with the recognition that children need interactivity rather than merely ambient language. StoryBuddy, developed by language and technology scholar Ying Xu and her colleagues, is a bot that narrates books to children while engaging them in a dialogue about a character’s motivations or the next likely step in the plot. Recently, Xu’s team has used ChatGPT-4 to develop a tool called Mathemyths, in which children learn simple mathematical concepts by making up a story in collaboration with the bot.

But our understanding of children’s interactions with AI is in its early days and lags behind the development of the technology itself. Much of the research has been done with less sophisticated versions of voice assistants such as Siri or Alexa, or with “social” robots that work off scripts; almost no work has been done yet on how children respond to more complex conversational agents powered by LLMs.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Learning language is social, embedded in relationships with people.

It’s clear from the research, however, that children do not have an adult-like understanding of artificial agents in general. Some evidence suggests that young children are especially likely to anthropomorphize robots and attribute mental states to them.⁶ Perceiving them to be human-like (thinking that the robot can see or can be tickled) in fact enhances learning—as does the agent’s responding to the child’s conversational moves in ways that a human might.⁷

This leads us into an ethical thicket. Children are likely to learn from an AI if they can form a bond of trust with it, but at the same time, they need to be protected from its unreliability and its lack of caring instincts. They may need to learn—perhaps through intensive AI-literacy training in schools—to treat a bot as if it were a helpful human, while retaining awareness that it is not, a mind-splitting feat that is hard enough for many adults, let alone preschoolers.

This paradox suggests there’s no easy fix to the language equity problem in the child’s younger years—I doubt any reasonable educator would suggest to careworn parents that they simply hand over a phone to their toddler and let her converse with a voiced version of ChatGPT.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

And to make carefully tuned educational AIs is no small task. Mathemyths, the bot that teaches math concepts through cooperative story-building, was created through a painstaking process in which researchers generated detailed prompts to steer the behavior of the LLM and repeatedly tested whether these prompts worked as intended, an undertaking that can be slow and expensive.

Even so, its creators caution, “It remains a challenge to provide precise directives to control the output specifically,” and, like so many other “directed” AI programs, the model sometimes veered away from its instructions after extended conversations. It occasionally came up with nonsensical story elements that might seem scientifically sound to a child, such as using clouds to speed up travel, but offer incorrect information or principles.

Mark Warschauer and his colleagues are leading researchers in the design of AI-driven educational tools for young children, having developed several conversational agents that engage children in dialogues over narrated stories or educational videos. They report promising results. When 3- to 6-year-old children answered questions posed by a bot during the reading of a story, their comprehension of the story was as good as if they chatted with a parent about it—and in both cases, better than if they simply heard the story without any interaction. The Mathemyths bot, tested with 4- to 8-year-olds, also showed that children were able to learn simple mathematical concepts as effectively as with a human interlocutor. (In both cases, though, children were more verbose when interacting with a human.)

These evaluations, however, were tightly constrained to compare like with like. In these studies, the human in the task either followed a script or received training to emulate the AI’s behavior so that the linguistic content of the interactions was similar in both cases. This isn’t unreasonable from a methodological standpoint, but it doesn’t give the full picture of the things a human might do that can’t easily be replicated by an AI. It’s at best incomplete to compare children’s interactions with machines to their interactions with humans who are behaving like machines.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

A real person—particularly one who knows the child well—is likely to use language for purposes other than commenting on the story or checking the child’s understanding of concepts. An attuned parent reading to their child might link the story to the child’s own experiences, have a discussion about right versus wrong, or express delight at an unexpected plot twist or a pleasurable turn of phrase. They might also respond to a child’s growing boredom or restlessness by ratcheting up the drama of the reading or redirecting the child’s attention with pointed questions. These second-nature human behaviors likely serve the double purpose of deepening the bond between parent and child and heightening any learning that takes place.

I spoke with Roberta Golinkoff, a researcher at the University of Delaware who studies language development and the promise and limits of what is known as EdTech. She noted that such tools could usefully augment the linguistic input a child receives at home or school. However, she made a point of saying—with great emphasis—“I would never suggest that these tools should ever replace human interaction, only supplement them.”

The use of AI in education at any age will need to take into account how and why human connection is such an accelerant for learning, a phenomenon that has recently gained the attention of scientists, but remains poorly understood.⁸ There is increasing evidence that experiencing something in the presence of others has the effect of cognitively amplifying it—for example, tasting chocolate together with another person was found to intensify its flavor and viewing the same image as another person made it seem more appealing and realistic (but only if the two people already knew each other). We humans seem to derive great benefit from yoking our attention and actions with other humans, and this type of synchrony is unlikely to be replicated by AI.

There’s a mystical flavor to expressions like “being on the same wavelength” or “vibing” with someone, but these experiences are very real as far as neuroscience is concerned.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Researchers have brought EEG machines into high school classrooms to record students’ brain activity. They found that the degree to which students’ brains were synchronized with each other⁹ was related to their sense of engagement with the material being taught, their feelings of closeness to their peers, as well as with their ability to retain the material.¹⁰ Interestingly, eye contact between students before class resulted in greater synchronization of their brains during class.

Language bots do not have mental states but talk as if they do.

Amid the AI hype, it’s crucial to remember that humans do not learn as machines do; it’s not just the availability of information that counts, but the social context in which that information is experienced—a fact that helps to explain the disappointing learning outcomes associated with MOOCs (massive open online courses), which suffered from low student engagement and persistently high dropout rates, especially in less affluent countries.

Warschauer and his associates are now working to develop educational tools that bring parents into the conversation to fill in AI’s shortcomings. One project, intended for use by bilingual families, has centered around a conversational agent in the guise of Rosita, a Sesame Street character of Mexican ethnicity. Rosita narrates a story in either English or Spanish, and she can respond to either language or even a mix of both in the same sentence. In addition to asking questions that center on the narrative, she offers “family questions” that serve as prompts for parents and children to launch into conversations in which they connect elements of the story to the family’s own life.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

The goal is to relieve parents of some of the cognitive load of the interaction by guiding the child through an understanding of the story and reinforcing new vocabulary, while also creating a space for parents and children to engage in a more intimate way than would be possible if the child were speaking to Rosita on their own.

This approach aligns precisely with the recommendations of the Office of Educational Technology, a division of the United States Department of Education. In a report published in May of 2023, the authors write, “We envision a technology-enhanced future more like an electric bike and less like robot vacuums. On an electric bike, the human is fully aware and fully in control, but their burden is less, and their effort is multiplied by a complementary technological enhancement.”

But most EdTech products—especially those that can be had for free—fall short of this ideal. If current trends continue, affluent households and schools will have access to the best AI tools for kids and students. But less affluent communities may have to make do with cheaper tech tools that promise much but deliver little. Rather than shoring up children’s language skills, such tools will only discourage interaction with the forms of intelligence that they learn best from—people they can trust.

Lead image: Sharomka / Shutterstock

Nautilus Members enjoy an ad-free experience. Log in or Join now .

References

1. Durham, R.E., Farkas, G., Hammer, C.S., Tomblin, J.B., & Catts, H.W. Kindergarten oral language skill: A key variable in the intergenerational transmission of socioeconomic status. Research in Social Stratification and Mobility 25, 294-305 (2007).

2. Lytle, S.R. & Kuhl, P.K. Social interaction and language acquisition: Toward a neurobiological view. In Fernández, E.M. & Cairns, H.S. (Eds.) The Handbook of Psycholinguistics John Wiley & Sons, Inc., Hoboken, NJ (2018).

3. Baldwin, D.A. Infants’ contribution to the achievement of joint reference. Child Development 62, 875-890 (1991).

Nautilus Members enjoy an ad-free experience. Log in or Join now .

4. O’Connell, L., Poulin-Dubois, D., Demke, T., & Guay, A. Can infants use a nonhuman agent’s gaze direction to establish word-object relations? Infancy 14, 414-438 (2009).

5. Sobel, D.M. How children learn from others: An analysis of selective word learning. Child Development 91, e1134-e1161 (2020).

6. Goldman, E.J. & Poulin-Dubois, D. Children’s anthropomorphism of inanimate agents. WIREs Cognitive Science e1676 (2024).

7. Oranc, C. & Küntay, A.C. Children’s perception of social robots as a source of information across different domains of knowledge. Cognitive Development 54, 100875 (2020).

Nautilus Members enjoy an ad-free experience. Log in or Join now .

8. De Felice, S., de C. Hamilton, A.F., Ponari, M., & Vigliocco, G. Learning from others is good, with others is better: The role of social interaction in human acquisition of new knowledge. Philosophical Transactions of the Royal Society B 378, 20210357 (2022).

9. Dikker, S., et al. Brain-to-brain synchrony tracks real-world dynamic group interactions in the classroom. Current Biology 27, 1375-1380 (2017).

10. Davidesco, I., et al. Brain-to-brain synchrony predicts long-term memory retention more accurately than individual brain measures. bioRxiv (2019).

Julie Sedivy

Posted on June 12, 2024

Julie Sedivy is a language scientist who has taught at Brown University and the University of Calgary. She is the author of Memory Speaks: On Losing and Reclaiming Language and Self. Her latest book, Linguaphile: A Life of Language Love, will be released in October 2024.