Four Takeaways on the Race to Amass Data for A.I.
Online data has long been a valuable commodity. For years, Meta and Google have used data to target their online advertising. Netflix and Spotify have used it to recommend more movies and music. Political candidates have turned to data to learn which groups of voters to train their sights on.
Over the last 18 months, it has become increasingly clear that digital data is also crucial in the development of artificial intelligence. Here’s what to know.
The more data, the better.
The success of A.I. depends on data. That’s because A.I. models become more accurate and more humanlike with more data.
In the same way that a student learns by reading more books, essays and other information, large language models — the systems that are the basis of chatbots — also become more accurate and more powerful if they are fed more data.
Some large language models, such as OpenAI’s GPT-3, released in 2020, were trained on hundreds of billions of “tokens,” which are essentially words or pieces of words. More recent large language models were trained on more than three trillion tokens.