Washington Post develops AI reader tool with Virginia Tech

One of the world's most famous news organizations is developing AI-powered tools to improve the reader experience.

The Washington Post is partnering with Virginia Tech's Sanghani Center for Artificial and Data Analysis to develop the new technology. This is a generative AI project that allows readers to get answers to their questions using data taken from the Post's previous coverage. The plan is to build it to understand the intent of a user's question, rather than simply relying on keywords like other AI platforms.

Although the project's development will take place out of the Virginia Tech Innovation Campus, the physical space is expected to be open in Alexandria through spring 2025. Currently, students and faculty work at the Arlington and Falls Church facility.

Sam Hung told that the partnership was born out of the Post's desire to be a leader in new ways people find and consume information.

Han is the paper's head of data and AI. He has been in the post for about seven years, the past three in his current role.

People are getting used to asking questions; [getting] Han said they answer directly rather than read and understand. That's the trend we're observing. And we want to be part of that change, or in some sense a revolution, and be at the forefront as a media technology company. We want to be technologically prepared to provide our readers with the best possible media experience.

Engineers consider implicit assumptions and context. Han gave the example of someone asking who won the Super Bowl. Usually they're asking about recent wins, not the past few years.

Among other things, for questions like this, we use a technology called search augmentation generation (RAG) to provide answers that are more likely to actually answer someone's question. RAGs allow generative AI systems to access new information beyond the initial training data. In this case, Han explained, the paper covers the latest content.

“The goal is to build technology assets for us in this new world” Sam Hung Washington Post

Post will also employ multimodal large-scale language model (LLM) technology. This means that AI tools can not only extract from text, but also integrate information found in audio and video reporting products.

The New York Times is suing OpenAI and Microsoft for copyright infringement, saying millions of articles were used to build AI models. In August 2023, the paper was blocked from allowing OpenAI to scrape its content to train models. The BBC, CNN and Reuters followed suit.

In May 2023, Fred Ryan, former CEO and publisher of The Post, announced in a press release that AI was a priority opportunity. At the same time, the Post established an AI task force and an AI hub, the latter of which will be led by Han.

At this time, there is no specific schedule for when readers will be able to see this feature, Han said. With two doctoral students and three Virginia Tech faculty members overseeing him, he began a year-long research and development effort to build the tool's search capabilities.

According to Naren Ramakrishnan, director of the Sanghani Center, this partnership will provide students with an unparalleled educational experience as they will have the opportunity to work on demanding real-world projects.

It also helps the Post stay on top of the latest AI trends.

The goal is to build technology assets for us in this new world, where large-scale language model AI will play a key role in providing conversational information consumption, Han said.

