Recent headlines from Google and Meta have further heated the open source AI debate in the big tech industry.

CNBC reported Tuesday evening that Google’s latest Large Language Model (LLM) PaLM 2 uses nearly five times more text data for training than its predecessor. When Google announced the model last week, it said it was smaller than its predecessor PaLM, but uses more efficient techniques. The article highlighted the company’s reluctance to disclose training data sizes and other details.

A Google spokesperson declined to comment on CNBC’s report, but Google engineers were outraged by the leak, to say the least, and wanted to share their thoughts. In a now-deleted tweet, Google DeepMind senior staff his software his engineer Dmitry (Dima) Repikin tweeted:

Alex Polozov, a senior research scientist at Google, echoed the alleged rant, saying the leak set a precedent for promoting research silos.

I am so angry that I am taking this rant publicly. What the heck are you trying to accomplish with the leak? Is it just an ego thrill that matters?Hundreds of his Googlers work “hard” to keep the publishing and scientific collaboration alive. And it just sets the precedent for siloing everything. https://t.co/0o6iDj4PsJ

?? Alex Polozov (@Skiminok) May 17, 2023

Zurich-based Google AI researcher Lucas Beyer agreed, tweeting: It’s not the number of tokens (I’m not even sure if they’re right) that upsets me, but the total loss of trust and respect. It is Such leaks lead to corpse peaking and reduced openness over time, leading to a deterioration of the overall work/research environment. And for what? FFS.

Coincidentally, not in response to Google’s leaks, Yann LeCun, chief AI scientist at Meta, gave an interview to the New York Times this morning that focused on Metas’ open source AI efforts. gone.

In this article, Metas released the LLaMA large-scale language model in February, as they released the model’s source code to academics, government researchers, and others who provided their email addresses to Meta, thus making it a treasure trove of AI. is provided free of charge. [and could then] Download the code after the company has vetted the individual.

LeCun said in an interview that open platforms would win, then adding that the heightened secrecy at Google and OpenAI was a big mistake and a very bad view of what was going on. rice field.

VentureBeat journalist Sean Michael Kerner pointed out in a Twitter thread that Meta has indeed already distributed PyTorch, one of the most important AI/ML tools ever created. The basic stuff should be open, and it is. After all, what would OpenAI be without PyTorch?

But even Meta and LeCun will have their limits in terms of openness. For example, Meta provided his LLaMA model weights to scholars and researchers on a case-by-case basis, such as the Alpaca project at Stanford University, and the weights were then leaked to his 4chan. This leak actually gave developers around the world full access to his GPT-level LLM for the first time. It wasn’t a meta release. The meta-release did not include the release of his LLaMA model for commercial use.

VentureBeat spoke to Meta last month about its nuanced views on the open and closed source debate. Joelle Pineau, her vice president of AI research at Meta, said in an interview that accountability and transparency in AI models are essential.

More than ever before, we need people to see technology more transparently, and we need to value transparency, she said. He explained that it could change depending on the type of hazard.

My hope, and that reflected in our data access strategy, is to find ways to ensure the transparency of verifiability audits for these models, she said.

On the other hand, she said a certain level of openness went too far. That’s why the LLaMA model got a gated release, she explained. Many would have been happy to open up completely. I don’t think that’s responsible behavior today.

LeCun remains outspoken about exaggerated AI risks

Still, LeCun remains an advocate for open-source AI, arguing in an interview with The New York Times that spreading misinformation on social media is more dangerous than the latest LLM technology.

You can’t stop people from creating nonsense, dangerous information, etc., he said. But we can stop its spread.

And while Google and OpenAI may become more closed in AI research, LeCun insists that he and Meta will remain committed to open source, and progress will be faster if it’s open. Stated.

