TikTok owner ByteDance has added 2 new text-to-video prompt Large language models (LLMs) to its Doubao AI chatbot as it eyes up rivaling OpenAI's Sora.
The Chinese-owned company, which announced the launch of Doubao, an AI chatbot last year which will be available to use in early October has announced Doubao-PixelDance and Doubao-Seaweed models, which are designed for creating videos from text and image prompts specifically catering toward video generation.
ByteDance introduced the new LLMs at the launch event in Shenzhen on September 23, 2024, China this month.
Volcano Engine, the cloud computing services platform of TikTok’s parent company ByteDance highlighted that the new LLMs are two video-generation AI tools.
Doubao-PixelDance is designed to manage complex and sequential motions and is equipped with the capability to create 10-second videos. Whereas the Doubao-Seaweed model can generate clips of up to 30 seconds.
Tan Dai, president of ByteDance’s Volcano Engine announced at the launch event that PizelDance and Seaweed will be available at the beginning of October.
ByteDance AI model accessible in October
Alluding to CapCut – the Chinese version of TikTok and ByteDance’s famous video editing app outside the mainland, Tan said that the addition of video-generation AI models to the Doubao LLM family “has benefited from the capabilities of understanding videos accumulated by Douyin and Jianying over the years.”
The Volcano Engine president exhibited the new AI models at the event showing their video generation capabilities that simulated real-life scenes. This was similar to a first-person view of driving a car or like fictional clips including a winged frog flying and a floating island.
Tan added that the new LLMs offer “stability”, in terms of subject and style, when a video cuts from one shot to another, which continues to be a huge challenge for other video-generation LLMs.
ByteDance’s Doubao features audio, language, image, and video creation tools and its latest addition – an AI chat assistant. It also comprises an interactive entertainment application and a few more programs.
In the past, Volcano Engine mainly used video generation AI tools internally but now the application is expanding.
Tan said that AI models are still not very popular in China because of the high cost and with enterprises’ large-scale adoption of AI models, larger concurrent traffic is becoming a key factor in the development of the industry.
Volcano Engine made a risky yet bold move in May when it dramatically undercut the competition in the enterprise AI model market. Their main model was priced at over 99% less than the industry average.
Tan however noted that Volcano Engine’s gross profit margin remained positive even after the price cut, Tan pointed out. He believes that the most important factor for business-to-business service providers is sustainability.
Since May, the everyday use of Doubao models surged by roughly 11 times, reaching an average of over 1.3 trillion. This translates to an average of 50 million images and 850,000 hours of voice being processed daily.
As the demand in China for text-to-video prompt tools rises, the new LLMs are striving to benefit from ByteDance's experience in video understanding and editing through Douyin and CapCut.
These AI models can manage complex motions, produce longer videos, and maintain consistency in style across different shots.
ByteDance plans to monetize the LLMs by offering them as a service as soon as early next month.
Rivaling OpenAI’s Sora
The development of Doubao-PixelDance and Doubao-Seaweed models comes after OpenAI announced Sora earlier this year in February.
Sora AI is a text-to-video AI model that aims to help people solve problems that require real-world interaction. It can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.
It optimizes diffusion models to generate videos from text descriptions. The model is trained on massive datasets of videos and their corresponding descriptions, it then employs machine learning to gradually transform the noise into meaningful visuals.
However, it’s also been reported that TikTok developer ByteDance was secretly using OpenAI’s tech to build a competitor.
The Verge reported in December 2023 that TikTok’s entrancing “For You” feed made its parent company, ByteDance, an AI leader on the world stage. Since the same company was vastly behind in the gen AI race, it was actually secretly using OpenAI’s technology to develop its own competing large language model, or LLM.
This action violated OpenAI’s terms of service but it didn’t entirely stop ByteDance. The company, in fact, sought access to OpenAi via Microsoft.
According to The Verge, internal ByteDance documents confirmed that the OpenAI API has been relied on to develop its foundational LLM, codenamed Project Seed, during nearly every phase of development, including for training and evaluating the model.
In July 2024, another report revealed that TikTok paid Microsoft nearly $20 million per month to access OpenAI’s models. This made up nearly a quarter of the division's revenue.
Microsoft's cloud AI business was projected to generate $1 billion annually, according to The Information. However, the report noted that TikTok's reliance on these capabilities could be reduced if it develops its own LLM.