Data is king: Why content creators must rethink their role in the AI era - FT中文网
登录×
电子邮件/用户名
密码
记住我
请输入邮箱和密码进行绑定操作:
请输入手机号码,通过短信验证(目前仅支持中国大陆地区的手机号):
请您阅读我们的用户注册协议隐私权保护政策,点击下方按钮即视为您接受。
双语电台

Data is king: Why content creators must rethink their role in the AI era

Content creators may feel the most profound shift and play a more important role as data becomes a strategic asset in the AI era
00:00

{"text":[[{"start":9.53,"text":"This article only represents the author's own views."}],[{"start":13.76,"text":"As the global AI race heats up, it’s becoming clear that data doesn’t lose its value once large models reach the reasoning stage. On the contrary, it’s even more critical due to the need for dynamic knowledge. The so-called “last mile” of high-quality datasets often determines a model’s ultimate performance."}],[{"start":36.15,"text":"That is likely why Facebook parent Meta Platforms (META.US) made a $14.3 billion strategic investment in Scale AI, a company focused on data labeling and cleaning for AI training."}],[{"start":53.18,"text":"Scale AI provides structured, high-quality datasets to OpenAI, Meta, Google and other tech giants by combining the output of massive human labor with automated pipelines. Its data labeling process involves tagging images, texts or audio with meaningful metadata — such as identifying pedestrians in a photo or labeling the point of an article. Data cleaning eliminates errors, duplicates or irrelevant material to ensure consistency and accuracy."}],[{"start":87.37,"text":"Another example of the growing value of quality data is a recent licensing deal between The New York Times and Amazon (AMZN.US), which allows fact-checked editorial content to be used for training AI models. A similar agreement between the Associated Press and OpenAI has also been signed."}],[{"start":109.52000000000001,"text":"Though these arrangements are described as content licensing, they reflect a deeper shift: content has become data, and data has become a service. These deals highlight how media organizations are reassessing the value of their content, while AI developers continue to pursue high-quality material with growing urgency."}],[{"start":131.46,"text":"In contrast, the Chinese-language AI ecosystem faces unique challenges, such as a shortage of publicly available data, lack of large-scale professional annotation and difficulty digitizing classical and cultural texts at scale. Such obstacles highlight the challenges facing development of localized large AI models."}],[{"start":155.99,"text":"Chinese-language materials are relatively scarce"}],[{"start":159.62,"text":"A white paper published by Alibaba Research Institute notes that English accounts for 59.8% of all crawlable web text, while Chinese represents just 1.3%. Wikipedia, a commonly used open resource, has over 7 million English articles, whereas there are only 1.5 million Chinese — less than a quarter of the volume."}],[{"start":184.85,"text":"This imbalance creates a major disadvantage. Without sufficient publicly available Chinese material, local large language models in Chinese may fall far behind their English-language counterparts in natural understanding and text generation — potentially leading to culturally mismatched outputs and a sense that these models have “consumed too much foreign ink.”"}],[{"start":209.9,"text":"Chinese authorities have long recognized this gap and have taken steps to address it. Platforms such as People’s Daily and Xinhua are actively constructing curated, high-quality materials, consisting of vetted news, commentary and policy interpretation, designed to ensure alignment with official values and to support AI safety from a moral and ideological standpoint."}],[{"start":237.43,"text":"Initiatives like the \"Cyber Research Large Language Model\" further concentrate on integrating data from legal and policy documents, state media and other publications, reinforcing alignment with Chinese values."}],[{"start":252.22,"text":"In China, such value alignment has become a basic requirement for any domestic AI system. While China has yet to produce a company of Scale AI’s size, several local firms, including Aishu Technology, Testin, iFlytek (002230.SZ) and Haitai Ruisheng (688787.SH), are building up their capabilities in large-scale data annotation and cleaning. The Shanghai AI Lab is also developing a platform-based material processing system in partnership with policy and academic resources, laying the foundation for a “Chinese version of Scale AI.”"}],[{"start":293.65,"text":"According to market research firm IDC, the value of China’s AI training data market was estimated at $260 million in 2023, and is expected to grow to approximately $2.32 billion by 2032, representing a compound annual growth rate of 27.4%."}],[{"start":317.23999999999995,"text":"Ultimately, the performance of any AI model depends on the content it consumes. In the AI era, content creators — especially those in journalism — must recognize that they are no longer merely material providers. They are now an integral part of the data services supply chain."}],[{"start":337.37999999999994,"text":"When news stories, commentary, academic papers and cultural archives are structured, semantically labeled and integrated into AI training pipelines, their value shifts from real-time information to durable data assets. Content creators who proactively organize and annotate their materials, and pursue licensing partnerships with AI developers, may find themselves unlocking new revenue opportunities."}],[{"start":367.2099999999999,"text":"It’s time for content to be seen not just as narrative, but also as infrastructure."}],[{"start":384.2499999999999,"text":""}]],"url":"https://audio.ftmailbox.cn/album/a_1750297349_2997.mp3"}

版权声明:本文版权归FT中文网所有,未经允许任何单位或个人不得转载,复制或以任何其他方式使用本文全部或部分,侵权必究。

苏格兰“威士忌湖”再现

由于苏格兰酿酒厂库存过剩,一些酒厂被迫暂停或缩减生产规模,这引发了人们对乡村经济的担忧。

年度关键词:AI泡沫

硅谷与华尔街的“高阶祭司”们开始承认大型科技股估值的确有些过火。

FT社评:英国资产折价,外资收购的盛宴与隐忧

外资收购金额激增,既映照出资产的优质底色,也暴露出其仍在折价交易的现实。

特朗普任命美国格陵兰岛特使

杰夫•兰德里的角色凸显了美国总统控制北极岛屿的决心。

福特押注“未来卡车”的电动化如何导致195亿美元减记

F-150 Lightning的经济账算不拢,也折射出全行业对美国电动汽车普及速度的误判。

一周展望:金融市场会在最后关头上演“圣诞行情”吗?

由于担忧人工智能公司的巨额开支,今年的“圣诞行情”迟迟没有到来。
设置字号×
最小
较小
默认
较大
最大
分享×