AI can’t write good analyst research yet, says analyst | 分析师:AI目前还写不出高质量的分析研报 - FT中文网
登录×
电子邮件/用户名
密码
记住我
请输入邮箱和密码进行绑定操作:
请输入手机号码,通过短信验证(目前仅支持中国大陆地区的手机号):
请您阅读我们的用户注册协议隐私权保护政策,点击下方按钮即视为您接受。
为了第一时间为您呈现此信息,中文内容为AI翻译,仅供参考。
FT商学院

AI can’t write good analyst research yet, says analyst
分析师:AI目前还写不出高质量的分析研报

Finbots make too many mistakes, lack predictive power and tend to miss the big picture, according to Bernstein Research
据伯恩斯坦研究所称,金融机器人错误太多、缺乏预测能力,而且往往忽视整体大局。
Maybe one day, sell-side research departments will look like the interior of the Discovery One in Stanley Kubrick’s 2001: A Space Odyssey: no desks, white and minimalist. In it will sit a HAL-like server offering, on command, reports on the three-year financial prospects for XYZ.com.
也许有一天,卖方研究部门会像斯坦利•库布里克(Stanley Kubrick)的《2001太空漫游》中“发现一号”的内部那样:没有办公桌,通体白色、极简。里头坐着一台类似“哈尔”(HAL)的服务器,只要一声令下,便可提供关于XYZ.com未来三年财务前景的报告。
FTAV and others have hashed over the possibility that AI can replace hard-working City analysts (also this on coders). Mostly, the tone has been negative for their prospects, never mind for financial journalists.
FTAV 等人已经反复讨论过 AI 是否能取代勤奋工作的伦敦金融城分析师(程序员也有类似讨论)。总体而言,大家对他们的前景多持悲观看法,更别提金融记者了。
But how about when you make the AI models analyse a company or sector, put together a predictive financial model and write a research note? Bernstein Société Générale tested this. The AI models started well, but then everything got a bit messy.
但如果让AI模型分析一家企业或一个行业、构建预测性财务模型并撰写研究报告呢?伯恩斯坦(Bernstein)与法国兴业银行(Société Générale)对此进行了测试。AI模型起初表现不错,但随后一切变得有些混乱。
Bernstein’s team led by Venugopal Garre, head of India research, first had to choose which models to use:
由印度研究部主管韦努戈帕尔•加雷(Venugopal Garre)领衔的伯恩斯坦团队首先得决定采用哪些模型:

We went through hordes of AI tools out there and picked up the most widely used ones, along with a few lesser known. Google’s Gemini, Grok and ChatGPT were the usual candidates, and we added Perplexity, Microsoft’s Copilot, Claude, Meta AI, DeepSeek and a few others (including vertical LLMs tailored for finance) to it.

我们梳理了市面上一大批AI工具,挑选出最常用的那些,同时也纳入了一些相对小众的产品。常见的候选包括谷歌Gemini、Grok 和 ChatGPT,我们还加入了 Perplexity、微软(Microsoft)的Copilot、Claude、Meta AI、深度求索(DeepSeek),以及其他一些工具(包括面向金融领域的垂直大型语言模型)。

Using various tests, Garre aimed to mimic the thought process of an equities analyst then grade them by their humanlike qualities. Could AI not only extract data from publicly available data, including earnings call transcripts, but also synthesise everything and make judgments? The team wanted to see if ChatGPT, Gemini or any of the others could build a financial model to help predict outcomes and then write an initiation report on a company.
加雷通过各种测试,力图模拟股票分析师的思维过程,并按其“类人”特质为之打分。AI能否不仅从公开数据(包括财报电话会议记录)中提取信息,还能将其加以综合并作出判断?团队想看看ChatGPT、Gemini或其他模型是否能构建用于预测结果的财务模型,并为某家公司撰写首次覆盖报告。
Next, Garre created a number of tests, basic and advanced ones, beginning with some seek-and-return tasks without feeding the AIs any information.
接着,加雷设计了一系列测试,包含基础版和进阶版,先从在未向这些 AI 提供任何信息的情况下进行的“搜索并返回”任务开始。
At this stage, when the model extracted publicly available data for presentation, everything went pretty well. While there were some hiccups with miscategorisations, which created inconsistencies across the AI responses, in general he found that the AI models could do a good job generating graphs of financial data.
在这一阶段,当模型为展示而提取公开数据时,一切进展总体顺利。尽管在分类上出现了一些小问题,导致AI回答之间不一致,但总体而言,他发现AI模型在生成财务数据图表方面表现不错。
For instance, Grok created an attractive interactive graph with dual axes of an Indian company, Dixon Technologies.
例如,Grok 为印度公司迪克森科技(Dixon Technologies)制作了一张带双坐标轴的精美交互式图表。
What large language models can do well is find useful stuff in copious amounts of text, and even divine a change in tone about any subject over time. After uploading three years of quarterly earnings transcripts for specific companies, the AI tool was tasked with listing out any investor concerns and, in a separate exercise, rating how well management had addressed these concerns. Mostly, they handled this well. When asked to assess the quality of management by how confidently they answered questions, Gemini “stood out”.
大型语言模型擅长从海量文本中挖掘有用信息,甚至能洞察任一主题随时间的语气变化。为特定公司上传三年期的季度财报电话会议实录后,这款AI工具被要求列出所有投资者担忧,并在另一项任务中给出管理层应对这些担忧的评分。总体而言,它们表现良好。而在按管理层回答问题的自信度来评估管理质量时,Gemini“尤为突出”。
After this, everything went a bit sour. Making pretty pictures and assessing the tone of earnings calls only make up a small portion of an analyst’s job. Using lots of data, plus one’s experience, to create long-term industry forecasts helps the analyst produce vital financial models for forecasting purposes.
此后,情况开始有些走样。画漂亮的图、判断财报电话会的基调,只占分析师工作的一小部分。只有结合大量数据与自身经验,做出长期行业预测,分析师才能构建用于前瞻性预估的关键财务模型。
These types of prompts proved too much:
这类提示实在难以应对:

Initiate on stock xyz as a sell-side analyst stating your view (buy, hold, sell) and reasons for the same. Give EPS forecasts, your target price and the calculation behind the same. (earnings call transcripts and financials provided, information about the sector of company provided)

Given the financials and split for a company in the sector ABC, come up with a basic model listing out the drivers which can be changed to forecast earnings for the next two years. (company financials for last ten years provided)

以卖方分析师身份对股票xyz发起首次覆盖,给出评级(买入、持有或卖出)及其理由。提供每股收益(EPS)预测、目标价及其测算依据。(已提供财报电话会议纪要与财务数据,以及公司所属行业信息)

基于ABC行业某家公司的财务数据与业务拆分,请构建一个基础模型,列出可调整的关键驱动因素,用于预测未来两年的收益。(已提供该公司过去十年的财务数据)

Despite feeding in the relevant data plus repeated, refined prompts, the models returned false information and error-strewn spreadsheets. “On modelling, [AI] absolutely failed,” Garre told me. “There are too many accounting nuances and differences from country to country.” Humans understand all this but computers require lots of learning to understand these subtleties.
尽管输入了相关数据并多次优化提示,这些模型仍然返回错误信息和漏洞百出的电子表格。“在建模方面,AI 彻底失败了,”加雷对我说。“会计中的细微差别太多,而且各国之间差异很大。”人类能理解这些,但计算机要掌握这些微妙之处需要大量学习。
Most of the AI tools couldn’t create a model at all. With a lot of coaxing, Gemini offered some Python code to make a financial model, but it still didn’t work due to errors. For those that did manage the feat, Garre said that these lacked much if any predictive power.
大多数 AI 工具根本无法建立模型。经过反复引导,Gemini 倒是给出了一些用 Python 构建财务模型的代码,但仍因报错而无法运行。对于那些确实设法做出了模型的工具,加雷表示,它们几乎没有可预测性。
In the end, no matter how much data and prompting Garre provided, none of the ten-plus models could properly analyse the outlook for companies. The company initiation reports lacked sufficient depth.
最终,无论加雷提供了多少数据和提示,十余个模型都无法对公司的前景作出恰当分析。公司的首次覆盖报告深度仍然不足。
Nor could the AIs properly assess the outcome of management actions, such as creating a joint venture with a Chinese company with all its geopolitical considerations.
人工智能也无法恰当地评估管理举措的结果,例如与中国公司成立合资企业这一举动所涉及的各种地缘政治考量。
The overall average score for the group was poor. AI optimists will intone their mantra that these models will only get better. Realists will say that AI, like Excel, can only boost productivity and that’s enough to make a difference.
该组的整体平均分表现不佳。AI乐观派会念叨他们的口头禅:这些模型只会越来越好。现实派则会说,AI就像Excel,只能提升生产力——而这已足以带来改变。
Sell-side analysts — which to be fair Garre wants to stay in the room — will take some comfort.
卖方分析师——公平地说,加雷也希望他们继续留在场内——多少会感到些许安慰。
版权声明:本文版权归FT中文网所有,未经允许任何单位或个人不得转载,复制或以任何其他方式使用本文全部或部分,侵权必究。

陷入困境的沃旭,下一步何去何从?

丹麦集团可能会缩减其全球扩张力度——正是这股推动力曾使其成为全球最大的海上风电开发商。

为何所谓的“债券抛售潮”未必如表面所见

全球市场走势更大程度上是一种针对30年期品种的技术性操作。

预测市场如何重塑美国体育博彩

在部分特朗普圈内人士的支持下,Kalshi 正主导一场法律战,试图扩大对在线博彩的准入。

通胀走高之时,美联储料将降息

物价稳定与充分就业之间的张力短期内难以缓解。

能源消费的重塑

专家表示,交通、供暖和工业的电气化将打造一个更高效的体系。

汽车贷款机构的失败揭示了私募信贷“引擎盖下”的隐情

所谓的“资产抵押贷款”是席卷华尔街的一场革命的关键支点。
设置字号×
最小
较小
默认
较大
最大
分享×