Fidelity’s Vast Trove of Data Coveted by Tech Firms in Age of AI

As tech companies the world over race to create AI services akin to ChatGPT, the underlying raw material required — data — is suddenly in demand like never before.

(Bloomberg) — As tech companies the world over race to create AI services akin to ChatGPT, the underlying raw material required — data — is suddenly in demand like never before.

Fidelity Investments is a case in point: tech startups and conglomerates alike are courting the wealth management giant to lay their hands on its vault of financial services data, Chief Information Officer Mihir Shah said in an interview. For companies seeking to build AI systems for the finance industry, Fidelity’s decades worth of online transaction records, customer call transcripts and face-to-face client interaction reports would be a treasure trove. It holds about 8 petabytes of data — equivalent to trillions of pages of printed text.

The US investment company, which oversees more than $11 trillion and has tens of millions of customers, hasn’t engaged with any of the suitors, said Shah, who is leading an effort to harvest value from Fidelity’s data. The firm has considered building its own AI model, although it hasn’t decided whether to go that route, he said. Any data it shared would be anonymized and scrubbed of personal information in keeping with the best security practices, he said.

Services such as ChatGPT are based on large language models, or AI systems that analyze vast quantities of writing from across the internet and other sources in order to determine how to generate human-sounding text. The technology has spurred excitement across industries as companies seek ways to reduce costs and better serve their customers — with banks from JPMorgan Chase & Co. to Morgan Stanley among those leading the way.

ChatGPT creator OpenAI, backed by Microsoft Corp., as well as Alphabet Inc. and Meta Platforms Inc. are among the tech leaders in the field. They all mostly use the same public data for training their systems to understand and generate text or code in human-like fashion.

But proprietary data such as that owned by Fidelity would enable an AI service to stand out from the competition, said Shah, who started at Fidelity 29 years ago and and oversaw the building of its website — the first for a major financial services company. He’s now directing the creation of Fidelity’s companywide cloud-based warehouse for its data, part of an effort to put it to better use.

“The differentiation will be in combining first-party data with public data to have a vertical large language model for financial services,” Shah, who is based in Boston, said via video. “We’ve already seen vertical LLMs coming up in scientific research and health-care industries.”

A large language model’s value depends largely on the amount and quality of the data it’s trained on. Massive amounts of text, images, sound and other information are required to make the AI models learn patterns and relationships, so they can then generate content based on them.

Fidelity’s data is deemed so attractive that some suitors have proposed building an AI system for the company for free, in exchange for collaboration, Shah said. Much of Fidelity’s data is relatively current, saved in the past seven years as per the latest compliance requirements, he said. Fidelity has more than 42 million customers, and it manages retirement plans and other benefit programs for tens of thousands of businesses.

As Fidelity decides on how to deploy the data, it needs to take into account AI systems’ challenges such as reliability, bias, and how personally identifiable information is processed, Shah said. Meanwhile, the company is taking steps to tighten its security infrastructure and adding further restrictions on who can access the data, he said.

“We are exercising extreme caution with these new tools,” Shah said. “With generative AI, you can’t fully trust the results.”

More stories like this are available on bloomberg.com