The UK’s Competition and Markets Authority (CMA) has published an initial report on its investigation into AI foundation models, addressing competitive concerns in the models’ development phase and their downstream deployment. The report remains optimistically equivocal concerning the likelihood of this market becoming a monopoly — a view which jars with recent evidence from this sector. Although the CMA does address the key concerns of AI monopolisation, such as access to hardware and the application of economies of scale, its conclusions exhibit an indecisiveness that is especially troubling in our moment of rapid technological change.
Upstream competitive concerns
The advent of AI technology promises to revolutionise every facet of our modern economy, from healthcare and financial services to education. Leading the way to this next digital age are reusable foundation models (FMs). These are trained on massive datasets of text and images and can be adapted and applied to any industry task or sector.
Citing the emergence of at least 160 FMs since 2018, the CMA argues that competition is currently healthy in this sector. Moreover, unlike search engine or social media markets where network effects establish ‘winner-takes-most’ outcomes, FMs are not necessarily, they argue, natural monopolies. ChatGPT’s service does not improve, and in fact reduces the likelihood of effective competition just by virtue of its popularity. In this it diverges markedly from the dominant players of the current ‘web 2.0’ internet. Chrome’s dominance is, for instance, very much tied to its high usage due to improvements made to its algorithm with repeated use. Likewise, users become locked into Facebook by the presence of their friends – past, present, and future. The CMA does recognise, however, that user-generated data could become an important characteristic of FMs in the future — perhaps leading to economies of scale and network effects — even if it is largely unusable at present. For the moment, as the report notes, switching between FMs is relatively cheap and easy, even if certain of their properties, such as their ability to customise their outputs to users or their integration in web ecosystems, may lend themselves to ‘lock in’ effects. These properties took centre stage at OpenAI’s developer conference on 6 November.
The company announced its plans to establish an online marketplace in which customers can buy access to customised GPT models. If OpenAI’s proposals succeed, and especially if other large FM developers follow suit, then the FM market will succumb to the network and ‘lock in’ effects that characterise the innovations of web 2.0 companies.
In fact, the success of the web 2.0 companies may already have determined the monopolisation of the emergent AI market. The CMA voices fears that the vast data resources and computational power required for the training of FMs, and dominated by incumbent big tech firms, could distort competitive outcomes. The majority of foundation model providers rent computational capabilities from a limited number of cloud providers (Google, Amazon, and Microsoft). Indeed, the massive cost and complexity of doing otherwise — which would require significant investment in data centres and semiconductor technology that the tech giants already possess — means that, in practice, the only alternative to renting from these firms is partnership, tying new entrants to the infrastructure and expertise of the existing tech firms. For instance, in 2019 Microsoft partnered with OpenAI by investing $1 billion towards the development of the company’s supercomputer. The fact that these large technology firms are already vertically integrated in the supply chain poses a threat to the future competitiveness of the market.
By the same token, these large firms also have access to proprietary data. It is currently unclear how significant this will be. Training FMs currently requires immense quantities of data. There is a public stock of high-quality data but the CMA notes that these resources may soon be ‘fully exploited for gaining improved model performance’. The effects of this exhaustion are yet to be determined. Perhaps, the proprietary data to which big tech companies — especially those with search engines — have access will form a sizeable competitive edge. In an internal memo, Google have already attributed their ‘nice head-start’ in this industry to their training data. While it remains to be seen if startups can close the gap, these noises from within Google’s organisation prompt concerns about the future competitiveness of the FM market.
The CMA also notes that the outcome of the intellectual property concerns that FMs have prompted will have a vital effect on this issue. If data-scraping practices are ruled to violate copyright, high-quality training data would become scarcer, which would benefit the companies with proprietary data at their fingertips. A recent white paper from the News Media Alliance (NMA) uncovers the significance of publisher-owned content to FM training datasets.1 They reasonably claim that FM developers have infringed their copyright to obtain high-quality data. Were regulatory bodies to cooperate with the NMA, this body of information would be less accessible to future FM developers — giving an undue competitive advantage to pre-existing FMs.
Professor Jason Furman, author of the highly influential Furman Report,2 also highlighted these issues when giving evidence to parliament on the new Digital Markets, Competition and Consumers Bill. He explained that the requirement for large amounts of data does not ‘necessarily lend [artificial intelligence] to a new upstart competitor but would instead entrench the power of the existing ones’. Furman’s analysis supports contemporary evidence that the CMA’s diagnosis of the competitive risks of FM developments is too slack.
Downstream competitive concerns
The broad uses of FMs in downstream markets are yet to be fully realised. Consumer and industry interactions are largely limited to general purpose chatbots, but the potential is enormous. Boston Consulting Group, for instance, has predicted that the total market for generative AI will increase from $18 billion in 2023 to $121 billion in 2027. Tools tailored to specific industries and tasks could see enormous efficiency boosts across the economy. However, the cost of developing in-house tools means that with notable exceptions (Bloomberg has developed its own proprietary large language model, BloombergGPT, integrated into the Bloomberg Terminal) most downstream uses will rely on licences from FM developers.
The CMA also recognises that the relative generality or specificity of FMs will influence the future downstream competitiveness of the supply chain. If highly specific FMs that have been fine-tuned to specific tasks — for instance, an FM that deals with drafting legal contracts — can remain viable in future markets, then a wide range of organisations may be able to innovate in this market. But if larger and highly generalised FMs can outperform the specialised FMs then ‘the number of FMs available could consolidate, as a small number of models could meet the needs of most users’. Because of the vast amounts of resources required to build generalised FMs, it would only be the largest and most entrenched market players that could develop these products. The CMA raises this concern as a point of uncertainty; it is unclear how the downstream ecosystem will develop in the coming years. But the looming presence of network and lock-in effects in this industry suggests a clear path for the industry towards a monopolised, winner-takes-all mentality. And so the CMA’s refusal to commit itself to this hypothesis appears naïve.
Remedies
As it stands, key FM model developers also typically have some consumer interface — Google’s Bard, for instance — and it is likely that upstream players will be interested in further developing tailored consumer and industry tools as their demand proliferates. The natural concern highlighted by the CMA would then emerge if Google, as an FM provider with substantial market power, refused or restricted access to its FM in order to weaken its dependent competitors. Like most similar scenarios where a dominant platform has the incentive to foreclose customers with whom they also compete, the answer lies in interoperability measures.3
Major causes of concern in both the upstream (data inequality) and downstream (dual supplier competitor relationships) markets of foundational models are arguably solvable by allowing mechanisms for data sharing and preventing bundling and exclusion. Indeed, the central conclusion of Professor Furman’s report to parliament was that pro-competition policies and legislation are absolutely necessary to support innovation in digital markets. Specifically, interoperability protocols are essential to counteract the natural advantages enjoyed by big tech companies and foster healthy competition from new entrants. Obtaining an internationally recognised metric for comparing the performance of foundation models would form one method of making the ecosystem easier to navigate for start-ups and consumers.
Transparency and interoperability are not novel ideas. When in the 1970s IBM, then the dominant supplier of mainframe computers, was forced by the US and EU authorities to open up its interfaces it gave rise to innovation in peripherals like PCs, paving the way for future tech titans such as Apple and Microsoft.
Fortunately, as the CMA’s report attests, policymakers appear aware of the lessons of the past. The CMA’s guiding principles for AI, which stress the need for access, choice and flexibility, self-consciously ‘draw on lessons learned from the evolution of other technology markets and how they might apply to foundation models as they are developed’. Moreover, in Europe the EU Data Governance Act similarly proposes wider data transfer measures. The Digital Markets Act requires that large search engines share data including ranking, query, click and view. This again might provide a legislative remedy to the distortion of AI markets.