Building Moats in Generative AI
We explore several of the key approaches to defensibility across the GenAI stack
In the rapidly evolving domain of Generative AI (“GenAI”), one of the biggest questions founders building in this space consistently hear is “what sort of moat and defensibility does your product have?” As we’ve spent more time investing and building in this space ourselves, we’ve developed several perspectives across the various parts of the GenAI stack, which we hope may serve founders well as they think about starting something new.
TL;DR -
As the GenAI stack continues to evolve, we see opportunities to build meaningful moats across all layers. The most successful companies, however, will be those that can leverage their unique strengths, whether in capital deployment, specialized knowledge, or rapid iteration cycles, to create defensible and enduring positions in this dynamic market. Given how fast things continue to develop and the growing ubiquity of AI-first apps, we think it remains critical to “build with your moats in mind” to ensure a durable competitive advantage over the long term.
To start, we can break down the different layers of the GenAI stack:
- Hardware and Infrastructure: This layer of the toolchain provides the computing capacity (i.e. 100,000 infiniband-linked H100s) to serve model execution and inference. Companies like Microsoft, CoreWeave, AWS, and Google Cloud provide the powerful computing systems needed to run AI models.
- Foundational Model Builders: This is where we are seeing billions of paper value being created by bundling models and inference. These are companies that create the core models (e.g. ChatGPT, Claude) and often offer ways to use them. Examples include Google, OpenAI, and Mistral.
- Application Layer: We can break these down into B2B and Consumer use cases
- B2B AI Applications: We'll look at how businesses use AI models and what gives some companies an advantage in this area.
- Consumer AI: Finally, we'll discuss how AI is used in products for everyday people and what opportunities exist in this space.
Hardware & Infrastructure:
Moats are intuitive in this category, largely driven by capital/scale and first-mover advantages:
- Capital and Scale: Capital and the cluster sizes that capital enables are the larger of the two moats, because of the impact of computing power on model output quality (see figure 1 below). In the early phases we see the following dynamics.
- High levels of capex = larger cluster = better trained model = higher quality output.
- Capital access and innovation also breeds better customer acquisition, as well as M&A opportunities, because negotiation leverage grows with scale.
- We note, while cluster size can lead to an early lead, we expect to see a plateau in performance gains over time - though perhaps not as quickly as the last wave of “AI”. Take, for example, collaborative filtering/scoring “AI”, which still powers every credit card platform's fraud systems. ML models have since plateaued, such that adding power (or increasing time to train) doesn't materially change model performance.
- First-Mover Advantage: Being early to a market or problem space can help attract customers (due to scarcity) and enables a founder to rapidly incorporate customer feedback to build an early lead out of the gate.
- One example we can draw on is what we’ve seen from CoreWeave. They leveraged their cryptocurrency mining GPU infrastructure and a strategic relationship with NVIDIA to accelerate into the “ChatGPT moment” of hyper AI awareness.
- They’ve built a strong lead in datacenter ops software, datacenter leasing and know-how, power management, and node resilience (which reduces costs of re-running a model when a GPU crashes). Further, with power and data centers as a scarce resource, locking those down first prevents competitors from access. Finally, as newer hardware pushes requirements on data centers (like the shift from air to water cooling required by the next generation of Nvidia cards), this moat becomes more pronounced, given the scarcity of “top shelf datacenters” like the next-gen water cooled data centers.
- First movers benefit from early customer feedback, reinforcing leadership in operational innovations like data center design, devops tools, and core GPU cluster designs. First movers are able to access more customer insights, integrations and get more companies built on their APIs (like Twilios and Stripes of the LLM domain).
- Early feedback from customers ultimately provides an advantage in discovering and incorporating best practices for the new wave of platform operations and design.
The importance of capital in successful models in this domain, combined with more aggressive execution by public companies, is a reason why hyperscalers like Google and Microsoft have fared well, profiting from this wave of innovation. Startups that want to succeed here have not only been incredibly aggressive at fundraising, but clever in other parts of the capital moat, like: a) choosing hardware platforms like Dell, that provide best-in-class leasing and financing over SMCI, which has weaker leasing/financing programs, or, like CoreWeave, b) issuing GPU asset-backed debt.
FIGURE 1: LLM model quality improves with more computing power
Foundational Model Level:
When thinking about the model layer, investment into the category and moat creation have some additional considerations:
- Intellectual Property and Talent: While companies like Mistral, OpenAI, and Anthropic have a significant advantage in algorithmic insights and fundamental science, this moat is narrowing.
- Open-source projects like Hugging Face's transformers library and Meta’s development of Llama have democratized access to state-of-the-art models.
- Talented individuals from academia and industry are constantly moving between organizations, spreading knowledge.
- Capital and Scale: Access to substantial financial resources remains crucial for several reasons:
- Training large language models requires immense computing power. For instance, training GPT-3 was estimated to cost around $4.6 million.
- Larger clusters enable more extensive parameter tuning and experimentation. OpenAI's GPT-3 has 175 billion parameters, while Google's PaLM has 540 billion.
- Economies of scale in hardware procurement. For example, Microsoft's partnership with OpenAI provides the latter with access to Azure's vast computing resources at competitive rates.
- Speed to Market: Being first to release advanced models creates significant advantages:
- Early releases capture market attention and build brand recognition. GPT-3's release in 2020 catapulted OpenAI to the forefront of AI discussions.
- First movers can establish API ecosystems and developer communities. OpenAI's GPT-3 API had over 300 applications built on it within months of its release.
- Rapid iteration based on real-world feedback. Google's BERT model quickly became a standard in natural language processing due to its early release and subsequent improvements.
- Dual-Use Infrastructure: Leading model builders are leveraging their hardware investments efficiently:
- The same clusters used for training are repurposed for inference during non-training periods. This approach maximizes the utility of expensive hardware investments.
- For example, OpenAI uses its vast GPU clusters not only to train models like GPT-3 and DALL-E, but also to power their API services for customers.
In summary, the key moats for AI model builders are:
- Fundamental model architectures and the talent to innovate
- Access to capital and ability to operate at scale
- Speed to market and ability to capture mindshare
- Efficient utilization of hardware resources for both training and inference
Tooling (aka Picks and Shovels)
These companies create tools that help developers work with AI models more easily. Some examples of tooling being developed:
- AI Testing: Ensure AI models give consistent results (e.g., Distributional)
- Bias detection: Check for unfair biases in AI outputs
- Content filtering: Make sure AI-generated content is appropriate
One advantage of this part of the value chain is that expertise is both crucial and narrow in terms of the specialists that can build here. Competitors might not be able to catch up as quickly where domain expertise is held in smaller circles. Compared to AI model builders, these tooling startups also don't need as much money to succeed. Instead, they focus on becoming leaders in their specific areas, building expert teams, and growing to dominate their market segment.
B2B Applications
The application layer is where we will begin to see new companies and experiences built around or the tooling/infrastructure that we previously discussed being integrated into legacy applications. A couple of considerations include:
- Low Barriers to Entry: Starting an AI company is cheaper and easier than ever before. At SumUp, we saw small teams creating working AI tools in just a week
- How will they be built? We think specialization will be key to successful adoption and accuracy in replicating workflows. Initial industry adoption will be in verticals with substantial text or verbal communication
- Law firms: Improving invoices
- Auditors: Analyzing data and documents
- Doctor's offices: AI-powered reception and booking
- Where will the competitive advantages (moats) be developed?
- Integrations (weakest advantage) - connecting with clients' existing systems
- Cross-client data (strong advantage) - using insights from multiple clients to improve the product. Example: Justt helps payments companies win disputes by learning from all clients
- Specialized workflows (strongest advantage) - creating complex AI-powered processes for specific industries. Example: UpLink improved auditor workflows with AI
- Existing platforms - Companies like Palantir and Salesforce have an edge because they already have complex systems in place
- Why Data Alone Isn't Enough?
- AI models are trained on general data, not company-specific data. It’s easier for AI native applications to access client data (emails, Slack, etc).
To summarize, we think about picking winners in B2B applications as driven by those that can create specialized workflows for specific industries and leverage insights from multiple clients, rather than relying solely on data or simple integrations.
Consumer Applications
While we’re excited about innovation in the consumer segment, we think that large incumbents currently have an outsized advantage. Companies like Gmail, Dropbox, and Discord already have user data, and can easily add AI powered search into their existing services.
Startups may struggle to compete in existing moat categories like search and content editing, which can be quickly added by large tech players. However, we’re excited by the potential for startups to build in consumer productivity. Examples include scheduling of everyday tasks, trip planning, spam or fraud detection, etc.
We think these productivity tools can succeed through personalization to the individual users, while big companies might be slow to change existing designs. Users may also resist large scale changes in familiar apps. Successful consumer AI companies will:
- Create unique workflow designs
- Build a large user base quickly
- Learn from user feedback to stay ahead. The key is to land power users and tailor the experience from there.
Conclusion: The most practical advice we can give to founders is to identify domains where you have expertise or unique access, and focus there. Have a particular lens to what workflows you can build for those domains, especially workflows that involve multiple modalities, including Calendar, integrations, text to speech, and of course, the obvious LLM step, and take comfort that in those domains, defensibility grows with every relevant extension you make for your customer base onto your products.