Axial partners with great founders and inventors. We invest in early-stage life sciences companies often when they are no more than an idea. We are fanatical about helping the rare inventor who is compelled to build their own enduring business. If you or someone you know has a great idea or company in life sciences, Axial would be excited to get to know you and possibly invest in your vision and company . We are excited to be in business with you - email us at info@axialvc.com
The Business of AI in Life Sciences
Artificial intelligence (AI) has the potential to transform many parts of life sciences from preclinical drug development and healthcare to synthetic biology and diagnostics. Basic research in AI has made major strides over the last 5 years, and when combined with biologists working with data as much as they work on the bench, the technology is changing how biology is studied and engineered.
As a result, AI-first companies in life sciences at first may not look like traditional companies. Whereas, traditional life sciences companies usually have a core set of IP or are based on a biological hypothesis, AI life sciences companies often look like R&D shops or services companies at first. This transition from services to a product focus has several trade offs but provides the potential to build a more scalable business model. In particular, these companies face challenges with:
Generating large amounts of data that often needs to be unbiased, which means a lot more capital is required here than traditional companies
Recruiting and training engineering talent
Implementing biologically relevant models
Generating large amounts of data that is often unbiased
First and foremost, an AI-first life sciences company needs high-quality data. In particular, this is very important in drug development where data is often not shared and stuck within the silos of each company. Diagnostics and healthcare software companies have accessible retrospective studies as long as you have the network within the medical community. And synthetic biology is in a similar position as drug development.
These types of companies often have to make large upfront investments in custom wet-lab infrastructure. There are two main reasons for this requirement right now: (1) lack of high quality public datasets that are labeled properly and removed of artifacts and (2) datasets not comprehensively designed to build accurate models. AI-first companies design custom experimental workflows, which are powered by the increasing power of robotic automation to scale wet lab work. Automation can be a major differentiator to produce high quality data for AI that cannot be generated by hand. This custom infrastructure can be pointed to iPSCs, DNA-encoded libraries, microscopy data, antibody screening, single-cell profiling, and more. This experimental data is fed into algorithms to hopefully generate new insights into everything from small molecule discovery to CHO-cell design. So many of these companies initially resemble an academic lab working to make new discoveries in biology. They need large amounts of technical investment to build the relevant datasets. This type of work is expensive and not accessible to most; as a result, new companies often are built on the premise that large amounts of data will somehow lead to a better or a higher number of products.
Recruiting and training engineering talent
The second challenge is getting software engineers and biologists to work well together. An important problem is tightly coupling computational and wet-lab work. This is driven by how well groups on both sides collaborate.
Moreover, incumbents, especially in drug development, are usually not willing to invest in machine learning talent because they don’t want to pay them more than their executives. There is a scenario over the next 1-2 decades that the best software engineers in the world easily earn the same amounts as star athletes or movie stars. The best AI-first life sciences companies do a good job at monopolizing ML talent. Insitro has done a great job here. These companies also invest resources in building sustainable tech infrastructure that can be maintained and used by biologists and data scientists.
Implementing biologically relevant models
Another important challenge is bringing the datasets and talent together to build better models of biology. These models could be used to classify certain cells, match small molecules to certain targets, or predict the behavior of a genetic circuit. If the biological data is accurately generated and an interdisciplinary team is built, these models have the potential to identify new drug targets, engineer metabolic pathways, and design better medicines. However, a major problem is that the scale of the data quickly grows to a size that a life sciences team cannot handle without the support of world-class software engineering.
Ultimately, AI life sciences companies initially appear like services companies working on software design and data generation. To experienced people in life sciences, these types of companies look more academic than commercial. However, the long-term thesis for most of these companies is to develop and commercialize their own products. Time will tell which ones succeed. As a result, building an AI-driven life sciences company is different in a few fundamental ways: data generation, talent, and models.
Generating large amounts of data that is often unbiased
The first part, data generation, is a substantial cost for AI-driven life sciences companies. At least in biotech, this goes against recent trends to reduce initial setup costs. During the 2000s, more early-stage biotechnology companies shifted toward a virtual model to start off. This was a response to high-profile failures of companies that raised a lot of money early only to pivot later. Over the last 2 decades, most of the costs of early-stage drug development have been borne by CROs. Nimbus Therapeutics is the best case study for this trend. With the costs being pushed to the vendor, this virtualization was enabled mainly by contract research organizations (CRO) especially as sites that biopharma shut down were converted into CROs. This allowed companies not to have to make initial outlays for lab space and other infrastructure. Even places like LabCentral and MBC BioLabs allow early-stage companies to have a minimal lab footprint before validating their work.
Virtual biotechnology companies start with a biological hypothesis and work to prove it out before raising more capital. CROs are useful for commoditized experiments like ADMET, some synthesis, and some cell lines/mouse models. This trend toward virtualization might be true in other life sciences fields but is most pronounced in drug development. However, a life sciences company centered around AI often doesn't start off with an initial hypothesis. The company probably needs custom experimental formats like a cell line with a series of specific reporters or certain libraries of proteins. As a result, CROs and outsourcing in general aren’t the best options. Until CROs hire in-house machine learning engineers and build out more infrastructure, virtualization probably isn’t an option for an AI-driven company. However, the resources for an AI drug company to virtualize are emerging with various bioinformatics SaaS products out there, more accessible lab automation tools, among others out there.
An AI-driven life sciences company needs large datasets to reduce the need for sophisticated models. On limited data, custom neural networks are needed. For example, with an antibody library with limited amounts of data points, companies need really good models that can accurately interpret the data and make correct predictions. Whereas, for a database of every single point mutation for a given antibody or library, standard recurrent neural networks or other vanilla models can be used.
For an AI-life sciences companies, the initial costs to get started can be daunting:
In-house infrastructure - depending on the modality and models used, a company probably has to set up their own instruments and workflows
Custom experiments - a company will have to invest resources to design and execute experiments that are often custom. For example, an imaging workflow might have to be deployed to gather cell morphology data along with gene expression profiles and protein localization. The difficult part for most companies is getting datasets that validate across not only models but scales.
Rich data sources - as a result, AI-driven life sciences companies often generate large and complex datasets. The tools to analyze this data and build models can be outside the scope of current software products. Companies here have to do a good job at increasing interpretability of high performing models; in short, a high accuracy score from a model might not translate into a sound hypothesis.
Tooling to scale their AI models - with AWS, various AI packages, and a lot more, the ability to deploy models has become a lot more accessible. However, the costs to deploy an AI model can run up to the $100Ks. For example AlphaFold, might have cost on the order of millions of dollars to train their models. New AI-specific chips, like tensor processing units, should help here. But there is a growing divide between the computing resources needed for AI models and the power of the chips to train them.
This upfront cost to build an AI-driven life sciences company can take $10Ms, maybe even $100Ms. This is often all done without a clear biological hypothesis. Maybe a company will in-license an initial set of assets to de-risk the platform investment. In addition, it’s not clear how much long-term operations (i.e. spend on compute resources, talent) will affect a company’s gross margin. The success of these companies depends on their ability to build models that eventually replace human annotation. With the keys to success driven by picking the right problem, curating the right dataset, and interpreting the results accurately.
For drug development, the idea is that AI could put a major dent in the cost curve for early work. These companies will still need to engage in traditional product development at some point. Once an initial hit is found, a company will move into pre-clinical and clinical studies. This is a cost that is much more difficult to change.
For AI-driven life sciences companies, avoiding false positives is also incredibly important as to not invest too many resources toward a deadend. By generating more data and investing in custom resources, a company will need to create more products, have higher efficiencies, and garner non-dilutive dollars to defray the upfront investment in order to make financials work for shareholders.
Recruiting and training engineering talent
An important element of building an AI-focused life sciences company is recruiting and training engineering talent. Recruiting AI talent is a lot harder than you would expect. Then retaining them is even harder. In general software engineers are being paid like professional athletes especially those with specialized knowledge in artificial intelligence.
Talent may be the main bottleneck limiting the impact of AI on life sciences. There are 100Ks of AI engineers in the world, but the competition for them is across almost every industry. For a life sciences company, there are 3 main challenges for AI talent: (1) Getting the expertise in the door, (2) Helping engineers learn enough biology to be dangerous, and (3), Retaining the trained engineering talent given the fierce competition against the Googles and Facebooks of the world along with every quant hedge fund. Given this, what are some strategies to build talented engineering teams?:
Get them early - a co-founder or early employee has to be a world-class AI engineer to improve the odds of attracting more. It’s pretty hard, but not impossible, for a founding team of biologists to recruit nevertheless lead great engineers. Importantly, finding ML talent who do not need to be the star expert at everything is essential for a life sciences company. The same is true for biologists to be humble at the intersection of AI and bio as well.
Get an advisor - the second best option to attract talent early, is to get an advisor who has expertise in AI. In short, there is a need for more Jeff Dean’s of biology.
Poach entire teams - engineers at large technology companies might be open to moving into life sciences to work on more “important” problems depending on their salary/equity packages along with the potential to continue staying with co-workers. Hexagon Bio did the best job here recruiting a lot of great engineers from Palantir.
Train biologists to become data scientists and SWEs - the last resort is to train biologists to become engineers. This might take 3-5 years though so be patient. Within AI-driven companies, biologists also need data and programming fluency to be productive.
Implementing biologically relevant models
The last challenge for an AI-focused life sciences company is implementing biologically relevant models. Data generation and talent are prerequisites for AI models but understanding when to deploy them is another important challenge. Biology is still the limiting factor and its complexity is what makes it beautiful but hard to engineer. Even with the best design and tools, biological validation cycles are still a major consideration for all companies. Given this bottleneck, having a deep understanding of deployment phases and other features before translation is incredibly important. So what are some of the main problems for implementing models in biology?:
Unstructured and noisy data - collecting data from sequencing to spatial transcriptomics to health records even can create too much complexity for a given model and team to handle. Moreover, just as important as any given data source are cross-validation studies.
Training data coverage - making sure the initial data set and training period is robust enough to deploy into the lab. Depending on the problem addressed, this process can take weeks and even months.
When to deploy? - figuring out when a model is accurate enough to trust its recommendations. For example, two key metrics in classification models are ROC and AUC, which measure performance of the model at all classification thresholds and the aggregate performance across all thresholds, respectively. In diagnostics, determining when an AUC is high enough to distinguish a patient with a disease or not and deploy is tricky without validation data.
Time to deploy? - decreasing the time it takes from developing a model to deploying one. It’s not obvious this gets easier over time given the increasing number of edge cases with a larger data set. However, this is where in-house data generation capabilities become an advantage to manage the cost of wet lab work and hopefully make deployment less expensive.
During the process of model design, early positive signals can give a false sense of confidence. Initial hits may not translate well at a certain stage of development. An early discovery may not account for a unique edge case. For example, a MoA that works in models may not translate into the clinic. Or for something in biomanufacturing, a model that is accurate at one scale may not work at another due to difference in oxygen circulation among other external factors. The long-tail is where the majority of the work is done to implement AI models. The majority of this long tail is happening in the wet lab. This trick is training models with “unphysical” biological experiments (Jacob, thank you for the concept) to help models know the limits of the biological parameter space. This creates an entirely new way to do biology where experiments aren’t necessarily going to provide meaningful results but are done to more accurately train models. Biology is a large search space and can easily multiply the number of edge cases thereby increasing costs of development. AI models that take advantage of parallelization to test models, manage data inputs, and work to eliminate steps in the product development process have a shot to reduce this search space. These edge cases may never disappear given the complexity of biology and will need to be validated at the bench or the clinic.
The rules of building an AI-first life sciences business are still being created and written by companies like Insitro, Recursion Pharmaceuticals, and more. Three important moats are talent, scale, and product development - can a company recruit/retain the best AI talent, validate their models more efficiently, and ask the most important questions? But AI and data itself likely do not create moats in the long-run. Both become commodities - AI models are becoming commoditized especially with pre-trained versions and various open source libraries, and data can be put in the public domain and proprietary datasets can soon become commoditized as their generation become cheaper (i.e. $10M spent on sequencing today can soon only cost $1M in a few years). But models are more easily made a commodity in life sciences where focusing on the right datasets carries most of the freight for a company’s success.
There are countless opportunities to apply AI to life sciences. A useful way to categorize these is by data and models (image below; thanks Lucas). The same segmentation can also be done by talent and team building strategies. We will do a follow up analysis diving into case studies for each of these categories:
Automate data generation - Insitro and Recursion focus on building out infrastructure to generate data that a group of lab techs can’t
Specialize on modality - Dyno and Serotiny focus on AAVs and CARs, respectively and build focused data sets
Low data methods - ProteinQure builds models that can operate on smaller scale datasets, which has advantages to optimize for many more features versus large-scale data generation approaches
Unlabeled data - Deep Genomics and EQRx invest resources to structure unlabeled data to discover new targets among other things
New molecular representations - Unnatural Products and Genesis Therapeutics focus on developing models and data sets to more accurately predict features for small molecules like PK/PD and ADMET
Noisy data - Asimov and Hexagon Bio work with complex cellular data to design new cell lines and natural products, respectively
Partners generate data - AbCellera works with partners to generate a large amount of data that can be fed back into their own AI models. Partnerships can be a good way for a business to bootstrap the initial data generation requirement.
This new batch of companies can learn at least 5 key lessons in building AI-first life sciences companies from the last generation:
Focus on specific problems
Narrowing focus to a specific task in life sciences is pretty important given how complex the whole field is. Ideally, low-hanging fruit tasks are pursued first to validate the models and build momentum on the product development side. Examples are BigHat and LabGenius working on antibodies, and even AbCellera has built out their AI team. Unnatural Products and Anagenex for certain types of small molecules. ProteinQure for peptides. Serotiny for CARs. Dyno Therapeutics for AAVs. Asimov for certain mammalian cells. All of these companies focus on specific problems whether on the modality side or a particular data set.
This enables reducing complexity of models and data
By narrowing down focus, a company can reduce the complexity of its models and data generation capabilities. Dyno can get really good at capsid engineering rather than having to build models for everything ranging from transgene expression to engulfment. Serotiny can build product moats around chimeric antigen receptor constructs without having to necessarily create models for target engagement. This reduction in complexity has a direct impact on COGS.
Recognize the high variable costs of building a life sciences business centered around AI
An AI-focused life sciences company not only has costs associated with wet lab work but costs from model implementation and compute. This is what makes these business models unique: they make an upfront investment in software and AI with the premise that more products or partnerships will come. More often than not platforms are overbuilt to prepare for new applications. Hopefully these variable costs decrease as more CRO infrastructure is put into place. However, AI will have an impact on gross margins initially driven by model maintenance and deployment; it might be useful for companies to split these variable costs from wet lab work.
Commoditize or be commoditized
We might get to the point where for example, a “drug discovered by AI” is a similar phrase as a “drug discovered by phage display”. New AI tools are still being built out. We are still pretty early in this wave of AI progress with ImageNet coming out in 2012 and TensorFlow and AlphaGo in 2015. Beyond fundamental breakthroughs, new workflows for AI models are emerging along with new models and automation of training tasks. Given all of this activity, an AI-focused life sciences company can easily fall behind on technology. It’s important to be forward thinking and constantly update your toolkit. Even the talent might get commoditized similar to genomics/bioinformatics went from a rare skill to a common one over the last 2 decades.
Use AI to build moats around products
The model itself is not a moat nor is the data. AI often allows problems in biology to be solved in a new way. Examples are old problems like drug repurposing or AAV and CAR design that needed scale to unlock new therapeutic variants. AI can help design unique products where a moat can be constructed.
AI companies in life sciences are more bio than software. Some of the advantages of AI don’t come cheaply. Companies face challenges in data generation, building interdisciplinary teams, and variable costs of developing AI models. These upfront costs create barriers from new entrants and create some defensibility. The ability to build hybrid business models merging life sciences and AI have the potential to invent faster, create new markets, and make product development more efficient. AI may lead to non-obvious side effects; for example, in drug development, more data might make it easier to find more ways to argue for an approval. AI is also causing a culture shift where companies pick the top 10 predictions from a model that may be very different from one another. Companies in life sciences could also use AI to generate a pipeline of uncorrelated assets with the goal of securitization. There are opportunities to bring AI to metabolic engineering, biomanufacturing, enzymes, and ASOs as well. AI has made a significant impact on tasks like PK/PD studies, formulations, and the initial stages of drug development, but can companies like Unlearn and Tilda bring these tools downstream to clinical trials? Beyond drug development, companies like Ginkgo and Zymergen have taken the lead on using AI in synthetic biology. Inflammatix and Endpoint Health in diagnostics. Levels and Whoop in consumer health products. EQRx in fast follower drug development. Abridge and PatientPing in healthcare.
Artificial intelligence has the potential to create a template for new products in all of life sciences. But at the very least, AI allows a business to generate new IP and execute more partnerships. For any AI-focused life sciences company, talent may be the most important moat; however, hypothesis-generation and scale are valuable as well. The enduring businesses will focus on problems that are easy for AI models avoiding edge cases and combining their platform with other business strategies. AI is a tool to build moats around biological products. By using the tool to iterate and improve, these companies have the potential to move faster than competitors and expand their product lines at an increasingly faster rate.
Special thanks to Rishi Bedi, Lucas Siow, Brandon White, Jacob Oppenheim as well as others kind enough to review and provide feedback on this piece.