How a Gemma model helped discover a new potential cancer therapy pathway

Today, as part of our research collaboration with Yale University, we’re releasing Cell2Sentence-Scale 27B (C2S-Scale), a new 27 billion parameter foundation model designed to understand the language of individual cells. Built on the Gemma family of open models, C2S-Scale represents a new frontier in single-cell analysis.
This announcement marks a milestone for AI in science. C2S-Scale generated a novel hypothesis about cancer cellular behavior and we have since confirmed its prediction with experimental validation in living cells. This discovery reveals a promising new pathway for developing therapies to fight cancer.
This launch builds upon our work from earlier this year, where we demonstrated that biological models follow clear scaling laws — just like with natural language, larger models perform better on biology. This work raised a critical question: Does a larger model just get better at existing tasks, or can it acquire entirely new capabilities? The true promise of scaling lies in the creation of new ideas, and the discovery of the unknown.
How C2S-Scale 27B works
A major challenge in cancer immunotherapy is that many tumors are “cold” — invisible to the body's immune system. A key strategy to make them “hot” is to force them to display immune-triggering signals through a process called antigen presentation.

We gave our new C2S-Scale 27B model a task: Find a drug that acts as a conditional amplifier, one that would boost the immune signal only in a specific “immune-context-positive” environment where low levels of interferon (a key immune-signaling protein) were already present, but inadequate to induce antigen presentation on their own. This required a level of conditional reasoning that appeared to be an emergent capability of scale; our smaller models could not resolve this context-dependent effect.
To accomplish that, we designed a dual-context virtual screen to find this specific synergistic effect. The virtual screen involved two stages:
- Immune-Context-Positive: We provided the model with real-world patient samples with intact tumor-immune interactions and low-level interferon signaling.
- Immune-Context-Neutral: We provided the model with isolated cell line data with no immune context.
We then simulated the effect of over 4,000 drugs across both contexts and asked the model to predict which drugs would only boost antigen presentation in the first context, to bias the screen towards the patient-relevant setting. Out of the many drug candidates highlighted by the model, a fraction (10-30%) of drug hits are already known in prior literature, while the remaining drugs are surprising hits with no prior known link to the screen.
From prediction to experimental validation
The model's predictions were clear. It identified a striking “context split” for the kinase CK2 inhibitor called silmitasertib (CX-4945). The model predicted a strong increase in antigen presentation when silmitasertib was applied in the “immune-context-positive” setting, but little to no effect in the “immune-context-neutral” one. What made this prediction so exciting was that it was a novel idea. Although CK2 has been implicated in many cellular functions, including as a modulator of the immune system, inhibiting CK2 via silmitasertib has not been reported in the literature to explicitly enhance MHC-I expression or antigen presentation. This highlights that the model was generating a new, testable hypothesis, and not just repeating known facts.
A prediction, however, is only valuable if it can be validated in clinical application. The real test is first in the lab, and eventually, in the clinic.
For the next phase of our project, we took this hypothesis to the lab bench and tested it in human neuroendocrine cell models — a cell type that was completely unseen by the model during training. The experiments demonstrated:
- Treating the cells with silmitasertib alone had no effect on antigen presentation (MHC-I).
- Treating the cells with a low dose of interferon alone had a modest effect.
- Treating the cells with both silmitasertib and low-dose interferon produced a marked, synergistic amplification of antigen presentation.
Remarkably, in our lab tests the combination of silmitasertib and low-dose interferon resulted in a roughly 50% increase in antigen presentation, which would make the tumor more visible to the immune system.
The model’s in silico prediction was confirmed multiple times in vitro. C2S-Scale had successfully identified a novel, interferon-conditional amplifier, revealing a new potential pathway to make “cold” tumors “hot,” and potentially more responsive to immunotherapy. While this is an early first step, it provides a powerful, experimentally-validated lead for developing new combination therapies, which use multiple drugs in concert to achieve a more robust effect.
This result also provides a blueprint for a new kind of biological discovery. It demonstrates that by following the scaling laws and building larger models like C2S-Scale 27B, we can create predictive models of cellular behavior that are powerful enough to run high-throughput virtual screens, discover context-conditioned biology, and generate biologically-grounded hypotheses.
Teams at Yale are now exploring the mechanism uncovered here and testing additional AI-generated predictions in other immune contexts. With further preclinical and clinical validation, such hypotheses may be able to ultimately accelerate the path to new therapies.
Getting started with C2S-Scale 27B
The new C2S-Scale 27B model and its resources are available today for the research community. We invite you to explore these tools, build on our work and help us continue to translate the language of life.
- Read the full scientific preprint on bioRxiv.
- Explore the model and resources on Hugging Face.
- Access the code on GitHub.