Drug discovery has historically been inaccessible to non-experts — locked behind cost, expertise, and tooling that only large institutions can afford. We present GemmaCure, a platform that makes drug discovery accessible at scale: by turning molecule generation into a game, any student anywhere can participate, and thousands of students playing collectively can produce a large, diverse database of candidate molecules for future computational screening — a crowdsourcing model analogous to FoldIt [17] and EteRNA [16] but applied to drug design. GemmaCure is built on three interlocking layers of novelty. (1) Gamification at scale: a 3D battle game, GemmaCure 3D, turns drug discovery into a player-driven exploration of chemical space. Motivated by the Gamification Diversity Hypothesis, we posit that cognitively diverse student players collectively explore scaffolds no single algorithm would produce — and that at scale, their sessions accumulate into a community molecule database. (2) Fine-tuned GPU-served LLM: we fine-tune Gemma 4-E2B [6] (5.2B parameters, 4-bit NF4 quantized, served via a GPU-backed inference API on an NVIDIA GB10 developer system) on 225,060 drug-target pairs with rationale-first generation, achieving 100% SMILES validity and a 5× drug-likeness improvement (QED = 0.591 vs. 0.116 base) on five held-out targets. (3) State-of-the-art binding affinity: the game integrates a state-of-the-art MAMMAL [4] multi-modal biomedical foundation model for live pKd scoring of every generated candidate, blended with an RDKit [12] composite; the system falls back gracefully to RDKit-only scoring when the API is unreachable. GemmaCure evolves Deep2Lead [1] by replacing every layer with a more capable modern counterpart. The publishable claim is a system and study-protocol contribution; no wet-lab, animal, clinical, toxicity, or therapeutic-efficacy validation is claimed.
Keywords — accessible drug discovery, gamification at scale, crowdsourced molecule generation, student-driven science, large language model, molecular generation, SMILES, fine-tuning, citizen science, educational games, Deep2Lead, Gemma 4, RDKit, MAMMAL
Drug discovery is one of the most consequential challenges in science. Published cost estimates vary by method: Wouters et al. reported a median capitalized investment of ~$985 million for 63 traceable products (2009–2018) [2], while DiMasi et al. estimated $2.558 billion using industry survey data [3]. Computational approaches — molecular docking, QSAR modeling, and generative deep learning — have substantially reduced early-stage costs. Yet the tools remain largely inaccessible to non-experts.
In 2021, Chawdhury introduced Deep2Lead [1], a web-based platform that combined a variational autoencoder (VAE) for molecular generation with the DeepPurpose [5] drug-target interaction framework, enabling lead optimization without programming knowledge. GemmaCure extends this vision across three interlocking layers. First, we embed the entire pipeline inside an educational 3D battle game, GemmaCure 3D, which recruits cognitively diverse human players to explore chemical space — a design insight originated by high school sophomore Tanisha Chawdhury. Second, we replace the classical VAE with a state-of-the-art fine-tuned Gemma 4-E2B LLM [6] — a 5.2B-parameter model in 4-bit NF4 quantized form, served via a GPU-backed inference API on an NVIDIA GB10 developer system — performing rationale-first generation that explains binding pocket geometry before committing to a SMILES string. Third, we integrate a state-of-the-art multi-modal biomedical foundation model, MAMMAL DTI [4], as a live pKd scoring service blended with RDKit property metrics [12].
The game concept originated with Tanisha Chawdhury, a sophomore at a high school in Elk Grove:
This framing embodies a scientific hypothesis we call the Gamification Diversity Hypothesis: that turning drug discovery into a game recruits a cognitively and demographically diverse player population whose collective exploration of chemical space may differ from what any single algorithm explores on its own. The central research question is not whether GemmaCure has discovered a drug — it has not. It is whether a gamified human-in-the-loop interface can measurably alter exploration behavior compared with automated generation under an equal budget.
Our contributions are:
Deep2Lead [1] was the first browser-accessible lead optimization tool combining a SMILES-trained VAE for molecular generation and DeepPurpose [5] for binding affinity prediction. Its primary contribution was accessibility: users with no programming background could input a protein target and receive AI-generated candidate molecules. The VAE produced valid SMILES strings ~60–70% of the time and was not conditioned on target-specific binding pocket information.
Large language models have demonstrated strong performance on SMILES-based molecular tasks. Models such as MolGPT [13], ChemGPT [14], and MolT5 [15] established that pre-trained transformers can generate valid, drug-like molecules. Fine-tuning on domain-specific data dramatically improves validity and target conditioning, with rationale-first formats enabling the model to reason about binding pocket geometry before committing to a structure.
Citizen science games have repeatedly demonstrated that non-expert players can make genuine scientific contributions. EteRNA [16] showed that game players outperformed automated algorithms in RNA secondary structure design. FoldIt [17] produced protein folding solutions that algorithms had not found. Galaxy Zoo [18] enlisted hundreds of thousands of participants in galaxy morphology classification. GemmaCure should be evaluated against this standard: logged player output must be compared with automated baselines before any discovery claim is made.
We fine-tuned the Gemma 4-E2B model (5.2B parameters, 4-bit NF4 quantized via bitsandbytes) using Unsloth with Rank-Stabilized LoRA (RS-LoRA, r=64, α=128) targeting all projection layers. Training and serving were performed on an NVIDIA GB10 GPU developer system; the same GPU host serves live inference requests from the game. Training used 225,060 drug-target pairs assembled from three public databases:
| Source | Records | Filter |
|---|---|---|
| BindingDB [19] | 150,000 | Kd/Ki/IC50 ≤100 nM |
| ChEMBL [20] | 65,000 | IC50 ≤100 nM, binding assay B |
| MOSES [21] | 10,060 | Lipinski Ro5, QED >0.5 |
The most significant training innovation is the
EvalAndStopCallback — a HuggingFace TrainerCallback
that runs inline drug quality evaluation every 100 steps using RDKit only
(no MAMMAL call during training):
Running inline rather than via subprocess reduces evaluation overhead from ~60 s to ~30 s per evaluation. The callback produced the following training trace:
GemmaCure 3D is a browser-accessible 3D game built with HTML/CSS/JavaScript. Seven active boss levels map to real disease targets; a curated library of twenty disease targets is available to the broader platform. The seven active bosses are: Influenza Neuraminidase, SARS-CoV-2 MPro, HIV-1 Protease, EGFR Kinase, BRAF V600E, SIRT1, and CDK2. The game loop is:
Resistance mutations activate at lower HP phases, mirroring real drug development challenges such as acquired resistance.
The live game computes a two-stage composite score. First, an RDKit-derived property score is computed for every attack:
If the MAMMAL API health check succeeds, a live pKd estimate is obtained and blended in:
If the API is unreachable, final_score = rdkit_score.
Each attack record stores the raw pkd value when
available, along with mammal_used to distinguish
the two scoring paths in analysis.
We evaluate generated molecules across three tiers:
| Tier | Tool | Key Metrics |
|---|---|---|
| 1 — Chemistry | RDKit [12] | SMILES validity, QED [22], Lipinski Ro5 [23], SA score [24], novelty, diversity |
| 2 — Target Plausibility | MAMMAL DTI [4] | pKd estimate, % predicted binders, scaffold diversity |
| 3 — Rationale Quality | NLP pipeline | Binding-term coverage, residue accuracy, coherence |
Algorithmic molecular generators optimize an objective function, which makes them efficient but biased toward regions of chemical space that score well under the training objective. Human players are not constrained this way. Their molecule selection is influenced by game strategy, curiosity, aesthetic preferences, and prior knowledge. A biology student may prefer scaffolds from their coursework; a high school student with no prior knowledge may select combinations an algorithm would never consider. This cognitive diversity may map to a richer and more novel region of chemical space — a testable hypothesis.
The "at scale" dimension amplifies this effect. A single player session generates a handful of candidates. But when thousands of students across schools and demographic groups play GemmaCure, their collective output accumulates into a large, diverse community molecule database — scored, logged, and available for downstream computational screening. This crowdsourcing model mirrors FoldIt [17] and EteRNA [16] but targets a stage of drug discovery — lead generation — where breadth of chemical space coverage has direct scientific value. Accessibility lowers the barrier to participation; scale turns participation into a scientific resource.
H1 (Novelty): Molecules generated during gamified GemmaCure sessions have higher average Tanimoto dissimilarity to training set molecules than molecules generated by automated random search over the same targets.
H2 (Scaffold Diversity): GemmaCure game sessions produce a greater number of unique Bemis-Murcko scaffolds per 100 generated molecules than automated baseline generation.
H3 (Democratization): High school students with no prior biochemistry instruction can generate chemically valid drug candidates (SMILES validity ≥80%) after a 15-minute tutorial, and can correctly explain at least one property tradeoff.
H3 is directly motivated by Tanisha Chawdhury's participation in the design and testing of GemmaCure. A high school sophomore who had not studied organic chemistry was able to generate valid drug candidates for COVID-19 MPro, earn the "COVID Crusher" badge, and articulate the binding rationale to a peer — evidence that the interface is genuinely accessible.
Table 3 compares the fine-tuned model (dlyog/gemma-cure) against the base Gemma 4-E2B model and the Deep2Lead VAE on Tier 1 chemistry metrics, evaluated on five held-out targets (SARS-CoV-2 MPro, EGFR Kinase, CDK2/Cyclin E, SIRT1, BRAF V600E). The 100% validity figure reflects the evaluation set; a reproducible release requires a frozen benchmark, raw generation logs, and canonicalization records.
| Metric | Deep2Lead VAE | Base Gemma 4 | GemmaCure |
|---|---|---|---|
| SMILES Validity (eval set) | ~65% | 33% | 100% |
| Avg QED | ~0.35 | 0.116 | 0.591 |
| Composite score | ~0.50 | 0.224 | 0.795 |
| Lipinski Ro5 | ~74% | 74.3% | 91.8% |
| Target conditioning | DeepPurpose | None | MAMMAL pKd + RDKit |
Representative molecules generated by dlyog/gemma-cure on held-out targets, with RDKit-verified properties:
All SMILES are valid per RDKit, satisfy Lipinski Ro5, and were generated with a chemically coherent rationale. These examples are illustrative; broader validity claims require a frozen held-out benchmark with reproduced descriptor tables.
| Feature | Value |
|---|---|
| Active disease boss levels | 7 (curated target library: 20) |
| Difficulty tiers | 4 (Junior → Senior → PI → Nobel) |
| Implemented achievement badge types | 9 |
| Missiles per solo run | 10 (fixed rack; visual ammo pips) |
| Concurrent assay slots | Up to 3 in-flight simultaneously |
| Solo scoring model | Best composite score across all 10 fires (no binary win/lose) |
| Scaffold evolution | Best confirmed SMILES seeds next-generation candidates |
| Avg generation latency (GB10 server) | ~2.4 s / molecule (LLM) + ~0.3 s (MAMMAL) |
| SMILES validity (live, dlyog/gemma-cure) | 100% on evaluation set |
| Live scoring | 0.55 × MAMMAL pKd score + 0.45 × RDKit composite |
One of the most significant claims of this work is that high school students — with no prior biochemistry training — can meaningfully participate in drug discovery research through gamification. Tanisha Chawdhury (High School Student, Class of 2028) was not a passive observer of GemmaCure's development: she was its conceptual originator.
Her core design insight — the pathogen-as-enemy, drug-as-ammunition metaphor — solved a communication problem that pure science interface design could not. It replaced the abstract notion of "binding affinity score" with a viscerally understandable hit-or-miss game mechanic. This metaphor is pedagogical; it makes binding, drug-likeness, and iterative optimization legible to students. It should not be interpreted as evidence that a generated molecule has therapeutic activity.
Tanisha also authored the patient backstories attached to each disease enemy, giving stakes to the game scenario. A player battling the SARS-CoV-2 MPro enemy reads a brief narrative about a fictional patient before the battle begins. This narrative investment has been shown in educational game research to improve knowledge retention [25].
GemmaCure represents a testbed for the hypothesis that high school students can generate scientifically valid drug candidates when equipped with: (1) an LLM that handles the chemistry, (2) a scoring oracle that provides immediate feedback, and (3) a game mechanic that transforms search into exploration. Demonstrating this at scale is the central goal of our ongoing qualitative study.
No wet-lab, animal, clinical, toxicity, binding, or therapeutic-efficacy validation is claimed. A valid SMILES string with a high QED score is not a medicine. A MAMMAL pKd prediction is a computational estimate that requires benchmarking, confidence intervals, and independent experimental validation before any activity claim can be made.
The largest scientific risk is overclaiming. Specific limitations include:
mammal_used
flag in each attack record distinguishes the two paths, and any player study
must account for this in analysis.GemmaCure should therefore be labeled as educational and hypothesis-generating. Any leaderboard output must be treated as a computational suggestion requiring independent cheminformatics review, ADMET screening, synthetic route assessment, and laboratory testing before scientific or medical significance is claimed.
We are currently conducting a qualitative pilot study with a convenience sample of friends and family members spanning a range of scientific backgrounds: from high school students to professional chemists. Participants play at least five GemmaCure 3D sessions and complete a short survey measuring (a) perceived understanding of drug discovery concepts, (b) engagement and motivation, and (c) self-reported hypothesis about which molecules "should work."
All generated SMILES strings, pKd scores (where MAMMAL was reachable), session durations, and player demographics are logged and will be analyzed for the Gamification Diversity Hypothesis (Section 4.2).
The planned controlled study will compare two conditions on the same five disease targets:
Primary outcome: mean Tanimoto dissimilarity to the training set (novelty). Secondary: unique Bemis-Murcko scaffold count per 100 molecules, mean QED, and mean MAMMAL pKd where available.
Once the controlled study establishes baseline novelty metrics, we will open a curated molecule feed to pharmaceutical chemists for qualitative expert review. The goal is to answer: Do GemmaCure-generated molecules represent genuine leads that a domain expert finds interesting?
A publishable next version should add: a frozen benchmark dataset with SHA256 checksum and held-out set definition; a released model artifact with evaluation scripts; a pre-registered player-versus-automation study; and MAMMAL pKd confidence intervals on held-out targets with known-drug positive controls.
This work is an independent research project conceived and conducted by Tarun Kumar Chawdhury and Tanisha Chawdhury in a personal capacity. It is not affiliated with, endorsed by, sponsored by, or conducted on behalf of any organization, employer, or institution with which either author is or has been associated.
Tarun Kumar Chawdhury holds a part-time instructional appointment at a university, is employed full-time in the health insurance sector, and is co-founder of DLYog Lab, an independent AI research startup. Tanisha Chawdhury is a high school student. This research was undertaken entirely outside of and independent from any employment or institutional obligations.
The authors declare no competing financial interests. No external funding, institutional resources, or grant support was received. All compute costs were borne personally by the authors. The fine-tuned model (dlyog/gemma-cure) is released under CC-BY 4.0.
| Contribution | Tarun K. Chawdhury | Tanisha Chawdhury |
|---|---|---|
| Model fine-tuning (Gemma 4, RS-LoRA, EvalAndStopCallback) | ✓ | |
| Dataset curation (225,060 drug-target pairs) | ✓ | |
| Flask backend, API architecture, MAMMAL integration | ✓ | |
| RDKit scoring pipeline + MAMMAL pKd blend | ✓ | |
| Game concept: pathogen-as-enemy metaphor | ✓ | |
| Game design, enemy mechanics, difficulty tiers | ✓ | |
| Patient backstories for disease enemies | ✓ | |
| XP system, badge system, UI/UX direction | ✓ | |
| Gamification Diversity Hypothesis formulation | ✓ | ✓ |
| Qualitative pilot study design | ✓ | ✓ |
| Electron desktop app (macOS .dmg) | ✓ | |
| Deployment scripts & CI/CD | ✓ | ✓ |
GemmaCure is best positioned as a gamified molecular-design education system and a testbed for studying whether human-guided gameplay changes chemical-space exploration. GemmaCure combines a fine-tuned Gemma 4 molecular assistant, live MAMMAL pKd scoring with graceful RDKit fallback, protein-structure visualization, ChEMBL similarity checks, and a 3D battle game into a falsifiable research protocol.
The central thesis is the Gamification Diversity Hypothesis: recruiting diverse human players to explore chemical space through gameplay may produce a set of novel molecules that neither automated search nor single-expert design would generate alone. This hypothesis is testable, and we are currently testing it.
GemmaCure has not yet demonstrated new therapeutics or experimentally active leads. A publishable next version should add a frozen benchmark dataset, a reproducible evaluation suite, and a pre-registered player-versus-automation study. Tanisha Chawdhury's vision — that a drug discovery game could make real science accessible to students like herself — is a falsifiable scientific claim about the value of cognitive diversity in molecular exploration. We believe GemmaCure is the right platform on which to answer it.