# Understanding Our Project & The Agentic Workflow

This document explains what we are building in this workshop and *how* we are building it together using an AI agent. 

---

## Part 1: The Project (What are we building?)

We are recreating **Figure 3** from a recent computational biology paper called **SCimilarity**. 

### The Biology
*   **Single-Cell Data:** Scientists can sequence the RNA of individual cells. Think of this as taking a "fingerprint" of what a cell is doing at a specific moment.
*   **The Problem:** Labeling these cells (e.g., "This is a Kidney Podocyte") is usually done manually by human experts. It takes a long time and different experts use different names.
*   **The SCimilarity Model:** An AI foundation model trained on millions of cells. You give it a cell's RNA fingerprint, and it automatically predicts what type of cell it is.
*   **Figure 3 (Concordance Heatmap):** We want to prove the model works. We take a dataset of kidney cells that human experts already labeled. We hide those labels, ask SCimilarity to predict them, and then compare the two. If the Heatmap has a strong diagonal line, it means the AI and the Humans agree!

### The Tech Stack
1.  **Python Script:** A program that loads the raw data (`.h5ad` files), runs the SCimilarity AI, harmonizes the human and AI labels, and saves the results as simple `.json` files.
2.  **Web Application:** A simple HTML/Javascript page that reads those `.json` files and draws interactive, colorful charts (UMAPs and Heatmaps) so humans can explore the data visually.

---

## Part 2: The Agentic Workflow (How are we working?)

You are working with me, an **AI Agent**. Unlike a normal chatbot where you just ask questions and copy-paste code, I have "agency." I can read your files, write code, run terminal commands, and test my own work. 

Our collaboration uses a process called **Spec-Driven Development**:

### 1. The SPEC (Specification) is the Boss
We never start coding blindly. First, we fill out a Markdown document (like `02_QUEST_FIG3.md`). This document is our "contract." It defines:
*   What is the biology goal?
*   What should the output look like?
*   How will we test it to know it is correct?

*Whenever you give me feedback (like "the heatmap must be square"), I update the SPEC first. The SPEC is the ultimate source of truth.*

### 2. My Loop: Plan -> Act -> Validate
Once the SPEC is written, I take over the heavy lifting using this loop:
*   **Plan:** I figure out the steps (e.g., "I need to download the data, write a Python script, and run it").
*   **Act:** I write the code and execute it in the terminal.
*   **Validate:** I run tests. If my code crashes (like when we had an issue with `NaN` labels earlier), I read the error message, figure out what went wrong, fix the code, and run it again. I do this loop automatically until it succeeds.

### 3. Your Role: The Scientific Director
As the human, your job is to steer the ship:
*   **Domain Expertise:** You make the scientific decisions. (e.g., "We should use a tiered strategy for harmonizing cell names" or "We must link to the Cell Ontology (CL) to make our data standard").
*   **Review:** You look at the final outputs (the web page, the tests) and tell me if it meets your standards. 

By combining your scientific direction with my ability to rapidly write and test code, we can build complex, standardized bioinformatics pipelines very quickly!