# 03_QUEST_NEW_DATA

Great, now we have a working recreation of figure3 using real data. How about let's
find some new data sets to display.

## 1. Objective and goal

The goal of this quest is to expand our visualization platform to support multiple datasets, specifically integrating a novel human small intestine dataset (**GSE282570**) published in late 2024. This dataset studies the cellular heterogeneity of small intestine tissue in celiac disease and provides a great contrast to our kidney data.

### New Dataset: GSE282570
- **Context:** Human duodenal biopsies from celiac disease patients and healthy controls.
- **Accession:** GSE282570
- **Reasoning:** Recently published, contains 10 major cell subpopulations with well-defined marker genes.

## 2. Implementation Plan

### A. Web Application Enhancement
- **Multi-Dataset Support:** Modify `web/index.html` and `web/visualization.js` to handle different data subdirectories.
- **UI Element:** Add a "Select Dataset" dropdown to the controls bar.
- **Data Organization:** Move the current Kretzler data to `web/data/kretzler/` and the new data to `web/data/gse282570/`.

### B. Data Acquisition & Processing
- **Download Script:** Create `scripts/download_geo_gse282570.sh` to fetch the processed matrix from GEO FTP.
- **Processing Script:** Create `src/process_gse282570.py` to:
    - Load the 10x MTX files.
    - Perform basic QC and normalization.
    - **Annotation:** Use the canonical markers from the paper (e.g., *CD3D* for T cells, *MS4A1* for B cells) to assign "Author Annotations".
    - Align genes to the SCimilarity model.
    - Generate embeddings and predictions.
    - Export results to `web/data/gse282570/`.

### C. Skill Creation
- **Skill Name:** `geo-scrna-download`
- **Definition:** Formalize the steps for finding, evaluating, and downloading a GEO dataset into a reusable skill definition.


## 3. Web Application

- Update the web application to dynamically fetch data based on the selected dataset from the dropdown. 
- Ensure the color mapping and cell type highlight logic remains consistent across different datasets.
- Add a "Dataset Statistics" panel that displays metadata specific to the chosen study (e.g., accession number, publication date, tissue).

copyright: © 2026 Sonia Timberlake & Ryan Bellmore
license: Proprietary - Authorized Workshop Participants Only
distribution_allowed: false
