# 04_QUEST_OPEN_TARGETS

Congratulations on reaching the final, open-ended quest! In the previous steps, we reproduced the core analysis from the SCimilarity paper and extended it to new datasets. Now, we're transitioning from pure data analysis to **Competitive Intelligence and Translational Strategy**.

In this quest, you will investigate the clinical viability of the biological markers identified in your earlier analysis. You will utilize advanced LLM concepts, such as Model Context Protocols (MCPs) or external APIs (like Open Targets or ClinicalTrials.gov), to generate a comprehensive clinical landscape report.

In this document, overwrite anything in `{{}}` to get Gemini to work with you. Because this is the most open-ended quest, the LLM will rely heavily on the specifics of your SPEC to assist you. 

---

## 1. Background and Goals

**Objective:** Extract actionable competitive intelligence for key targets identified via SCimilarity (e.g., Fibrosis-Associated Macrophage markers) by querying external databases and structuring the results into a visual report.

**Example Scenarios:**
- **Target Intel:** Identify competitive crowding and current clinical phases for `SPP1`, `MARCO`, and `CD163`.
- **Diligence/Whitespace:** Assess if the identified novel targets are already under development, and if trials are actually stratifying patients using these biomarkers.

**Deliverables:**
1. A Python script (`src/query_opentargets.py`) that uses the GraphQL API to fetch known drugs, clinical phases, and associated indications for the chosen targets.
2. A Markdown-based **Target Landscape Report** summarized by Gemini, incorporated as a new section in our project documentation.

## 2. Tech Stack and Project Structure

We will use **Direct API querying** using the `requests` library in Python to interact with the Open Targets GraphQL API. This is the most flexible approach for extracting detailed target-disease-drug associations.

## 3. Inputs & Target Selection

**Targets Chosen:**
- **SPP1:** Identified in the SCimilarity paper as a key marker for Fibrosis-Associated Macrophages.
- **MARCO:** Another high-priority Fibrosis-Associated Macrophage marker from the paper's latent space analysis.
- **CD3D:** A canonical T cell marker heavily expressed in our GSE282570 Celiac Disease dataset.
- **MS4A1:** A B cell marker (encoding CD20) identified in our Celiac Disease analysis.

**Research Questions:**
- What is the highest clinical phase reached for drugs targeting each of these genes?
- What are the top 5 indications (diseases) currently being treated or investigated via these targets?
- Is there any "whitespace" (e.g., targets like SPP1 or MARCO that might have high biological relevance in kidney fibrosis but fewer clinical trials than CD20)?

## 4. Outputs

The resulting report will be a structured Markdown file (`CLINICAL_LANDSCAPE.md`) including:
- **Summary Table:** Target, Known Drugs Count, Highest Phase, and Top Indication.
- **Trial Distribution:** A breakdown of indications by phase.
- **Competitive Analysis:** A short AI-generated summary comparing the clinical "crowding" of these targets.

## 5. Implementation Strategy

### Connecting to APIs / MCPs
The script will use a GraphQL query to the Open Targets endpoint (`https://api.platform.opentargets.org/api/v4/graphql`). We will query the `target` object for `knownDrugs` and `associatedDiseases`.

### Data Processing
We will parse the JSON response, flattening the nested drug and disease lists into a pandas DataFrame for aggregation (e.g., counting unique drugs per phase).

### Visualizations & Report Generation
We will use Python to generate a summary table and use Gemini to provide the translational interpretation of the data.

## 6. Open Exploration

Because you are at the end of the workshop, there are few restrictions! If you finish your target analysis early, use this space to outline other advanced AI workflows, custom agents, or biological questions you'd like to explore with Gemini.

{{Your open-ended ideas here}}
copyright: © 2026 Sonia Timberlake & Ryan Bellmore
license: Proprietary - Authorized Workshop Participants Only
distribution_allowed: false
