# Infrastructure Reference: BioIT Workshop VM

This document provides a technical overview of the workshop environment. Attendees and AI assistants (like Gemini) can use this as a reference for the available tools, data paths, and system configuration.

## VM Specifications
- Machine Type: n2-highmem-8 (8 vCPUs, 64 GB RAM)
- Operating System: Ubuntu 22.04 LTS
- Boot Disk: 50 GB (OS and system tools)
- Data Disk: 300 GB SSD mounted at /data (Persistent storage)
- Network: Private VPC (No external IP). Internet access via Cloud NAT; SSH access via IAP Tunnel.

## Data Layout
The primary workspace is located on the high-speed SSD at /data.

| Path | Description |
| :--- | :--- |
| /data/workspace | Your primary working directory. All notebooks and scripts should live here. |
| /data/models | Pre-trained model weights (e.g., SCimilarity). |
| /data/datasets | Reference tissue atlases and biological datasets. |
| /data/scratch | Temporary storage for large intermediate files. |
| /data/workspace/web | Default directory for publishing your final project showcase. |

## Installed Tools
The environment is pre-configured with the following software:
- Python/Conda: Miniconda is installed at /opt/conda. Use conda or pip to manage environments.
- Docker: Installed and configured for use without sudo.
- Editors: 
  - code-server: A browser-based VS Code environment.
  - vim, nano, tmux: Standard terminal editors and multiplexers.
- AI Tools: 
  - gemini: Interactive CLI for code generation and technical help.
  - gcloud: Google Cloud SDK for interacting with GCP services.

## Convenience Scripts
We have provided several helper scripts to simplify the workshop workflow:

| Command | Description |
| :--- | :--- |
| start-code-server | Launches the VS Code browser editor on port 8080. |
| publish | Syncs /data/workspace/web to the public project gallery. |
| test-vm | Runs a diagnostic suite to verify all tools and disks are healthy. |
| gemini | Opens the interactive AI assistant in your terminal. |

## Environment Variables
The following keys are pre-configured in /etc/environment for seamless AI tool integration:
- GEMINI_API_KEY: Authorized for Vertex AI and Generative Language APIs.
- GOOGLE_API_KEY: Alias for the Gemini API key.

## Networking & Access
Since the VMs do not have external IPs, use port forwarding to access browser-based tools:

- VS Code (code-server): Port 8080
- Jupyter Notebooks: Port 8888 (if started)

**SSH Port Forwarding Example:**
```bash
gcloud compute ssh <vm-name> --tunnel-through-iap -- -L 8080:localhost:8080
```

## Accessing the code server

You should be able to access the VS Code server via your browswer at http://{MACHINE_NAME}-bioit26-compbio-ai-vscode.rightbionic.com/, where
MACHINE_NAME is the team name assigned to your group, like ws-01, ws-02, etc. Then go to file > terminal to open a terminal.

If this is giving you trouble, here is how to access it via cloud shell:

To start open your Cloud Console terminal and enter the command `gcloud compute ssh <vm-name> --zone us-east4-1 --project rb-wkshp-bioit26  --tunnel-through-iap -- -L 8080:localhost:8080`

This will bring you to the VM, then within the VM, run

```bash
cd /data/workspace/
start-code-server
```

This will start an instance of VS code. On the top right of the cloud console UI, click "Web Preview" > 8000, which will open VS Code in a browser tab. From here, you can create a new bash terminal. In that bash terminal, run `gemini` to start a new LLM interaction.

## Running Gemini

Run gemini from within /data/workspace. If it prompts you for your API key /quit gemini, run /data/login.sh and restart gemini. If it gives you any more trouble, ask a proctor.

## Viewing the webserver

When you write a web server to /data/workspace/web, you can view it via the browswer. Create an additional terminal in VS Code then run `cd /data/workspace/web && python -m http.server`. In the "ports" tab of VS Code terminal, it should prompt you for another port, where you can click "web preview". If not, manually forward the 8888 port.`

## Data

The models are in `gs://rb-wkshp-bioit26-data/models/`. They should be downloaded locally, but check here before downloading from zenodo.

We have seeded data in `gs://rb-wkshp-bioit26-data/data/`. You made suggest that you look in the bucket for existing data to start off with. Also, if a user downloads a novel data set, ask if they would like to upload to this bucket to share with the class.

## Publishing Your Project

When you are ready to share your work:
1. Place your HTML, CSS, and images in /data/workspace/web/.
2. Run the publish command or gcloud storage cp -r web/* gs://bioit26-compbio-ai-workshop.rightbionic.com/$USER/
3. Your project will be live at: https://bioit26-compbio-ai-workshop.rightbionic.com/$USER/


## Troubleshooting

### If you see "Address already in use"
In your **Cloud Shell** terminal (not the VM), run this command to clear any stuck connections:
```bash
fuser -k 8080/tcp
```
Then restart the `gcloud compute ssh` command.