Using Dirichlet Gaussian Processes to Analyze Gene Expression of Lung Cancer Metastasis Progression

A Lab Rotation Project in R+Python, September - December 2023


Introduction:
Cancer metastasis is responsible for 50-90% of cancer-related deaths, yet current therapies do not specifically target this process, highlighting the critical need to study metastatic data. Understanding the genetic dynamics of cancer metastasis remains incomplete, with significant gaps in knowledge regarding its progression over time. Previous coputational analyses have shown an ordered series of immunological changes corresponding to metastatic progression, emphasizing the importance of examining differential changes at molecular and genetic levels for potential target-based therapies.

Objective:
● To implement and update the Dirichlet Process Gaussian Process (DPGP) software for analyzing gene expression trajectories over time.
● To extract and format RNA-seq data for use with the DPGP software.
● To analyze gene expression outputs and model performance across different cell types in the dataset.

Outcomes:
● Successfully updated and implemented the DPGP software for analyzing gene expression trajectories.
● Extracted and formatted RNA-seq data, enabling its use with the DPGP software.
● Analyzed gene expression outputs, identifying steady trends of up and down regulation of genes per cell type overtime
● Demonstrated the need for improved efficiency of the DPGP model, as it struggled with large datasets, while also showing the model's significance in analyzing gene expression over time.
● Identified correlations between gene expression and the number of cells over time, with more frequent cell types related to immune-response functions.
● Discovered key spikes in gene expression at specific time points, potentially corresponding to specific cancer progression cycles or pathways.
● Found that less frequent cell types showed extreme variability and uncertianty, suggesting the need for optimization and further study of their roles in cancer progression.
● Concluded that the overall optimal number of clusters is less than 10, indicating a need for further iterations of the DPGP model to condense clusters for more precise analysis.

Snipit of Presentation File.


Download Presentation File:











Contact

I also do freelancing Web Development! If you need a website made, feel free to reach out to inquire.

[email protected]

[email protected]

Based in San Francisco, CA 94118