One of my projects for the master was to investigate using data of the TCGA project. We (my teammates and me) decided to use data of the thyroid. We aimed to discover the reasons why women suffer more thyroid carcinoma than men.
I don't want to repeat all the information on the report we did, you can find the work we did, you can find everything here.
The summary of the study is:
In case you want to do something similar or check the study, the report is with all the code used, you will only need the dataset you need to download from the website. We started with a preprocessed data of fpkm in form of a ExpressionSet provided by our professor, but you should be able to create it.
I just want to explain the way we used the package sva to find the surrogate variables between our two models:
In the alternative model we used the interaction of sex and status while on the null model we didn't, we just used the sex and the type of sample in the model. This models were chose to maximize the difference in the interaction between the sex and the tumor in order to see how does the tumor affect differently women than men.
You kind find more on the complete article/report here.
Distribution of samples of the TCGA dataset from the thyroid, separated by sex |
The summary of the study is:
Papillary Thyroid Carcinoma (PTC) is the most common type of thyroid cancer (Agrawal et al. 2014). It is more prevalent in women than men and its common diagnosis occurs between 25 and 65 years old (The Cancer Genome Atlas Accessed: 2016-06-1). The aim of our project is to study differentially expressed genes between tumor and normal samples taking into account whether there could be a gender effect on the tumorgenesis of PTC. Using data from the The Cancer Genome Atlas (Accessed: 2016-06-1) we have performed a differential expression analysis. Comparisons were made according to the patients gender and the disease state of the sample (tumor or normal) and revealed that Female-Normal samples have more up-regulated (over-expressed) genes respect to Female-Tumor samples as well as more than to Male-Tumor samples and Male-Normal samples. Functional enrichment performed with GO annotations suggested that there might be an impurity of the samples due to the presence of some paratiroides cells in them as well as DE genes belonging to lipid and cholesterol-associated pathways. To our knowledge, a thyroid cancer form leading to an hypothyroidism can result into a more agressive thyroid cancer due to the cholesterol and lipid accumulation in the cells (Beloribi-Djefaflia et al. 2016; Healthline Accessed: 2016-06-19). Therefore, the fact that there is a greater difference in under-expression of genes between Tumor-Female vs Normal-Female samples than the comparison Tumor-Male vs. Normal-Male samples might be the explanation why thyroid cancer is more prevalent in women than men. Nevertheless, further investigation must be done so our results should be treated with caution.
In case you want to do something similar or check the study, the report is with all the code used, you will only need the dataset you need to download from the website. We started with a preprocessed data of fpkm in form of a ExpressionSet provided by our professor, but you should be able to create it.
I just want to explain the way we used the package sva to find the surrogate variables between our two models:
Models used in the analysis of the data, F = Female, M = Male, T = Tumor, N = Normal, I = Intercept, G = Gender, T = Type, Barcode identifies the patients as paired |
In the alternative model we used the interaction of sex and status while on the null model we didn't, we just used the sex and the type of sample in the model. This models were chose to maximize the difference in the interaction between the sex and the tumor in order to see how does the tumor affect differently women than men.
You kind find more on the complete article/report here.