RISK cohort

Since some time I am working with Crohn's Disease. One of the problems with the disease is that it is not known what happens. People has found associations with microorganisms, but the relationship between those microorganisms and the patient is still unknown. Also the risk factors for complications is largely unknown.

This post follows up the use of a patient cohort data enrolled for identifying the risk factors of complications and health-care costs in pediatric and adult onset Crohn’s disease. Where we can see some usage of the data and the problems of unclear descriptions when using the same data.

Articles describing the RISK cohort

The first mention to the RISK cohort I could found is in this article [1] where they describe a cohort as:

an observational research program that enrolled patients younger than age 17 diagnosed with in flammatory (nonpenetrating, nonstricturing) CD from 2008 through 2012 at 28 pediatric gastroenterology centers in North America.

In that article the cohort had 552 patients at 2014. It doesn't provide where to find the data. In a later study of 2017 it is described as a previous report [2]. It states that there are 1813 patients enrolled from which 913 had diagnosed with Crohn’s disease, with complete information on disease location, without complications at diagnosis, and attend the follow-up visits. (See this comment for some criticism on the mentioned article, and this other comment about the composition of the cohort [3,4].)

One article that uses these data describe it on the first figure [5]. As you can see on that figure the RISK cohort, according to this article, has 322 samples. In the body of the article the reference of the origin of the data as following article [6]. Where the cohort is described as:

The RISK cohort. Ileal biopsy samples and associated clinical information were obtained from the RISK study, an ongoing, prospective observational IBD inception cohort sponsored by the Crohn’s and Colitis Foundation of America. 1,656 children and adolescents younger than 17 years, newly diagnosed with IBD and non-IBD Ctls, were enrolled at 28 North American pediatric gastroenterology centers between 2008 and 2012. All patients were required to undergo baseline colonoscopy and confirmation of characteristic chronic active colitis/ileitis by histology prior to diagnosis and treatment, with the recording of findings in standardized fashion. Only subjects with a confirmed persisting diagnosis of CD, UC, or Ctl during an average of 22 months follow-up to date were included in this analysis, which included a representative subgroup of age-matched CD (n = 243), Ctl (n = 43), and disease Ctl UC (n = 73) patients.

Note that between [2] and [6] there is some difference in the way to describe the data. It might seem that from those 1813 patients 1656 where younger than 17 years. But few seem to have a persisting diagnosed CD for 2 years, because the CD patients are reduced to 243.

In another referenced article[7] in [6] we can read about the RISK cohort that :

A total of 447 children and adolescents (< 17 years) with newly diagnosed CD and a control population composed of 221 subjects with non-inflammatory conditions of the gastrointestinal tract were enrolled to the RISK study in 28 participating pediatric gastroenterology centers in North America between November 2008 and January 2012

Which disagrees with the previous article about the total number of patients with less than 17 years with CD (226 instead of 243) maybe because they used a more restrictive subset of the cohort in the average time of follow up.

A different study[8] referenced [2] and [6, 7] and describes the cohort as:

The RISK study is an observational prospective cohort study that aims to develop risk models for predicting complicated course in children with Crohn's disease. From 2008 to 2012, the RISK study recruited more than 1,800 treatment-naive patients with a suspected diagnosis of Crohn's disease at 28 pediatric gastroenterology centers in North America.

However they use 245 samples with ileal CD, but 35 lacked gut inflammation and were classified as non-IBD controls. Remaining 210 selected individuals showed persisting Crohn's disease and remained in complication-free B1 status for at least 90 d from the time of initial diagnosis. After 3 years of follow-up, 27 had a complicated disease course with progressions to further states B2 or B3.

I can't understand how from those 1656 patients described in [6] it end up with 322 patients in [7] instead of 243 patients with Crohn disease as in [6]. Also, it is not clear how in [6] we have 243 patients while the origin of data seems to be [7] where only 226 are described. And from [8] we learn that 245 had ileal CD which could mean that all the patients described in [6] and [7] could be from ileal samples. Furthermore, there isn't a reference between [6-7] and [2], which could mean that they are different cohort of patients despite being from the same (?) 28 centers in North America and being enrolled in the same time (November 2008 and January 2012). As this is unlikely, there is a lack of description on how they processed the same cohort of patients.

This doubts drifted my interest to find the actual data where all these articles are based on, totally or partially.

Availability of the RISK cohort

The first article describing the RISK study doesn't describe a location where to find the data, neither the more recent article [2].

Interestingly, Peters et al. ([5]) despite providing links to other datasets used, they don't provide a link or a reference were to find the RISK cohort. Indicating perhaps that it is not freely available or that there are other problems providing the data to the scientific community.

Haberman et al. ([6]) link to a repository in the Gene Expression Omnibus GSE57945. However in that dataset instead of the total 359 samples selected, there are 322 samples listed which match the total number of samples described in [5] but does not match the total number of samples selected in the original study of the cohort [2] nor their own total number of samples.

Gevers, et al. ([7]) only provide references about the 16S projects not about the RISK RNA-seq expression data used.

Marigorta et al. in [8] provide a link to another data set in the Gene Expression Ominbus, the GSE93624 data set, which has "210 treatment-naïve patients of pediatric Crohn's disease and 35 non-IBD controls from the RISK study."

From the original 913 patients with Crohn's disease at most 532 samples are made public, if the IDs of the patients in those two datasets are not the same. The GSE57945 was uploaded on 2014 but last updated on 2017 and provide more information than the GSE93624. I couldn't find a way to make sure if the same patient has samples in both datasets.

Conclusion

I don't know where I can find the whole RISK cohort. Maybe more description of the process used with the datasets would be helpful to clarify what is the RISK cohort. It is clear that the GSE93624 and the GSE57945 data sets are both involved in that cohort but lack of the whole data set hinders replicability.

Update 24/01/2018:
Found a web page of the RISK cohort here, describing up to 2013 the state of the project.

Update 30/05/2018:
Found another article [9] referring to RISK cohort, where 254 samples are described as having RNA-seq and only 158 of those has matching 16S data. BTW in this study there is a link to where to find the 16S data in Qiita, study 1939.

Update 13/06/2019 :
Found another article[10] referring to the RISK cohort which links to a GSE (GSE117993) with RNAseq data from 190 samples (55 non-IBD controls, 43 UC patients, and 92 CD patients with rectal inflammation)

Update 17/11/2021:

Found another article[11] describing it as having 441 ileal samples (184 control and 245 CD) and 218 stool samples. Although on the text they mention that RISK has over 700 patients and ∼30,000 mean number of reads per sample. It doesn't have a link to any GSE project or data repository.

Update 23/03/2022:

New article using the RISK cohort [12], were they "reduced this data to 773 biopsy samples that were either controls or CD patients and ≤ 18 years old".

It links to PRJEB13679 which has Gevers's [7] as linked publication. According to SRA the data was uploaded on 2019-09-05 (which would mean 5 years later than the first publication and 1 year later than the publication where it is referenced). It was registered on 26-Apr-2016, two years later than the initial publications.

Update 06/07/2022:

On a new article [13] the RISK cohort is referenced to the project PRJNA237362. Which dates from 2014-02-05. It only has microbiome data (in the article they say it is from stools) which was uploaded on 2015-11-04. But I couldn't find information about the phenotype of the samples. In the article it is mentioned that it has 23 CD and 5 control samples and refers back to [7].

References

Walters, Thomas D., et al. "Increased effectiveness of early therapy with anti-tumor necrosis factor-α vs an immunomodulator in children with Crohn's disease." Gastroenterology 146.2 (2014): 383-391.
Kugathasan, Subra, et al. "Prediction of complicated disease course for children newly diagnosed with Crohn's disease: a multicentre inception cohort study." The Lancet 389.10080 (2017): 1710-1718.
Arijs, Ingrid, and Isabelle Cleynen. "RISK stratification in paediatric Crohn's disease." The Lancet 389.10080 (2017): 1672-1674.
Kugathasan, Subra, Lee A. Denson, and Jeffrey S. Hyams. "Exclusive and partial enteral nutrition for Crohn's disease–Authors' reply." The Lancet 390.10101 (2017): 1486-1487.
Peters, Lauren A., et al. "A functional genomics predictive network model identifies regulators of inflammatory bowel disease." Nature genetics 49.10 (2017): 1437.
Haberman, Y. et al. Pediatric Crohn disease patients exhibit specific ileal transcriptome and microbiome signature. J. Clin. Invest. 124, 3617–3633 (2014).
Gevers, Dirk, et al. "The treatment-naive microbiome in new-onset Crohn’s disease." Cell host & microbe 15.3 (2014): 382-392.
Marigorta, Urko M., et al. "Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn's disease." Nature genetics 49.10 (2017): 1517.
Tang, Mei San, et al. "Integrated Analysis of Biopsies from Inflammatory Bowel Disease Patients Identifies SAA1 as a Link Between Mucosal Microbes with TH17 and TH22 Cells." Inflammatory bowel diseases 23.9 (2017): 1544-1554.
Haberman, Yael, et al. "Ulcerative colitis mucosal transcriptomes reveal mitochondriopathy and personalized mechanisms underlying disease severity and treatment response." Nature communications 10.1 (2019): 38.
Wang, Feng, et al. "Detecting microbial dysbiosis associated with pediatric Crohn disease despite the high variability of the gut microbiota." Cell reports 14.4 (2016): 945-955.
Douglas, Gavin M., et al. "Multi-omics differentially classify disease state and treatment outcome in pediatric Crohn’s disease." Microbiome 6.1 (2018): 1-12.
Reiter, Taylor E. et al "Meta-analysis of metagenomes via machine learning and assembly graphs reveals strain switches in Crohn’s disease" bioRxiv
doi: https://doi.org/10.1101/2022.06.30.498290

Bioinformatics or B101nformatics

Search This Blog

RISK cohort

Articles describing the RISK cohort

Availability of the RISK cohort

Conclusion

References

Popular posts from this blog

My journey with Bioconductor

Reviewing a preprint

Sequencing: From the wet lab to the dry lab