Skip to main content

RISK cohort

Since some time I am working with Crohn's Disease. One of the problems with the disease is that it is not known what happens. People has found associations with microorganisms, but the relationship between those microorganisms and the patient is still unknown. Also the risk factors for complications is largely unknown.

This post follows up the use of a patient cohort data enrolled for identifying the risk factors of complications and health-care costs in pediatric and adult onset Crohn’s disease. Where we can see some usage of the data and the problems of unclear descriptions when using the same data.

Articles describing the RISK cohort

The first mention to the RISK cohort I could found is in this article [1] where they describe a cohort as:
an observational research program that enrolled patients younger than age 17 diagnosed with in flammatory (nonpenetrating, nonstricturing) CD from 2008 through 2012 at 28 pediatric gastroenterology centers in North America.
In that article the cohort had 552 patients at 2014. It doesn't provide where to find the data. In a later study of 2017 it is described as a previous report [2]. It states that there are 1813 patients enrolled from which 913 had diagnosed with Crohn’s disease, with complete information on disease location, without complications at diagnosis, and attend the follow-up visits. (See this comment for some criticism on the mentioned article, and this other comment about the composition of the cohort [3,4].)

One article that uses these data describe it on the first figure [5]. As you can see on that figure the RISK cohort, according to this article, has 322 samples. In the body of the article the reference of the origin of the data as following article [6]. Where the cohort is described as:
The RISK cohort. Ileal biopsy samples and associated clinical information were obtained from the RISK study, an ongoing, prospective observational IBD inception cohort sponsored by the Crohn’s and Colitis Foundation of America. 1,656 children and adolescents younger than 17 years, newly diagnosed with IBD and non-IBD Ctls, were enrolled at 28 North American pediatric gastroenterology centers between 2008 and 2012. All patients were required to undergo baseline colonoscopy and confirmation of characteristic chronic active colitis/ileitis by histology prior to diagnosis and treatment, with the recording of findings in standardized fashion. Only subjects with a confirmed persisting diagnosis of CD, UC, or Ctl during an average of 22 months follow-up to date were included in this analysis, which included a representative subgroup of age-matched CD (n = 243), Ctl (n = 43), and disease Ctl UC (n = 73) patients.
Note that between [2] and [6] there is some difference in the way to describe the data. It might seem that from those 1813 patients 1656 where younger than 17 years. But few seem to have a persisting diagnosed CD for 2 years, because the CD patients are reduced to 243.

In another referenced article[7] in [6] we can read about the RISK cohort that :
A total of 447 children and adolescents (< 17 years) with newly diagnosed CD and a control population composed of 221 subjects with non-inflammatory conditions of the gastrointestinal tract were enrolled to the RISK study in 28 participating pediatric gastroenterology centers in North America between November 2008 and January 2012
Which disagrees with  the previous article about the total number of patients with less than 17 years with CD (226 instead of 243) maybe because they used a more restrictive subset of the cohort in the average time of follow up.

A different study[8] referenced [2] and [6, 7] and describes the cohort as:
The RISK study is an observational prospective cohort study that aims to develop risk models for predicting complicated course in children with Crohn's disease. From 2008 to 2012, the RISK study recruited more than 1,800 treatment-naive patients with a suspected diagnosis of Crohn's disease at 28 pediatric gastroenterology centers in North America.
However they use 245 samples with ileal CD, but 35 lacked gut inflammation and were classified as non-IBD controls. Remaining 210 selected individuals showed persisting Crohn's disease and remained in complication-free B1 status for at least 90 d from the time of initial diagnosis. After 3 years of follow-up, 27 had a complicated disease course with progressions to further states B2 or B3.

I can't understand how from those 1656 patients described in [6] it end up with 322 patients in [7] instead of 243 patients with Crohn disease as in [6]. Also, it is not clear how in [6] we have 243 patients while the origin of data seems to be [7] where only 226 are described. And from [8] we learn that 245 had ileal CD which could mean that all the patients described in [6] and [7] could be from ileal samples. Furthermore, there isn't a reference between [6-7] and [2], which could mean that they are different cohort of patients despite being from the same (?) 28 centers in North America and being enrolled in the same time (November 2008 and January 2012). As this is unlikely, there is a lack of description on how they processed the same cohort of patients.

This doubts drifted my interest to find the actual data where all these articles are based on, totally or partially.

Availability of the RISK cohort

The first article describing the RISK study doesn't describe a location where to find the data, neither the more recent article [2].

Interestingly, Peters et al. ([5]) despite providing links to other datasets used, they don't provide a link or a reference were to find the RISK cohort. Indicating perhaps that it is not freely available or that there are other problems providing the data to the scientific community.

Haberman et al. ([6]) link to a repository in the Gene Expression Omnibus GSE57945. However in that dataset instead of the total 359 samples selected, there are 322 samples listed which match the total number of samples described in [5] but does not match the total number of samples selected in the original study of the cohort [2] nor their own total number of samples.

Gevers, et al. ([7]) only provide references about the 16S projects not about the RISK RNA-seq expression data used.

Marigorta et al. in [8] provide a link to another data set in the Gene Expression Ominbus, the GSE93624 data set, which has "210 treatment-naïve patients of pediatric Crohn's disease and 35 non-IBD controls from the RISK study."

From the original 913 patients with Crohn's disease at most 532 samples are made public, if the IDs of the patients in those two datasets are not the same.  The GSE57945 was uploaded on 2014 but last updated on 2017 and provide more information than the GSE93624. I couldn't find a way to make sure if the same patient has samples in both datasets.


I don't know where I can find the whole RISK cohort. Maybe more description of the process used with the datasets would be helpful to clarify what is the RISK cohort. It is clear that the GSE93624 and the GSE57945 data sets are both involved in that cohort but lack of the whole data set hinders replicability.

Update 24/01/2018: 
Found a web page of the RISK cohort here, describing up to 2013 the state of the project.

Update 30/05/2018:
Found another article [9] referring to RISK cohort, where 254 samples are described as having RNA-seq and only 158 of those has matching 16S data. BTW in this study there is a link to where to find the 16S data in Qiita, study 1939.

Update 13/06/2019 :
Found another article[10] referring to the RISK cohort which links to a GSE (GSE117993)  with RNAseq data from 190 samples (55 non-IBD controls, 43 UC patients, and 92 CD patients with rectal inflammation)


  1. Walters, Thomas D., et al. "Increased effectiveness of early therapy with anti-tumor necrosis factor-α vs an immunomodulator in children with Crohn's disease." Gastroenterology 146.2 (2014): 383-391.
  2. Kugathasan, Subra, et al. "Prediction of complicated disease course for children newly diagnosed with Crohn's disease: a multicentre inception cohort study." The Lancet 389.10080 (2017): 1710-1718.
  3. Arijs, Ingrid, and Isabelle Cleynen. "RISK stratification in paediatric Crohn's disease." The Lancet 389.10080 (2017): 1672-1674.
  4.  Kugathasan, Subra, Lee A. Denson, and Jeffrey S. Hyams. "Exclusive and partial enteral nutrition for Crohn's disease–Authors' reply." The Lancet 390.10101 (2017): 1486-1487.
  5. Peters, Lauren A., et al. "A functional genomics predictive network model identifies regulators of inflammatory bowel disease." Nature genetics 49.10 (2017): 1437.
  6. et al. Pediatric Crohn disease patients exhibit specific ileal transcriptome and microbiome signature. J. Clin. Invest. 124, 3617–3633 (2014).
  7. Gevers, Dirk, et al. "The treatment-naive microbiome in new-onset Crohn’s disease." Cell host & microbe 15.3 (2014): 382-392.
  8. Marigorta, Urko M., et al. "Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn's disease." Nature genetics 49.10 (2017): 1517. 
  9. Tang, Mei San, et al. "Integrated Analysis of Biopsies from Inflammatory Bowel Disease Patients Identifies SAA1 as a Link Between Mucosal Microbes with TH17 and TH22 Cells." Inflammatory bowel diseases 23.9 (2017): 1544-1554. 
  10. Haberman, Yael, et al. "Ulcerative colitis mucosal transcriptomes reveal mitochondriopathy and personalized mechanisms underlying disease severity and treatment response." Nature communications 10.1 (2019): 38.



Popular posts from this blog

Functional enrichment methods and pathways