BACKGROUND AND AIMSSeveral clinical factors have an impact on prognosis in stage II colorectal cancer (CRC), but as yet they are inadequate for risk assessment. The present study aimed to develop a gene expression classifier for improved risk stratification of patients with stage II CRC.METHODS315 CRC samples were included in the study. Gene expression measurements from 207 CRC samples (stage I-IV) from two independent Norwegian clinical series were obtained using Affymetrix exon-level microarrays. Differentially expressed genes between stage I and stage IV samples from the test series were identified and used as input for L1 (lasso) penalised Cox proportional hazards analyses of patients with stage II CRC from the same series. A second validation was performed in 108 stage II CRC samples from other populations (USA and Australia).RESULTSAn optimal 13-gene expression classifier (PIGR, CXCL13, MMP3, TUBA1B, SESN1, AZGP1, KLK6, EPHA7, SEMA3A, DSC3, CXCL10, ENPP3, BNIP3) for prediction of relapse among patients with stage II CRC was developed using a consecutive Norwegian test series from patients treated according to current standard protocols (n=44, p<0.001, HR=18.2), and its predictive value was successfully validated for patients with stage II CRC in a second Norwegian CRC series collected two decades previously (n=52, p=0.02, HR=3.6). Further validation of the classifier was obtained in a recent external dataset of patients with stage II CRC from other populations (n=108, p=0.001, HR=6.5). Multivariate Cox regression analyses, including all three sample series and various clinicopathological variables, confirmed the independent prognostic value of the classifier (p≤0.004). The classifier was shown to be specific to stage II CRC and does not provide prognostic stratification of patients with stage III CRC.CONCLUSIONThis study presents the development and validation of a 13-gene expression classifier, ColoGuideEx, for prognosis prediction specific to patients with stage II CRC. The robustness was shown across patient series, populations and different microarray versions.