BACKGROUNDUnderstanding the heterogeneity of a population at risk is an important step in the early detection of gastric cancer. This study aimed to cluster demographic, hematologic, and biochemical markers of gastric cancer in a heterogeneous sample of patients.METHODSData of 695 adult patients (50.0% women) who were diagnosed with histologically confirmed gastric cancer or benign gastric disease or identified as healthy individuals (December 2018 to August 2019; Hangzhou, China) were analyzed. A hierarchical clustering was performed using a factorial analysis of mixed data. To assess the clustering scheme, a machine-learning classification model was developed using the Extreme Gradient Boosting algorithm and subsequently ranked the variables for differentiating patient phenotypes.RESULTSOf note, 3 clusters were identified using patient characteristics. The classification model demonstrated high performance (multiclass area under the curve = 0.921) in recognizing the clusters. The top 5 important variables in differentiating the clusters were sex (male/female), hemoglobin, albumin, creatinine, and high-density lipoprotein (all analysis of variance P <.001) in decreasing order of importance. The prevalence rates of gastric cancer in clusters I, II, and III were 95.8%, 53.8%, and 34%, respectively (χ2(2) = 164.050; P <.001). Cluster I (n = 167) predominantly had an inflammatory profile, cluster II (n = 240) had metabolic disturbances, and cluster III (n = 288) had a relatively favorable metabolic and inflammatory profile.CONCLUSIONThere were distinct clinical phenotypes in the population, each with varying prevalence of gastric cancer. A combination of routine clinical data outperformed carbohydrate or carcinoembryonic antigens in capturing the heterogeneity of the population regarding gastric pathologies.