Preliminary Results
Binary Modeling
The best performing model was the combined model.
Applied a pipeline comprising variance thresholding, feature selection, scaling, and modeling.
Used GroupKFold cross-validation with randomized hyperparameter optimization.
Selected 41 features: 12 Ensemble gene IDs and 29 Uniprot IDs using SelectKBest with ANOVA F-statistic.
|
Validation ROC/AUC |
Validation F1 Score |
Transcriptomics |
0.60 |
0.50 |
Proteomics |
0.85 |
0.77 |
Combined |
0.87 |
0.78 |