Introduction

We introduce a novel computational pipeline, Mut2Vec, to generate distributed representations of mutations and experimentally validate the efficacy of the generated mutation representation. We expect Mut2Vec to potentially serve as a helping hand in many biomedical applications such as cancer analysis and complex drug sensitivity problems.

Pre-trained Mut2Vec

We provide Mut2Vec in two annotation version of Ensembl Gene ID(ENSG) and HUGO Gene Nomenclature Committe gene symbol(HGNC).

Mut2Vec Release Data(.txt)
Date ENSG HGNC
06. 26. 2017 Download Download

Driver Candidates

Using IntOGen driver mutations, we measured driver enrichment p-value of each cluster with hypergeometric distribution and extracted clusters with p-value below 5-e2.

Driver Candidates
Date .tsv(Tab Separated)
06. 26. 2017 Download

Mut2Vec Training Pipeline

Visualization of Driver/Passenger Mutation vectors

Red dots are drivers and Blue dots are passengers. The score in each figure is Normalized Mutual Information(NMI). Click on the figure to zoom in.


Breast Cancer
Cutaneous Melanoma
Colorectal Adenocarcinoma
Lung Adenocarcinoma
Stomach Adenocarcinoma
Uterine Corpus Endometrial Carcinoma

Release note

06/26/17 ver1.0 release