Shulin Cao

Ph.D. Candidate

I am also open to all opportunities for 2020 when I expect to graduate some time in Late 2020. Also, I am open to any summer opportunities related to machine learning and applied scientist.

I am currently a Ph.D candidate speciallized in Computational Biology and Bioinformatics from Department of Bioengineering, UC San Diego. I finished my undergrad at Huazhong University of Science and Technology.

Research Experiences

CMRG at UC San Diego

La Jolla, CA
Ph.D. Candidate, advised by Prof. Andrew McCulloch
01/2017 - Now
  • Constructed a functional gene regulatory network using ODE systems modeling, implemented clustering methods on genomic data and trained graph neural network to detect potential interactions of genes, which achieved 72\% accurate prediction of gene expression.
  • Performed data analysis based on RNA Sequencing data and PCR data and completed bioinformatics pathways analysis using network above. Also predicted by comparing model with existing networks using methods of link prediction.
  • [Paper under submission, first author]Fiber and Transverse Stretch Mediate Differential Transcriptional Responses in Mouse Neonatal Ventricular Myocytes
  • [Paper under submission, first author]Uncertainty Quantification in Cardiac and Cardiovascular Modelling and Simulation

Selected Project I: NLP Application in Predicting Psychological Health from Childhood Essays

La Jolla, CA
04/2018 - 06/2018
  • Cleaned and processed clinical notes data using implemented language model and error model for spelling corrector and also collected features using bag-of-word and embedded word2vec algorithms.
  • Implemented three main methods, KNN using counts, logistic regression using TF-IDF, single-layer neural network/adaboost/random forest using word2vec to train the data respectively.
  • Measured models' performances by comparing several statistical indicators such as F1 score, precision and ROC curve and determined that single-layer neural network with TF-IDF can be utilized in predictions of psychological diseases from early stage resources.

Selected Project II: E-commercial Recommendation Systems Based on Link Analysis

La Jolla, CA
04/2017 - 06/2017
  • Implemented a machine learning algorithm which can calculate similarity based attributes of nodes and several other network features based on link prediction using graph mining methodology on the dataset of Amazon product co-purchasing network metadata for product recommendation.
  • Compared the performance of product and customer similarity graphs and the accuracy of different algorithms and explore in detail the attributes and network properties of Amazon product co-purchasing network dataset.

Industry Experiences


Menlo Park, CA
Machine Learning Engineering Intern
06/2019 - 09/2019
  • Trained the categorization and taxonomy of user interests using TF-IDF and K-Means which efficiently and accurately extract interest hierarchy. Then built the platform of user-group connection and deployed the ML algorithms in recommendations of users to potential groups based on these interests on different granularities. This is now under experiment testing with 1 million targeted users.
  • Analyzed user and group level features to train models to determine group leaders among selected users for guiding new group creation(Model optimization included). It achieved 80% positive prediction and more than 60% true positive rate (5% and 10% increase compared with old model, respectively).

Novartis Institutes for Biomedical Research (Biostatistics Case: A Drop Everything Situation)

Cambridge, MA
Visiting Student, advised by Dr. Brian Smith
  • Processed a clinical trial placebo dataset of diabetes with feature engineering form the 15 raw features and implemented several methods including logistic regression, multivariate regression, SVM and KNN in classifying key factors influencing body ALT values responding to the drug.
  • Concluded dosing effect and pharmacodynamical effects of the drug are the main features affecting patients' physiological response and potential side effects and made corresponding suggestions (further experiments tested).


  • Quantification of Model and Data Uncertainty in a Systems Model of the Cardiac Myocyte Mechano-Signaling Network. (1st author, under review)
  • Quantification of Uncertainty in a New Network Model of Pulmonary Arterial Adventitial Fibroblast Pro-fibrotic Signaling. (2nd author, under review)
  • Fiber and Transverse Stretch Mediate Differential Transcriptional Responses in Mouse Neonatal Ventricular Myocytes. (1st author, under review)


Doctor of Philosophy in Computational Biology and Bioinformatics

2015 - 2020(Expected)
  • Ph.D. Candidate

Bachelor of Science

2011 - 2015
  • Nationaly Endeavor Scholarship (13/343)

Professional Skills

Top Skills



Advanced, 4 years

CSE 258 – Web Mining and Recommender Systems
CSE 250A – Principles of Artificial Intelligence: Probabilistic Reasoning and Decision-Making
Phys 244 – Parallel Comput.Sci & Engineer
CSE 291 – Graph Mining & Network ANLYS



Advanced, 3 years

FMPH 221 – Biostatistical Methods
MED 263 – Bioinformatic Appl/Hum Disease
MATH 284 – Survival Analysis



Advanced, 2 years

Machine Learning and Statistical Modeling

Advanced, 3 years

CSE 291 – Advanced Data Analytics and ML Systems
Math 289 – Topics/Probability&Stats: tat Learning & Data Science


Biomedical Health

Advanced, 3 years

Genetical Engineering
Transport Phenom/Living Systms



Advanced, 20 years

Fan of PokeMon
Fan of Nintendo

Other Skills

CSS HTML5 Github Matlab openMP LaTeX
Java SQL Survival Analysis Mandarin

Get in Touch

As expected to graduate in late 2020, I'm also open to any future full time positions for 2020 or summer opportunities.

I can help with the following:

  • Machine Learning and Data Analysis
  • Biostatistical Analysis for Clinical Data
  • Bioinformatics Algorithm Development
  • Imaging Process for Diagnosis of Diseases

Drop me a line at

Choose Color