Shulin Cao, Ph.D.

I am now a Research Scientist at Meta. I obtained my Ph.D. from UCSD (UC San Diego, United States) and B.S. from HUST (Huazhong University of Science and Technology, China), speciallized in Computational Biology and Bioinformatics.

Museum * Tennis * Photography * Cats.

Research Experiences

CMRG at UC San Diego

La Jolla, CA
Ph.D. Candidate, advised by Prof. Andrew McCulloch
09/2016 - 11/2021
  • Constructed a functional gene regulatory network using ODE systems modeling, implemented clustering methods on genomic data and trained graph neural network to detect potential interactions of genes, which achieved 72% accurate prediction of gene expression.
  • Performed data analysis based on RNA Sequencing data and PCR data and completed bioinformatics pathways analysis using network above. Also predicted by comparing model with existing networks using methods of link prediction.

Selected Project I: NLP Application in Predicting Psychological Health from Childhood Essays

La Jolla, CA
04/2018 - 06/2018
  • Cleaned and processed clinical notes data using implemented language model and error model for spelling corrector and also collected features using bag-of-word and embedded word2vec algorithms.
  • Implemented three main methods, KNN using counts, logistic regression using TF-IDF, single-layer neural network/adaboost/random forest using word2vec to train the data respectively.
  • Measured models' performances by comparing several statistical indicators such as F1 score, precision and ROC curve and determined that single-layer neural network with TF-IDF can be utilized in predictions of psychological diseases from early stage resources.

Selected Project II: E-commercial Recommendation Systems Based on Link Analysis

La Jolla, CA
04/2017 - 06/2017
  • Implemented a machine learning algorithm which can calculate similarity based attributes of nodes and several other network features based on link prediction using graph mining methodology on the dataset of Amazon product co-purchasing network metadata for product recommendation.
  • Compared the performance of product and customer similarity graphs and the accuracy of different algorithms and explore in detail the attributes and network properties of Amazon product co-purchasing network dataset.

Industry Experiences

Meta

Menlo Park, CA
Research Scientist
11/2021 - Now
  • Instagram Feed Ranking.
  • Private Sharing Ranking.
  • Payment Risk Ranking.

Facebook

Menlo Park, CA
Machine Learning Engineering Intern
06/2020 - 09/2020
  • Search Typeahead Suggestion and Disambiguation.

Facebook

Menlo Park, CA
Machine Learning Engineering Intern
06/2019 - 09/2019
  • Trained the categorization and taxonomy of user interests using TF-IDF and K-Means which efficiently and accurately extract interest hierarchy. Then built the platform of user-group connection and deployed the ML algorithms in recommendations of users to potential groups based on these interests on different granularities. This is now under experiment testing with 1 million targeted users.
  • Analyzed user and group level features to train models to determine group leaders among selected users for guiding new group creation(Model optimization included). It achieved 80% positive prediction and more than 60% true positive rate (5% and 10% increase compared with old model, respectively).

Novartis Institutes for Biomedical Research (Biostatistics Case: A Drop Everything Situation)

Cambridge, MA
Visiting Student, advised by Dr. Brian Smith
08/2018
  • Processed a clinical trial placebo dataset of diabetes with feature engineering form the 15 raw features and implemented several methods including logistic regression, multivariate regression, SVM and KNN in classifying key factors influencing body ALT values responding to the drug.
  • Concluded dosing effect and pharmacodynamical effects of the drug are the main features affecting patients' physiological response and potential side effects and made corresponding suggestions (further experiments tested).

Selected Publications

  • Quantification of Model and Data Uncertainty in a Systems Model of the Cardiac Myocyte Mechano-Signaling Network. (1st author, under review)
  • Quantification of Uncertainty in a New Network Model of Pulmonary Arterial Adventitial Fibroblast Pro-fibrotic Signaling. (2nd author, under review)
  • Fiber and Transverse Stretch Mediate Differential Transcriptional Responses in Mouse Neonatal Ventricular Myocytes. (1st author, under review)

Education

Doctor of Philosophy in Computational Biology and Bioinformatics

2015 - 2021
  • Ph.D. Candidate

Bachelor of Science

2011 - 2015
  • Nationaly Endeavor Scholarship (13/343)

Professional Skills

Top Skills

95%

Python

Advanced, 4 years

CSE 258 – Web Mining and Recommender Systems
CSE 250A – Principles of Artificial Intelligence: Probabilistic Reasoning and Decision-Making
Phys 244 – Parallel Comput.Sci & Engineer
CSE 291 – Graph Mining & Network ANLYS

90%

R

Advanced, 3 years

FMPH 221 – Biostatistical Methods
MED 263 – Bioinformatic Appl/Hum Disease
MATH 284 – Survival Analysis

85%

C/C++

Advanced, 2 years
90%

Machine Learning and Statistical Modeling

Advanced, 3 years

CSE 291 – Advanced Data Analytics and ML Systems
Math 289 – Topics/Probability&Stats: tat Learning & Data Science

90%

Biomedical Health

Advanced, 3 years

Phisiology
Genetical Engineering
Transport Phenom/Living Systms
Genomics

100%

PokeMon

Advanced, 20 years

Fan of PokeMon
Fan of Nintendo

Other Skills

CSS HTML5 Github Matlab openMP LaTeX
Java SQL Survival Analysis Mandarin

Get in Touch

TO UPDATE LIFE STATUS.

I can help with the following:

  • Machine Learning and Data Analysis
  • Biostatistical Analysis for Clinical Data
  • Bioinformatics Algorithm Development
  • Imaging Process for Diagnosis of Diseases

Drop me a line at shc131@ucsd.edu.

Choose Color