Ying Jin

Assistant Professor of Statistics and Data Science

Contact Information

Primary Email:
yjinstat@wharton.upenn.edu

office Address:
441 Academic Research Building
265 South 37th Street
Philadelphia, PA 19104

Research Interests: Uncertainty quantification, Distribution-free inference, Causal inference, Selective inference, Generalizability.

Links: Personal Website

Research

Kexin Huang, Ying Jin, Ryan Li, Michael Li, Emmanuel J. Candes, Jure Leskovec (2025), Automated hypothesis validation with agentic sequential falsifications, International Conference on Machine Learning (ICML).
Yu Gui, Ying Jin, Yash Nair, Zhimei Ren, ACS: An interactive framework for conformal selection.
Yash Nair, Ying Jin, James Yang, Emmanuel J. Candes, Diversifying conformal selections.
Ying Jin and Zhimei Ren (2025), Confidence on the focal: Conformal prediction with selection-conditional coverage, Journal of the Royal Statistical Society Series B: Statistical Methodology.
Ying Jin, Zhimei Ren, Zhuoran Yang, Zhaoran Wang (2025), Policy learning “without” overlap: pessimism and generalized empirical Bernstein’s inequality, Annals of Statistics (accepted).
Ying Jin, Naoki Egami, Dominik Rothenhausler, Beyond reweighting: On the predictive role of covariate shift in effect generalization.
Tian Bai and Ying Jin, Optimized Conformal Selection: Powerful selective inference after conformity score optimization.
Ying Jin, Zhuoran Yang, Zhaoran Wang (2024), Is pessimism provably efficient for offline RL?, Mathematics of Operations Research.
Yu Gui, Ying Jin, Zhimei Ren (2024), Conformal alignment: Knowing when to trust foundation models with guarantees, Advances in Neural Information Processing Systems (NeurIPS).
Kexin Huang, Ying Jin, Emmanuel J. Candes, Jure Leskovec (2023), Uncertainty quantification over graph with conformalized graph neural networks, Advances in Neural Information Processing Systems (NeurIPS).
All Research from Ying Jin »

Teaching

Past Courses

AMCS9950 - Dissertation
Allows for a PhD student to be enrolled full-time to work exclusively on research, writing and preparing his/her doctoral thesis and defense. All required coursework (20 CUs) must be completed, and the student must have passed his/her thesis proposal/oral candidacy examination prior to being enrolled.
AMCS9999 - Ind Study & Research
Study under the direction of a faculty member.
STAT4710 - Modern Data Mining
With the advent of the internet age, data are being collected at unprecedented scale in almost all realms of life, including business, science, politics, and healthcare. Data mining—the automated extraction of actionable insights from data—has revolutionized each of these realms in the 21st century. The objective of the course is to teach students the core data mining skills of exploratory data analysis, selecting an appropriate statistical methodology, applying the methodology to the data, and interpreting the results. The course will cover a variety of data mining methods including linear and logistic regression, penalized regression (including lasso and ridge regression), tree-based methods (including random forests and boosting), and deep learning. Students will learn the conceptual basis of these methods as well as how to apply them to real data using the programming language R. This course may be taken concurrently with the prerequisite with instructor permission.
STAT5710 - Modern Data Mining
Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging real-life data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class. Prerequisite: two courses at the statistics 4000 or 5000 level or permission from instructor.
STAT9910 - Sem in Adv Appl of Stat
This seminar is for graduate students who wish to learn about current research frontiers. It covers advanced topics in probability, statistical theory and methods, applied statistics, data science and artificial intelligence. Specific topics vary from year to year and emphasize both theoretical foundations and applications.

Awards And Honors

Activity

Latest Research

Kexin Huang, Ying Jin, Ryan Li, Michael Li, Emmanuel J. Candes, Jure Leskovec (2025), Automated hypothesis validation with agentic sequential falsifications, International Conference on Machine Learning (ICML).

All Research

In the News

Transforming Health Care Logistics With Low-Cost AI

By forecasting demand and correcting for missing data, researchers from Wharton and Penn Engineering developed a low-cost AI tool that helps get life-saving medicines to the communities in Sierra Leone that need them most.…Read More

Knowledge @ Wharton - 2026/07/28

All News

Ying Jin

Contact Information

Research

Teaching

Past Courses

AMCS9950 - Dissertation

AMCS9999 - Ind Study & Research

STAT4710 - Modern Data Mining

STAT5710 - Modern Data Mining

STAT9910 - Sem in Adv Appl of Stat

Awards And Honors

In the News

Knowledge @ Wharton

Activity

Latest Research

In the News