Nancy R. Zhang

Nancy R. Zhang
  • Ge Li and Ning Zhao Professor
  • Professor of Statistics and Data Science
  • Vice Dean of Wharton Doctoral Programs

Contact Information

  • office Address:

    431 Academic Research Building
    265 South 37th Street
    Philadelphia, PA 19104

Research Interests: genomics, change-point methods, empirical bayes estimation, model and variable selection, scan statistics, statistical modeling

Links: CV, Lab Website

Overview

Dr. Zhang is a Ge Li and Ning Zhao Professor of Statistics in The Wharton School at University of Pennsylvania.  Her research focuses primarily on the development of statistical methods and computational algorithms for the analysis of data from high-throughput biological experiments.  She has made contributions to copy number and structural variant detection, to the modeling and estimation of intra-tumor genetic heterogeneity, and to the modeling and analysis of single-cell and spatial genomic data.  In Statistics, she has made contributions to change-point analysis, variable selection, and model selection.

Dr. Zhang obtained her Ph.D. in Statistics in 2005 from Stanford University.  After one year of postdoctoral training at University of California, Berkeley, she returned to the Department of Statistics at Stanford University as Assistant Professor in 2006.  She received the Sloan Fellowship in 2011, and formally moved to University of Pennsylvania with tenure in 2012.  She was awarded the Medallion Lectureship by the Institute of Mathematical Statistics in 2021 and the P.R. Krishnaiah Memorial Lectureship in 2023.  Her work has been funded by grants from the NSF, NIH, and Mark Foundation.  At Penn, she is a member of the Abramson Cancer Center and the Graduate Group in Genomics and Computational Biology, and Senior Fellow of Institute of Biomedical Informatics.  Dr. Zhang currently serves as the Vice Dean of the Wharton Doctoral Program.

Here are some of Dr. Zhang’s representative publications, categorized by topic (ǂalphabetical ordering, *corresponding author):

Computational methods for single cell data denoising, batch correction, and transfer learning

    1. Huang M, Wang J, Torre E, Dueck H, Shaffer S, Bonasio R, Murray J, Raj A, Li M, and Zhang NR* (2018) Gene expression recovery in single-cell RNA sequencing, Nature Methods 15, 539-542. PMID: 29941873 PMCID: PMC6030502. (R package: SAVER).
    2. Wang J, Huang M, Torre E, Dueck H, Shaffer S, Murray J, Raj A, Li M, and Zhang NR* (2018) Gene expression distribution deconvolution in single cell RNA sequencing, Proceedings of the National Academy of Sciences 115 (28) E6437-E6446. PMID: 2994020 PMCID: PMC6048536 (R package: DESCEND).
    3. Wang J, Agarwal D, Huang M, Hu G, Zhou Z, Conley VB, MacMullan H, Li M, Zhang NR* (2019) Data denoising with transfer learning in single-cell transcriptomics. Nature Methods. 16, 875. PMID: 31471617 PMCID: PMC7781045.
    4. Zhang Z, Mathew D, Lim T, Mason K, Martinez CM, Huang S, Wherry EJ, Susztak K, Minn AJ, Ma Z and Zhang NR* (2024) Signal recovery in single cell batch integration. Nature Biotechnology, Accepted in principle. (Python package: CellANOVA)

Computational methods for integrative multi-omic modeling of single cell data

    1. Zhou Z, Ye C, Wang J, Zhang NR* (2020) Surface protein imputation from single cell transcriptomes by deep neural networks, Nature Communications 11, Article number: 651
    2. Wu C-Y, Lau BT, Kim H, Sathe A, Grimes SM, Ji HP, Zhang NR* (2021) Integrative single-cell analysis of allele-specific copy number alterations and chromatin accessibility in cancer. Nature Biotechnology, 39, 1259
    3. Jiang Y, Harigaya Y, Zhang Z, Zhang H, Zang C, and Zhang NR* (2022) Nonparametric single-cell multiomic characterization of trio relationships between transcription factors, target genes, and cis-regulatory regions. Cell Systems 13, 737.
    4. Lin K and Zhang NR* (2023) Quantifying common and distinct information in single-cell multimodal data with Tilted Canonical Correlation Analysis. Proceedings of the National Academy of Sciences, 120 (32) e2303647120. (R package: tiltedCCA)

Computational methods for analysis of spatial genomic data and integration of spatial and single cell data.

    1. Chen S, Zhu B, Huang S, Hickey JW, Lin KZ, Snyder M, Greenleaf WJ, Nolan GP, Zhang NR*, Ma Z* (2023) Integration of spatial and single-cell data across modalities with weak linkage.  Nature Biotechnology,  https://doi.org/10.1038/s41587-023-01935-0 (Python package: MaxFuse)
    2. Mason K, Sathe A, Hess P, Rong J, Wu C-Y, Furth E, Susztak K, Levinsohn J, Ji HP, Zhang NR* (2024) Niche-DE: niche differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell-cell interactions. Genome Biology 25, 14. (R package: NicheDE)

DNA copy number estimation, variant detection and inference

    1. Zhang NR, Senbabaoglu Y, Li J* (2010) Joint estimation of DNA copy number from multiple platforms, Bioinformatics 26, 153.
    2. Chen H, Xing H, Zhang NR* (2011) Estimation of parent specific DNA copy number in tumors using high-density genotyping arrays, PLoS Computational Biology 7, e1001060.
    3. Shen J, Zhang NR* (2012) Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing, Annals of Applied Statistics 6, 476.
    4. Chen H, Bell JM, Zavala NA, Ji HP, Zhang NR* (2015) Allele-specific copy number profiling by next-generation DNA sequencing, Nucleic Acids Research 43, e23.
    5. Jiang Y, Oldridge DA, Diskin SJ, Zhang NR * (2015) CODEX: a normalization and copy number variation detection method for whole exome sequencing, Nucleic Acids Research 43, e39.

Intra-tumor heterogeneity and cancer genomics.

    1. Jiang Y, Qiu Y, Minn AJ, Zhang NR* (2016) Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing, Proceedings of the National Academy of Sciences 113, E5528.
    2. Muralidharan O, Natsoulis G, Bell J, Ji H, Zhang NR* (2012) Detecting mutations in mixed sample sequencing data using empirical Bayes, Annals of Applied Statistics 6, 1047.
    3. Xia LC, Bell JM, Wood-Bouwens C, Chen JJ, Zhang NR*, Ji HP* (2017) Single molecule-based discovery of complex genomic rearrangements, Nucleic Acids Research 46, e19.

Change-point detection and scan statistics

    1. Zhang NR, Siegmund DO (2007) A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data, Biometrics 63, 22.
    2. Chan HP, Zhang NR ǂ (2007) Scan statistics with weighted observations, Journal of the American Statistical Association, 102, 595.
    3. Zhang NR, Siegmund DO, Ji H, Li J (2010) Detecting simultaneous changepoints in multiple sequences, Biometrika 97, 631.
    4. Siegmund DO, Zhang NR, Yakir B (2011) False discovery rate for scanning statistics, Biometrika 98, 979.
    5. Chen H, Zhang NRǂ (2015) Graph-based change-point detection, The Annals of Statistics 43, 139.
    6. Zhang NR, Siegmund DO (2012) Model selection for high dimensional, multi-sequence change-point problems, Statistica Sinica 22, 1507.

General multiple testing control, high-dimensional inference

    1. Li F, Zhang NRǂ (2010) Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics, Journal of the American Statistical Association 105, 1202.
    2. Bickel PJ, Boley N, Brown JB, Huang H, Zhang NR ǂ (2010) Subsampling methods for genomic inference, Annals of Applied Statistics 4, 1660.
    3. Sun Y, Zhang NR and Owen A* (2012) Multiple hypothesis testing, adjusted for latent variables, with an application to the agemap gene expression data, Annals of Applied Statistics 6, 1664.

For a complete overview of Dr. Zhang’s publications, funded grants, and teaching, mentoring, and service work, see her CV above.

 

 

Continue Reading

Research

You can find the latest updates on my research on my lab website:

https://nzhanglab.github.io/

For a complete list of my publications and funded grants, the most trustworthy source is my CV (see link above). The searchable publication list below is only updated once per year.

Teaching

Past Courses

  • AMCS5999 - Independent Study

    Independent Study allows students to pursue academic interests not available in regularly offered courses. Students must consult with their academic advisor to formulate a project directly related to the student’s research interests. All independent study courses are subject to the approval of the AMCS Graduate Group Chair.

  • AMCS9999 - Ind Study & Research

    Study under the direction of a faculty member.

  • GCB6990 - Lab Rotation

    Lab rotation

  • GCB8990 - Pre-Dissertation Research

    Pre-dissertation lab research

  • GCB9950 - Dissertation

    Ph.D. students enroll in this course after passing their candidacy exam. They work on their dissertation full-time under the guidance of their dissertation supervisor and other members of their dissertation committee.

  • STAT4050 - Stat Computing with R

    The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.

  • STAT7050 - Stat Computing with R

    The goal of this course is to introduce students to the R programming language and related eco-system. This course will provide a skill-set that is in demand in both the research and business environments. In addition, R is a platform that is used and required in other advanced classes taught at Wharton, so that this class will prepare students for these higher level classes and electives.

  • STAT9910 - Sem in Adv Appl of Stat

    This seminar will be taken by doctoral candidates after the completion of most of their coursework. Topics vary from year to year and are chosen from advance probability, statistical inference, robust methods, and decision theory with principal emphasis on applications.

  • STAT9915 - Sem in Adv Appl of Stat

    This semester-long course explores the forefront of biomedical data science, focusing on the computational challenges in analyzing single-cell and spatial genomic data. Structured into six in-depth modules, the course offers a blend of lectures and journal club discussions to cover a range of topics in the field. Students will engage with current research topics in the area, develop critical data modeling skills, and learn to critique and enhance existing methodologies. Designed for both computational and biomedical backgrounds, the course provides a springboard into research topics in single cell and spatial genomics, equipping students with the tools to frame scientific problems computationally and to rigorously evaluate computational methods.

  • STAT9950 - Dissertation

    Dissertation

  • STAT9990 - Independent Study

    Written permission of instructor and the department course coordinator required to enroll.

  • STAT9999 - Independent Study

    Written permission of instructor and the department course coordinator required to enroll.

Awards And Honors

  • IMS Medallion Lecture, 2021
  • Sloan Fellowship, 2011
  • New World Silver Medal for Best PhD Thesis in Mathematical Sciences, 2007

In the News

Knowledge @ Wharton

Activity

Latest Research

Somabha Mukherjee, Divyansh Agarwal, Nancy Zhang, Bhaswar B. Bhattacharya (2022), Distribution-free multisample test based on optimal matching with applications to single cell genomics, Journal of the American Statistical Association, 117 (538), pp. 627-638.
All Research

In the News

How Dynamic Electricity Pricing Can Improve Market Efficiency

New research co-authored by Wharton's Arthur van Benthem demonstrates how consumers could benefit from aligning electricity prices with the cost of producing and distributing that power.Read More

Knowledge @ Wharton - 2024/11/12
All News

Awards and Honors

All Awards