Academic Positions

  • Present 2012

    Associate Professor

    George Mason University, Department of Statistics

  • 2012 2006

    Assistant Professor

    George Mason University, Department of Statistics

  • 2006 2006

    Postdoctoral Fellow

    University of North Carolina at Chapel Hill, Department of Biostatistics

Education

  • Ph.D. December, 2005

    Biostatistics

    University of North Carolina at Chapel Hill

  • M.S.July, 2001

    Probability and Statistics

    University of Science and Technology of China

  • B.E.July, 1998

    Electronic Engineering and Information Science

    University of Science and Technology of China

Honors and Awards

  • 2009
    Travel Award (ENAR Junior Researcher Workshop)
  • 2008
    Travel Award (IMS Young Researcher Conference)
  • 2006
    Barry H. Margolin Dissertation Award (UNC-CH)
  • 2005
    Graduate School Transportation Grant (UNC-CH)
  • 2004
    Distinguished Student Paper Award (ENAR)

Filter by type:

Sort by year:

Density ratio model for multivariate outcomes

Scott Marchese, Guoqing Diao
Journal Paper Journal of Multivariate Analysis, Volume 154, February 2017, Pages 249-261

Abstract

The Density Ratio Model is a semi-parametric regression model which allows analysis of data from any exponential family without making a parametric distribution assumption. For univariate outcomes several authors have shown desirable properties of this model including robustness to mis-specification and efficiency of the estimators within a suitable class. In this paper we consider analysis of multivariate outcomes with this model, where each marginal distribution is from an exponential family. We show that the model successfully analyzes data from mixed outcome types (continuous, integer, binary), providing valid tests of the joint effects of covariates. Furthermore, for continuous outcomes we provide a bootstrap technique which correctly estimates the underlying marginal regression parameters and provides appropriate coverage probabilities without specifying the covariance structure. The methods are demonstrated via simulation studies and analysis of healthcare data.

Joint regression analysis of mixed-type outcome data via efficient scores

Scott Marchese, Guoqing Diao
Journal Paper Computational Statistics & Data Analysis, Volume 125, September 2018, Pages 156-170

Abstract

Joint analysis of multivariate outcomes composed of mixed data types (continuous, count, binary, survival, etc.) induces special complexity in model specification and analysis. When the scientific question of interest involves a joint effect of covariate(s) of interest on the set of outcome variables, specifying a full probability model may be infeasible, undesirably complex, or computationally intractable. A flexible method to estimate and conduct inference on such joint effects is presented which accounts for correlation among the outcomes without needing to explicitly specify their joint distribution. Simulation studies and an analysis of health care data illustrate the approach and its operating characteristics vis-à-vis other methods.

Joint factor and regression analyses of multivariate ordinal data - Application to psychiatric assessments

Guoqing Diao, Srikanth Gottipati, Peter Zhang
Manuscript October 2018, Pages 1-26

Abstract

Semiparametric frailty models for zero-inflated event count data in the presence of informative dropout

Guoqing Diao, Donglin Zeng, Kuolung Hu, Joseph G Ibrahim
Manuscript July 2018, Pages 1-30

Abstract

Robust big data analytics via divergences

Anand N Vidyashankar, Lei Li, Guoqing Diao, Ejaz Ahmed
Manuscript July 2018, Pages 1-30

Abstract

Efficient methods for signal detection from correlated adverse events in clinical trials

Guoqing Diao, Guanghan F Liu, Donglin Zeng, William Wang, Xianming Tan, Joseph F Heyse, Joseph G Ibrahim
Manuscript July 2018, Pages 1-30

Abstract

Semiparametric regression analysis for composite endpoints subject to componentwise censoring

Guoqing Diao, Donglin Zeng, Chunlei Ke, Haijun Ma, Qi Jiang, Joseph G. Ibrahim
Journal Paper Biometrika, Volume 105, Issue 2, June 2018, Pages 403-418

Abstract

Composite endpoints with censored data are commonly used as study outcomes in clinical trials. For example, progression-free survival is a widely used composite endpoint, with disease progression and death as the two components. Progression-free survival time is often defined as the time from randomization to the earlier occurrence of disease progression or death from any cause. The censoring times of the two components could be different for patients not experiencing the endpoint event. Conventional approaches, such as taking the minimum of the censoring times of the two components as the censoring time for progression-free survival time, may suffer from efficiency loss and could produce biased estimates of the treatment effect. We propose a new likelihood-based approach that decomposes the endpoints and models both the progression-free survival time and the time from disease progression to death. The censoring times for different components are distinguished. The approach makes full use of available information and provides a direct and improved estimate of the treatment effect on progression-free survival time. Simulations demonstrate that the proposed method outperforms several other approaches and is robust against various model misspecifications. An application to a prostate cancer clinical trial is provided.

A class of semiparametric cure models with current status data

Guoqing Diao, Ao Yuan
Journal Paper Lifetime Data Analysis, Feb 2018 (online), Pages 1-26

Abstract

Current status data occur in many biomedical studies where we only know whether the event of interest occurs before or after a particular time point. In practice, some subjects may never experience the event of interest, i.e., a certain fraction of the population is cured or is not susceptible to the event of interest. We consider a class of semiparametric transformation cure models for current status data with a survival fraction. This class includes both the proportional hazards and the proportional odds cure models as two special cases. We develop efficient likelihood-based estimation and inference procedures. We show that the maximum likelihood estimators for the regression coefficients are consistent, asymptotically normal, and asymptotically efficient. Simulation studies demonstrate that the proposed methods perform well in finite samples. For illustration, we provide an application of the models to a study on the calcification of the hydrogel intraocular lenses.

Biomarker threshold adaptive designs for survival endpoints

Guoqing Diao, Jun Dong, Donglin Zeng, Chunlei Ke, Alan Rong, Joseph G. Ibrahim
Journal Paper Journal of Biopharmaceutical Statistics, February 2018 (published online), Pages 1-17

Abstract

Due to the importance of precision medicine, it is essential to identify the right patients for the right treatment. Biomarkers, which have been com- monly used in clinical research as well as in clinical practice, can facilitate selection of patients with a good response to the treatment. In this paper, we describe a biomarker threshold adaptive design with survival endpoints. In the first stage, we determine subgroups for one or more biomarkers such that patients in these subgroups benefit the most from the new treatment. The analysis in this stage can be based on historical or pilot studies. In the second stage, we sample subjects from the subgroups determined in the first stage and randomly allocate them to the treatment or control group. Extensive simulation studies are conducted to examine the performance of the proposed design. Application to a real data example is provided for implementation of the first-stage algorithms.

Quantification of muscle tissue properties by modeling the statistics of ultrasound image intensities using a mixture of Gamma distributions in children with and without cerebral palsy

Siddhartha Sikdar, Guoqing Diao, Diego Turo, Christopher J. Stanley, Abhinav Sharma, Amy Chambliss, Loretta Laughrey, April Aralar, Diane L. Damiano
Journal Paper Journal of Ultrasound in Medicine, Volume 37, Issue 9, September 2018, Pages 2157-2169

Abstract

Objectives

To investigate whether quantitative ultrasound (US) imaging, based on the envelope statistics of the backscattered US signal, can describe muscle properties in typically developing children and those with cerebral palsy (CP).

Methods

Radiofrequency US data were acquired from the rectus femoris muscle of children with CP (n = 22) and an age‐matched cohort without CP (n = 14) at rest and during maximal voluntary isometric contraction. A mixture of gamma distributions was used to model the histogram of the echo intensities within a region of interest in the muscle.

Results

Muscle in CP had a heterogeneous echo texture that was significantly different from that in healthy controls (P < .001), with larger deviations from Rayleigh scattering. A mixture of 2 gamma distributions showed an excellent fit to the US intensity, and the shape and rate parameters were significantly different between CP and control groups (P < .05). The rate parameters for both the single gamma distribution and mixture of gamma distributions were significantly higher for contracted muscles compared to resting muscles, but there was no significant interaction between these factors (CP and muscle contraction) for a mixed‐model analysis of variance.

Conclusions

Ultrasound tissue characterization indicates a more disorganized architecture and increased echogenicity in muscles in CP, consistent with previously documented increases in fibrous infiltration and connective tissue changes in this population. Our results indicate that quantitative US can be used to objectively differentiate muscle architecture and tissue properties.

Modeling event count data in the presence of informative dropout with application to bleeding and transfusion events in myelodysplastic syndrome

Guoqing Diao, Donglin Zeng, Kuolung Hu, Joseph G. Ibrahim
Journal Paper Statistics in Medicine, September 2017, Volume 36, Issue 22, Pages 3475-3494

Abstract

In many biomedical studies, it is often of interest to model event count data over the study period. For some patients, we may not follow up them for the entire study period owing to informative dropout. The dropout time can potentially provide valuable insight on the rate of the events. We propose a joint semiparametric model for event count data and informative dropout time that allows for correlation through a Gamma frailty. We develop efficient likelihood‐based estimation and inference procedures. The proposed nonparametric maximum likelihood estimators are shown to be consistent and asymptotically normal. Furthermore, the asymptotic covariances of the finite‐dimensional parameter estimates attain the semiparametric efficiency bound. Extensive simulation studies demonstrate that the proposed methods perform well in practice. We illustrate the proposed methods through an application to a clinical trial for bleeding and transfusion events in myelodysplastic syndrome.

A class of semiparametric cure models with current status data

Guoqing Diao, Ao Yuan
Journal Paper Lifetime Data Analysis, Feb 2018 (online), Pages 1-26

Abstract

Current status data occur in many biomedical studies where we only know whether the event of interest occurs before or after a particular time point. In practice, some subjects may never experience the event of interest, i.e., a certain fraction of the population is cured or is not susceptible to the event of interest. We consider a class of semiparametric transformation cure models for current status data with a survival fraction. This class includes both the proportional hazards and the proportional odds cure models as two special cases. We develop efficient likelihood-based estimation and inference procedures. We show that the maximum likelihood estimators for the regression coefficients are consistent, asymptotically normal, and asymptotically efficient. Simulation studies demonstrate that the proposed methods perform well in finite samples. For illustration, we provide an application of the models to a study on the calcification of the hydrogel intraocular lenses.

Analysis of Secondary Phenotype Data under Case-Control Designs

Guoqing Diao, Donglin Zeng, Dan-Yu Lin
Book Chapter in Handbook of Statistical Methods for Case-Control Studies | CRC Press | June 27, 2018
image

Chapter 28. Analysis of Secondary Phenotype Data under Case-Control Designs

Although the primary objective of case-control studies is to assess the effects of genetic variants between cases and controls, secondary phenotypes are often collected in such studies without much extra cost. For example, in the Diabetes Genetics Initiative (DGI) study, there were 1,464 patients with type 2 diabetes and 1,467 controls from Finland and Sweden, while at the same time, a variety of secondary phenotype traits were available for these patients, including anthropometric measures, glucose tolerance and insulin secretion, lips and apoliporoteins and blood pressure. These secondary phenotypes are typically the exposures/risk-factors of interest for the main outcome. In the Wellcome Trust Case Control Consortium (WTCCC), a case-control study consisting of 1,924 U.K. type-2 diabetes patients and 2,938 U.K. population controls, body mass index (BMI) and adult height were also measured as secondary traits in the study. With the availability of second phenotype information, it is cost-effective to study the association between genetic variants and these additional traits without need to conduct new studies. Indeed, the DGI study identified association of a particular single nucleotide polymorphism (SNP) in an intron of glucokinase regulatory protein with serum triglycerides in both case and control groups.

Current Teaching

  • Present 2015

    STAT778: Algorithms and Simulation for Statistics in C

Teaching History

  • 2014 2014

    STAT771: Spatial Data Analysis

  • 2015 2011

    STAT668: Survival Analysis

  • 2012 2012

    STAT655: Analysis of Variance

  • 2008 2007

    STAT652: Statistical Inference

  • 2007 2007

    STAT660: Biostatistical Methods

  • 2009 2008

    STAT554: Applied Statistics

  • 2011 2011

    STAT554: Applied Statistics

  • 2013 2013

    STAT505: Introduction to R

  • 2013 2012

    STAT362: Introduction to Computer Statistical Packages

  • 2009 2008

    STAT554: Applied Statistics

  • 2013 2013

    STAT665: Categorical Data Analysis

  • 2014 2014

    STAT673/773: Statistical Methods for Longitudinal Data Analysis

  • 2008 2007

    STAT673/773: Statistical Methods for Longitudinal Data Analysis

At My Office

You can find me at my office located in Room 1709 of the Nguyen Engineering Building at George Mason University. Please email me to schedule an appointment before you stop by.

Coffee Break

I would be happy to talk to you on potential reserach collaborations at Panera Bread, my favoriate coffee shop on campus.




tumblr tracker