Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Modern Inference Based on Health-Related Markers: Biomarkers and Statistical Decision Making
Modern Inference Based on Health-Related Markers: Biomarkers and Statistical Decision Making
Modern Inference Based on Health-Related Markers: Biomarkers and Statistical Decision Making
Ebook722 pages6 hours

Modern Inference Based on Health-Related Markers: Biomarkers and Statistical Decision Making

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Modern Inference Based on Health Related Markers: Biomarkers and Statistical Decision Making provides a compendium of biomarkers based methodologies for respective health related fields and health related marker-specific biostatistical techniques. These methodologies may be applied to various problems encountered in medical and epidemiological studies.

This book introduces correct and efficient testing mechanisms including procedures based on bootstrap and permutation methods with the aim of making these techniques assessable to practical researchers. In the biostatistical aspect, it describes how to correctly state testing problems, but it also includes novel results, which have appeared in current statistical publications. The book discusses also modern applied statistical developments that consider data-driven techniques, including empirical likelihood methods and other simple and efficient methods to derive statistical tools for use in health related studies.

The title is a valuable source for biostaticians, practitioners, theoretical and applied investigators, and several members of the biomedical field who are interested in learning more about efficient evidence-based inference incorporating several forms of markers measurements.

  • Combines modern epidemiological and public health discoveries with cutting-edge biostatistical tools, including relevant software codes, offering one full package to meet the demand of practical investigators
  • Includes the emerging topics from real health fields in order to display recent advances and trends in Biomarkers and associated Decision Making areas
  • Written by researchers who are leaders of Epidemiological and Biostatistical fields, presenting up-to-date investigations related to the measuring health issues, emerging fields of biomarkers, designing health studies and their implementations, clinical trials and their practices and applications, different aspects of genetic markers
LanguageEnglish
Release dateMar 18, 2024
ISBN9780128152485
Modern Inference Based on Health-Related Markers: Biomarkers and Statistical Decision Making

Related to Modern Inference Based on Health-Related Markers

Related ebooks

Medical For You

View More

Related articles

Reviews for Modern Inference Based on Health-Related Markers

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Modern Inference Based on Health-Related Markers - Albert Vexler

    Modern Inference Based on Health-Related Markers

    Biomarkers and Statistical Decision Making

    Edited by

    Albert Vexler

    Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    Jihnhee Yu

    Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    Jiaojiao Zhou

    Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    Table of Contents

    Cover image

    Title page

    Copyright

    Contributors

    Preface

    Chapter 1. An array of statistical concepts and tools for handling challenging data

    1. Preliminaries and basic components of relevant statistical instrument assortment

    2. Statistical approaches for problems of biomarker measurements

    3. A maximum likelihood approach to analyzing incomplete longitudinal data exemplified by mice tumor development

    4. Evaluating the effectiveness of interventions based on unbalanced data

    Appendix

    Chapter 2. A review of expected P-values and their applications in biomarkers studies

    1. Introduction

    2. The EPV in the context of an ROC curve analysis

    3. Multiple testing problems

    4. Examples

    5. Monte Carlo study

    6. Real data example

    7. Discussion

    Appendix

    Chapter 3. Latent class modeling approaches for studying the effects of chemical mixtures on disease risk

    1. Introduction

    2. Latent class model

    3. Latent class model for bivariate chemical patterns

    4. A latent function approach

    5. Discussion

    Chapter 4. Incomplete data in health studies

    1. Introduction

    2. Types of missing data

    3. Methods

    4. Statistical results

    5. Discussion

    Chapter 5. An introduction to biomarkers in translational research (2023)

    1. Introduction

    2. Biomarker discovery

    3. Biomarker validation

    4. Clinical research involving biomarker-derived targeted therapies

    5. Guidelines and conclusions

    Chapter 6. Collection and handling of biomarkers of inorganic arsenic exposure in statistical analyses

    1. Introduction

    2. Types of biomarkers of inorganic arsenic exposure

    3. Adjustments of urine dilution for urinary inorganic arsenic concentrations

    4. Data analysis

    5. Conclusion

    Chapter 7. Efficient sample pooling strategies for COVID-19 data gathering

    1. Introduction

    2. The Fisher information of pooled sampling

    3. Optimization

    4. Conclusions and discussion

    Chapter 8. Implications of childhood neighborhood quality for young adult parasympathetic reactivity

    1. Introduction

    2. The current study

    3. Method

    4. Results

    5. Discussion

    Chapter 9. Application of adaptive designs in clinical research

    1. Introduction

    2. Covariate adaptive randomization

    3. Response adaptive randomization

    4. Discussion

    Chapter 10. Comparison of multivariate pooling strategies based on skewed data in light of the receiver operating characteristic curve analysis

    1. Introduction

    2. Estimations of parameters based on pooled and unpooled data

    3. ROC curves

    4. Bivariate markers: Parameters' estimation based on pooled and unpooled data

    5. Best combination of biomarkers

    6. Monte Carlo simulations

    7. Real data analysis

    8. Discussion

    Appendix

    Chapter 11. ROC methods in biomarker development

    1. Introduction

    2. The bivariate-ROC model

    3. Other ROC models

    Chapter 12. Introduction of diffusion tensor imaging data: An overview for novice users

    1. Introduction

    2. Topics

    3. Concluding remarks

    Appendix

    Chapter 13. Genome-driven cancer site characterization: An overview of the hidden genome model

    1. Introduction

    2. The hidden genome model: An overview

    3. Data analysis

    4. Discussion

    Appendix

    Chapter 14. Thalamic volumetry via deep learning as an imaging biomarker in multiple sclerosis

    1. The thalamus as a potential biomarker

    2. Deep learning system and training data

    3. Validation of the proposed biomarker

    4. Implications and discussion

    Index

    Copyright

    Academic Press is an imprint of Elsevier

    125 London Wall, London EC2Y 5AS, United Kingdom

    525 B Street, Suite 1650, San Diego, CA 92101, United States

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    Copyright © 2024 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    ISBN: 978-0-12-815247-8

    For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

    Publisher: Stacy Masucci

    Acquisitions Editor: Linda Buschman

    Editorial Project Manager: Barbara Makinster

    Production Project Manager: Sajana Devasi P K

    Cover Designer: Matthew Limbert

    Typeset by TNQ Technologies

    Contributors

    Paul S. Albert,     Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, United States

    Prince A. Allotey,     Department of Statistics, University of Connecticut, Storrs, CT, United States

    Allison A. Appleton,     Department of Epidemiology and Biostatistics, University at Albany, State University of New York, Albany, NY, United States

    Kristopher Attwood,     Department of Biostatistics & Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo NY, United States

    Niels Bergsland,     Buffalo Neuroimaging Analysis Center, Department of Neurology, School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, United States

    Charles Bernick,     Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States

    Saptarshi Chakraborty,     Department of Biostatistics, University at Buffalo, State University of New York, Buffalo, NY, United States

    Zhen Chen,     Biostatistics & Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, MD, United States

    Gauri Desai,     Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, The State University of New York (SUNY) at Buffalo, Buffalo, NY, United States

    Carolee Dodge Francis,     Department of Civil Society and Community Studies, School of Human Ecology, University of Wisconsin–Madison, Madison, WI, United States

    Michael Dwyer

    Buffalo Neuroimaging Analysis Center, Department of Neurology, School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, United States

    Jacobs MS Center, Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, United States

    Beth J. Feingold,     Department of Environmental Health Sciences, University at Albany, State University of New York, Albany, NY, United States

    Xinyu Gao,     Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    Ofer Harel,     Department of Statistics, University of Connecticut, Storrs, CT, United States

    Elizabeth A. Holdsworth,     Department of Anthropology, The Ohio State University, Columbus, OH, United States

    Xuan Hong,     Depeartment of Thoracic Oncology, Harbin Medical University Cancer Hospital, Harbin, Heilongjiang, China

    Sung Duk Kim,     Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, United States

    Katarzyna Kordas,     Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, The State University of New York (SUNY) at Buffalo, Buffalo, NY, United States

    Victoria Ledsham,     Department of Psychology, University at Albany, State University of New York, Albany, NY, United States

    Betty Lin,     Department of Psychology, University at Albany, State University of New York, Albany, NY, United States

    Jingxia Liu,     Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine, St. Louis, MO, United States

    Jeffrey C. Miecznikowski,     Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    Austin Miller,     Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States

    Soyun Park,     Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    Guogen Shan,     Department of Biostatistics, University of Florida, Gainesville, FL, United States

    Michael Sill,     Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States

    Istvan Szapudi,     Institute for Astronomy, University of Hawaii, Honolulu, HI, United States

    Marie Vahter,     Karolinska Institutet, Stockholm, Sweden

    Albert Vexler,     Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    David Vexler,     Williamsville East High School, East Amherst, NY, United States

    Jihnhee Yu,     Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    Jiaojiao Zhou,     Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    Robert Zivadinov

    Buffalo Neuroimaging Analysis Center, Department of Neurology, School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, United States

    Jacobs MS Center, Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, United States

    Preface

    Biomarkers are highly significant in the field of medicine as they serve as crucial biological or medical traits that can be measured objectively. They act as indicators, providing valuable insights into the normal functioning of biological processes or the presence of pathological conditions. Additionally, biomarkers can effectively demonstrate the response of an individual to various therapies or treatments. The integration of biomarkers into medical practice has been steadily increasing. This integration holds immense promise for transforming the way health care is approached. By harnessing the power of biomarkers, healthcare professionals can gain deeper insights into the underlying mechanisms of diseases and tailor treatments more precisely to individual patients. Within the realm of biomarkers, certain types have the remarkable capability to act as early surrogates, providing valuable insights into the eventual clinical outcomes of individuals. Moreover, biomarkers play a significant role in guiding therapeutic decision-making processes. They provide crucial information that enables healthcare providers to identify individuals who are likely to respond positively to specific therapies. This personalized approach to treatment is highly beneficial, as it helps to optimize the allocation of healthcare resources and ensures that patients receive the most appropriate interventions for their specific condition.

    Scientifically rigorous and relevant statistical instruments play a central role in studies that revolve around biomarkers. In this era of advancing research and health care, there has been an exponential increase in the demand for statistical mechanisms that are specifically tailored to address health-related issues. These statistical tools are essential for analyzing complex data sets, drawing meaningful conclusions, and making informed decisions in biomarker research. The field of biomarker research is constantly growing, with new methodologies and practical approaches being developed. Given this dynamic landscape, it is much needed to organize scholarly works that serve as comprehensive compilations of the current state of research and practice in biomarker research fields. Such compilations help to consolidate knowledge, share insights, and provide a holistic view of the advancements in the field. These scholarly works address modern developments and recent emerging trends and issues in biomarker research. Additionally, these compilations serve as valuable resources for researchers and practitioners seeking to stay updated with the rapidly advancing field of biomarkers.

    Our objective is to present scholarly works that highlight the current utilization and potential of biomarkers in health applications and biostatistics, placing them at the forefront of biomarker research. We place particular emphasis on the significance of different types of biomarkers, addressing the diverse range of modern biomarkers in terms of their forms, structures, and their efficient utilization, along with the corresponding biostatistical implementations. These insights can effectively guide decision-making procedures in the context of health care. This book provides extensive, data-driven examples of biomarker measurements and their applications across various health-related domains. We delve into how advancements in different biomarkers can be applied in medical studies and drug development, targeting improved diagnostics, treatment strategies, and therapeutic interventions. It is important to note that while this book covers limited aspects of biomarker-based studies through several selected topics, its primary purpose is to serve as an introduction and incentive to explore this important field further. By conveying the necessities and significance of biomarker research, we aim to inspire readers to embark on a deeper exploration of this promising and rapidly advancing field.

    Despite the growing demand for biomarker-based discoveries, it has become apparent that existing scholarly works and programs rarely introduce diverse research methods pertaining to biomarkers and relevant biostatistical tools. Given the extensive nature of biomarker studies, the material presented in this book is necessarily limited; however, our endeavor is to gather pertinent materials by leveraging the expertise of field investigators who specialize in epidemiology and biostatistics. By drawing on their outputs, we aim to provide a broad resource that bridges the gap in the literature. Our hope is that this book will inspire readers and convey the importance of embracing innovative biomarker-based investigations for the advancement of health-related science.

    Chapter 1: An array of statistical concepts and tools for handling challenging data

    Albert Vexler, and Jihnhee Yu     Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, United States

    Abstract

    The aim of this chapter is to present introductory statistical concepts and illustrate diverse statistical techniques that showcase accurate and effective methods for handling challenging data. Specifically, this chapter focuses on addressing various situations, including complete or incomplete data affected by different types of measurement errors (ME) and dealing with imbalanced survey results derived from national data sources.

    In clinical experiments, instrumental inaccuracies, biological variation, or errors in questionnaire-based self-report data can produce significant MEs issues. Ignoring ME problems can cause bias or inconsistency of statistical decision-making schemes. We will focus on two sorts of MEs that are additive model errors and errors related to the limit of detection (LOD) due to the instrumental incapability of detecting low levels of biomarker measurements. Diverse statistical approaches have been created for analyzing data affected by MEs, including methods based on the parametric/nonparametric likelihood principles, Bayesian analysis, the single and multiple imputation techniques, and the repeated measurement design of experiment. In this framework, we first present a hybrid pooled–unpooled design as one of the strategies to evaluate data subject to MEs. This hybrid design and the classical techniques are compared to show the advantages/disadvantages of the considered methods. We note that currently the pooling technique receives more attention due to cost efficiency in testing large populations with respect to pandemic. Second, to exemplify the method on the parametric likelihood principles relevant to the LOD, in this chapter, we consider longitudinal mammary tumor development studies, where outcomes are affected by the following issues: (a) increases of missing data toward the end of the study; and (b) the presence of censored data caused by the detection limits of instrumental sensitivity. We show a test to carry out K-group comparisons based on the maximum likelihood approach. We apply the decision-making procedure using breast cancer in mice data.

    Finally, we pay our attention to the survey data, which may not be designed for measuring accurate effect of some factors or interventions. National-level publicly available survey data sets are feasible sources to evaluate the public health impact of the intervention and policy; we discuss procedures for accurate estimation and inference using such data sets. With respect to tools for analyzing unbalanced survey outcomes from national data resources, we examine extensive ranges of data-balancing techniques. We also discuss linearization methods by implementing influence function methods. This allows researchers to evaluate the variability for the relative risk in the complex survey settings. For an application, we present an estimation of seasonal influenza vaccine effectiveness. We show robust inferential performance across different relevant model assumptions.

    Keywords

    Biomarkers; Detection limit; Hybrid design; Measurement error

    1. Preliminaries and basic components of relevant statistical instrument assortment

    Health-related data-based experiments involve mathematically formalized tests, employing appropriate and efficient statistical mechanisms. Mathematical strategies to make decisions via formal rules play important roles in medical and epidemiological discovery, in policy formulation, and in clinical practice.

    The aim of the scientific methods in decision-making theory is to simultaneously maximize quantified gains and minimize losses in reaching a conclusion. For example, statements of clinical experiments can require maximizing factors (gains) such as accuracy of diagnosis of medical conditions, faster healing, and greater patient satisfaction, while minimizing factors (losses) such as efforts, durations of screening for disease, more side effects, and costs of the study.

    We need many constraints and formalisms while constructing statistical tests. An essential part of the test-constructing process is that statistical hypotheses should be clearly formulated with respect to the objectives of health-related studies.

    1.1. Statistical hypotheses

    Statistical hypotheses and the corresponding clinical hypotheses are associated but stated in different forms and orders. In most health-related experiments, we are interested in tests regarding characteristics or distributions of one or more populations. For example, suppose that the clinical hypothesis is that a treatment alters prognosis of a disease. In this case, the statistical hypothesis to be tested should be that the treatment has no effect. Here, we will test that the treatment has an effect of zero.

    The term Null Hypothesis, symbolized Equation , is commonly used to show our primary statistical hypothesis. For example, when the clinical hypothesis is that a biomarker of oxidative stress has different circulating levels with respect to patients with and without atherosclerosis, a null hypothesis, Equation , can be proposed corresponding to the assumption that levels of the biomarker in individuals with and without atherosclerosis are distributed equally. Note that the clinical hypothesis displays that we want to indicate the discriminating power of the biomarker, whereas Equation says there are not significant associations between the disease and biomarker's levels. The aim is to formulate Equation clearly and unambiguously, as well as quantify and calculate expected errors in decision-making procedures. If Equation were stated in a similar manner to the clinical hypothesis, we probably could not unambiguously determine which links between the disease and biomarker's levels we should test.

    1.2. Types of statistical errors related to the decision-making procedures

    The null hypothesis is usually a main focus of formulating statistical tests. The statistical testing procedure results in a decision to reject or not reject the null hypothesis. In this context, to provide a formal test procedure, schemes for controlling test characteristics associated with the probability to reject a correct hypothesis should be treated. To this end, we define the statistical power of a test as the probability that Equation is correctly rejected when Equation is false. In general, while developing and applying test procedures, the practical statistician faces the task of controlling the probability of the event that a test's outcome requests to reject Equation when in fact Equation is correct, a Type I Error. For example, assume that Equation is the test statistic based on the observed data, Equation is a threshold, and the decision rule is to reject Equation for large values of Equation , i.e., when Equation ; then, the threshold should be defined such that Equation , where Equation is a prefixed significance level, i.e., the probability of committing a Type I error. Note that when we compare two statistical tests, we mean to compare powers of the tests, given that the rate of Type I error is fixed.

    The practitioner may be interested to consider another, related type of errors in statistical testing procedures. If Equation is false but fails to be rejected, the incorrect decision of not rejecting Equation is called a Type II error. The Type II error rate can be defined as Equation , when we assume that Equation is the test statistic based on the observed data, Equation is a threshold, and the decision rule is to reject Equation when Equation . Type II errors may occur when the effect size, biases in testing procedures, and random variability combine to lead to results insufficiently inconsistent with Equation to reject it (Freiman et al., 1978). Essentially, it is the dichotomization of the study results into the categories significant or not significant that leads to Type I and Type II errors.

    1.3. P-values

    The frequentist decision-making procedure assumes to define a test threshold and reject or not reject Equation based on comparisons between values of tests statistics and the threshold. An alternative approach to hypothesis testing is to obtain the P-value.

    As a continuous measure of the compatibility between a hypothesis and data, a P-value is defined as the probability of obtaining a test statistic (a corresponding quantity computed from the data, such as a t-statistic) at least as extreme or close to the one that was actually observed, assuming that Equation is true (Goodman, 1999). P-values can be divided into two major types: one-sided (upper and lower) and two-sided. Assuming there are no biases in the data collection or the data analysis procedure, an upper one-sided P-value is the probability under the test hypothesis that the test statistic will be no less than the observed value. Similarly, a lower one-sided P-value is the probability under the test hypothesis that the test statistic will be no greater than the observed value. The two-sided P-value is defined as twice the smaller of the upper and lower P-values (Rothman et al., 2008; Goodman, 1999; Berger and Delampady, 1987).

    If the P-value is small, it can be interpreted that the sample produced a very rare result under Equation , i.e., the sample result is inconsistent with the null hypothesis statement. On the other hand, a large P-value indicates the consistency of the sample result with the null hypothesis. At the prespecified Equation significance level, the decision is to reject Equation when the P-value is less than or equal to Equation ; otherwise, the decision is to not reject Equation . Therefore, the P-value is the smallest level of significance at which Equation would be rejected. In addition to providing a decision-making mechanism, the P-value also sheds some light on the strength of the evidence against Equation (Gibbons and Chakraborti, 2011).

    Misinterpretations of P-values are common in clinical trials and epidemiology. In one of the most common misinterpretations, P-values are erroneously defined as the probabilities of test hypotheses. In many situations, the probability of the test hypothesis can be computed, but it will almost always be far from the two-sided P-value (Rothman et al., 2008). Note that the P-values can be viewed as a random variable, uniformly distributed between 0 and 1 if the null hypothesis is true. For example, suppose that the test statistic Equation has a cumulative distribution function (CDF) Equation under Equation and a CDF Equation under a one-sided upper-tailed alternative Equation . Then, the P-value is the random variable Equation , which is uniformly distributed under Equation (Sackrowitz and Samuel-Cahn, 1999).

    1.4. Parametric approach

    A clinical statistician may use a sort of technical statements related to the observed data, while constructing the corresponding decision rules. The above-mentioned types of information used for test constructing can induce the technical statements, which oftentimes are called assumptions regarding the distribution of data. The assumptions often define a fit of the data distribution to a functional form that is completely known, or known up to parameters, since a complete knowledge of the distribution of data can provide all the information investigators need for efficient applications of statistical techniques. However, in many scenarios, the assumptions are presumed and very difficult to prove, or to test for being proper. The simple, but widely used, assumptions in biostatistics are that data derived via a clinical study follow one of the commonly used distribution functions: the Normal, LogNormal, t,  Equation , Gamma, F, Binominal, Uniform, Wishart, and Poisson. The data distribution function can be defined up to parameters (Lindsey, 1996). For example, the normal distribution Equation is the famous bell curve, where the parameters Equation and Equation represent the mean and variance of the population from which the data were sampled. The values of the parameters Equation and Equation may be assumed to be unknown. Mostly, in such cases, assumed functional forms of the data distributions are involved in making statistical decision rules via the use of statistics, which we name Parametric Statistics. If certain key assumptions are met, parametric methods can yield very simple, efficient, and powerful inferences.

    1.5. Nonparametric approach

    The statistical literature has widely addressed the issue that parametric methods are often very sensitive to moderate violations of parametric assumptions, and hence nonrobust (Freedman, 2009). The parametric assumptions can be tested in order to reduce the risk of applying a misleading parametric approach. Note that in order to test for parametric assumptions, a goodness of fit test, outlined in a later section of this chapter, can be applied. In this case, statisticians can try to verify the assumptions, while making decisions with respect to main objectives of the clinical study. This leads to very complicated topics, dealt with in multiple testing. For example, it turns out that a computation of the expected risk of making a wrong decision strongly depends on the errors that can be made by not rejecting the parametric assumptions. The complexity of this problem can increase when researchers examine various functional forms to fit the data distribution in order to apply parametric methods. A substantial body of theoretical and experimental literature has discussed the pitfalls of multiple testing, placing blame squarely on the shoulders of the many clinical investigators who examine their data before deciding how to analyze it, or neglect to report the statistical tests that may not have supported their objectives (Austin et al., 2006). In this context, one can present different examples, both hypothetical and actual, to get to the heart of issues that especially arise in the health-related sciences. Note, also, that in many situations, due to the wide variety and complex nature of problematic real data (e.g., incomplete data subject to instrumental limitations of studies), statistical parametric assumptions are hardly satisfied, and their relevant formal tests are complicated or not readily available (Vexler et al., 2015).

    Unfortunately, even clinical investigators trained in statistical methods do not always verify the corresponding parametric assumptions, or attend to probabilistic errors of the corresponding verification, when they use well-known elementary parametric statistical methods, e.g., the t-tests.

    Thus, it is known that when the key assumptions are not met, the parametric approach may be extremely biased and inefficient when compared to its robust nonparametric counterparts. Statistical inference under the nonparametric regime offers decision-making procedures, avoiding or minimizing the use of the assumptions regarding functional forms of the data distributions.

    In general, the balance between parametric and nonparametric approaches can boil down to expected efficiency versus robustness to assumptions. One very important issue is preserving the efficiency of statistical techniques through the use of robust nonparametric likelihood methods, minimizing required assumptions about data distributions (Gibbons and Chakraborti, 2011; Wilcox, 2011).

    1.6. Receiver operating characteristic curve analysis

    The receiver operating characteristic (ROC) curves are useful visualization tools for illustrating the discriminant ability of biomarkers to distinguish between two populations: diseased and nondiseased. The ROC curve methodology was originally developed for radar signal detection theory, and was extensively employed in psychological, and most importantly, medical research and epidemiology (Green and Swets, 1966).

    Assume, without loss of generality, that Equation and Equation are biomarker measurements from the diseased and nondiseased populations, respectively. The observations Equation are independent and identically distributed (say i.i.d.), and independent of i.i.d. measurements Equation . Let Equation and Equation denote the cumulative distribution functions of Equation and Equation , respectively. The ROC curve Equation can be defined as Equation , where Equation (Pepe, 1997). It plots sensitivity (true positive rate, Equation against 1 minus specificity (true negative rate, Equation for various values of the threshold Equation .

    As an example, we consider three biomarkers with their corresponding ROC curves presented in Fig. 1.1, whose underlying distributions are Equation , Equation , for biomarker A (the diagonal line), Equation , Equation , for biomarker B (in a dashed line), and Equation , Equation for biomarker C (in a dotted line), respectively.

    It can be shown that the farther apart the two distributions Equation and Equation fall, the more the ROC curve curves up to the top left corner. A perfect biomarker would have the ROC curve come close to the top left corner, and a biomarker without discriminability would result in a diagonal line in the ROC curve. We also observe that there exists a trade-off between specificity and sensitivity.

    There exists extensive research on estimating the ROC curves from the parametric and nonparametric perspectives (Pepe, 1997; Hsieh and Turnbull, 1996; Wieand et al., 1989).

    Area Under the ROC Curve: A rough idea of the performance of the biomarkers can be obtained with the ROC curve. However, judgments solely based on the ROC curves are far from enough to precisely describe the diagnostic accuracy of biomarkers. The area under the ROC curve (AUC) is a common index of the diagnostic performance of a continuous binary biomarker. It measures the ability to discriminate between the control and the disease groups (Pepe and Thompson, 2000; McIntosh and Pepe, 2002). Bamber noted that the area under this curve is equal to Equation (Bamber, 1975). Values of AUCs can range from 0.5, in the case of no difference between distributions, to 1, where the two distributions are perfectly discriminated. For more details, see Kotz et al. for wide discussions regarding evaluations of the AUC-type objectives (Kotz and Pensky, 2003).

    Figure 1.1  ROC curves related to the biomarkers. The solid diagonal line corresponds to the ROC curve of biomarker A, where Equation and Equation . The dashed line displays the ROC curve of biomarker B, where Equation and Equation . The dotted line close to the upper left corner plots the ROC curve for biomarker C, where Equation and Equation .

    1.7. Remarks

    A wealth of additional applied and theoretical materials related to statistical decision-making procedures may be found in a variety of scientific publications (Rothman et al., 2008; Berger and Delampady, 1987; Gibbons and Chakraborti, 2011; Lindsey, 1996; Freedman, 2009; Wilcox, 2011; Lehmann et al., 2005; Riffenburgh, 2012).

    2. Statistical approaches for problems of biomarker measurements

    2.1. Overview of issues related to biomarkers measurements

    In clinical trials that involve measurements of biomarkers' values, the values supplied are typically estimates and hence subject to measurement error (ME), arising from, e.g., assay or instrument inaccuracies, biological variation, or errors in questionnaire-based self-report data. For instance, it is well-known that the systolic blood pressure (SBP) is measured with error mainly related to strong daily and seasonal variations. In this case, Carroll et al. (1984) suggested that approximately 1/3 of the observed variability is due to ME. In such a circumstance, it makes sense to hypothesize an unbiased additive error model assuming we observe the true value plus an error in each measurement. Subsequently, measurements may be subject to limits of detection (LOD) where values of biospecimens below a detection threshold are undetectable, leading to limitation of the information one can utilize in the analysis. Ignoring the presence of ME and LOD effects in data can result in biased estimates and invalid inference. For example, in a study of polychlorinated biphenyls (PCBs) congeners as potential indicators of endometriosis, a gynecological disease, Perkins et al. (2011) pointed that the biomarker PCB oxi153 is unobservable below 0.2 ng/g serum due to the sensitivity of the measurement process. The authors proved that in the case of disregarding the LOD problem in data, PCB 153 might be discarded as potentially lacking discriminatory ability for endometriosis.

    Different approaches have been suggested to deal with additive ME issues including a study design method based on repeated measurements as well as statistical techniques that utilize the maximum likelihood methodology and Bayesian methods. Under the additive error model assumptions, repeated measurements of biospecimens allow parameter identifiability, making statistical inferences to be adjusted for ME effects (Fuller, 1987). Typically, few if any distributional assumptions are required in the analysis of repeated measurements. In practice, measurement processes based on bioassays can be costly, time-consuming, dangerous, or even unfeasible. To illustrate, the cost of the F2-isoprostane assay, an important biomarker measuring oxidative stress, was about $130 in a BioCycle study of assessing the reproducibility of F2-isoprostane (Malinovsky et al., 2012). It makes the reproducibility assessment an expensive proposition that cannot easily be repeated in practice. Faraggi et al. (2003) focused on the interleukin-6 biomarker of inflammation that has been suggested to present a potential discriminatory ability for myocardial infarction. However, since the cost of a single assay was $74, examination of its usefulness has been hindered. In addition, repeated measurements just give nuisance information regarding ME in general. When the number of replicates or individual biospecimens available is restricted, investigators may not have enough observations to achieve the desired power or efficiency in statistical inferences, e.g., Hasabelnaby et al. (1989). In several situations, the parametric or nonparametric maximum likelihood method can tackle additive ME problems in an optimal concept. The parametric likelihood method requires strong assumptions on data distributions, making the validation difficult to be tested. In addition, the likelihood method is difficult to be applied to discriminate variability of ME, and is often computationally intensive. The Bayesian method (Carroll et al., 2006; Schmid and Rosner, 1993) is an alternative and promising way to address the additive ME. It allows one to incorporate prior information on ME with data and to utilize other sources of information, e.g., from similar studies, to help estimate parameters that are poorly identified by the data alone. However, the prior information is hard to obtain, and specification of prior distribution are sometimes subjective. The possible nonrobustness of inference due to model misspecification is a vexing and difficult problem. Another problem shared by the maximum likelihood method is that computation of Bayes estimators is intensive,

    Enjoying the preview?
    Page 1 of 1