Schedule for: 17w5007 - Latest Advances in the Theory and Applications of Design and Analysis of Experiments

Beginning on Sunday, August 6 and ending Friday August 11, 2017

All times in Banff, Alberta time, MDT (UTC-6).

Sunday, August 6
16:00 - 17:30 Check-in begins at 16:00 on Sunday and is open 24 hours (Front Desk - Professional Development Centre)
17:30 - 19:30 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
20:00 - 22:00 Informal gathering (Corbett Hall Lounge (CH 2110))
Monday, August 7
07:15 - 08:45 Breakfast
Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
09:00 - 09:15 Introduction and Welcome by BIRS Station Manager (TCPL 201)
09:15 - 10:00 Will Welch: Computer Experiments I: Analysis of Computer Experiments: Moving Forward by Looking Back at History
Gaussian processes (GPs) are widely used for analysis of the input-output relationship(s) of a deterministic computer experiment. While there are many tweaks of the basic model, they turn out to be fairly unimportant for prediction accuracy (Chen, Loeppky, Sacks, and Welch, Statistical Science, 2016). In particular, complex input-output functions remain difficult to model with useful accuracy whatever method is used within the usual GP universe of models. The talk will report ongoing work with PhD student Alexi Rodriguez-Arelis to extend the functions that can be usefully modelled. Several ideas going back to Finney, Box, and other statisticians in physical experiments, as well as general principles from science, will be transferred to the realm of computer experiments. A series of examples will demonstrate that moderate improvement in the accuracy from a GP is possible, and sometimes the improvement is dramatic.
(TCPL 201)
10:00 - 10:30 Coffee Break (TCPL Foyer)
10:30 - 11:15 Jeff Wu: Computer Experiments II: Spatial-temporal Kriging, Navier-Stokes, and Combustion Instability
Most “learning” in big data is driven by the data alone. Some people may believe this is sufficient because of the sheer data size. If the physical world is involved, this approach is often insufficient. In this talk I will give a recent study to illustrate how physics and data are used jointly to learn about the “truth” of the physical world. In the quest for advanced propulsion systems, a new design methodology is needed which combines engineering physics, computer simulations and statistical modeling. There are two key challenges: the simulation of high-fidelity spatial-temporal flows (using the Navier-Stokes equations) is computationally expensive, and the analysis and modeling of this data requires physical insights and statistical tools. First, a surrogate model is presented for efficient flow prediction in swirl injectors with varying geometries, devices commonly used in many engineering applications. The novelty lies in incorporating properties of the fluid flow as simplifying model assumptions, which allows for quick emulation in practical turnaround times, and also reveals interesting flow physics which can guide further investigations. Next, a flame transfer function framework is proposed for modeling unsteady heat release in a rocket injector. Such a model is useful not only for analyzing the stability of an injector design, but also identifies key physics which contribute to combustion instability.
(TCPL 201)
11:15 - 12:00 Devon Lin: Computer Experiments III: Recent developments in dynamic computer experiments
Dynamic computer experiments are those with time series outputs. Research on such experiments has been gaining momentum recently. In this talk, we consider two problems of dynamic computer experiments. We propose a computationally efficient modeling approach to build emulators for large-scale dynamic computer experiments. This approach sequentially finds a set of local design points based on a new criterion specifically designed for emulating dynamic computer simulators. Singular value decomposition based Gaussian process models are built with the sequentially chosen local data. To update the models efficiently, an empirical Bayesian approach is introduced. When a target observation is available, estimating the inputs of the computer simulator that produce the matching response as close as possible is known as inverse problem. We propose a new criterion-based estimation method to address the inverse problem of dynamic computer experiments. (Joint work with Ru Zhang and Pritam Ranjan.)
(TCPL 201)
11:30 - 13:00 Lunch (Vistas Dining Room)
13:15 - 14:00 Derek Bingham: Computer Experiments IV: Space-filling experimental designs using sequences of lattices
Experiments on deterministic, large-scale computer simulators for physical systems are commonplace in scientific applications. It is known that a good experimental design (choice of input parameters at which to run the simulation code) should have good space-filling properties (e.g. large packing or small covering radii of the design points) under the conventional Gaussian process for data analysis. In this work, a new method for space-filling experimental design is proposed that is based on a special type of sampling non-Cartesian lattice that contains rotated and scaled copies of the same lattice. For a given experimental region, a lattice construction is developed to build a nested hierarchy of lattice designs. This allows (i) large designs to be constructed from small ones; (ii) optimal batches of experimental trials to be run in sequence; or (iii) designs to be constructed for low and high fidelity simulators. Furthermore, we show how these non-Cartesian lattice based designs can be found for arbitrary run-sizes. The approach is demonstrated on several examples, including the cosmology study of non-linear power spectrum simulation models that motivates this work. (Joint work with Steven Bergner.)
(TCPL 201)
14:00 - 14:20 Group Photo
Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo!
(TCPL Foyer)
14:30 - 15:15 Peter Chien: Computer Experiments V: Computer experiments with complex data
This talk consists of three topics on the design and analysis of computer experiments with complex data. The first topic deals with computer codes with gradients. The gradient-enhanced Gaussian process emulator is widely used to analyze all outputs from a computer model with gradient info. The gradient-enhanced Gaussian process emulator has more numeric problems than in many multivariate cases because of the dependence of the model output and each gradient output. We derive a statistical theory to understand why this problem happens and propose a solution using a data selection approach. The second topic discusses computer models with invariance properties, which appear in materials science, physics and biology. We propose a new statistical framework for building emulators to preserve invariance. The framework uses a weighted complete graph to represent the geometry and introduces a new class of function, called the relabeling symmetric functions, associated with the graph. The effectiveness of the method is illustrated by several examples from materials science. The third topic presents a new class of statistical design inspired by the Samurai Sudoku puzzle. These designs have overlapping components and are useful for cross-validating data or models from multiple sources.
(TCPL 201)
15:15 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 16:15 Werner Müller: (presenting jointly with Radoslav Harman). A design criterion for symmetric model discrimination
Besides optimization and parameter estimation, discrimination between rival models has always been a prime objective of experimental design. A good review of the early developments is given in Hill (1978). A big leap from these rather ad-hoc approaches was Atkinson & Fedorov (1975), who introduced $T$-optimality derived from the likelihood ratio test under the assumption that one model is true and its parameters are fixed at nominal values. Maximization of the noncentrality parameter is equivalent to maximizing the power of the respective likelihood ratio test. When the models are nested, $T$-optimality can be shown to be equivalent to $D_s$-optimality for the parameters that embody the deviations from the smaller model (see eg. Fedorov and Khabarov 1986). For this setting the optimal design questions are essentially solved and everything hinges on the asymmetric nature of the NP-lemma. However, the design problem itself is inherently symmetric as it is usually the purpose of the experiment to determine which of two different models is true, and nestedness of the models is a less common situation. The purpose of the talk is to propose a new criterion, termed $\Delta$-optimality, to solve the discrimination design problem for non-nested non-linear regression models, without having to resort to asymmetry as in the references above. We will suppose that we do not have a prior probability distribution on the unknown parameters of the models, which rules out Bayesian approaches such as Felsenstein (1992). Nevertheless, we will assume a specific kind of prior knowledge about the unknown parameters, extending the approach of local optimality. We will demonstrate methodological and computational advantages of the proposed criterion and illustrate its use in a practical setting from pharmacology.
(TCPL 201)
17:30 - 19:00 Dinner
A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building.
(Vistas Dining Room)
19:00 - 21:00 Informal discussions (CH 2110)
Tuesday, August 8
07:00 - 08:30 Breakfast (Vistas Dining Room)
08:45 - 09:30 Timothy Waite: Robustness I: Minimax efficient random design with application to model-robust design for prediction
We consider response surface problems where it is explicitly acknowledged that a linear model approximation differs from the true mean response by the addition of a discrepancy function. The most realistic approaches to this problem develop optimal designs that are robust to discrepancy functions from an infinite-dimensional class of possible functions. Typically it is assumed that the class of possible discrepancies is defined by a bound on either (i) the maximum absolute value, or (ii) the squared integral, of all possible discrepancy functions. Under assumption (ii), the minimax prediction error criteria fail to select a finite design. This occurs because all finitely supported deterministic designs have the problem that the maximum, over all possible discrepancy functions, of the integrated mean squared error of prediction (IMSEP) is infinite. We demonstrate a new approach in which finite designs are drawn at random from a highly structured distribution, called a designer, of possible designs. If we also average over the random choice of design, then the maximum IMSEP is finite. We develop a class of designers for which the maximum IMSEP is analytically and computationally tractable. Algorithms for the selection of minimax efficient designers are considered, and the inherent bias-variance trade-off is illustrated. (Joint work with Dave Woods.)
(TCPL 201)
09:30 - 10:15 Xiaojian Xu: Robust Design for Generalized Linear Mixed Models with Different Types of Misspecifications
We discuss the effects of maximum likelihood estimation for a generalized linear mixed model (GLMM) when possible departures may appear from its assumed form. The commonly occurring departures involved in a GLMM may include imprecision in the assumed linear predictor, misspecified random effect distribution, or possibly both. We construct D-optimal sequential designs which are robust under consideration of these types of departures. Since the computational work involved in GLMMs can be very intensive, an approximate approach is also proposed. Some comparisons are given through simulations.
(TCPL 201)
10:15 - 10:45 Coffee Break (TCPL Foyer)
10:45 - 11:30 Dave Woods: Robustness III: Closed-loop automatic experimentation for optimisation
Automated experimental systems, involving minimal human intervention, are becoming more popular and common, providing economical and fast data collection. We discuss some statistical issues around the design of experiments and data modelling for such systems. Our application is to “closed-loop” optimisation of chemical processes, where automation of reaction synthesis, chemical analysis and statistical design and modelling increases lab efficiency and allows 24/7 use of equipment. Our approach uses nonparametric regression modelling, specifically Gaussian process regression, to allow flexible and robust modelling of potentially complex relationships between reaction conditions and measured responses. A Bayesian approach is adopted to uncertainty quantification, facilitated through computationally efficient Sequential Monte Carlo algorithms for the approximation of the posterior predictive distribution. We propose a new criterion, Expected Gain in Utility (EGU), for optimisation of a noisy response via fully-sequential design of experiments, and we compare the performance of EGU to extensions of the Expected Improvement criterion, which is popular for optimisation of deterministic functions. We also show how the modelling and design can be adapted to identify, and then down-weight, potentially outlying observations to obtain a more robust analysis. (Joint work with Tim Waite.)
(TCPL 201)
11:30 - 12:15 Julie Zhou: Robustness IV: A- and D-optimal designs based on the second-order least squares estimator
When the error distribution in a regression model is asymmetric, the second-order least squares estimator (SLSE) is more efficient than the least squares estimator. Under the SLSE, optimal regression designs were proposed and studied recently. In this talk, we will discuss the optimality criteria under the SLSE, properties of the optimal designs, and efficient numerical algorithms for finding approximate A- and D-optimal designs on discrete design space.
(TCPL 201)
12:15 - 13:15 Lunch (Vistas Dining Room)
13:30 - 14:15 Lieven Vandenberghe: Semidefinite programming and experiment design
Semidefinite programming (SDP) has important applications in optimization problems that involve moment cones or, by duality, cones of nonnegative polynomials. Examples can be found in statistics, signal processing, control, and non-convex polynomial optimization. The talk will give an introduction to the connections between moment theory and SDP, and discuss SDP algorithms in this context. The focus will be on applications in experiment design.
(TCPL 201)
14:15 - 15:00 Guillaume Sagnol: Distributionally Robust Optimal Designs
The optimal design of experiments for nonlinear (or generalized-linear) models can be formulated as the problem of finding a design $\xi$ maximizing a criterion $\Phi(\xi,\theta)$, where $\theta$ is the unknown quantity of interest that we want to determine. Several strategies have been proposed to deal with the dependency of the optimal design on the unknown parameter $\theta$. Whenever possible, a sequential approach can be applied. Otherwise, Bayesian and Maximin approaches have been proposed. The robust maximin designs maximizes the worst-case of the criterion $\Phi(\xi,\theta)$, when $\theta$ varies in a set $\Theta$. In many cases however, such a design performs well only in a very small subset of the region $\Theta$, so a maximin design might be far away from the optimal design for the true value of the unknown parameter. On the other hand, it has been proposed to assume that a prior for $\theta$ is available, and to minimize the expected value of the criterion with respect to this prior. One objection to this approach is that when a sequential approach is not possible, we rarely have precise distributional information on the unkown parameter $\theta$. In the literature on optimization under uncertainty, the Bayesian and maximin approaches are known as "stochastic programming" and "robust optimization", respectively. A third way, somehow in between the two other paradigms, has received a lot of attention recently. The distributionally robust approach can be seen as a robust counterpart of the Bayesian approach, in which we optimize against the worst-case of all priors belonging to a family of probability distributions. In this talk, we will give equivalence theorems to characterize distributionally-robust optimal (DRO) designs. We will show that DRO-designs can be computed numerically by using semidefinite programming (SDP) or second-order cone programming (SOCP), and we will compare DRO-designs to Bayesian and maximin-optimal designs in simple cases.
(TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 16:15 Rosemary Bailey: A substitute for square lattice designs for 36 treatments.
If there are $r+2$ mutually orthogonal Latin squares of order $n$ then there is a square lattice design for $n^2$ treatments in $r$ replicates of blocks of size $n$. This is optimal, and has all concurrences equal to $0$ or $1$. When $n=6$ there are no Graeco-Latin squares, and so there are no square lattice designs with replication bigger than three. As an accidental byproduct of another piece of work, Peter Cameron and I discovered a resolvable design for $36$ treatments in blocks of size six in up to eight replicates. No concurrence is greater than $2$, the design is partially balanced for an interesting association scheme with four associate classes, and it does well on the A-criterion. I will describe the design, and say something about its properties.
(TCPL 201)
17:30 - 19:00 Dinner (Vistas Dining Room)
19:00 - 21:00 Informal discussions or (free!) chamber music concert
See https://www.banffcentre.ca/events/performance-peter-evans-tyshawn-sorey-sofia-jernberg-tiffany-ayalik-carla-kihlstedt/20170808/1930 for details of the concert
(CH 2110 or Bentley Chamber Music Studio)
Wednesday, August 9
07:00 - 08:30 Breakfast (Vistas Dining Room)
08:45 - 09:15 Rui Hu: New Researchers I: Robust design for the estimation of a threshold probability
We consider the construction of robust sampling designs for the estimation of threshold probabilities in spatial studies. A threshold probability is a probability that the value of a stochastic process at a particular location exceeds a given threshold. We propose designs which estimate a threshold probability efficiently, and also deal with two possible model uncertainties: misspecified regression responses and misspecified variance/covariance structures. The designs minimize a loss function based on the relative mean squared error of the predicted values (i.e., relative to the true values). To this end an asymptotic approximation of the loss function is derived. To address the uncertainty of the variance/covariance structures of this process, we average this loss over all such structures in a neighbourhood of the experimenter's nominal choice. We then maximize this averaged loss over a neighbourhood of the experimenter's fitted model. Finally the maximum is minimized, to obtain a minimax design.
(TCPL 201)
09:15 - 09:45 Kirsten Schorning: New Researchers II: Optimal designs for dose response curves with common parameters
A common problem in Phase II clinical trials is the comparison of dose response curves corresponding to different treatment groups. If the effect of the dose level is described by parametric regression models and the treatments differ in the administration frequency (but not in the sort of drug) a reasonable assumption is that the regression models for the different treatments share common parameters. During the talk we develop optimal design theory for the comparison of different regression models with common parameters. We derive upper bounds on the number of support points of admissible designs, and explicit expressions for D-optimal designs for frequently used dose response models with a common location parameter. If the location and scale parameter in the different models coincide, the problem becomes much harder and therefore we determine minimally supported designs and sufficient conditions for their optimality in the class of all designs.
(TCPL 201)
09:45 - 10:15 Maryna Prus: New Researchers III: Optimal designs for individual prediction in multiple group random coefficient regression models
Random coefficient regression (RCR) models are popular in many fields of statistical application; especially in biosciences and medical research. In these models observational units (individuals) are assumed to come from the same population with an unknown population mean and differ from each other by individual random parameters. Besides the estimation of the population mean parameter, the prediction of the individual response is often of prior interest. In the particular case of multiple group RCR models individuals in different groups get different kinds of treatment. If group sizes are fixed and the unknown mean parameters may differ from group to group, statistical analysis can be performed in each group separately (see Prus, M.: Optimal Designs for the Prediction in Hierarchical Random Coefficient Regression Models. Ph.D. thesis, Otto-von-Guericke University Magdeburg (2015).). This talk presents analytical results for optimal group sizes for the prediction of the individual parameters in multi group RCR models with a common population mean for all individuals across all groups.
(TCPL 201)
10:15 - 10:45 Coffee Break (TCPL Foyer) (TCPL 201)
10:45 - 11:15 Yu Shi: New Researchers IV: Sparse grid hybridized PSO for finding Bayesian optimal designs
Finding Bayesian optimal designs for a nonlinear model is generally a difficult task, especially when there are several factors in the study. Such optimal designs are often analytically intractable and require expensive effort to compute them. A main problem is the unknown number of support points required for the optimal design. The often used Monte Carlo approximations require very large samples from the parameter space to have reasonable accuracy and are thus unrealistic for moderate to high dimensional models. In this talk, we propose an effective and assumptions free approach for finding numerical Bayesian optimal designs with a few real applications to longitudinal models in HIV studies. Our algorithm hybridizes particle swarm optimization and sparse grids methodology (PSOSG) and we demonstrate its potential for finding Bayesian optimal designs for a variety of models using user-specified prior distributions. The optimality of all our generated designs is verified using an equivalence theorem and if time permits, we also discuss applications of PSOSG to find other types of optimal designs.
(TCPL 201)
11:15 - 12:00 Seongho Kim: Statistical modeling and applications of particle swarm optimization
Particle swarm optimization (PSO) is a population-based global optimization and evolves a group of solutions stochastically, which was motivated by the behavior of a flock of birds or school of fish in nature. PSO is used to solve a wide array of different optimization problems because of its attractive advantages, such as the ease of implementation and its gradient free stochastic algorithm. It has been proved to be an efficient method for many global optimization problems, and not suffering from the difficulties encountered by other evolutionary computation techniques. In this talk, we will review PSO and its variants and then discuss several applications for complicated statistical models. In particular, we will discuss on computational issues of PSO in practice and show how to use it for pharmacokinetics/pharmacodynamics (PK/PD) modeling, including our recent applications to adaptive Phase II clinical trial designs.
(TCPL 201)
12:00 - 13:00 Lunch (Vistas Dining Room)
13:00 - 17:30 Free Afternoon (Banff National Park)
17:30 - 18:45 Dinner (Vistas Dining Room)
19:00 - 21:00 Reflections on the future of design of experiments
This special session will feature presentations by Joachim Kunert, Dennis Lin, John Stufken and Boxin Tang, with Rainer Schwabe as moderator and discussant. These will be followed by a group discussion in which everyone is invited to join. Joachim plans to emphasize the continuing role of the concepts of blinding, randomization and permutation tests. Dennis plans to share some of his personal views on the evolution of DOE since RA Fisher, and to then talk about some of his recent work on Design for Order-of-Addition. John asks general questions about our relevance as a group, with suggestions for meeting current challenges. Boxin will take a look at some possible future developments in computer experiments, factorial designs and beyond, through the lenses of optimality, robustness and fusion.
(TCPL 201)
Thursday, August 10
07:00 - 08:30 Breakfast (Vistas Dining Room)
08:45 - 09:30 Mong-Na Lo Huang: Optimal group testing designs for estimating prevalence with uncertain testing errors
We construct optimal designs for group testing experiments where the goal is to estimate the prevalence of a trait using a test with uncertain sensitivity and specificity. Using optimal design theory for approximate designs, we show that the most efficient design for simultaneously estimating the prevalence, sensitivity, and specificity requires three different group sizes with equal frequencies. However, if estimating prevalence as accurately as possible is the only focus, the optimal strategy is to have three group sizes with unequal frequencies. Based on a Chlamydia study in the United States, we compare performances of competing designs and provide insights into how the unknown sensitivity and specificity of the test affect the performance of the prevalence estimator. We demonstrate that the proposed locally D- and Ds-optimal designs have high efficiencies even when the prespecified values of the parameters are moderately misspecified. Extensions on budget-constrained optimal group testing designs will also be discussed, where both subjects and tests incur costs, and assays have uncertain sensitivity and specificity that may be linked to the group sizes.
(TCPL 201)
09:30 - 10:15 France Mentré: Using Hamiltonian Monte-Carlo to design longitudinal count studies accounting for parameter and model uncertainties
To design longitudinal studies with nonlinear mixed effect models (NLMEM), optimal design based on the expected Fisher information matrix (FIM) can be used. A new method evaluating the FIM based on Monte-Carlo Hamiltonian Monte-Carlo (MC/HMC) was developed and implemented in the R package MIXFIM using Stan for HMC sampling. This approach requires a priori knowledge on models and parameters, leading to locally optimal designs. The objective of this work was to extend this MC/HMC-based method to evaluate the FIM in NLMEM accounting for uncertainty in parameters and in models. We showed an illustration of this approach to optimize robust designs for repeated count data. When introducing uncertainty on the population parameters, we evaluated the robust FIM as the expectation of the FIM computed by MC/HMC on these parameters. Then, the compound D-optimality criterion was used to find a common CD-optimal design for several candidate models. A compound DE-criterion combining the determinants of the robust FIMs was also calculated to find the CDE-optimal design which was robust with respect to both model and parameters. These methods were applied in a longitudinal Poisson count model which event rate parameter (𝛌) is function of the dose level. We assumed a log-normal a priori distribution characterizing the uncertainty on the parameter values, and several candidate models describing the relationship between log(𝛌) and the dose level (linear, log-linear, Imax, full Imax, or quadratic functions). We performed combinatorial optimization of 2 among 10 doses between 0.1 and 1, corresponding to 45 possible elementary designs. Finally, accounting or not for uncertainty on parameters does not have a striking impact on the allocation of optimal doses in this study. However, misspecification of models could lead to low D-efficiencies. The CD- or CDE-optimal designs provided then a good compromise for different candidate models.
(TCPL 201)
10:15 - 10:45 Coffee Break (TCPL Foyer)
10:45 - 11:30 Anatoly Zhigljavsky: Optimal design in regression models with correlated observations: an overview
We consider the problem of construction of optimal experimental designs for linear regression models with correlated observations. In the first part of the talk, we assume that the ordinary least square estimator is used. The approach of Bickel-Herzberg is reviewed and several generalization of this approach are discussed. Then we consider the problem of construction of BLUE and some of its discrete approximations. In this case, the problem of construction of asymptotically optimal experimental designs is much more evolved. The celebrated Sacks-Ylvisaker approach is reviewed and several approximate methods of construction of optimal estimators and designs are discussed. (Jjoint work with Holger Dette and Andrey Pepelyshev.)
(TCPL 201)
11:30 - 12:15 Min Yang: On Data Reduction of Big Data
Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in Big Data analysis is data reduction. In this presentation, I will review some existing approaches in data reduction and introduce a new strategy called information-based optimal subdata selection (IBOSS). Under linear and nonlinear models set up, theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to other approaches in term of parameter estimation and predictive performance. The tradeoff between accuracy and computation cost is also investigated. When models are mis-specified, the performance of different data reduction methods are compared through simulation studies. Some ongoing research work as well as some open questions will also be discussed.
(TCPL 201)
12:15 - 13:15 Lunch (Vistas Dining Room)
13:30 - 14:15 Stefanie Biedermann: Optimal design when outcome values may be missing
The presence of missing response values complicates statistical analyses. However, incomplete data are particularly problematic when constructing optimal designs, as it is not known at the design stage which values will be missing. When data are missing at random (MAR) it is possible to incorporate this information into the optimality criterion that is used to find designs. However, when data are not missing at random (NMAR) such a framework can lead to inefficient designs. We first investigate an issue common to all missing data mechanisms: The covariance matrix of the estimators does not exist, so it is not clear how well the inverse of the information matrix will approximate the observed covariance matrix. To this end, we propose and study a new approximation to the observed covariance matrix for situations where the missing data mechanisms is MAR. We then address the specific challenges that NMAR values present when finding optimal designs for linear regression models. We show that the optimality criteria will depend on model parameters that traditionally do not affect the design, such as regression coefficients and the residual variance. We also develop a framework that improves efficiency of designs over those found assuming values are MAR.
(TCPL 201)
14:15 - 15:00 Jesus Lopez-Fidalgo: Optimal designs for longitudinal studies with fractional polynomial models
Fractional polynomials (FP) have been shown to be much more flexible than polynomials for fitting continuous outcomes in the biological and health sciences. Despite their increasing popularity, design issues for FP models have never been addressed. D- and I-optimal experimental designs will be computed for prediction using FP models. Their properties will be evaluated and a catalogue of design points useful for FP models will be provided. As applications, we consider linear mixed effects models for longitudinal studies. To provide greater flexibility in modeling the shape of the response, we use fractional polynomials and not polynomials to approximate the mean response. An example using gene expression data will be considered comparing the designs used in practice. An additional an interesting problem is finding designs for effective model discrimination for FP models. This will be explored from the KL-optimality point of view.
(TCPL 201)
15:00 - 15:30 Coffee Break (TCPL Foyer)
15:30 - 16:15 Steven Gilmour: Bayesian optimal designs for fitting fractional polynomial response surface models
Fractional polynomial models are potentially useful for response surface investigations. With the availability of routines for fitting nonlinear models in statistical packages they are increasingly being used. However as in all experiments the design should be chosen such that the model parameters are estimated as efficiently as possible. The design choice for such models involves the usual difficulties of nonlinear models design. We find Bayesian optimal exact designs for several fractional factorial models. The optimum designs are compared to various standard designs in response surface problems. Some unusual problems in the choices of prior and optimization method will be noted. (Joint work with Luzia Trinca.)
(TCPL 201)
16:15 - 17:00 Selin Damla Ahipasaoglu: A mathematical programming look at optimal experimental design problems
Skype presentation: Using a mathematical programming lens, we will look at a parametric family of optimal design problems in depth. Two popular problems, the approximate D- and A- optimal design problems, will be highlighted as two special members and some of the results will be customised for them. First, we will present mathematical formulations, derive their duals and obtain optimality conditions. Second, we will survey different types of algorithms that are in use and discuss their convergence properties. Third, we will provide a branch and bound framework together with techniques to generate lower and upper bounds to obtain exact designs. Along the way, we will be identifying immediate research questions, as well as theoretical and computational challenges awaiting to be tackled.
(TCPL 201)
17:30 - 19:00 Dinner (Vistas Dining Room)
19:00 - 21:00 Informal discussions or (free!) jazz concert (CH 2110 or The Club, Theatre Complex)
Friday, August 11
07:00 - 08:30 Breakfast (Vistas Dining Room)
08:45 - 09:30 Luc Pronzato: On the construction of minimax-distance (sub-)optimal designs
A good experimental design in a non-parametric framework, such as Gaussian process modelling in computer experiments, should have satisfactory space-filling properties. Minimax-distance designs minimize the maximum distance between a point of the region of interest and its closest design point, and thus have attractive properties in this context. However, their construction is difficult, even in moderate dimension, and one should in general be satisfied with a design that is not too strongly suboptimal. Several methods based on a discretization of the experimental region will be considered, such as the determination of Chebyshev-centroidal Voronoi tessellations obtained from fixed-point iterations of Lloyds' method, and the construction of any-time (nested) suboptimal solutions by greedy algorithms applied to submodular surrogates of the minimax-distance criterion. The construction of design measures that minimize a regularized version of the criterion will also be investigated.
(TCPL 201)
09:30 - 10:15 Henry Wynn: Optimal experimental design that minimizes the width of simultaneous confidence bands
We propose an optimal experimental design for a curvilinear regression model that minimizes the band-width of simultaneous confidence bands. Simultaneous confidence bands for nonlinear regression are constructed by evaluating the volume of a tube about a curve that is defined as a trajectory of a regression basis vector (Naiman, 1986). The proposed criterion is constructed based on the volume of a tube, and the corresponding optimal design is referred to as the minimum-volume optimal design. For Fourier and weighted polynomial regressions, the problem is formalized as one of minimization over the cone of Hankel positive definite matrices, and the criterion to minimize is expressed as an elliptic integral. We show that the Möbius group keeps our problem invariant, and hence, minimization can be conducted over cross-sections of orbits. We demonstrate that for the weighted polynomial regression and the Fourier regression with three bases, the minimum-volume optimal design forms an orbit of the Möbius group containing D-optimal designs as representative elements. (Joint work with Satoshi Kuricki.)
(TCPL 201)
10:15 - 10:45 Coffee Break (TCPL Foyer)
10:45 - 11:30 Checkout by Noon
5-day workshop participants are welcome to use BIRS facilities (BIRS Coffee Lounge, TCPL and Reading Room) until 3 pm on Friday, although participants are still required to checkout of the guest rooms by 12 noon.
(Front Desk - Professional Development Centre)
11:30 - 13:30 Lunch (Vistas Dining Room)