NZSA Visiting Lecturers – New Zealand Statistical Association

The New Zealand Statistical Association coordinates and provides some financial support for a tour of New Zealand universities by a distinguished overseas statistician. Normally the funding covers domestic travel within New Zealand, while host institutions cover local costs.

Usually this person, known as the NZSA Visiting Lecturer, will spend two to three days at each of the six main university centres, and give at least two lectures at each place: one for a general audience, and one more closely tied to their own particular research interests.

Past NZSA Visiting Lecturers were:

Prof Ingram Olkin	2010
Prof Ray Chambers	2008
Prof C.R. Rao	2005
Prof Richard Tweedie	2001

Ingram Olkin, NZSA Visiting Lecturer 2010

We were delighted that Professor Ingram Olkin (Stanford University) was the NZSA Visiting Lecturer for 2010. His visit was associated with our conference and the joint International Conference on Statistical Methodologies and Related Topics celebrating the contribution of Chin-Diew Lai.

Dr Olkin was an icon in the world statistical community, having been active for over 60 years. He was a member of many professional societies, received many honours and awards, held many editorial positions, and delivered numerous invited addresses all over the world. Ingram coauthored 7 books, edited 10 books, and contributed 220 journal papers. His joint paper with Albert Marshall “A multivariate exponential distribution” was cited in over 600 articles – a testament to high calibre research.

Dr Olkin’s work aimed at ensuring that educators select the proper statistical tools for measuring the outcomes of their programs and methods, and that their interpretation of the results is similarly rigorous. His research included the development of powerful new statistical methods for combining results from independent studies that have analysed the same topic. Meta-analysis is assisting researchers to reconsider long-standing educational problems with a fresh critical eye.

Dr Olkin was a Guggenheim, Fulbright, and Lady Davis Fellow, with an honorary Doctorate from De Montfort University. He received his BS in mathematics at the City College of New York, his MA from Columbia University, and his PhD from the University of North Carolina. Dr Olkin’s research interests included analysis of social and behavioural models; multivariate statistical analysis; correlational and regression models in educational processes and meta-analysis.

Ray Chambers, NZSA Visiting Lecturer 2008

Ray Chambers is Professor of Statistical Methodology at University of Wollongong and has extensive research interests in the design and analysis of sample surveys, official statistics methodology, robust methods for statistical inference and analysis of data with group structure.

Statistics New Zealand hosted Ray Chambers’ visit to New Zealand, and the New Zealand Statistical Association, through its Visiting Lectureship, enabled Ray to visit other New Zealand centres.

A brief preamble to Ray Chambers’ visit was given in Newsletter 67.

Talk Abstracts

Robust Prediction of Small Area Means and Distributions

Small area estimation techniques typically rely on mixed models containing random area effects to characterise between area variability. In contrast, the M-quantile approach to small area estimation avoids conventional Gaussian assumptions and problems associated with specification of random effects and uses M-quantile regression models to characterise small area effects. In this talk I will describe a general framework for robust small area prediction that is based on representing a small area estimator as a functional of a predictor of the within area distribution of the target variable, and is applicable under either a mixed model approach or a M-quantile approach. The usefulness of this framework will be demonstrated through< both model-based as well as design-based simulation, with the latter based on two realistic survey data sets containing small area information. An application to predicting key percentiles of district level distributions of per-capita household consumption expenditure in Albania in 2002 will be described.

Small Area Estimation Via M-quantile Geographically Weighted Regression

Spatially correlated data arise in many situations. When these data are used for small area estimation, a popular approach is to characterise the small area effects using a Simultaneous Autoregressive Regression model. An alternative approach incorporates the spatial information via Geographically Weighted Regression (GWR). In this talk I will describe how the M-quantile approach to small area estimation can be extended to situations where GWR is preferable. An important spin-off from this approach is more efficient synthetic estimation for out of sample areas. The usefulness of this framework will be demonstrated through model-based as well as design-based simulation. An application to predicting average Acid Neutralizing Capacity at 8-digit Hydrologic Unit Code level in the Northeast states of the USA will also be described.

Small Area Estimation Under Transformation To Linearity

Small area estimation based on linear mixed models can be inefficient when the underlying relationships are non-linear. In this talk I will describe small area estimation techniques for variables that can be modelled linearly following a non-linear transformation. In particular, I will show how so-called model-based direct estimation can be used with data that are consistent with a linear mixed model in the logarithmic scale, provided estimation weights are derived using model calibration. Simulation results will be presented which show that this transformation-based estimator is both efficient and robust with respect to the distribution of the random effects in the linear mixed model. An application to business survey data will also be discussed.

Robust Mean Squared Error Estimation for Linear Predictors for Domains

A crucial aspect of small area estimation is estimation of the mean squared error of the resulting small area estimators. In this talk I will discuss robust mean squared error estimation for linear predictors of finite population domain means. The approach that will be taken represents an extension of the well known ‘sandwich’ type variance estimator used in population level sample survey inference, and appears to lead to a mean squared error estimator that is simpler to implement, and potentially more robust, than alternatives suggested in the small area literature. The usefulness of this approach will be demonstrated through both model-based as well as design-based simulation, with the latter based on two realistic survey data sets containing small area information.

Measurement Error in Auxiliary Information

Auxiliary information is information about the target population of a sample survey over and above that contained in the actual data obtained from the sampled population units. The availability of this type of information represents a key distinction between sample survey inference and more mainstream inference scenarios. In particular, modern methods of sampling inference (both model-assisted as well as model-based) depend on the availability of auxiliary information to improve efficiency in survey estimation. However, such information is not always of high quality, and typically contains errors. In this talk I focus on some survey-based situations where auxiliary information is crucial, but where this information is not precise. Estimation methods that allow for this imprecision will be described. In doing so I will not only address the types of inference of concern to sampling statisticians (e.g. prediction of population quantities), but also inference for parameters of statistical models for surveyed populations.

Maximum Likelihood With Auxiliary Information

In this talk I use a general framework for maximum likelihood estimation with complex survey data to develop methods for efficiently incorporating external population information into linear and logistic regression models fitted via sample survey data. In particular, saddlepoint and smearing methods will be used to derive highly accurate approximations to the score and information functions defined by the model parameters under random sampling and under case-control sampling when auxiliary data on population moments are available. Simulation-based results illustrating the resulting gains in efficiency will also be discussed.

Analysis of Probability-Linked Data

Over the last 25 years, advances in information technology have led to the creation of linked individual level databases containing vast amounts of information relevant to research in health, epidemiology, economics, demography, sociology and many other scientific areas. In many cases this linking is not perfect but can be modelled as the outcome of a stochastic process, with a non-zero probability that a unit record in the linked database is actually based on data drawn from distinct individuals. The impact of the resulting linkage errors on analysis of data extracted from such a source is only slowly being appreciated. In this talk I will describe a framework for statistical analysis of such probability-linked data. Applications to linear and logistic regression modelling of this type of data will be discussed.

Estimation of the Finite Population Distribution Function

Although most survey outputs consist of estimates of means and totals, there are important situations where the primary focus is estimation of the finite population distribution function, defined as the proportion of population units with values less than or equal to the argument of this function. In this talk I will describe design-based, model-assisted and model-based methods for predicting a finite population distribution function, focussing on a situation where the underlying regression relationship is non-linear. An application to estimation of the distribution of hourly pay rates will be discussed in some detail.

Maximum Likelihood under Informative Sampling

Loosely speaking, a sampling method is informative if the random variable corresponding to the outcome of the sampling process and the random variable corresponding to the response that is of interest are correlated in some way. An examples of informative sampling is size-biased sampling. In this talk I describe a general framework for likelihood-based inference with sample survey data, including data collected via informative sampling. Some simple examples will then be used to contrast maximum likelihood estimation within this framework with alternative likelihood-based approaches that have been suggested for data collected under informative sampling.

C.R. Rao, NZSA Visiting Lecturer 2005

A brief preamble to C.R. Rao’s visit is given in Newsletter 60.

C.R. Rao was designated as a Massey University Distinguished Visitor. He was the Keynote speaker on the first day of the 2005 International Workshop in Matrices Statistics.

C.R. Rao also presented at the WCAS Workshop on 22 March 2005.

Talk Abstracts

Cross Examination of Data

Abstract: Data obtained from historical records, designed experiments and sample surveys are not usually in a form where routine statistical methods can be employed and inferences drawn. There may be recording errors and missing observations. The data may be faked and contaminated with irrelevant data. Usually the stochastic model generating the data, essential for data analysis, is not known. The actual procedure planned for the collection of data might not have been strictly followed. Inferential analysis of data without examining these issues might lead to wrong conclusions.

The first task of a statistician is what R.A. Fisher emphasized to cross examine the data (CED), which is to look for deficiencies in data of the type mentioned above. Some questions could be answered by questioning those who collected the data, but statisticians must have the appropriate tools to elicit the answers from the data itself. This process is described by Tukey as exploratory data analysis (EDA), and by Mahalanobis as scrutiny of data (SOD). To some extent such preliminary analysis of data is an art, but much of it could be codified.

Statistics: The science, technology and art of creating new knowledge

Abstract: Practice of statistics today extends to the whole gamut of natural and social sciences, engineering and technology, management and economic affairs, as well as arts and literature. Statistics is being applied virtually to every field to make new discoveries and breakthroughs.

There are different concepts of knowledge: true knowledge as conceived by philosophers, mathematical knowledge deduced from given axioms, scientific knowledge as embodied in scientific theories and empirical knowledge with a specified amount of uncertainty inferred from observed data. It is the last type of knowledge which enables us to take optimal decisions if an action is necessary.

Some examples of questions that have been resolved by statistics will be given. Who wrote the poem discovered in a library without any record of authorship, Shakespeare or a contemporary poet. Did Shakespeare have ghost writers? Is the expression of a gene the same in a normal person and a cancer patient? Are goods produced by a machine according to specification? Is the second born child more intelligent than the first?

Statistics: Reflections on the past and visions for the future

Abstract: Statistics is not a basic discipline like mathematics, physics, chemistry or biology each of which has a subject matter of its own on which new knowledge is built. Statistics is more a method of solving problems and creating new knowledge in other areas. Statistics is used in diverse fields such as scientific research, legal practice, medical diagnosis, economic development and optimum decision making at individual and institutional levels.

What is the future of statistics in the 21st century which is dominated by information technology encompassing the whole of communications, interaction with intelligent systems, massive data bases, and complex information processing networks? The current statistical methodology based on probabilistic models applied on small data sets appears to be inadequate to solve new problems arising in emerging areas of science, technology and policy making. Ad hoc methods are being put forward under the title Data Mining by computer scientists and engineers to meet the demands of customers. The talk will focus on a critical review of current methods of statistics and future developments based on large data sets and enormous computing power and efficient optimization techniques.

Statistical proofs of matrix theorems

Abstract: Matrix algebra is extensively used in the study of linear models, multivariate analysis and optimization problems. It is interesting to note that the matrix results needed to prove statistical propositions can themselves be deduced using some statistical results which can be derived without using matrix algebra. The results are based on Fisher information and its properties which can be established without using matrix results.

Richard Tweedie, NZSA Visiting Lecturer 2001

Professor Richard Tweedie of the University of Minnesota was the first New Zealand Statistical Association Visiting Lecturer.

As NZSA Visiting Lecturer Professor Tweedie visited and presented lectures at the Victoria University of Wellington, Otago University, University of Canterbury, University of Auckland, and Massey University (at Albany). He also presented a keynote address at the Symposium to Honour Professor David Vere-Jones.

Professor Tweedie was Head of the Division of Biostatistics, in the School of Public Health at the University of Minnesota. His research interests were in the theory and application of Markov chains, and biostatistics including especially meta-analysis and stochastic modeling. He published over 130 scientific papers, an acclaimed book (Markov Chains and Stochastic Stability), and had very extensive experience as a statistical consultant. He was the editor of Statistical Science.

The titles of Professor Tweedie’s lectures were:

Meta-analysis – Potentials, Problems and Pitfalls

Perfect Simulation for MCMC and Markov Chains

Meta-analysis – Potentials, Problems and Pitfalls