NZSA 2023 Conference

Abstracts

## Contents

A Bayesian generalized additive mixed model under monotone constraints (Lizbeth Naranjo Albarrán)

Adapting the Linear Probability Model to GLMMs (Peter Green)

Addressing falling response rates in social surveys (Matthew Hendtlass )

Analysis of well-being and ill-being on campus (Ivy Liu)

Assessing the Impact of COVID-19 on S&P 500 Industry Volatilities: A Novel Clustering Approach (Jorge Caiado)

Background rates of 13 adverse events of special interest before and during COVID-19 pandemic: a multinational Global Vaccine Data Network analysis in New Zealand and other countries (Han Lu)

Bayesian free-knot splines. (Matt Edwards)

Bayesian small domain estimation of the New Zealand population from administrative data (Andrew Martin)

Building ’Disclosure Risk Calculator’: A Case Study in R and TypeScript Integration (Tom Elliott)

Data requirements and challenges of observational urban data for the causal analysis (Pooja Baburaj)

Deep Mixture Models for Understanding Latent Space Representations and Clustering Tasks (Mashfiqul Huq Chowdhury)

Diffusive Nested Sampling in Action (Brendon Brewer)

Emergency Department versus cardiology management of low risk chest pain patients. (Eleanor Dunn)

Estimating contaminant loads from high frequency - how useful is the extra data? (Alasdair Noble)

Estimating genotype by environment interactions using genomic data: An application to smallholder dairy farms in India (Roy Costilla)

Estimating Human Mobility - Computing Commute Graph Across Aotearoa (Simon Urbanek)

Finding cycles in Covid-19 time series of daily counts (Miaotian Li)

Forecasting New Zealand electricity price using regime-switching and extreme value theory models (Nuttanan Wichitaksorn)

From LAD (least absolute deviation) to DL (dictionary learning) (Ciprian Doru Giurcaneanu)

Generative Models for Core-Collapse Supernovae (Tarin Eccleston)

How does the quality of a survey frame affect achieve survey response targets? (Fareeda Begum )

Improving admin address assignment using a machine learning model (Katie Simpson )

Inferring the kinematics of Globular Cluster Populations for NGC 1052-DF2 and NGC 1052-DF4 Galaxies (Cher Li)

Introduction to the International Visitor Survey (Scott Guo )

Measuring indigenous outcomes and inequity– is a different approach to age-standardisation needed? (Tori Diamond)

Modelling the spatial distribution of fertilizer spreading (David Baird)

Navigating two worlds: Innovations in healthcare monitoring and fisheries modelling (Nokuthaba Sibanda)

Pairwise Differences Covariance (PDC) Estimation in Principal Component Analysis when n¡p (Nuwan Weeraratne)

Parallel Queues with Time Delay (Angeline Xiao)

Partial Ordered Stereotype Model Development of a New Model (Laia Egea Cortés)

Particle-based Variational Bayes: Towards Scalable and Accurate Bayesian Computation (Minh-Ngoc Tran)

Prevalence estimation from sparse data when the outcome and covariates are sometimes missing (Patrick Graham)

Quantiles on global non-positive curvature spaces (Ha-Young Shin)

Respiratory Health of Pacific Youth: Nutrition Resilience and Risk in Childhood (Siwei Zhai)

Results from the 10-year traumatic brain injury study (Priya Parmar)

Revealing and characterising anomalous spatio-temporal patterns in Hikurangi Subduction Zone seismicity (Jessica Allen)

SDMX Standards and working with SDMX through data publishing tools (Sam Cleland)

Shrinkage estimators of the spatial relative risk function (Martin Hazelton)

TBA (Fabian Dunker)

Prediction of usual residence and its use for admin enumerations in the 2023 Census (Stephen Merry )

Collaboration with Iwi MFlorian Flueggen ) ori on no response mitigations for the 2023 Census (

TBA, topic Statistical Imputation using CANCEIS for filling gaps in Census 2023 records (Andre Macleod Hungar )

Test of clustering for Neyman-Scott processes (Bethany Macdonald)

The Curious Case of CAR Correlations (Tilman Davies)

The geometry of diet: using projections to quantify the similarity between sets of dietary patterns (Beatrix Jones)

The Markov Chains Tool - an interactive tool (Heti Afimeimounga)

Trends in Statistical Methods in Medical Studies (2000 – 2023) (Deborah Kakis)

Understanding quality change in price indexes (Frances Krsinich)

Using Convolutional Autoencoders for Signal Detection of Extreme Mass Ratio Inspirals Detected by the LISA Mission (Amin Boumerdassi)

Using Linear Assignments in Spatial Sampling (Blair Robertson)

Variance estimation for network meta-analysis (Hans-Peter Piepho)

When are we going to die? A Bayesian latent variable approach to modelling Australian mortality data from January 2015 (John Holmes)

Abstracts

A Bayesian generalized additive mixed model under monotone constraints

Monday

03:30 PM

A2

Lizbeth Naranjo Albarrán

Universidad Nacional Autónoma de México - National Autonomous University of Mexico

In many practical situations involving longitudinal data, the response variable of interest is required to obey specific shape constraints or monotone patterns, for example when patients’ health is expected to naturally deplete over time. This problem has been widely addressed under ordinary advanced modelling frameworks. In this talk we present a novel class of monotonic generalized additive model (MGAM) with mixed components for longitudinal data to address this problem. By adopting a Bayesian approach, specific monotonic conditions on the response are enforced via smoothers as well as penalized splines to account for increasing or decreasing patterns. Some examples in the field of health science are discussed, noting the computational advantages and potential pitfalls. _________________________________________________________________________________

All going to plan I will be presenting a new method for working with binomial GLMMs where some factor combinations require probability estimates of zero or one. If not will be presenting on why this method does not work and possible alternatives.____

In survey statistics, there is an international trend of increasing costs and dropping response rates that reflects both societal changes and resistance to an increased volume in surveying. This is putting strain on National Statistics Offices (NSOs) around the world, including Stats NZ, and has spurred significant work on how to collect designed-data more efficiently and how to make more use of administrative and alternative data. In this talk we will briefly describe the problem and some of the solutions being investigated by NSOs. Addressing these issues will, in particular, likely require a fundamental shift to our survey design and estimation methodology. ____________________________________________________________________

Analysis of well-being and ill-being on campus

Wednesday

11:10 AM

A3

Ivy Liu

Victoria University of Wellington

The purpose of our study is to highlight the value of recognising the simultaneous presence of both well-being and ill-being within individual students (the dual-continua model) as opposed to positioning their well-being and ill-being at opposite ends of a single continuum (the bi-polar model). We empirically derive six clusters of well-being and ill-being using latent profile analysis and show how stochastic membership of these clusters of well-being and ill-being vary with the student’s self-assessed physical and financial health. We estimate the degree to which ill-being fell and well-being rose within each cluster when the students physical health improved and/or when they gained greater control over their financial commitments. The temporal results further underscore the value of the dual-continua model in describing the mental health of students across campus.__________________________________________________________________________________________________

Assessing the Impact of COVID-19 on S&P 500 Industry Volatilities: A Novel Clustering Approach

Monday

03:50 PM

A1

Jorge Caiado

ISEG Lisbon School of Economics and Management, University of Lisbon

Our research investigates the impact of the COVID-19 pandemic on the conditional volatilities of S&P 500 industries by employing an innovative feature-based clustering technique within a fitted TGARCH model. Instead of relying on estimated model parameters for stock index distance calculations, we propose using autocorrelations of the estimated conditional volatilities as the basis for distance metrics. This study employs both hierarchical and non-hierarchical algorithms to categorize industries into clusters, revealing distinct shifts in cluster compositions from the pre-COVID-19 era to the pandemic period. ________________________________________________________________

Background rates of 13 adverse events of special interest before and during COVID-19 pandemic: a multinational Global Vaccine Data Network analysis in New Zealand and other countries

Monday

04:30 PM

A1

Han Lu

University of Auckland

(Please note that I have submitted the abstract to the NZSA travel award before the deadline, I am not sure whether that counts) Background: Global Vaccine Data Network (GVDN) is a multinational research network focused on vaccine safety and effectiveness using health data from diverse regions worldwide. This study aimed to estimate the background rates of thirteen adverse event of special interest (AESI) for COVID-19 vaccines during the pre-pandemic (2015-2019) and pandemic (2020) periods across twelve GVDN sites to facilitate the observed versus expected analyses of the AESIs following vaccination. Additionally, New Zealand conducted supplementary analyses, exploring the background rates in different ethnic groups and socioeconomic deprived areas. Methods: All GVDN global sites followed the same protocol to obtain the incident counts for the AESIs, using national or regional healthcare databases with primary and/or secondary diagnoses. Background incidence rate (IR) were calculated for each year and study period, stratified by sex-age and patient type including emergency department, hospital inpatient and outpatient, and primary care patients based on each site’s data availability. Exact 95__________________________________________________________________________________________________________

_________________________________________________________________________________

Bayesian small domain estimation of the New Zealand population from administrative data

Tuesday

01:40 PM

A1

Andrew Martin

Stats NZ

New Zealand has run a census every five years since 1877 with a few exceptions. With a full enumeration census becoming increasingly expensive, response rates falling and natural disasters impacting data collection, Stats NZ has been researching alternative census models based primarily on linked administrative data, supported by surveys. One objective of this research is to produce high quality, frequent and timely small domain/area population estimates from administrative data. Administrative data will have undercoverage with respect to the target population being estimated and linked administrative data is prone to linkage errors. We have developed a Bayesian capture-recapture method to estimate the population using two administrative lists (datasets), by correcting for undercoverage and linkage error. We demonstrate this method on simulated data generated under the assumed model and present a case study with real data in the New Zealand context._________

Building ’Disclosure Risk Calculator’: A Case Study in R and TypeScript Integration

Wednesday

11:30 AM

A2

Tom Elliott

iNZight Analytics

TypeScript is revolutionizing web development with its type safety features, but integrating it with dynamically-typed languages like R presents unique challenges. This talk introduces the ’Disclosure Risk Calculator,’ a web application built on ReactJS and powered by an Rserve back-end. I will explain how I extended the rserve-js interface to give front-end developers the full TypeScript experience, and discuss how this process might be automated in the future.________________________________________________________________

Data requirements and challenges of observational urban data for the causal analysis

Tuesday

11:30 AM

A3

Pooja Baburaj

University of Canterbury

In causal analysis with observational data, meeting essential data requirements is paramount, including comprehensive covariate information, precise treatment/exposure data, reliable outcomes, an adequate sample size, temporal ordering, data quality, a control group, and addressing data-level mismatches. To address these challenges, resolving data-level mismatch challenges involves preprocessing, harmonization, and statistical techniques. Transparent reporting and collaboration with experts enhance validity. Understanding model assumptions and conducting sensitivity analysis are vital for robust causal inferences, ensuring more accurate and reliable urban causal analysis. _________________________________________________________________________________

Deep Mixture Models for Understanding Latent Space Representations and Clustering Tasks

Tuesday

11:50 AM

A3

Mashfiqul Huq Chowdhury

Victoria University of Wellington

Clustering is a challenging task in machine learning research for analyzing high-dimensional datasets. This study introduces two deep mixture variational autoencoder models to uncover meaningful latent representations for clusters within unlabelled datasets. In our first method, we assume the Gaussian mixture model as the prior distribution for the latent space. The second model additionally considers mixtures of Gaussian for the approximate posterior distribution. We develop a new evidence lower bound (ELBO) to approximate the marginal log-likelihood using variational approximation techniques and formulate closed-form expressions of posterior probability for clustering assignments. The proposed framework facilitates learning latent space representation and performs clustering tasks. We evaluate the effectiveness of our probabilistic models by assessing the clustering performances on various benchmark datasets. This proposed method will also be compared with the existing state-of-the-art methods.

For a decade now, I have been using Diffusive Nested Sampling, a high performance MCMC algorithm, in a variety of applications of Bayesian model comparison. In this talk I will present two recent astrostatistics results. Firstly, I will present an analysis of the ’quasar dipole’, a cosmological puzzle that required a significance test. Secondly I will present work on detecting the subtle signal of cosmological time dilation on the variability of quasars. _______________________________________________________________________

Emergency Department versus cardiology management of low risk chest pain patients.

Monday

03:30 PM

A1

Eleanor Dunn

Department of Medicine, University of Otago, Christchurch

Chest pain is a frequent reason for hospital visits, but most cases do not result in a diagnosis of acute myocardial infarction (AMI). Clinicians must rule out Acute Coronary Syndrome (ACS), but also consider the possibility of underlying coronary artery disease. Determining the course of action for ”low-risk” patients regarding major adverse cardiac events (MACE) presents challenges in Emergency Medicine. Previously, cardiology referrals guided follow-up, but in April 2022, a new pathway directed ED physicians. This study compares exercise stress test (EST) referral rates between these approaches. Data from both groups’ clinical records over 12 months, with a 30-day follow-up for 202 EST patients, found no statistically significant differences. Key parameters, including positive EST results (-9Within 30 days of discharge, MACE occurrence in the ED-led group (3.7In conclusion, this study suggests that ED-led management of minor chest pain cases is comparable to cardiology-led management in terms of referral rates for further investigation and MACE occurrence. This ED approach streamlines care, reducing hospital visits, potentially benefiting rural patients with limited access____

Estimating contaminant loads from high frequency - how useful is the extra data?

Wednesday

11:50 AM

A2

Alasdair Noble

AgResearch

High frequency probes are available at realistic prices to monitor flow and a range of contaminants in rivers. Multi year data has been captured from a small number of sites (9) and by subsampling we have investigated a range of time scales, from half hourly to monthly. We consider some measures of fit, for a commonly used load model, and the estimates of annual load. _________________________________________________________

Estimating genotype by environment interactions using genomic data: An application to smallholder dairy farms in India

Tuesday

02:20 PM

A2

Roy Costilla

AgResearch, Ruakura Research Centre

Genotype-by-environment interactions (GxE) are prevalent in most livestock industries. In smallholder dairy farms in India this potential GxE is particularly important because of the huge variety of environments animals are raised in these production systems. Here we aim to estimate GxE by state in India using simulations based on real genotyped data from crossbred dairy cows from the BAIF research foundation. In particular, we simulate phenotypes under the null hypothesis of perfect genomic linkage between environments (genetic correlation=1) and we use multi-trait models to estimate the resulting genetic correlations. In addition, we use the fixation index (Fst), to measure the genomic similarity among animals in different states. Using data from around 5,000 animals across five Indian states: we show that the estimated genetic correlations among states in the simulations are high for all possible combinations of states, and thus provide evidence of enough genomic linkage/connectedness across states. Moreover, Fst values are also very low providing additional evidence of similar genetic backgrounds among animals in different states. These findings support the feasibility of estimating GxE by state in India using real milk yield phenotypic records.___________________________________________________________________________________________________

Estimating Human Mobility - Computing Commute Graph Across Aotearoa

Tuesday

02:00 PM

A3

Simon Urbanek

University of Auckland

In this talk we will show a way to construct a spatial network representing commutes to/form work, illustrated on the Aotearoa New Zealand 2018 Census recording the usual residence and work location of New Zealanders. Based on the spatial information we can estimate the commute flows between and through regions. The resulting spatial network can be combined with other available data to answer various questions such as how dangerous are commutes for people living in certain areas. The main focus is to construct a spatial network that can be used independently of the generation process to link existing data such as demographics or tagged locations with the human mobility aspect captured by the transition network. _____

Finding cycles in Covid-19 time series of daily counts

Monday

04:10 PM

A1

Miaotian Li

University of Auckland

In this talk, we show the results of an experimental study focused on finding oscillatory components in Covid-19 daily time series data. To this end, we present an algorithm for the detection of a fundamental period T0 and its harmonics in the data. A key feature of the algorithm is an automatic procedure for the selection of the significant peaks of the smoothed periodogram. We investigate the effect on the detection performance of three data transforms that have been used in the previous literature. We also investigate to which extent the use of the filtering enables the detection of the oscillatory components with periods that are integer divisors of T0. The spectral analysis is extended to weekly data, which are obtained by aggregating the daily measurements._________

Forecasting New Zealand electricity price using regime-switching and extreme value theory models

Wednesday

11:50 AM

A3

Nuttanan Wichitaksorn

Auckland University of Technology

In this talk, I will demonstrate the forecasting capabilities of several models under the Markov regime-switching (MRS) and the extreme value theory (EVT) frameworks applied to daily electricity prices in the New Zealand electricity market. The MRS models in the study include up to five regimes, with time-varying transition probabilities and incorporation of external market variables. We apply Hamilton’s filter with maximum likelihood estimation for parameter estimation. The EVT peaks-over-threshold (EVT-PoT) framework is also considered, and its relationship to the MRS class of models is discussed. We generate out-of-sample forecasts under various market scenarios. The MRS models are able to replicate real price densities under stable market conditions. The EVT-PoT model performs well despite its lack of complexity compared to the MRS framework. We attribute this to the usage of the generalized Pareto distribution to model price extremities. _____

From LAD (least absolute deviation) to DL (dictionary learning)

Wednesday

11:10 AM

A2

Ciprian Doru Giurcaneanu

University of Auckland

We consider the optimization problem related to least absolute deviation estimation and discuss how it was extended in the previous literature from the l1-nom to lp-norm. Next, we show how the same optimization problem can be employed to derive the orthogonal matching pursuit (OMP) algorithm in the lp-norm; OMP finds a sparse representation for a signal by using a given dictionary. Furthermore, we use the same optimization problem for dictionary learning in lp-norm, where both the dictionary and the representation are learned from the signals (data). The results presented in this talk are based on the following publication: X. Zheng, B. Dumitrescu, J. Liu, C.D. Giurcaneanu, Dictionary learning for signals in additive noise with generalized Gaussian distribution, Signal Processing (Elsevier), 195, 2022._____

Generative Models for Core-Collapse Supernovae

Tuesday

11:50 AM

A1

Tarin Eccleston

University of Auckland

My research investigates generative models, particularly Deep Convolutional Generative Adversarial Networks (DCGANs), to expedite the generation of gravitational wave signals originating from core-collapse supernovae (CCSN). Current methods of generating such signals rely on complex physical and thermonuclear calculations, leading to simulations that often take months to complete. In contrast, this study demonstrates the efficiency of DCGANs in replicating these signals by learning the underlying distribution in the data.

Having an abundance of CCSN gravitational wave signals allows us to better test state-of-the-art parameter estimation techniques in preparation for real LIGO observations of CCSN. Furthermore, the methodologies elucidated in this research can be extended to generate signals originating from diverse gravitational wave-producing phenomena. ______________________________________________________________________

How does the quality of a survey frame affect achieve survey response targets?

Wednesday

11:50 AM

A1

Fareeda Begum

Stats NZ

One of the challenges of household survey design is selecting a sample to yield a desired responding sample size. With increasing levels of non response, there is a need to understand the various factors responsible for this. Another factor is the contribution of dwellings that are found to be ineligible. Over the years, there have been changes in the source of data for dwelling selections for Stats NZ surveys, with increasing use of administrative lists. We have done a study to understand how changes in the source of dwellings may have contributed to any changes in the level of ineligible dwellings, and how they have impacted the achieved responding sample size. I will be presenting the summary of the study.____

Improving admin address assignment using a machine learning model

Tuesday

11:10 AM

A2

Katie Simpson

Stats NZ

Abstract - Wording yet to be finalized. Please note this talk has direct relevance to Stephen Merry’s proposed talk.

In 2022, in preparation for an update to the admin household and family methods, it became clear that family and household data was limited by the quality of address selection. I created a machine learning model to better select usual residence address from admin data. I will speak about how we created the model and the opportunities presented by the new method and the ongoing limitations. ______________________________________________________________________

Inferring the kinematics of Globular Cluster Populations for NGC 1052-DF2 and NGC 1052-DF4 Galaxies

Tuesday

11:30 AM

A1

Cher Li

University of Auckland

The most appealing star clusters in the universe are the globular clusters. One can learn more about galaxies’ properties and formation by studying globular clusters. Here, two small galaxies, NGC 1052-DF2 and NGC 1052-DF4, have been investigated using Bayesian inference. It is interesting to see whether the globular clusters in these two galaxies contain rotational components and how this may help to explain the mass of the galaxy and the dark matter problem, as it has been claimed that these galaxies lack dark matter. First, ten globular clusters in the NGC 1052-DF2 system have undergone a re-analysis. The results support the notion that the globular cluster population rotates. The resultant value of amplitude, A, which is larger than the velocity dispersion, gives an estimate of the mass of the galaxy that is about two times the mass calculated from the findings of a previous investigation. The outcomes also indicate that there may indeed be some little dark matter in NGC 1052-DF2. Second, seven globular clusters in the NGC 1052-DF4 system had their rotational features examined. The results demonstrated that the non-rotation model outperformed the rotation model. A tiny value of amplitude, A, and a large value of velocity dispersion indicate that the estimated mass of the galaxy stays the same as in the previous investigation. Moreover, the result concluded that there is very little chance that dark matter exists in the NGC 1052-DF4 system.___

The International Visitor Survey (IVS) is led by the Ministry of Business, Innovation and Employment and it provides accurate national information on the characteristics, behaviours and expenditures of New Zealand’s international visitors on a quarterly basis. In this talk, we will provide a brief overview of the International Visitor Survey and Statistics New Zealand’s support for the survey. __________________________________

Measuring indigenous outcomes and inequity– is a different approach to age-standardisation needed?

Tuesday

02:00 PM

A2

Tori Diamond

University of Auckland

How do different approaches to age standardisation affect health measures for the M

ori population and M ori/non-M ori inequity metrics? This updates and extends work by Robson et al. (2007) by investigating the impacts of using different reference populations in direct standardisation calculations to quantify three age-dependent outcomes for the M ori population. MoH and Stats NZ records were used for non-fatal injuries, cancer registrations and mortality from 2000 to 2019. Age-standardised rates per 100,000 were calculated with 95Previous work highlighted the differences between the 2001 M ori population age structure and the two global standards (Segi and WHO). Our work reinforced this and demonstrated the impact of the M ori demographic since 2001 on age-standardisation calculations. The reference populations produced different rates for the three age-related outcomes. Our results show the importance of reference population selection for quantifying indigenous outcomes. The changing M ori demographic structure supports using the most recent available M ori population as an external reference population, but this creates inconsistencies in time trends. These results reinforce the need for an indigenous standard population when calculating M ori health outcomes and M ori/non-M ori inequities.________________________________________________________________________________________________With environmental concerns about fertilizer pollution of ground water, farmers are wanting to minimize the leaching of nutrients, whilst maximizing the economic effectiveness of the fertilizer. To assess and model the uniformity of spreading fertilizer, a large-scale experiment was performed. This measured the spatial distribution for a range of fertilizer types (pure and blended components) for different trucks and regions of New Zealand. This was analysed with a generalized linear model and the results input into a simulation program to estimate the spatial distribution over a whole paddock. The physics of spreading fertilizer from a turning truck had to be estimated to allow for corners in the truck’s track. The simulation can then be used to optimize the distance between tracks to give the optimal economy for the spreading operation. The model was then made available to farmers as an Excel spreadsheet, where they could enter their own paddock dimensions, track width and fertilizer type to visualize the spread of fertilizer across the paddock. ___

Navigating two worlds: Innovations in healthcare monitoring and fisheries modelling

Monday

02:00 PM

A1

Nokuthaba Sibanda

Victoria University of Wellington

The on-going pursuit of improved patient care and safety in healthcare has resulted in several methodological developments. A key consideration in these developments is practical implementation of the solutions developed. Likewise, in fisheries modelling increasing availability of new data sources, such as presence-only data, has resulted in development of new modelling techniques. This talk will explore a number of projects covering recent developments and practical implementation of statistical methods in these two application areas.________________________________________________________________________________

Pairwise Differences Covariance (PDC) Estimation in Principal Component Analysis when n¡p

Monday

03:50 PM

A3

Nuwan Weeraratne

University of Waikato

Principal Component Analysis (PCA) is a method of compressing a lot of data into a format that captures the essence of the original data. Moreover, PCA is a matrix decomposition technique that uses eigendecomposition. It quantifies variable relationships using covariance matrices, determines the data distribution, and assesses direction significance using eigenvalues. Therefore, the variance-covariance estimation is crucial for PCA. The usual maximum likelihood estimator of covariance is asymptotically unbiased, but it gives poor estimates of the principal components and badly conditioned estimates of the covariance matrix in high dimensional settings when n¡p. To address the issues with PCA when n¡p, we proposed the Pairwise Differences Covariance (PDC) estimation method. In empirical comparisons with several popular estimators, the PDC estimator performs well in estimating the covariance and the principal components in terms of the MSE and cosine similarity error of PCs. Real data applications are presented. _____

The research aims to explore the effects of information delay on parallel queues for different dispatching methods. We will consider the problem of the optimal dispatching policy for dispatching to a pair of parallel queues with Markov service. We investigate the impact of the amount of information delay for queues with both synchronous and asynchronous updates and the influence of different service rates. Our research utilizes simulation methods in R to assess the efficiency of these policies.__

Partial Ordered Stereotype Model Development of a New Model

Monday

04:10 PM

A3

Laia Egea Cortés

Victoria University of Wellington

Ordinal data is prevalent across various fields. Nevertheless, even today, researchers often employ methods designed for continuous or nominal data to analyse ordinal response variables. It is important to notice that ordinal data is truly different in nature than continuous and nominal. Although each level is greater than the previous one, we can not, a priori, quantify the size of the difference on a numerical scale, and nor can we assume that the levels are equally spaced.

The Ordered Stereotype Model (OSM) includes score parameters which specify the potentially unequal distances between adjacent response categories. The score parameters show the discriminating power of the covariates, that is, how well the covariates of the model can distinguish between response categories. It can, however, be the case that two covariates have different discriminating powers. My talk presents the Partial Ordered Stereotype Model (POSM) developed in my thesis, which is an extension of the OSM that allows different sets of score parameters within the same model. In this way, this new model captures the particularities of each covariate in terms of their discriminating power.

To demonstrate the utility of this new model, we apply it to real-world salmon data. Our objective is to identify variables impacting salmon health and assess how these variables differentiate between health levels. Finally, a simulation study comparing the performance of the OSM and the POSM has been set up. ____

Particle-based Variational Bayes: Towards Scalable and Accurate Bayesian Computation

Monday

04:10 PM

A2

Minh-Ngoc Tran

The University of Sydney

Variational Bayes (VB) is widely recognised as a highly efficient and scalable technique for Bayesian inference. However, classical VB imposes restrictions on the space of variational distributions, typically restricting it to a specific set of parametric distributions or factorized distributions. This talk explores ways to relax these restrictions by traversing a set of particles to approximate the target distribution. The theoretical basis of the new particle VB method is established using Optimal Transport theory, which equips the space of probability measures with useful calculus tools. This paves the way for new research avenue, enabling precise Bayesian inference even in intricate, high-dimensional scenarios. __

Prevalence estimation from sparse data when the outcome and covariates are sometimes missing

Tuesday

12:10 PM

A2

Patrick Graham

Statistics New Zealand

Estimating the population prevalence of some characteristic of interest, by levels of one or more covariates is a common task in epidemiological and social research. Though seemingly straightforward, such analyses often encounter problems of sparseness and missing data. Multiple imputation is a popular and flexible method for dealing with missing data problems but it’s application to prevalence estimation is complicated by sparseness because the normality assumptions which underpin the standard multiple imputation combining rules are often suspect in studies of low prevalence conditions. In this paper we consider some alternative approaches to interval estimation from multiply imputed prevalence data and compare these with interval estimates obtained from Bayesian approaches to prevalence estimation in the presence of missing data, that avoid imputation.______________________________________________________________________________________________

Quantiles on global non-positive curvature spaces

Tuesday

11:10 AM

A3

Ha-Young Shin

Seoul National University, Department of Statistics

This talk develops a notion of geometric quantiles on Hadamard spaces, also known as global non-positive curvature spaces. After giving some definitions and basic properties, we demonstrate several asymptotic properties of sample quantiles on Hadamard manifolds such as strong consistency and joint asymptotic normality. Some theory, including an explicit formula for the gradient of the quantile loss function, is developed specifically for hyperbolic space, followed by both simulation and real data experiments.________

Respiratory Health of Pacific Youth: Nutrition Resilience and Risk in Childhood

Tuesday

01:20 PM

A2

Siwei Zhai

University of Auckland

In New Zealand, 7% of deaths are related to respiratory diseases and Pacific people are at higher risk. This work investigated causal effects of early-life nutritional factors on early-adulthood lung function amongst Pacific Islands Families Study cohort members , who consist of the 1398 individuals born from Pacific Island families in Middlemore Hospital between March and December 2000. 466 from the cohort participated in the respiratory study. Primary outcome was forced expiratory volume in 1 second (FEV1) z-score at age 18 years. FEV1 and healthy lung function (HLF), defined as the z-score being larger than -1.64, were secondary outcomes. Nutrition and other information were previously collected in 4 measurement waves at ages 4, 6, 9 and 14 years. Exploratory and multi-group confirmatory factor analyses identified 4 eating patterns represented by nutritional factor scores (NFS). Confounders were identified using a causal directed acyclical graph. Semi-parametric linear and relative risk regression models were fitted to estimate causal effects of NFS on respiratory outcomes, using estimated weights compensating for attrition-induced selection bias. The population attributable fractions of HLF of each NFS were estimated for each measurement wave. Results suggest a positive impact of consuming more fruit and vegetables during childhood on respiratory health later in life. There is a need to support healthier food environments for Pacific children and access to healthier food. ______

Results from the 10-year traumatic brain injury study

Monday

04:50 PM

A1

Priya Parmar

University of Auckland

TBA______________________________________________________________________________________________________

Revealing and characterising anomalous spatio-temporal patterns in Hikurangi Subduction Zone seismicity

Tuesday

01:20 PM

A3

Jessica Allen

University of Otago

The Hikurangi subduction zone comprises an active seismic area overlapping the North Island of Aotearoa. Classifying events recorded in this region and understanding their recurrence is a crucial step in highlighting interactions between different forms of seismic activity and the underlying fault processes, to improve forecasting of the next mega thrust earthquake. Slow slip events (SSEs) are a kind of slow earthquake, with potential as both a slow motion analogue for the generating mechanisms and a possible trigger of mega thrust earthquakes. However, SSEs often remain undetected, resulting an incomplete catalogue that limits the insights they can provide. SSEs in the Hikurangi subduction zone have been associated with increased seismicity. Seismic swarms are clusters of events where several have similar magnitudes and the expected mainshock-aftershock decay is not displayed. We explore various models to isolate potential seismic swarms from a comprehensive catalogue of Hikurangi seismicity. Hidden Markov models with extra zeros in 2 and 3 dimensions are applied to spatio-temporally classify events, identifying meaningful subregions of activity. Epidemic Type Aftershock Sequence models have been well-established for capturing typical earthquake cycles and are utilised to highlight the surplus seismicity that corresponds to swarms. We then use renewal processes to examine the occurrence patterns of swarms to further investigate the relationship between SSEs and swarms. ______

SDMX Standards and working with SDMX through data publishing tools

Tuesday

11:50 AM

A2

Sam Cleland

Statistics NZ

The SDMX community is a global initiative to improve statistical data and metadata exchange. It is a ISO standard designed to describe statistical data and metadata, normalise their exchange, and improve their efficient sharing across statistical and similar organisations. Stats NZ is working to improve its data publishing tools, some of these are based on the SDMX information model. The upgrade of the NZ Stat tool is introducing new functionality and opportunities to work with SDMX formatted objects. In this presentation I will introduce the SDMX model, new functionality in data publishing tools, and opportunities for working with SDMX formatted objects through the new data publishing tools. ____

Shrinkage estimators of the spatial relative risk function

Tuesday

02:40 PM

A3

Martin Hazelton

University of Otago

The spatial relative risk function describes differences in the geographical distribution of two types of points, such as locations of cases and controls in an epidemiological study. It is defined as the ratio of the two underlying densities. Estimation of spatial relative risk is typically done using kernel estimates of these densities, but this procedure is often challenging in practice because of the high degree of spatial inhomogeneity in the distributions. This makes it difficult to obtain estimates of the relative risk that are stable in areas of sparse data while retaining necessary detail elsewhere, and consequently difficult to distinguish true risk hotspots from stochastic bumps in the risk function. We study shrinkage estimators of the spatial relative risk function to address these problems. In particular, we propose a new lasso-type estimator that shrinks a standard kernel estimator of the log-relative risk function towards zero, eliminating stochastic bumps._________

TBA______________________________________________________________________________________________________

Prediction of usual residence and its use for admin enumerations in the 2023 Census

Tuesday

11:30 AM

A2

Stephen Merry

Stats NZ

The 2023 Census aims to count the people and dwellings residing in New Zealand on 7th March 2023. For each person in the census usual resident population, we aim to have information about where they usually reside. For census non-respondents, their usual residence address is key for their inclusion as admin enumerations. Historically, a rules-based method, choosing an individual's most recent address, was used. Here we discuss the implementation of an XGBoost method developed by Katie Simpson, its performance, and use for admin enumerations. ___________

Collaboration with Iwi Mori on no response mitigations for the 2023 Census

Tuesday

02:00 PM

A1

Florian Flueggen and Pip Bennett

Stats NZ

The national census of people and dwellings aims to provide accurate information on a variety of topics, nationally and down to sub-populations. For this reason it is important for the census to have accurate values. While the best data quality is achieved by people responding fully and accurately, some do not respond to every question or at all, and others do not provide usable responses. To reduce the impact of these ’no response’-type situations on data quality, appropriate mitigations are required. In this presentation we describe the collaboration between Te K

hui Raraunga and Stats NZ on developing these mitigations. We discuss why this collaboration was sought, how it started and developed, and share our key learnings. The intention is to support others thinking about embarking on similar journeys to improve the quality of their data and outputs. ___

TBA, topic Statistical Imputation using CANCEIS for filling gaps in Census 2023 records

Tuesday

01:20 PM

A1

Andre Macleod Hungar

Stats NZ

TBA______________________________________________________________________________________________________

Test of clustering for Neyman-Scott processes

Tuesday

01:40 PM

A3

Bethany Macdonald

University of Otago

Spatial point patterns can arise from a vast array of application areas including epidemiology, ecology and geoscience. A fundamental research question is whether the points within these patterns are independent or clustered. Somewhat surprisingly, there exists no formal statistical test for such a hypothesis. This is largely due to the long recognised fact that the likelihood of the Neyman-Scott process is intractable. Recent developments by Baddeley et al. (2022) have remedied this issue by reparametrising the Neyman-Scott model by cluster strength and cluster scale, where the Poisson process occurs when the cluster strength is zero. Using these developments, we establish a formal test of clustering for the Neyman-Scott process. _________________________________________________________________________________

Conditional autoregressive (CAR) models are frequently used in applications where a spatially discrete index set forms the domain of interest. We typically see the covariance structure formulated in terms of a single parameter that scales the influence of the local ’neighbourhood’ of a given node or site and as such, is often interpreted as a representation of the strength of correlation. In this talk I don’t introduce anything new, but do highlight my lack of ability by trying to explain why a ’correlation-like’ interpretation of this parameter is difficult, if not downright wrong.

The geometry of diet: using projections to quantify the similarity between sets of dietary patterns

Tuesday

12:10 PM

A3

Beatrix Jones

University of Auckland

Food consumption is complex and high dimensional. Nutrition researchers measure (or attempt to measure) consumption using food frequency questionnaires, food recalls, or food records. This generates high dimensional data which is frequently summarised using principal components, principal components with rotation, or factor analysis. The resulting axes in the high dimensional space are called “dietary patterns.” We define a multivariate extension of Tucker’s congruence coefficient (MTCC) to quantify how similar two different sets of dietary patterns are, producing a similarity measure that ranges from zero to one. While the MTCC can be applied to compare different populations assessed with the same dietary instrument, to contextualise the MTCC we compute the similarity for several datasets from the dietary pattern validation literature, where the same food questionnaire is given to the same people a few weeks apart. _________________________________________________________________________________

The Markov Chains Tool - an interactive tool

Wednesday

11:30 AM

A3

Heti Afimeimounga

University of Auckland

Researchers and educators have long been aware of the misconceptions prevalent in people’s probabilistic reasoning processes. Calls to reform the teaching of probability from a traditional and predominantly mathematical approach to include an emphasis on modelling using technology are now being heeded by many. The Markov chains tool is one of four interactive visualization tools that were developed as part of a research project funded by the Teaching & Learning Research Initiative. Initial feedback suggests that the tool may support students’ understanding of the equilibrium distribution and points to certain aspects of the tool that may be beneficial. In this talk, I will present our experiences of including an activity based on this tool in the Markov processes module of a first-year probability course.______

Trends in Statistical Methods in Medical Studies (2000 – 2023)

Tuesday

01:40 PM

A2

Deborah Kakis

University of Auckland

Statistical analysis plays a significant role in medical and healthcare research and practice. Having a command of statistics, enables health professionals to read, understand and synthesize published results to inform practice. Further, it allows those involved in research to implement appropriate study designs, acquire quality data, and analyze, to ensure reliable findings. While it is desirable for health professionals to acquire a certain level of statistics, knowing their required skills is important. Statistics, like many other disciplines, is constantly evolving and is true for those used in healthcare research. In this talk I investigate statistical methods applied in medical and health studies, enabling an understanding of their trends over the last twenty years. Studies published in the New England Journal of Medicine between 2000 – 2023 were analyzed to identify the trend and prevalence of statistics used. This presentation shares preliminary findings that while descriptive and inferential statistics are still being widely used, more involved procedures such as regression, multivariate analyses and machine learning models are making an appearance in recent years. These results contribute to the wider theme of a PhD study that seeks to investigate the statistical literacy of health professionals in Papua New Guinea. Understanding the trend will help guide the development of tools to assess and improve statistical literacy of health professionals in Papua New Guinea. _____

New methods for measuring price change from administrative and big data mean that we can more accurately understand quality change. We will explain new approaches to doing this and present some results for different product classes._____

Using Convolutional Autoencoders for Signal Detection of Extreme Mass Ratio Inspirals Detected by the LISA Mission

Tuesday

11:10 AM

A1

Amin Boumerdassi

University of Auckland

Extreme Mass Ratio Inspirals (EMRIs) are gravitational wave (GW) events produced by the mergers of pairs of massive objects such as black holes and neutron stars whose mass ratio is ¿10,000. These GWs cause the distance between points in space to oscillate, and these oscillations are measured through the varying time of travel for laser light. Traditionally, the detection of GW events was performed through matched filtering in which a detected signal would be compared to millions of variations of a template model for a given type of GW event. In the case of EMRIs, this is computationally unfeasible owing to the huge parameter space of EMRI waveform models, years-long waveform duration, and large file size. My work attempts to overcome these problems by training a convolutional autoencoder on rapidly-generated simulated EMRI signals. By framing this as an anomaly detection problem (anomaly = not an EMRI), the autoencoder attempts to reproduce EMRIs as accurately as possible by mapping the signal to a low dimensional representation and back to the original dimensionality. The successful autoencoder will accurately reconstruct EMRIs, poorly reconstruct anything else, and perform all this with little computational requirement. _________________________________________________________________________________

Using Linear Assignments in Spatial Sampling

Tuesday

02:20 PM

A3

Blair Robertson

University of Canterbury

A spatial sampling design determines where sample locations are placed in a study area so that population parameters can be estimated with relatively high precision. Usually, the population mean or total is estimated, but other characteristics may also be of interest. An effective strategy for sampling natural resources is to spread sample locations evenly over the resource because nearby locations tend to have more similar response values than distant ones. Spatially balanced designs have good spatial spread and give precise results for commonly used estimators when surveying natural resources. In this talk, we present a linear assignment strategy that can be used to draw various well-spread samples over arbitrary auxiliary spaces.________

Variance estimation for network meta-analysis

Wednesday

09:30 AM

A1

Hans-Peter Piepho

University of Hohenheim

Meta-analysis summarizes the results of a series of trials. When more than two treatments are included in the trials and when the set of treatments tested differs between trials, the combination of results across trials requires some care. Several methods have been proposed for this purpose, which feature under different labels, such as network meta-analysisor mixed treatment comparisons. Two types of linear mixed model can be used for meta-analysis. The one, known as contrast-based, expresses the expected outcome of treatments as a contrast to a baseline treatment. The other, known as arm-based, uses a classical two-way linear predictor with main effects for treatment and trial. In this talk, I will compare both types of model, exploring under which conditions they give equivalent results and stressing the advantages of the arm-based approach. A key quantity in network meta-analysis is the variance for heterogeneity, corresponding to the treatment-by-trial interaction in the arm-based approach. Estimation of this variance can be done by a number of methods in a generalized linear model context. I will report on simulations done to evaluate alternative methods of estimation. First I will consider the basic model for network meta-analysis, which assumes that direct and indirect evidence on treatment comparisons are consistent. Consistency is a key assumption, and therefore several methods have been proposed to detect inconsistencies. Here, I will consider the recently proposed evidence-splitting model and propose a new estimator for the heterogeneity variance. Simulation evidence will also be presented, showing that the proposed estimator does well. ___

When are we going to die? A Bayesian latent variable approach to modelling Australian mortality data from January 2015

Monday

03:50 PM

A2

John Holmes

University of Canterbury

The Covid-19 pandemic has resulted in increased interest in excess mortality modelling. However, the statistical methods used tend to (1) be over-parametrised, (2) fit models that ignore variance heterogeneity, (3) do not check if postulated models are consistent over time, (4) do not look for the factors driving short-term variation in mortality rates.

Since 2020, the Australian Bureau of Statistics has been releasing mortality statistics monthly, including age-standardised death rates per week split by underlying cause. Modelling the resulting multivariate mortality time series using multivariate regression and factor analysis allows us to show

1. Mortality rates in Australia include a true seasonal component, but most seasonality is driven by winter epidemics of Pneumonia causing pathogens. However, over 80 2. Deviations from expected winter respiratory illness epidemic mortality in any week split into the different underlying cause categories in similar proportions as expected epidemic mortality. 3. Since January 2020, mortality trends are unchanged except no winter respiratory illness epidemic is visible in 2020 mortality data and Covid-19 circulation increases all-cause mortality rates. In the Omicron period, we find evidence that covid-associated mortality is under-reported. _________________________________