Abstracts

Statistical modelling of heavy-tailed stock returns

Taehan Bae

University of Regina

This study aims at predictive modelling of heavy-tailed stock returns under a mixture model setting. A robust regression method is used to fit the main body, while the peaks-over-threshold method is employed to model the tails. For the estimation of the tail parts, the Bayesian maximum a posterior estimation with conjugate priors is used to smooth the maximum likelihood estimates (MLEs). This filter tuning process provides stability and efficiency in computation and prediction. Several constrained, non-convex optimization problems have been converted to unconstrained, convex problems by quadratic approximation and variable changes. The approach is applied to a large, multi-period, unbalanced data set of daily returns of global stocks, containing nearly 100,000 records. Out-of-sample prediction results show the outperformance of the smoothed estimates over the MLEs.

I am indebted to Reg Kulperger for many of my problems

John Braun

University of British Columbia

In this talk, I will start with a brief history of my experience of Reg as supervisor, colleague and collaborator. I will then survey the breadth of problems that Reg and I worked on over the years, spanning cluster point processes, branching processes, interacting particle systems, bootstrapping, Fourier analysis, density and intensity estimation.

Modelling rainfall over the catchment of the blue nile

A.H. El-Shaarawi

Cairo University, Department of Statistics, Cairo, Egypt National Water Research Institute, Burlington, ON, Canada, L7R 4A6

Life of Millions of people in Egypt, Sudan and Ethiopia depends on the Blue Nile's water for agriculture, industry and domestic use. The downstream nations, Egypt and Sudan, are concerned about the impact of constructing the Grand Ethiopian Renaissance Dam (GERD) across the Blue Nile River in Ethiopia on their traditional water share. The objective of this paper is to analyze the historical rainfall data, which is collected from more than 100 sampling stations within the River's watershed on the Ethiopian Plateau. The intention is to develop a model for predicting the changes in the discharge to the Nile from the Blue Nile as a result of the changes in yearly rainfall yield. The importance of this prediction is related to the Ethiopian management of filling the huge GERD reservoir whose capacity equals more than a year's flow of the Blue Nile. The model takes into account the effects of El Ni\tilde{n}o and La Ni\tilde{n}a events where the first is associated with low rainfall while the second is associated with more rainfall.

Decomposing Variability in Environmental Monitoring Data

Sylvia Esterby

University of British Columbia

Temporal records of environmental variables often display periodicity or oscillations. When modelling such data sets for purposes of detecting temporal trends or attributing components of variability to other factors, various methods are used to account for periodicity or oscillations. For river monitoring data collected at a monthly frequency, methods which block on season or methods which explicitly model the form of seasonal change as a time varying function could be used. A consistent theme is the decomposition of the variability. A number of cases will be reviewed. Examples of variables include water quality and quantity, air pollutants, and tree rings. Methods include modelling oscillations with Fourier series, wavelets and LOWESS and, in cases where monitoring locations are grouped as similar, random forest models and functional data clustering.

TBA

Peter Guttorp (Thursday PM - Remote)

University of Washington

Bootstrapping the empirical distribution of a stationary process with change-point

Gail Ivanoff (Thursday PM)

University of Ottawa

When detecting a change-point in the marginal distribution of a stationary time series, bootstrap techniques are required to determine critical values for the tests when the pre-change distribution is unknown. In this talk, we propose a sequential moving block bootstrap and demonstrate the asymptotic behaviour of the bootstrapped empirical process under both converging and non-converging alternatives. We avoid any assumptions of mixing, association or near epoch dependence. These results are applied to a linear process with heavy-tailed innovations.

Coupling and weak convergence

Richard Lockhart

Simon Fraser University

I will chat about a strategy Reg, Peter Guttorp, and I used for proving weak limit theory for one process by coupling it to another easier one. I expect to define the terms, talk about Reg, and leave you guessing about what we actually did.

On the relationship between data sharpening and Firth's adjusted score function

James Stafford

University of Toronto

In a series of seminal papers [Choi and Hall (1999); Choi et al. (2000); Doosti and Hall (2016)] data sharpening is shown to reduce the bias of non- parametric estimators for regression and density estimation. In this paper we give a common framework for these techniques and show that they can be derived directly from an adjusted score function of the type given by Firth (1993). When interest lies in derivative estimation for non-parametric regression regression we show this approach leads to a new data sharpening technique.

Calculating wait time 1 from billing data

David Stanford

University of Western Ontario

The Segment 1 wait time (“Wait Time 1”), defined as the time between the date of referral and the date of the initial specialist consultation, is an important component of total patient wait time. The Saskatchewan Ministry of Health wanted to explore the viability of a method to measure Wait Time 1 based upon billing data. This study considers three different approaches to identify Wait Time 1 from administrative billing data. The prime advantage in using such billing data is that it is readily available. The main disadvantage is that the actual referral date is not generally indicated in the general physician (GP)’s billing data and the patient may have seen the GP multiple times prior to the specialist visit, so that statistical methods are needed to ascertain which was the referring event. To address the difficulty of identifying the referral date, the Ministry introduced the “55B” GP billing code in April 2012. The new billing code is identical to the “5B” which is used by GPs for partial assessments and indicates, among multiple visits, which GP visit resulted into referral to specialist. This study also examines whether the current sample size of 55Bs is sufficient to accurately measure Wait Time 1.

Treatment regimes and social networks

Mary Thompson

University of Waterloo

Medical treatments are usually administered to the individual for whom the outcome is defined. In contrast, preventive health or behavioural interventions are often administered taking network structure into account. The evaluation of these latter types of treatment regimes would look to optimizing the expected outcomes for the network, subject to cost constraints. We consider examples of evaluation frameworks inspired by some of the literature on causal inference.

Hidden Markov models, blood counts, psychosis, probability tables and the Levenberg-Marquardt algorithm.

Rolf Turner

University of Auckland

It is conjectured that there is a relationship between counts of blood cells (monocytes) and severity of psychosis in an individual. Data are available in the form of observations made on a fairly large number (1258) of individuals. The psychosis observations consist of (subjective) assessments made by physicians, of severity on a 0 to 4 scale (0 = no symptoms, ..., 4 = severe psychosis). The cell counts are of course intrinsically discrete, but with a large range, so they have been further discretised on a 1 to 5 scale in a somewhat ad hoc fashion. There is no appropriate parametric distribution for such discrete data, so the emission probabilities in a hidden Markov model to be fitted to the data must be specified via tables (one table for each state of the underlying Markov chain). These "non-parametric" models are conceptually simple but entail the estimation of awkwardly large numbers of parameters (!!!). I have approached the analysis in two ways. (1) Fitting bivariate models (and testing for dependence of the two components. (2) Fitting models to the monocyte counts and then using the fitted values from these models as predictors in models to fitted to the psychosis ratings. Both approaches required substantial development of both methodology and software. In an effort to improve the performance of the model fitting procedure I adapted, to the current context, an implementation of (a version of) the Levenberg-Marquardt algorithm that I had previously developed to handle settings in which the emissions are Poisson distributed. This provides a new option in addition to the standard EM algorithm procedure and "brute force" methods whereby the likelihood is maximised vi either nlm() or optim(). Major difficulties include the sensitivity of all of the fitting methods to the starting values used, and the vexing problem of choosing the appropriate number of states in the underlying hidden Markov chain. In this talk I will discuss these difficulties, describe some of the challenges in working out the details for the new implementation of the Levenberg-Marquardt algorithm, and present some of the results of the data analysis.

Text Ranking based on time series and entropy

Hao Yu

University of Western Ontario

Text ranking is to find a relationship among a collection of documents in terms of some special rules or criteria. Ranking order can be represented from the highest quality document to the lowest quality one. Different ranking rules may lead to different ranking orders. In this talk, we will show how a text can be digitized into a time series based on a selection of keywords. Then we will use an improved DTW method as well as Entropy to rank texts. Their performance will be demonstrated through a specific set of Amazon question/answer data.