Statistical modelling of heavy-tailed stock returns
Taehan Bae
University of Regina
This study aims at predictive modelling of heavy-tailed stock returns under a mixture model setting. A
robust regression method is used to fit the main body, while the peaks-over-threshold method is employed to model
the tails. For the estimation of the tail parts, the Bayesian maximum a posterior estimation with conjugate priors
is used to smooth the maximum likelihood estimates (MLEs). This filter tuning process provides stability and
efficiency in computation and prediction. Several constrained, non-convex optimization problems have been
converted to unconstrained, convex problems by quadratic approximation and variable changes. The approach is
applied to a large, multi-period, unbalanced data set of daily returns of global stocks, containing nearly 100,000
records. Out-of-sample prediction results show the outperformance of the smoothed estimates over the MLEs.
I am indebted to Reg Kulperger for many of my problems
John Braun
University of British Columbia
In this talk, I will start with a brief history of my experience of
Reg as supervisor, colleague and collaborator. I will then survey
the breadth of problems that Reg and I worked on over the years, spanning
cluster point processes, branching processes, interacting particle systems,
bootstrapping, Fourier analysis, density and intensity estimation.
Modelling rainfall over the catchment of the blue nile
A.H. El-Shaarawi
Cairo University, Department of Statistics, Cairo, Egypt
National Water Research Institute, Burlington, ON, Canada, L7R 4A6
Life of Millions of people in Egypt, Sudan and Ethiopia depends on the Blue Nile's water for agriculture, industry
and domestic use. The downstream nations, Egypt and Sudan, are concerned about the impact of constructing the
Grand Ethiopian Renaissance Dam (GERD) across the Blue Nile River in Ethiopia on their traditional water share.
The objective of this paper is to analyze the historical rainfall data, which is collected from more than 100
sampling stations within the River's watershed on the Ethiopian Plateau. The intention is to develop a model for
predicting the changes in the discharge to the Nile from the Blue Nile as a result of the changes in yearly
rainfall yield. The importance of this prediction is related to the Ethiopian management of filling the huge GERD
reservoir whose capacity equals more than a year's flow of the Blue Nile. The model takes into account the effects
of El Ni\tilde{n}o and La Ni\tilde{n}a events where the first is associated with low rainfall while the second is
associated with more rainfall.
Decomposing Variability in Environmental Monitoring Data
Sylvia Esterby
University of British Columbia
Temporal records of environmental variables often display periodicity or oscillations. When modelling
such data sets for purposes of detecting temporal trends or attributing components of variability to
other factors, various methods are used to account for periodicity or oscillations. For river monitoring
data collected at a monthly frequency, methods which block on season or methods which explicitly model
the form of seasonal change as a time varying function could be used. A consistent theme is the
decomposition of the variability. A number of cases will be reviewed. Examples of variables include
water quality and quantity, air pollutants, and tree rings. Methods include modelling oscillations with
Fourier series, wavelets and LOWESS and, in cases where monitoring locations are grouped as similar,
random forest models and functional data clustering.
TBA
Peter Guttorp (Thursday PM - Remote)
University of Washington
Bootstrapping the empirical distribution of a stationary process with change-point
Gail Ivanoff (Thursday PM)
University of Ottawa
When detecting a change-point in the marginal distribution of a stationary time series, bootstrap
techniques are required to determine critical values for the tests when the pre-change distribution is unknown.
In this talk, we propose a sequential moving block bootstrap and demonstrate the asymptotic behaviour of the
bootstrapped empirical process under both converging and non-converging alternatives. We avoid any assumptions of
mixing, association or near epoch dependence. These results are applied to a linear process with heavy-tailed
innovations.
Coupling and weak convergence
Richard Lockhart
Simon Fraser University
I will chat about a strategy Reg, Peter Guttorp, and I used for proving
weak limit theory for one process by coupling it to another easier one.
I expect to define the terms, talk about Reg, and leave you guessing
about what we actually did.
On the relationship between data sharpening and Firth's adjusted score function
James Stafford
University of Toronto
In a series of seminal papers [Choi and Hall (1999); Choi et al. (2000);
Doosti and Hall (2016)] data sharpening is shown to reduce the bias of non-
parametric estimators for regression and density estimation. In this paper we
give a common framework for these techniques and show that they can be derived
directly from an adjusted score function of the type given by Firth (1993). When
interest lies in derivative estimation for non-parametric regression regression we
show this approach leads to a new data sharpening technique.
Calculating wait time 1 from billing data
David Stanford
University of Western Ontario
The Segment 1 wait time (“Wait Time 1”), defined as the time between the
date of referral and the date of the initial specialist consultation, is
an important component of total patient wait time. The Saskatchewan
Ministry of Health wanted to explore the viability of a method to
measure Wait Time 1 based upon billing data. This study considers three
different approaches to identify Wait Time 1 from administrative billing
data. The prime advantage in using such billing data is that it is
readily available. The main disadvantage is that the actual referral
date is not generally indicated in the general physician (GP)’s billing
data and the patient may have seen the GP multiple times prior to the
specialist visit, so that statistical methods are needed to ascertain
which was the referring event. To address the difficulty of identifying
the referral date, the Ministry introduced the “55B” GP billing code in
April 2012. The new billing code is identical to the “5B” which is used
by GPs for partial assessments and indicates, among multiple visits,
which GP visit resulted into referral to specialist. This study also
examines whether the current sample size of 55Bs is sufficient to
accurately measure Wait Time 1.
Treatment regimes and social networks
Mary Thompson
University of Waterloo
Medical treatments are usually administered to the individual for whom the outcome is defined. In
contrast, preventive health or behavioural interventions are often administered taking network structure into
account. The evaluation of these latter types of treatment regimes would look to optimizing the expected outcomes
for the network, subject to cost constraints. We consider examples of evaluation frameworks inspired by some of
the literature on causal inference.
Hidden Markov models, blood counts, psychosis, probability
tables and the Levenberg-Marquardt algorithm.
Rolf Turner
University of Auckland
It is conjectured that there is a relationship between
counts of blood cells (monocytes) and severity of psychosis in
an individual. Data are available in the form of observations
made on a fairly large number (1258) of individuals. The psychosis
observations consist of (subjective) assessments made by physicians,
of severity on a 0 to 4 scale (0 = no symptoms, ..., 4 = severe
psychosis). The cell counts are of course intrinsically discrete,
but with a large range, so they have been further discretised on
a 1 to 5 scale in a somewhat ad hoc fashion.
There is no appropriate parametric distribution for such discrete
data, so the emission probabilities in a hidden Markov model to
be fitted to the data must be specified via tables (one table for
each state of the underlying Markov chain). These "non-parametric"
models are conceptually simple but entail the estimation of awkwardly
large numbers of parameters (!!!). I have approached the analysis in
two ways. (1) Fitting bivariate models (and testing for dependence
of the two components. (2) Fitting models to the monocyte counts
and then using the fitted values from these models as predictors
in models to fitted to the psychosis ratings. Both approaches
required substantial development of both methodology and software.
In an effort to improve the performance of the model fitting
procedure I adapted, to the current context, an implementation
of (a version of) the Levenberg-Marquardt algorithm that I had
previously developed to handle settings in which the emissions are
Poisson distributed. This provides a new option in addition to the
standard EM algorithm procedure and "brute force" methods whereby
the likelihood is maximised vi either nlm() or optim().
Major difficulties include the sensitivity of all of the fitting
methods to the starting values used, and the vexing problem of
choosing the appropriate number of states in the underlying hidden
Markov chain. In this talk I will discuss these difficulties,
describe some of the challenges in working out the details for
the new implementation of the Levenberg-Marquardt algorithm, and
present some of the results of the data analysis.
Text Ranking based on time series and entropy
Hao Yu
University of Western Ontario
Text ranking is to find a relationship among a collection of documents in
terms of some special rules or criteria. Ranking order can be represented
from the highest quality document to the lowest quality one. Different
ranking rules may lead to different ranking orders. In this talk, we will
show how a text can be digitized into a time series based on a selection
of keywords. Then we will use an improved DTW method as well as Entropy to
rank texts. Their performance will be demonstrated through a specific set
of Amazon question/answer data.