## Semester 2 |
|||

Date |
Presenter |
Title |
Abstract |

9 Aug |
Dr Katherine UylangcoThe University of Newcastle |
Exogenous factors and the incidence of criminality | [Click to show or hide] |

(From a joint article with Dr Paul Docherty) An individual’s propensity to commit crime can be considered a trade-off between the expected benefits and costs of the criminal activity (Becker, 1968). We examine this trade-off by undertaking an empirical analysis of whether variables representing the business cycle, law enforcement and demographics explain variance in the incidence of property crimes and crimes against the person. The impact of the weather on crime rates is also examined, as prior studies have suggested a positive relationship between temperature and violent crimes (Field, 1992). A comprehensive model that includes all of these exogenous variables is used to explain the variation in changes in property crimes and crimes against the person in an Australian context. We also extend upon the existing literature by developing a forecast model that performs well out-of-sample. The success of this model has important policy implications, as it allows law enforcement agencies to predict and prepare for future crime rates. |
|||

11 Oct |
Dr Patrick McElduffHunter Medical Research Institute |
Causal diagrams mediate variable selection in regression models | [Click to show or hide] |

Many researchers are often confronted with a situation in which they have a well-defined outcome and a large number of potential predictor variables. It is tempting for them to throw all the predictor variables into a regression model and/or to use stepwise procedures to select the most parsimonious model. As well as having low coverage probabilities this approach can cause spurious results. In this talk I discuss the use of causal diagrams (or Directed Acyclical Graphs) as a tool to assist researchers to fit more appropriate regression models. The talk will cover issues such as: adjusting for confounding; mediation analysis; proper adjustment in matched case-control studies; the potential for hidden confounding in surveys and cohort studies; and instrumental variables analysis. |
|||

25 Oct |
Prof John RaynerThe University of Newcastle |
Smooth Tests of Fit for Gaussian Mixtures | [Click to show or hide] |

(Thomas Seusse, John Rayner & Olivier Thas) Smooth tests were developed to test for a finite mixture distribution using two smooth models. Each has its own strengths and weaknesses. These tests are demonstrated by testing for a mixture of two normal distributions. Some sizes and powers are given, as is an example. |
|||

25 Oct |
Prof John RaynerThe University of Newcastle |
Extended analysis of partially ordered multi-factor designs | [Click to show or hide] |

(John Rayner, John Best & Olivier Thas) For multifactor experimental designs in which the levels of at least one of the factors are ordered we demonstrate the use of components that provide a deep nonparametric scrutiny of the data. The components assess generalized correlations and the resulting tests include and extend the Page and umbrella tests. |
|||

29 Oct |
Shanjeeda ShafiThe University of Newcastle |
Mathematical Methods for Molecular Drug discovery and biomarker identification: towards new paradigms of drug efficacy and safety | [Click to show or hide] |

High throughput docking of small molecule ligands (candidate drugs) into high resolution protein structures is now standard in computational approaches to drug discovery. A candidate drug is usually a small molecule (~50 atoms) which acts by modifying the metabolic activity of a protein. By current estimates, it costs more than $1.3 billion and takes 12-15 years to bring a new drug to market. Predicting druggability and prioritising certain disease modifying targets for the drug development process is of high practical relevance in pharmaceutical research. Druggability of a molecule is characterised in part by it satisfying Lipinski's rule Rule-of-five, which identifies several key physicochemical properties, such as mass and hydrophobicity, that should be considered for compounds with oral delivery (Lipinski and Hopkins, 2004), and by other characteristics, which determine whether a chemical compound can be orally active in humans. There is still much debate in the industry as to what constitutes a 'good' hit - that is, one that will remain rule-of-five compliant after optimization. This is very difficult to assess with hits derived from high throughput screening, as a significant fraction of the molecules may be interacting sub-optimally with the receptor. There is still no consensus as to what constitutes an acceptable lead in terms of potency, molecular mass and hydrophobicity. Another challenge is to identify regions of chemical space that contain biologically active compounds for given biological targets (i.e. proteases, kinases, calpains etc). Lipinski and Hopkins (2004) suggested that within the continuum of chemical space, there should be discrete regions occupied by compounds with specific affinities towards particular biological targets. The question of which variables (or coordinate systems) would facilitate such segregation, however, was not delineated in their seminal paper – and is as yet not determined. The main aim of this research is to create superior indices for the estimation of true binding affinity of ligands and to distinguish between high affinity binders and non-essential binders of proteins. New cut points will be considered for create a scoring function of violation for each predictor using considering different approaches such as Bayesian mixtures (BayesMix) and non-Bayesian hybrid method based on model-based clustering with discriminant analysis and also cutpoint methods. Recently an alternative score for violations based on Lipinski's 4 variables, but using different cutpoints (Hudson et al., 2012) categorized molecules as druggable, if they satisfied less or equal to 4 violations. Our results to date now suggest an improved cutpoint of 5. We shall develop new mixture based modelling (nonlinear and linear Bayesian and non-Bayesian) to identify poor and good drug candidates and form the basis to create constructs for visualizing these in chemospace. We are currently creating constructs for visualizing binders in chemospace. Visualization methods such as PCA and correspondence analysis show promise. Hybrids of SOMs with HMMs and mixtures with Dirichlet priors are yet to be investigated. This research has the potential to significantly reduce false classification of drugs and therefore improve drug design where an appropriate predictor set needs to be identified for new drug innovations. |
|||

22 Nov |
James TotterdellThe University of Newcastle |
Bayesian Hidden Markov Model for Homogeneous Segmentation of Heterogeneous DNA Sequences | [Click to show or hide] |

The advent of rapid DNA sequencing techniques has led to an exponential increase in the quantity of nucleotide sequences in online databases such as Genbank. In the face of such large sets of data statistical methods provide efficient means to screen for and identify structure within DNA sequences. Perhaps the most fundamental property of a DNA sequence is its base composition. It is known that the base composition of DNA sequences is not uniform. Models with homogeneous probabilistic structure do not adequately describe variation in base composition. Fluctuations in base composition can be better explained by alternating homogeneous domains, called segments. Segmentation models aim to partition compositionally heterogeneous domains into homogeneous segments which may be reflective of biological function. A number of different segmentation models have been proposed such as moving window, maximum likelihood estimation and recursive segmentation. Due to the latent nature of the segments a natural approach to segmentation uses hidden Markov models (HMMs). An HMM consists of an observed stochastic process the distribution of which is influenced by an underlying unobserved Markov chain. This thesis investigated the use of HMMs in DNA segmentation where parameter estimation was performed using Bayesian methods. The model to be considered was derived and algorithms that could be used for estimation were presented. Functions following these algorithms were implemented in R and were assessed through a simulation study. The functions were then used in the model estimation of real DNA sequences and the results were compared to past efforts in this area. |