The goal of this study is to overcome the identified methodological limitations of prior studies aimed at predicting the type of auditor opinion and draw definite conclusions on the relative predictive performance of different predictive methods for this particular task. Predictive performance of twelve candidate models from the realms of statistics and machine learning is assessed separately for the two common real-life scenarios: a) when prior information on the client (i.e. types of audit opinion received in the past) is available and can be used in prediction, and b) when such information is not available (e.g. new companies). The results show that, in the first scenario, several methods from both realms achieve comparable predictive performance of around 0.89, as measured by the Area under the curve (AUC). In the second scenario, however, machine learning algorithms, particularly tree-based ones, such as random forest, perform significantly better, achieving the AUC of up to 0.79. Finally, we develop and assess the predictive performance of two hybrid models aimed at combining the strong points of both statistical (i.e. interpretability of results) and machine learning (i.e.handling a large number of predictors and improved accuracy) approaches. The complete procedure is demonstrated in a reproducible manner, using the largest empirical data set ever used in this stream of research, comprising 13,561 pairs of annual financial statements and the corresponding audit reports. The procedures described in this study allow audit and finance professionals around the globe to develop and test predictive models that will aid their procedures of audit planning and risk assessment.
Ključne reči: ensembles; guided regularized random forest; random forest; generalized linear mixed models; financial reports; auditor opinion