Principal Component Analysis in Financial Data Science

Skup: Principal Component Analysis

Izdavač: IntechOpen

DOI: http://dx.doi.org/10.5772/intechopen.102928

Stranice: 1-25

Link: https://www.intechopen.com/online-first/80983

Apstrakt:
Numerous methods exist aimed at examining patterns in structured and unstructured financial data. Applications of these methods include fraud detection, risk management, credit allocation, assessment of the risk of default, customer analytics, trading prediction, and many others, creating a broad field of research named Financial data science. A problem within the field that remains significantly under-researched, yet very important, is that of differentiating between the three major types of business activities - merchandising, manufacturing, and service based on the structured data available in financial reports. A reliable method for fuzzy classification of business entities into the three categories of activities would assist numerous other methods in achieving their tasks. It can be argued that, due to the inherent idiosyncrasies of the three types of business activities, methods for assessment of the risk of default, methods for credit allocation, and methods for fraud detection would all see an improved performance if reliable information on the percentage of entities’ business activities allocated to the three major activities would be available. To this end, in this paper, we propose a clustering procedure that relies on Principal Component Analysis (PCA) for dimensionality reduction and feature selection. The procedure is presented using a large empirical data set comprising complete financial reports for various business entities operating in the Republic in Serbia, that pertain to the reporting period 2019.
Ključne reči: data science, principal component analysis, random forest algorithm, financial data, financial reporting
Priložene datoteke: