Credit default prediction from user-generated text based on deep learning
Abstract: Digital technologies produce vast amounts of unstructured data that can be stored and accessed by traditional banks and fintech companies. Existing literature on the topic indicates that certain aspects of this unstructured data can be valuable for decision-making regarding the acceptance and pricing of credit contracts. Both practitioners and academics are interested in understanding the value of this information and how to exploit it for credit risk predictions. We employ deep learning techniques to extract credit-relevant information from user-generated text on the peer-to-peer platform Lending Club. Our results confirm that even short pieces of user-generated text can improve credit default predictions significantly and generate substantial additional profit for lenders. We benchmark four deep neural network architectures and more traditional approaches (machine learning and rule-based text characteristics) to retrieving credit-relevant information from text. Average embedding neural networks, convolutional neural networks, and recurrent neural networks achieve similar prediction quality while outperforming convolutional recurrent neural networks. Deep learning models achieve better results than traditional approaches in almost all cases; in traditional approaches, spelling mistakes are particularly informative.
Short Bio: Johannes Kriebel is an assistant professor at the Finance Center Münster. His work focuses on topics related to credit risk and machine learning as well as digital transformation in financial service providers.