Design and Prototypical Implementation of a Large-Language-Model-Based Solution for Automating Invoice Processing in ECM Systems

  1. Initial Situation

In many organizations, the processing of incoming invoices is handled within Document Management Systems (DMS) or Enterprise Content Management (ECM) solutions. Traditionally, these systems rely on position recognition and template-based extraction methods to capture relevant invoice data. While these approaches have been widely used, they often lack flexibility when invoice layouts vary and require extensive maintenance, which is especially prevalent with large organisations that receive large quantities of invoices from numerous different partners daily.

Recent advances in Artificial Intelligence, particularly through the development of Large Language Models (LLMs), have introduced new possibilities for natural language understanding and data extraction from semi-structured documents. Unlike rule-based systems, LLMs can interpret varying invoice formats more flexibly and may increase automation rates while reducing manual intervention.

The company hosting this thesis project intends to explore these opportunities by creating a pilot software and, depending on the advancement, testing it in a customer-facing [MM1] [LO2] project. The goal is to conceptualize, implement, and evaluate an LLM-based approach to invoice processing within ECM systems, and to compare it to existing solutions based on position recognition.

 

 

  1. Research Goal/Research Questions

As stated above, the general goal of this thesis is to design and implement a prototype of an LLM-based invoice processing solution in the context of an ECM system, evaluate its initial performance, and assess its potential benefits and challenges compared to traditional position recognition methods.

Possible research questions:

  1. How can an LLM-based approach be integrated into the invoice processing workflow of an ECM system?
  2. What are the main architectural components required for such a solution?
  3. How does the flexibility of the LLM-based extraction compare to traditional position recognition approaches when dealing with varying invoice layouts and what initial recognition accuracy does the prototype achieve?
  4. How does the performance of LLM-based extraction compare to traditional position recognition approaches in terms of accuracy and flexibility?[LO3] (if possible) 
  5. What technical challenges arise when implementing an LLM-based invoice extraction prototype?
  6. What are the economic implications of replacing or complementing position recognition with LLM-based solutions, especially considering the configuration costs of each system?
    1. How does the initial configuration effort (time and cost) required by the LLM-based approach compare to traditional position recognition when preparing to process a representative quantity of existing invoice layouts?
    2. How does the ongoing configuration effort required to integrate a new invoice layout (from a new supplier or partner) differ between the LLMM-based solution and the traditional position recognition approach?[LO4] 
  7. Maybe: To what extent can process quality (e.g., automation rate, error reduction, user perception) be improved by using LLMs?
  8. Which risks and limitations need to be considered when deploying LLMs in enterprise environments (e.g., data privacy, scalability, reliability)?
  9. How can the findings of this pilot project inform future research or practical developments in intelligent document processing?

 

 

  1. Planned Method + Planned structure

Planned method:

  • Literature review on ECM, invoice processing, position recognition, and LLM theory and applications in document processing.
  • Conceptual design and modelling of an LLM-based invoice extraction architecture.
    • (documentation/description of design/architecture using Subject-Oriented Modelling Means)
  • Prototypical implementation of the solution in collaboration with the company’s ECM system.
  • Maybe:
    • Empirical evaluation through test cases with real or anonymized invoice data.
    • Comparative analysis of test cases against the current position recognition approach.
    • Comparative analysis of configuration expenses against the current position recognition approach
  • Discussion of process quality, technical feasibility, and economic implications.
  • Conclusion with recommendations and outlook.

Planned structure (preliminary):

  1. Introduction
  2. Theoretical background (ECM/DMS, invoice processing, position recognition, LLMs)
  3. Conceptual design of the pilot solution
  4. Prototypical implementation
  5. Evaluation of the prototype: Absolute accuracy and flexibility
  6. Comparative analysis: Cost of configuration and economic implications
  7. Conclusion and outlook

 [MM1]Customer-facing: in the sense of facing the customer that buys the invoice handling software, or customer-facing in the sense of the customer receiving invoices in general?

 [LO2]Customer-facing in the sense of facing the customer that buys the invoice handling software.

 [LO3]Point 3 was created as an alternative to point 4 as the ‚legacy‘ systems do not hold the necessary data for such a comparison of accuracy. Therefore, only one point should be picked.

 [LO4]Does the point come accross that this questions aims to compare the cost of configuration of both approaches, in the hopes of decreasing the configuration workload with an llm-based approach? Do sub-questions a) and b) provide more clarity in this regard