Exploring the Potential of LLM Technology to Create Subject-Oriented Process Models from Natural Language Texts

Motiavation

Working with or as part of modern-day socio-technical information systems increasingly comes with the need to understand the processes to be executed or supported by them. A way to analyze and discuss and design these processes is via the creation and use of process models. Process models are usually created as part of business process management efforts and/or as part of requirements elicitation activities for information systems design and often involve the stakeholders or process natives responsible for the job and in possession of intrinsic knowledge about the process. One aspect of the challenge of process modeling is the communication with stakeholders and human interaction. Another is the actual and correct incorporation of the acquired knowledge into process models. Traditionally, creating process models usually requires specialized knowledge of the semantic and syntactic rules of \textbf{business process modeling languages} and technical knowledge about \textbf{general graphical modeling software} like \textbf{MS Visio} or \textbf{specialized modeling environments} like Signavio or Camunda. But even when those are available, the model creation process is manual, time-consuming, error-prone, and requires modeling experts to mitigate these risks. However, experts are not always available and even then especially initial model creation or translating from the human-given information to formal structures is time time-consuming and tedious. Automation or at least support in this area could help professionals in terms of speed and beginners in creating semantically and syntactically correct models faster.

In this regard, Large-Language Models (LLM) as part of the current research field of Natural Language Process (NLP) are an interesting technology with the potential to allow human users to describe their processes as \textbf{simple natural language text} in order to express their needs and requirements. The hypothesis to be investigated here is that a system can be created that subsequently allows the automatic generation of \textbf{structured formal process models}, thus reducing manual modeling effort, with the assumption that systematic diagrams of complex process systems will still be necessary in the future for the aforementioned efforts.

 

Theoretical Background

Subject Oriented Paradigm

Subject-Orientation is an interesting yet not truly widely spread modeling paradigm for complex process systems. It is conceptually based on the fundamental elements of natural languages. At its core, it simply requires the strict separate consideration of subjects (active entities), (data) objects, and verbs/activities. With the Parallel Activity Specification Schema (PASS), there is at least one specialized formal process modelling language for subject-oriented process descriptions. These simple concepts encourage a shift towards a decentralized perspective on processes and could potentially even simplify and support the generation of PASS models from natural language texts using LLM technology.


Large Language Models

LLM is the abbreviation for  "Large Language Model" - Artificial Intelligence (AI) models trained on massive amounts of text data to understand and especially generate human-like natural language texts. These models use various techniques from machine learning, like deep learning or specifically transformers, to perform tasks like answering questions, summarizing text, writing code, and even generating structured data.

Research Objectives and Questions

The general goal is: "Analyzing possibilities for interactive subject-oriented models creation using LLMs" and the following are the potential questions to be answered in this Master’s thesis:

 

  •    Can LLMs be used to generate subject-oriented models?
  •      How do we make existing LLMs do the specific task intended here? What are possible strategies to achieve that?
  •     What are requirements for such a system?
  •     How to engineer good prompts for the LLM for the intended task?
  •     What will be the output format of the tool that will automate subject-oriented model creation?
  •   What will be the output format from LLM prompt engineered for such a task?
  •    Which implementation language/framework should be used or what drawbacks or advantages have different solutions?
  •    Which LLM is to be used and why?
  •     How do open-source and closed-source (proprietary) LLMs compare? Which option is better?
  •    What will be the hosting options for the tool?  Specifically, how do self-hosted and cloud-hosted (free hosting) solutions compare?
  •    What could be a feasible architecture for an according system?
  •     Is it feasible to prompt-engineer some LLM to do the intended task? How good can the according results be?