5 June 2024
Dr. Nikola Milanovic
Chief Technology Officer (OPTIMAL SYSTEMS)
Modern artificial intelligence (and/or machine learning) thrives on data. Larger and better training sets lead to larger models with more parameters providing better services. Traditional document management systems (DMS) in turn store decades worth of business data, very often enriched with structured metadata. This is a dream come true for every data scientist on the planet! However, these two worlds have largely ignored each other until now.
How does AI become “intelligent”, where does “intelligence” in “artificial intelligence” come from? Modern AI learns in much the same way children do: through supervised repetition, trial and error. Regardless of the model class, size or type, the learning process is pretty much the same and straightforward. In a very simplified form, it runs like this: The untrained model is first presented with a training set. Different forms of training sets exist. For brevity, let us assume supervised learning here, where the training set represents an input with an associated known output (result). There are also unsupervised learning methods, where the model learns “directly” from data. For example, if the model needs to learn to recognize apples and pears, it will be presented with several apples and will be told that those are apples. Then it will be presented with several pears and told that those are pears. In a typical DMS scenario, apples and pears would be document types, such as contracts or invoices. After that, the model will be presented with a validation set, which is used to track training progress. This set contains the same classes of input, that is, apples and pears. This input, however, has not been presented to the model before during the training; it is unknown, and the model sees it for the first time. It is expected that the model will now recognize any apple and any pear. This process is performed in several iterations by adjusting model variables (hyperparameters). The trick is: the more expressive (representative) the training set is, the more capable the model will become to distinguish apples and pears.
A popular alternative here would be to take an existing general-purpose model and then tune it to the problem at hand. Either by retraining it, or by extending it slightly and then training it so that it can solve a new problem (transfer learning).
So, to train a model in an optimal manner to accurately identify apples and pears, regardless of whether you use supervised, unsupervised, transfer learning or some other method, it is necessary to obtain both training and validation sets of high quality which cover not only standard examples but also edge cases. And in many cases a training set needs to be “annotated”, i.e., it is necessary to tell the model formally when it is being trained on apples and when on pears.
The holy grail of today’s data science is how and where to obtain the minimum required data sets for training and validation, which will give an adequate amount of accuracy to the model.
A document management system (DMS) is, simply put, a centralized storage facility for all electronic documents in a company. Documents are stored in logical structures (folders, registers), thus being electronically “filed”. They are also typed (given a class) and contain metadata (attributes). Instead of searching and retrieving documents in many different applications, users can query DMS quickly and run searches based on document content (full text), document type, and metadata, or all of those combined. On top of this search and retrieve functionality, a DMS will typically implement workflows, which enables enactment of business processes based on stored documents. Typical examples include invoice processing, HR management or contract management, and many more.
DMS is, at the same time, an almost endless source of highly accurate structured data (filing structures, document types and attributes or metadata) as well as unstructured data (full-text information or document content). Even more, a typical DMS will implement document preview functions, making documents viewable independent of the client application. Put this all together: DMS provides decades worth of business data, with exact metadata, content, and even graphic layout. This represents a perfect source of training and validation sets for AI! You can use metadata and content to train NLP (language) models and generated document previews to train image processing models, and you can even combine both. Think further: filing locations and completed workflows can even be used to train AI to automate parts of or entire business processes. But more about that in a moment.
But it actually works both ways: you can (re)use AI to speed up your DMS! Once you have trained the models based on your data, you can integrate them with a DMS to automate otherwise manual processes. For example, when importing a document into a DMS, you need to tell the system the document type and enter the document’s metadata. You can use AI to automatically classify and index all incoming documents, thus speeding up this process. Furthermore, you can train models to automatically find filing locations, so that no manual interactions are needed when creating new documents in a DMS: you can simply drag and drop a document, and AI will recognize its type, add metadata to it and even file it in the proper location in the system. Other types of processes can also easily be automated. Think about a scenario where, based on the incoming documents (letters, requests, complaints, tickets), a workflow is started, and a group of users need to pick tasks that they can (or should) work on. AI can eliminate this slow and inefficient process, by learning from past data and assigning workflows to individual users automatically.
More advanced scenarios can be implemented, too. For example, a model can be trained to summarize several documents from the result list. This enormously speeds up the process of deciding if the result set is relevant or needs to be reduced or expanded. There are also models which can check documents based on domain knowledge, such as checking the consistency of a contract. Finally, pretrained large language models can be integrated into the user interface of a DMS and enable prompting instead of querying the system traditionally via search forms or full text.
DMS is a great tool for safe storage, structuring, and finding information within millions of documents that a company creates each year, and users are happy to use it for information retrieval and collaboration. However, no one really enjoys the process of document filing. It is tedious, time consuming and distracts users from their actual work—no one is hired to enter documents into a DMS. To solve this inherent issue of DMS usage, different rule-based approaches have been used until now, with mixed success.
But instead of manual or rule-based document filing, we can offer an integration between a DMS and AI. Assume a DMS is in place and several pretrained models are integrated as described above. New documents of unknown type and purpose are being imported into the DMS. This can be done by a human operator, on the client side, or in a batch import, on the server side. Several AI models (“agents”) can listen for the incoming documents and perform asynchronous processing. The first agent recognizes if a document needs to be optically recognized, and if yes, sends the document to the OCR service. In the next step, the classifying agent determines the type of the document. Following this, the indexing agent assigns metadata to each document. The location agent searches for possible appropriate locations and archives the document automatically. Finally, depending on the document type, the forwarding agent determines if a workflow needs to be started for a document and who the proper recipient is for the task. Thus, a very cumbersome and error prone manual work of importing, classifying, indexing, filing a document and starting a business process is completely automated by using several independent AI models.
This is a very fair and relevant question. Transformer/Large Language Models (LLM) and ChatGPT as their most prominent example can provide some amazing answers and insights when prompted correctly. The first reason why you may consider training and serving your own model is pricing, in particular for massive document processing, such as in the case of input management.
Just as important is the fact that ChatGPT or other generally available LLMs will not give you a probability regarding the correctness of their answers. For example, if you ask ChatGPT to extract header data from an invoice, it will answer with extracted metadata, but will not offer you any confidence level. Without a metric to estimate model confidence in the provided answer, you have to unconditionally believe the model’s output. The output is often correct, but the LLMs are also known to “hallucinate” answers, which is a major issue for any process automation, since starting an automated process based on incorrect data is very expensive and may negate the overall advantages of AI usage.
Lastly, since the output of ChatGPT and similar models is optimized for interaction with humans, they have difficulties providing a stable, machine-readable format of inferred/extracted data (such as a JSON or XML structure with a strictly defined schema). This makes later machine processing and use of the extracted data difficult.
If you train your own model however, it can calculate the probability of the offered result being correct, for example that a recipient company was extracted correctly from an invoice with a probability of 0.95%. Now you can build thresholds into your DMS-AI integration: if the model performs above the threshold, you can accept the AI output and automate the process further. If the model performs below the threshold however, you can insert human interaction in the loop to avoid starting an automated process based on incorrect data, which is very costly.
One more benefit is obviously the possibility to train the model with your specific data, to which ChatGPT has no access. Finally, a hybrid approach may be applicable in some situations. For example, you can train the model based on your data. If the model performs (slightly) below the threshold, you can query ChatGPT (or a similar pretrained model) with the same input. If the outputs match, you can assume with a high confidence that both models predicted correctly. Basically, you can implement a kind of hybrid voting.
DMS companies such as OPTIMAL SYSTEMS have started to provide a set of pretrained, optimized AI models that can be used for classification and metadata extraction. They address all issues that we observed when general purpose models are used:
Both of which are major advantages if you plan to controllably automate DMS processes with AI.
Thus, if you already have implemented a DMS solution, you should consider enriching it with AI. You can train various models on your own data, because you have the perfect training set to automate your processes.
If you are considering introducing a DMS, then you should do it with AI assistance, as it has never been easier to implement a DMS. We can offer pretrained models which provide the perfect basis for getting started with DMS usage and can support you when importing or migrating documents. That way, you can easily “discover” what kind of documents actually you have. Afterwards, you can further train the models with your own data to improve precision or automate processes.
Whichever way you go, you will soon find out for yourself that DMS and AI are a match made in heaven.
Do you have any further questions?