November 22, 2021
Using the Right Tools: File Storage, Object Storage, and Intelligent Content Management
By Mark Sowul
File and object storage are those deceptively simple tasks in a software project — it seems straightforward at first (“just store it in the database” or “just use S3”). But then real-world requirements start piling up: permissions, client data segregation, searching, auditing, retention, and more. How much will it cost to scale your solution to production levels of users and documents? How does the system handle tens of thousands or even millions of documents?
Object storage APIs help, but they also bring other challenges. Some APIs are inflexible and limit your ability to integrate your own logic. Other APIs require you to store your data in the cloud, which you may be unwilling (or unable) to do.
yuuvis® Momentum is a document repository that offers extended features for building content and information management solutions, handling all of the file storage concerns of a document management system and more. This modern, cloud-native technology stack supports public cloud, private cloud (including on-premises deployment), and hybrid cloud, enabling you to use the same stack everywhere, flexibly deploy your solution on your own terms, and cost-effectively scale to the most intense workloads.
Let’s look at what makes object storage useful and why choosing yuuvis® Momentum adds additional value over a simple object storage services such as Amazon S3.
First, some background: what is object storage? As a simple example, consider a photo saved on your hard disk. Instead of storing a file as a piece of data at a specific location — C:UsersJohnDesktopIMG_0712.JPG — object storage associates each file with a unique ID for retrieval. The system is then free to store that data however and wherever it sees fit.
For example, it might distribute files across multiple disks, allowing the system to grow (more hardware can be added almost seamlessly since the file is accessed by ID, rather than location). Data might be stored in multiple places to enable more fault tolerance (and quicker access across the globe). Data that isn’t accessed for a long time might be transparently moved to cheaper, slower storage.
Object storage techniques are what enable cloud providers to offer services that are so cost-effective and highly scalable. Additionally, that photo described above consists of data and metadata. The data is the actual image itself, but there is other information: when it was taken, the camera model, photo tags, and other similar information about the picture. That is metadata.
Object storage stores and treats files as objects, where the metadata is a first-class component of the object, along with the actual data. Importantly, object storage lets you access the metadata without needing to download the entire file. It can also preserve each different data revision (for example, if the photo is cropped or enhanced).
Object storage really shines when handling a large amount of data that mostly doesn’t change. This category may include more use cases than you think. Imagine a document that needs to be approved and published. Once the document has been uploaded into the system, approving and publishing the document wouldn’t change the contents of the document itself. Whether it is approved and published would likely only change the metadata.
In fact, it may even be a requirement that the document not change. When you start adding requirements such as versioning and retention, this behavior of object storage becomes valuable. Even if the document is revised, you can preserve its different versions over time. If the retention policy requires the document data itself to remain unchanged, you can still capture and update its status, use, or other business-relevant information in its metadata.
You might recognize Amazon’s S3 product as a common example of object storage. Why not just use that? S3 is a useful cloud storage offering with a few of the features described earlier. It is very cost effective and easily scales to very large amounts of data. Among other benefits, it allows for files to be replicated across multiple availability zones for speed and reliability, and it has built-in storage tiering (archiving old objects to cheaper storage).
Nevertheless, S3 is still, at its core, just a place to store data. If you want to query and search those files effectively, or apply logic to files, you need to implement that separately. In contrast, yuuvis® Momentum has a sophisticated metadata and type system and tackles far more than just storing data. Imagine you have an invoice, for example. The invoice itself is just a document, but it might require a sign-off from the finance team, and you would want to keep track of whether or not it has been paid. That information could be stored seamlessly alongside the invoice itself (as metadata).
Eliminating the need to build this out yourself, yuuvis® Momentum provides the ingredients to integrate such a system. It could treat the invoice as a first-class object type, along with the tracking information. It even offers workflow integration with Flowable to implement business process management.
Imagine you have not just one invoice, of course, but thousands. The advanced querying facilities in yuuvis® Momentum, for example, enable easy searches for all invoices that haven’t been paid, or all invoices approved by a particular employee. An integrated, full-text extraction and query service also enables you to search through the full content of each document.
An additional feature of yuuvis® Momentum is that all the above can be handled in multi-tenant fashion: in other words, you can ensure all data for each separate customer (or department) is kept separate, while still deploying the same object definitions and logic (schema). At the same time, multiple applications (schemas) can run on the same instance (a single instance can scale up to 109 objects). Schemas can also be tenant-specific.
Developing on a platform that already offers necessary features and integrations saves valuable time and effort. In addition to the examples above, yuuvis® Momentum handles authentication and authorization via OAuth2, another important advantage. Developers already using S3 will benefit from their familiarity with the S3 protocol.
OAuth2 integration allows users’ application sign-in information to flow through to yuuvis® Momentum as well, and OAuth2 support provides broad compatibility with common single sign-on (SSO) providers like Active Directory. From there, multi-tenanted role-based access control can lock down read, write, and delete access through the yuuvis® Momentum APIs.
That said, any time developers leverage existing third-party libraries that they didn’t build themselves, they run the risk of the library not supporting what they want to do. Fortunately, yuuvis® Momentum enables injecting server-side business logic using system hooks (webhooks and messaging).
Finally, web developers will also appreciate the Angular reference client and user interface (UI) components. This provides ready access to authentication, search, and upload functionality, for example. The reference client showcases various yuuvis® Momentum functionality and access patterns and can help you deliver your solution more quickly.
Aside from developer productivity, yuuvis® Momentum provides business benefits. If you’re not ready to commit all your processing to the cloud, yuuvis® Momentum can run on your own hardware instead of in the cloud — or some combination of both local datacenter and cloud storage.
You can seamlessly move yuuvis® Momentum from on-premises to the cloud (Amazon, Azure, or Google, for example) should you decide to transition later, or as your solution requires increased scalability. Hosting on the cloud can offer significant savings on infrastructure costs, even taking into account the increased costs of traffic.
Archiving and retention features send old data elsewhere for archival. The tool supports S3-compatible object stores as well as the file system. Object metadata controls an object’s archive path. If necessary, document locking ensures retention until a certain point in time, meeting legal and regulatory requirements without wasting storage space. An audit trail also logs user activity on each document, and there is a built-in archive integrity check.
From a technical perspective, yuuvis® Momentum’s scalability handles a vast set of documents. Meanwhile, automated content analysis and full-text extraction (with a rich search function across metadata and object text) empowers users to efficiently handle these large data sets.
Additionally, the server integrations and events we described earlier allow yuuvis® Momentum data stores to interact with your other applications seamlessly. By providing this depth and insight into your data stores, as well as layering your business process across them, you can unlock your data’s full potential.
Object storage provides a flexible, rich, and scalable platform for your large data sets, especially for archival or data warehousing style operations. By layering commonly-used functions, and enabling data spread across private and public clouds, yuuvis® Momentum levels up data storage beyond a dusty archive.
This is not to say object storage or yuuvis® Momentum are the right choices for all projects. Because of yuuvis® Momentum’s many powerful features, it is complex with many moving parts and a significant learning curve. The power of its flexible deployment options also comes with the obligation to deploy and maintain it, involving experience with Docker, Kubernetes, Elasticsearch, and other technologies.
That said, many organizations already use these technologies, and to many developers, the excuse to use these cutting-edge cloud-native tools is further incentive to adopt yuuvis® Momentum.
If your data sets are outgrowing your current storage solution, you want to manage your documents more cost-effectively, or you want to better leverage your current cloud-based object store, check out yuuvis® Momentum and its API to access powerful document-handling tools.