AI in Archiving & Digital Preservation FAQs
We've put together some useful FAQs so you can confidently engage with stakeholders when it comes to talking AI in Digital Preservation
AI in Archiving & Digital Preservation - Practitioner Workshop Series
You control which subset of your archive AI should apply to using the Enrichment policy. This gives you granular control down to folder location, specific security tag, or both.
Preservica supports integration with AI in three ways: (1) AI tools embedded within Preservica's code and infrastructure, such as our OCR and PII detection tools. This helps more organizations benefit as they don't need to worry about their data leaving the system. We keep the tools up to date with their latest versions but these are not generative AI tools which is what people commonly think of as models. These tools are trained on content, but don't tend to have a constantly open feedback loop. (2) Integrating cloud-service based AI tools such as Image Analysis which you saw used in the demo during Session 1. In this case your data does leave the system, however it is not stored by the cloud service or used in training as there is no feedback loop. Images are analysed using AI that was trained on other datasets, and we don't currently support customization which would allow training on your own images. (3) Integrating your own AI - our extensible platform allows you to integrate any AI model, whether a cloud-service or your own AI, using webhooks. |
These features are currently in beta. As we get feedback through our beta program, we will launch them to general availability and expand access to more editions. Further updates are coming in the new year.
AI Workshop 2 explores how to engage stakeholders on AI policies, data security and emerging AI standards & regulations.
Preservica already provides automation within this workflow without AI. On an Enterprise private cloud version of our platform you can fully automate your preservation workflow - once you configure your Preservation policy, Preservica then keeps your content up to date with this policy forever. For example, anytime you make a change to your folder structure or security tags, Preservica checks the content to ensure it's still in line with the policy. You can also schedule other processes like integrity checks and even ingests. |
Environmental sustainability is a core priority for us. Preservica is the only Digital Preservation provider with an independently ratified Sustainability Charter, and the only one that measures and reports its greenhouse gas emissions annually through Small World Consulting. |
It doesn't at this moment but we are working on adding this to the interface in a future update.
The Image Analysis feature doesn't impact your storage as only metadata is added. |
There are limits for queries per edition. Additional features carry an extra cost. |
Our OCR tool is fully embedded within Preservica, so your data never leaves your system. We prioritize embedding AI tools natively within our architecture so that customers easily find and use them with minimal effort. |
Preservica provides a fully customizable system for metadata. You can easily create a metadata schema from scratch, including any custom fields, then populate it on your content individually or in bulk. |
No, your catalog metadata will be excluded. We appreciate that our catalog users want their catalog to remain the source of truth for their metadata, so we always ensure that no changes can be made to catalog metadata from Preservica. |
Yes, the outcome of transcription or captioning will be a transcript or caption file. This will be added as a content object to your existing video or audio asset so that when you go to view it, the transcript and subtitles render immediately as well. This means you will also be able to download the transcript or caption file from the asset page, on its own or together with the asset to use in another system. |
Yes - all the tools we use are maintained and updated. Not all AI tools require a constantly open feedback loop. You could train them on a dataset only once, or periodically, or you could update its models to newer or better trained ones. As we look at more generative AI features (like transcription or summarisation), a custom feedback loop becomes more important and we are exploring ways of making it possible. |
We have a more detailed audit trail available at the level of each asset in Preservica - this is available via the UI (only Classic interface currently has the full detail) and via APIs for fetching or exporting at scale. You won't need to ask for this information. |
As explained in AI Workshop 1, we do have an active AI roadmap and this is one of the areas we are exploring currently. |
Preservica already provides a very granular system of permissions, utilizing security tags, that you can assign to your content. When it comes to our current AI features that you've seen throughout this series, you have the ability to specify which content can be accessed by which AI features, so that you can protect any sensitive content (via your Enrichment policy). You can specify this at the level of a folder or security tag. These access controls are already in place. When it comes to platform-wide AI features such as semantic search, where another user is taking action with AI on your content, we are currently exploring additional ways of sandboxing the AI to ensure it only accesses content that specific user has permission to access. You are always in control of which AI features are enabled on your system - such as an admin-level switch for AI so those who don’t want any AI in their system can be assured that none is enabled. Individual AI features can be managed via the Enrichment policy by Manager-level users only. |
We control to what extent the AI is applied to ensure the integrity and authenticity of your records. Most of our AI tools are fully embedded within our infrastructure, they are containerised within their own service, and because of this, we are in full control of how they are implemented. Furthermore, the tools are only provided with the minimal information they need to produce a desired output without affecting the contents of the asset. For example, in the case of PII detection, we provide a copy of the full text contents of the asset, which is then checked for PII. Any PII that is found is added as metadata to the asset under a schema labelled 'PII Detection', so that the archivist can review and take appropriate action. |
Confidence levels are provided for each value generated by AI so users understand how confident the AI is in the conclusion made. Confidence levels are also available when searching, where relevant, allowing you to easily identify content where PII was detected with higher confidence (e.g. 85%+). |
The archivist is fully in control of the AI's processes on your content through our Enrichment policy. They can disable individual AI features anytime. The ability to review or override the AI can vary by feature, depending on the archivist's needs. Broadly they can review AI actions when viewing specific assets, through Search and audit trails (Event history and Process Monitor), and they can remove AI-generated metadata. |
Most of our AI features so far are embedded within Preservica's architecture so your data never leaves the system. Whether our features are embedded or using a cloud service, as is the case with the Image Analysis feature, they are always security hardened. Security is baked into everything we do - we have a dedicated InfoSec team and a clear process for getting their oversight over new features to ensure they are built to highest security standards in line with our certifications available at https://preservica.com/trust-center. If you do enable any AI features, you can fully control which content they each can be applied to so you can protect any sensitive content. |
AI features in digital preservation help archivists and record managers by automating time-consuming tasks such as metadata creation, classification, and quality checks; allowing them to focus on higher-value archival and research work. They reduce backlogs and improve consistency, which directly increases processing capacity without requiring additional staffing. The ROI comes from faster turnaround times, improved discoverability of records, and long-term efficiency gains in managing and preserving digital collections. |
Preservica's goal is to add AI features in a way that makes them easy and intuitive to use, focusing on the value they provide archivists and record managers. We provide on-demand video training on our online platform Community Hub, and 1-1 training is also available for specific product editions through our Training team. |
Yes - Preservica prides itself on having an extensible platform, and we already have many customers who are integrating their ArchivesSpace catalogs or using BitCurator along with Preservica. |
Got more questions? Resource Hub | Contact us