The hyperlinked files dilemma doesn’t start with collection. There’s a preservation nightmare of hyperlinked files that few are talking about.
eDiscovery has traditionally revolved around custodians. Back when most organizations stored data on-premise, you identified which custodians were responsive to the case. Then, you collected data stores for those custodians, including email, personal network shares, local drive(s) (including portable drives). That’s why the EDRM model has Preservation and Collection at the same level of the model – because you typically preserved by collecting.
Storing data in cloud-based solutions was supposed to change all that. The data is centralized, now we can index in place and preserve in place! No need to continue sending custodian-based corpora downstream for expensive processing, hosting, searching and review! We finally have an approach to combat rising data volumes in discovery!
Alas, what the cloud giveth, the cloud may take away.
Storing data in the cloud has enabled organizations to save redundant data by hyperlinking to files instead of embedding them in emails and messages instead of embedding them. Great for information governance, lousy for eDiscovery. Hyperlinked files have become the bane of our existence – regardless of whether we call them “modern attachments” or not.
We’ve seen cases where the handling of hyperlinked files has been disputed, including here, here, here, here, here, here and here. The collection of hyperlinked files is seen as more than burdensome – it’s seen as impossible in many cases. And courts are agreeing – at least for now.
But we’re making strides on the collection front. Microsoft has added a cloud collection capability that includes the ability to collect the contemporaneous version (i.e., the version that existed at the time the email was sent) of the files. I believe that Google will eventually follow and keep those versions and provide a capability too. We will eventually be able to collect the correct version of the file in the major platforms.
Assuming they’re there, that is.
Now, we get into the preservation nightmare of hyperlinked files. Take Microsoft 365. You can add custodians to a case in eDiscovery Premium, then choose the data locations for that custodian (e.g., Mail, OneDrive, SharePoint, Teams, etc.), then you configure your Hold settings to select custodians. It’s all documented here. Easy, right?
But there’s a problem. The data locations being preserved are only those for the selected custodians. So, if a custodian links to a file in one of their data locations and sends it to others, that file is put on litigation hold along with the email message linking to it. Great. But, setting the data locations for the custodian doesn’t do anything to put on hold the files sent to that custodian if they are hyperlinked from the sending custodian’s data location. The only way those files get put on hold is to put the data locations for every other custodian on hold too. Otherwise, those files could be lost through retention and destruction policies.
Needless to say (but I’ll say it anyway), most custodians receive files from considerably more senders than that. It’s impractical for just about every organization to put every potential custodian on hold.
So, what is the only way to fully preserve every file that the custodian receives? Collect the custodian corpus and include the files linked by those who send them to the custodian (assuming they’re still there). Collection is its own well-documented “can of worms”, but it’s theoretically possible. That means you’re back to collecting entire custodian corpora to ensure a complete hold. That’s the preservation nightmare of hyperlinked files.
That may be somewhat of an oversimplification of the issue, but it’s a real challenge. Solving the collection issue for mod, er, hyperlinked files doesn’t solve all the issues – they go further back than that.
Perhaps someday, these cloud-based solutions will reach out and extend the hold for a custodian to the files sent to them if they’re in the domain of the organization. In the meantime, I don’t see an end to the preservation nightmare of hyperlinked files. Expect us to discuss this issue on the EDRM webinar sponsored by Nextpoint on August 27th.
So, what do you think? How does your organization handle preservation in cloud-based solutions like Microsoft 365 where hyperlinked files are referenced? Please share any comments you might have or if you’d like to know more about a particular topic.
Image created using PlaygroundAI, using the term “robot experiencing a nightmare of various files floating around it”.
Hat tip to John Collins for his helpful discussion of the issue.
Disclaimer: The views represented herein are exclusively the views of the authors and speakers themselves, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.
Discover more from eDiscovery Today by Doug Austin
Subscribe to get the latest posts sent to your email.







[…] the continuing discussion about discovery of hyperlinked files, which are reflected in my post here about the preservation “nightmare” of hyperlinked files and my post here regarding the need for […]
[…] fact that relevant data may be stored in external applications linked to the platform (here’s an example of that). Furthermore, collaboration platforms commonly use hyperlinks instead of traditional […]