As illustrated by yesterday’s case law post, dupe identification is a big challenge. So, I’m excited about the EDRM Cross Platform Email Duplicate Identification Specification released last week!
Last Thursday, EDRM announced the new EDRM Cross Platform Email Duplicate Identification Specification released for public comment (available here). This specification provides a framework for identifying duplicates across multiple email platforms, allowing organizations to identify duplicate emails efficiently and effectively in a defensible and cost-effective manner. Currently no means of cross platform email duplicate identification exists, except to reprocess the data using a single vendor platform, often expending significant time and cost.
The solution is a simple, but effective approach which involves the use of the hash value of an email Message ID metadata field which is the EDRM Message Identification Hash (“MIH”). This new approach will not replace current email deduplication methods but will enable cross platform email duplicate identification.
The EDRM Duplicate Identification Project team states that it is “expected cross platform duplicate identification using the MIH will be applied to email data sets that have already been deduplicated using a vendor’s standard deduplication process” and “envisioned that the EDRM MIH will be an additional field that will be generated as part of the processing functionality in each vendor’s platform and used by recipients to further identify duplicates for collections with established EDRM MIH values.”
The 18-page Toolkit includes:
- EDRM Message Identification Hash (MIH) Specification (v1.0) is a succinct, technical specification with advisory notes geared to software developers and has been written for the target audience of vendors who are implementing the MIH in their platform. It defines a process to identify duplicate emails across disparate formats and forms of data employed in electronic discovery and disclosure.
- EDRM Email Duplicate Identification Guidelines (v1.0) is a non-technical reference for those who need to understand why and how to use the MIH. It outlines the objectives, methodology, potential use cases, advantages, and usage considerations of the Specification. These Guidelines are intended for use by those who want to use the MIH for cross platform duplicate identification, including parties and counsel, vendors and service providers and regulators and courts.
There’s also a four-page white paper written by Craig Ball, who also discusses in his excellent blog here the problem that the new specification is designed to address, what the solution entails, what’s exciting about it, how you learn more and so forth.
At the risk of sounding like famous pitchman Billy Mays, I’m not done yet! There’s also a terrific infographic (which succinctly identifies the purpose, benefits and six use cases of the EDRM MIH Specification), as well as data and utilities to support its use. All for the low price of just $19.99! Just kidding, it’s free. 😉
The fact that it’s addressing a significant issue (cross platform deduplication) while not requiring vendors to change the way they deduplicate email messages internally (only supplement with this approach) is impressive.
You can download the new EDRM Cross Platform Email Duplicate Identification Specification, the infographic and Craig’s whitepaper here. Do so ASAP as there is a public comment period until March 15 (you can provide any comments you have in that same form). As Brad would say: “learn it, know it, live it”!
So, what do you think? Are you wrestling with dupe identification, especially across platforms? If so, check out the new EDRM MIH Specification! Please share any comments you might have or if you’d like to know more about a particular topic.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.