Document Schema
MailAtlas stores documents with these core fields:
idsource_kindmessage_idthread_idsubjectsender_namesender_emailauthorreceived_atpublished_atbody_textbody_html_pathraw_pathcontent_hashmetadatacreated_at
Assets are stored separately with:
iddocument_idordinalkindmime_typefile_pathcidsha256
kind is inline for embedded assets such as HTML images and attachment for regular email
attachments extracted from the message.
Metadata
Section titled “Metadata”metadata carries parser notes and provenance:
provenance.is_forwardedprovenance.forwarded_chaincleaning.removed_forwarded_headerscleaning.dropped_line_countcleaning.stopped_at_footerparser_config.*source.kindsource.hostsource.foldersource.uidsource.uidvalidity
This lets downstream code inspect what MailAtlas changed instead of treating cleaning as a black box.
For IMAP-synced documents, source_kind is imap and metadata.source.* records the mailbox
folder and UID that produced the stored document.