Skip to content

Example: Ingest an mbox Archive

Use this example when you already have a mailbox archive on disk.

An mbox file is a local mailbox file that can contain many messages. It is often created by an export tool or another local mail program.

An mbox file is not the same thing as IMAP receive. Use IMAP Receive when MailAtlas should connect to a live mailbox and fetch selected folders.

Terminal window
python -m pip install mailatlas
export MAILATLAS_HOME="$PWD/.mailatlas"

If you need sample fixtures:

Terminal window
git clone https://github.com/mailatlas/sample-data.git
Terminal window
mailatlas ingest sample-data/fixtures/mbox/atlas-demo.mbox

MailAtlas iterates over each message in the archive, parses it, preserves source metadata, extracts assets, deduplicates records, and writes the results into the local filesystem plus SQLite workspace.

Expected output shape:

{
"status": "ok",
"ingested_count": 5,
"duplicate_count": 0,
"document_refs": [
{
"id": "<document-id>",
"subject": "<subject>",
"source_kind": "mbox",
"created_at": "<timestamp>"
}
]
}
Terminal window
mailatlas list
mailatlas get <document-id>
mailatlas get <document-id> --format json --out ./mbox-message.json

If the same message appears more than once, MailAtlas deduplicates by message_id when present and falls back to a normalized content hash otherwise.

A nonzero duplicate_count is expected when an archive overlaps with messages already stored in the workspace.

Use mbox ingest when you have a mailbox export, want repeatable local parsing, want to build a retrieval corpus from an archive, and do not need MailAtlas to connect to a live mailbox.

Use IMAP receive instead when messages still live in a mailbox and should be fetched over IMAP.