Skip to content

Quickstart: Ingest and Export Email Files

This guide walks through the file-based MailAtlas workflow: create a local workspace, ingest .eml files, list stored documents, inspect one document, and export it.

Use this page when your input already exists as files on disk. If you want MailAtlas to connect to a live mailbox, use IMAP Receive instead.

By the end, you will have:

  • A local MailAtlas workspace.
  • One or more stored email documents.
  • Raw message files, cleaned text, HTML snapshots, and extracted assets when present.
  • A JSON export.
  • A Markdown bundle suitable for downstream AI or retrieval workflows.
  • Optional HTML or PDF exports.

You need:

  • Python 3.12 recommended, or another supported Python version from the package metadata.
  • A working MailAtlas install.
  • A local .eml file, or the MailAtlas sample data repository.
  • Chrome or Chromium only if you plan to export PDF.

Run this first if you have not verified the install:

Terminal window
mailatlas doctor
Terminal window
export MAILATLAS_HOME="$PWD/.mailatlas"

Check that the variable is set:

Terminal window
echo "$MAILATLAS_HOME"

If you already have your own .eml files, skip this step.

Terminal window
git clone https://github.com/mailatlas/sample-data.git

The public sample-data repository includes the fixtures used below.

Ingest one message:

Terminal window
mailatlas ingest sample-data/fixtures/eml/atlas-market-map.eml

Or ingest several messages:

Terminal window
mailatlas ingest \
sample-data/fixtures/eml/atlas-market-map.eml \
sample-data/fixtures/eml/atlas-founder-forward.eml \
sample-data/fixtures/eml/atlas-inline-chart.eml

Expected output shape:

{
"status": "ok",
"ingested_count": 3,
"duplicate_count": 0,
"document_refs": [
{
"id": "<document-id>",
"subject": "Regional freight signals tighten in the Midwest",
"source_kind": "eml",
"created_at": "<timestamp>"
}
]
}

Copy one returned id. You will use it as <document-id>.

Terminal window
mailatlas list

Use this command whenever you need to find document IDs in the current workspace.

Terminal window
mailatlas get <document-id>

A stored document includes fields such as:

{
"id": "<document-id>",
"source_kind": "eml",
"subject": "Port dwell times normalize after weather disruptions",
"sender_email": "[email protected]",
"body_text": "<cleaned text>",
"body_html_path": "html/<document-id>.html",
"raw_path": "raw/<document-id>.eml",
"metadata": {
"cleaning": {
"dropped_line_count": 0
},
"provenance": {
"is_forwarded": false
}
},
"assets": [
{
"kind": "inline",
"file_path": "assets/<document-id>/001-route-heatmap.svg"
}
]
}

When MailAtlas extracts a regular file attachment, the same assets array uses "kind": "attachment" and stores the file under assets/<document-id>/....

Terminal window
mailatlas get <document-id> \
--format json \
--out ./message.json

Use JSON when another program needs normalized fields, metadata, and asset references.

Terminal window
mailatlas get <document-id> \
--format markdown \
--out ./message-markdown

This writes a directory bundle that contains:

  • document.md
  • assets/ with copied inline images and attachments referenced from the Markdown

Use Markdown when an AI workflow, search index, notebook, or review process needs readable text with local asset references.

Terminal window
mailatlas get <document-id> \
--format html \
--out ./message.html

Use HTML when layout or visual structure matters.

PDF export requires Chrome or Chromium:

Terminal window
mailatlas get <document-id> \
--format pdf \
--out ./message.pdf

If MailAtlas cannot find the browser:

Terminal window
export MAILATLAS_PDF_BROWSER="/path/to/chrome-or-chromium"

If you omit --out for PDF, MailAtlas writes the PDF to .mailatlas/exports/<document-id>.pdf.

Terminal window
find "$MAILATLAS_HOME" -maxdepth 3 -type f | sort

During ingest, MailAtlas writes:

  • Raw email bytes to raw/.
  • HTML snapshots to html/ when the message has HTML.
  • Extracted inline images and attachments to assets/.
  • Metadata and indexes to store.db.

Exports go where you tell MailAtlas to write them with --out.

Confirm the fixture path exists:

Terminal window
ls sample-data/fixtures/eml

If you are using your own message, pass the path to that .eml file instead.

MailAtlas deduplicates by message_id when present and falls back to a normalized content hash. Duplicate records are expected if you ingest the same message more than once.

Install Chrome or Chromium, then set MAILATLAS_PDF_BROWSER if needed.

Terminal window
rm -rf "$MAILATLAS_HOME"

Only delete a workspace when you are sure it does not contain real mail or outbound audit records you need to keep.