Quickstart: Ingest and Export Email Files
This guide walks through the file-based MailAtlas workflow: create a local workspace, ingest .eml files, list stored documents, inspect one document, and export it.
Use this page when your input already exists as files on disk. If you want MailAtlas to connect to a live mailbox, use IMAP Receive instead.
By the end, you will have:
- A local MailAtlas workspace.
- One or more stored email documents.
- Raw message files, cleaned text, HTML snapshots, and extracted assets when present.
- A JSON export.
- A Markdown bundle suitable for downstream AI or retrieval workflows.
- Optional HTML or PDF exports.
Before you start
Section titled “Before you start”You need:
- Python 3.12 recommended, or another supported Python version from the package metadata.
- A working MailAtlas install.
- A local
.emlfile, or the MailAtlas sample data repository. - Chrome or Chromium only if you plan to export PDF.
Run this first if you have not verified the install:
mailatlas doctor1. Create a workspace root
Section titled “1. Create a workspace root”export MAILATLAS_HOME="$PWD/.mailatlas"Check that the variable is set:
echo "$MAILATLAS_HOME"2. Get sample email fixtures
Section titled “2. Get sample email fixtures”If you already have your own .eml files, skip this step.
git clone https://github.com/mailatlas/sample-data.gitThe public sample-data repository includes the fixtures used below.
3. Ingest .eml files
Section titled “3. Ingest .eml files”Ingest one message:
mailatlas ingest sample-data/fixtures/eml/atlas-market-map.emlOr ingest several messages:
mailatlas ingest \ sample-data/fixtures/eml/atlas-market-map.eml \ sample-data/fixtures/eml/atlas-founder-forward.eml \ sample-data/fixtures/eml/atlas-inline-chart.emlExpected output shape:
{ "status": "ok", "ingested_count": 3, "duplicate_count": 0, "document_refs": [ { "id": "<document-id>", "subject": "Regional freight signals tighten in the Midwest", "source_kind": "eml", "created_at": "<timestamp>" } ]}Copy one returned id. You will use it as <document-id>.
4. List stored documents
Section titled “4. List stored documents”mailatlas listUse this command whenever you need to find document IDs in the current workspace.
5. Inspect one stored document
Section titled “5. Inspect one stored document”mailatlas get <document-id>A stored document includes fields such as:
{ "id": "<document-id>", "source_kind": "eml", "subject": "Port dwell times normalize after weather disruptions", "body_text": "<cleaned text>", "body_html_path": "html/<document-id>.html", "raw_path": "raw/<document-id>.eml", "metadata": { "cleaning": { "dropped_line_count": 0 }, "provenance": { "is_forwarded": false } }, "assets": [ { "kind": "inline", "file_path": "assets/<document-id>/001-route-heatmap.svg" } ]}When MailAtlas extracts a regular file attachment, the same assets array uses "kind": "attachment" and stores the file under assets/<document-id>/....
6. Export JSON
Section titled “6. Export JSON”mailatlas get <document-id> \ --format json \ --out ./message.jsonUse JSON when another program needs normalized fields, metadata, and asset references.
7. Export a Markdown bundle
Section titled “7. Export a Markdown bundle”mailatlas get <document-id> \ --format markdown \ --out ./message-markdownThis writes a directory bundle that contains:
document.mdassets/with copied inline images and attachments referenced from the Markdown
Use Markdown when an AI workflow, search index, notebook, or review process needs readable text with local asset references.
8. Export HTML
Section titled “8. Export HTML”mailatlas get <document-id> \ --format html \ --out ./message.htmlUse HTML when layout or visual structure matters.
9. Export PDF
Section titled “9. Export PDF”PDF export requires Chrome or Chromium:
mailatlas get <document-id> \ --format pdf \ --out ./message.pdfIf MailAtlas cannot find the browser:
export MAILATLAS_PDF_BROWSER="/path/to/chrome-or-chromium"If you omit --out for PDF, MailAtlas writes the PDF to .mailatlas/exports/<document-id>.pdf.
10. Review the workspace
Section titled “10. Review the workspace”find "$MAILATLAS_HOME" -maxdepth 3 -type f | sortDuring ingest, MailAtlas writes:
- Raw email bytes to
raw/. - HTML snapshots to
html/when the message has HTML. - Extracted inline images and attachments to
assets/. - Metadata and indexes to
store.db.
Exports go where you tell MailAtlas to write them with --out.
Troubleshooting
Section titled “Troubleshooting”No such file or directory
Section titled “No such file or directory”Confirm the fixture path exists:
ls sample-data/fixtures/emlIf you are using your own message, pass the path to that .eml file instead.
duplicate_count is greater than zero
Section titled “duplicate_count is greater than zero”MailAtlas deduplicates by message_id when present and falls back to a normalized content hash. Duplicate records are expected if you ingest the same message more than once.
PDF export fails
Section titled “PDF export fails”Install Chrome or Chromium, then set MAILATLAS_PDF_BROWSER if needed.
Reset the quickstart workspace
Section titled “Reset the quickstart workspace”rm -rf "$MAILATLAS_HOME"Only delete a workspace when you are sure it does not contain real mail or outbound audit records you need to keep.
Next step
Section titled “Next step”- Use IMAP Receive to fetch selected folders from a live mailbox.
- Use Document Schema to understand stored fields.
- Use Workspace Model to understand local files and SQLite metadata.
- Use CLI Overview for the full command surface.