Mail

EMLX File Format

Overview

The EMLX file format is Apple Mail's proprietary format for storing individual email messages on disk. Each .emlx file contains a single email message and consists of three components: a byte count on the first line, the full RFC 2822 email message, and an optional trailing XML plist containing Apple-specific metadata such as conversation threading, flags, and timestamps.

EMLX files are stored within the Mail version directory, organized by account UUID and mailbox name. They are numbered sequentially (e.g., 1.emlx, 2.emlx) within each mailbox's Messages directory.

File Locations

EMLX files are located within the Mail data directory hierarchy:

~/Library/Mail/V10/{Account-UUID}/{Mailbox}.mbox/{UUID}/Data/Messages/
  1.emlx
  2.emlx
  3.emlx
  ...

Example full path:

~/Library/Mail/V10/ABC123-DEF456/INBOX.mbox/789GHI/Data/Messages/42.emlx

Database Schema / File Format

Three-Part Structure

An EMLX file has three sequential parts:

<byte-count>\n
<RFC 2822 message content of exactly byte-count bytes>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" ...>
<plist version="1.0">
<dict>
    <key>conversation-id</key>
    <integer>7421</integer>
    <key>date-last-viewed</key>
    <integer>0</integer>
    <key>date-received</key>
    <integer>1641261501</integer>
    <key>flags</key>
    <integer>8623750145</integer>
</dict>
</plist>

Part 1: Byte Count

The first line contains a decimal integer representing the exact byte length of the RFC 2822 message content that follows. This allows the parser to know exactly where the message ends and the trailing plist begins.

Part 2: RFC 2822 Message

The message content is a standard RFC 2822 email, which may be:

  • Plain text: Simple single-part message with headers and text body.
  • MIME multipart: Complex message with multiple parts (text, HTML, attachments) delimited by MIME boundaries.

Standard email headers are present:

HeaderDescription
FromSender address
ToPrimary recipients
CcCarbon copy recipients
SubjectMessage subject
DateSend date (RFC 2822 format)
Message-IDUnique message identifier
In-Reply-ToParent message ID (for threading)
ReferencesThread message IDs
Content-TypeMIME type (e.g., multipart/mixed; boundary="...")
Content-Transfer-EncodingEncoding (base64, quoted-printable, 7bit)

Part 3: Trailing Plist

After exactly byte-count bytes of message content, Apple appends an XML plist dictionary with metadata:

KeyTypeDescription
conversation-idIntegerMail.app conversation thread ID
date-receivedIntegerUnix timestamp of when message was received
date-last-viewedIntegerUnix timestamp of when message was last opened (0 = never viewed)
flagsIntegerBitmask encoding message state

Flags Bitmask

The flags value in the trailing plist uses the same bitmask as the Envelope Index:

BitValueFlag
01Read
12Deleted
24Answered
38Encrypted
416Flagged
532Recent
664Draft
7128Initial download
8256Forwarded
9512Redirected

Note that the flags integer can be very large (e.g., 8623750145) because higher bits encode additional internal state beyond the documented flags.

Key Fields for Analysis

Parsing Algorithm

  1. Read the first line and parse it as an integer (byte_count).
  2. Read exactly byte_count bytes as the RFC 2822 message content.
  3. Parse the RFC 2822 content to extract headers, body text, HTML body, and attachment metadata.
  4. If data remains after the message content, locate the <?xml declaration and parse the trailing plist.

MIME Multipart Handling

For messages with Content-Type: multipart/..., the parser must:

  1. Extract the MIME boundary from the Content-Type header parameters.
  2. Split the body on boundary markers.
  3. For each part:
    • text/plain parts provide the plain text body.
    • text/html parts provide the HTML body.
    • Parts with Content-Disposition: attachment or non-text content types are attachment references.
  4. Handle nested multipart structures (e.g., multipart/alternative inside multipart/mixed).

Attachment Metadata from MIME Parts

Attachments are identified by their Content-Disposition and Content-Type headers within MIME parts:

SourceField
Content-Disposition: attachment; filename="doc.pdf"Filename from disposition
Content-Type: application/pdf; name="doc.pdf"Filename from type parameters
Part body length (after decoding)Attachment size
Content-Type media typeMIME type of the attachment

The actual attachment content is embedded in the MIME part (typically base64-encoded). macfor extracts metadata only and does not collect attachment binary content.

Timestamps

The trailing plist uses Unix timestamps (seconds since 1970-01-01 00:00:00 UTC):

FieldDescription
date-receivedWhen the message was received locally
date-last-viewedWhen the user last opened the message; 0 means never viewed

The Date header in the RFC 2822 message uses RFC 2822 date format (e.g., Mon, 15 Jan 2026 10:30:00 -0800). The timezone in the Date header reflects the sender's timezone, which may differ from the local system timezone.

Analysis Notes

  • Byte count accuracy: The byte count on the first line must be exact. If the file is truncated (message shorter than the byte count), the plist metadata cannot be reliably located.
  • Missing plist: Some EMLX files may not have trailing plist metadata. The parser should handle this gracefully and still extract the RFC 2822 message.
  • Character encoding: Message bodies may use various character encodings (UTF-8, ISO-8859-1, etc.) as specified in the Content-Type header. Proper decoding requires checking the charset parameter.
  • Content-Transfer-Encoding: Message body parts may be encoded as base64, quoted-printable, or 7bit. Decoding is required before reading the actual text content.
  • Flags consistency: The flags in the trailing plist should match the flags in the Envelope Index for the same message. Discrepancies may indicate database corruption or manual file manipulation.
  • Large files: EMLX files with large attachments can be many megabytes. The byte count ensures efficient seeking past the message content to the metadata plist.
  • Date-last-viewed forensic value: A date-last-viewed value of 0 means the user never opened this specific message. A non-zero value provides evidence that the user viewed the message at a specific time.

Version Differences

The EMLX format has been stable across macOS versions. The primary change is the addition of new plist metadata keys in newer versions:

macOS VersionChanges
All versionsCore format (byte count + RFC 2822 + plist) unchanged
10.15+Additional plist keys may appear but are backwards compatible

Tool Support

ToolCapability
macforFull EMLX parsing: byte count, headers, body extraction, MIME multipart, attachment metadata, trailing plist
emlx2mboxConverts EMLX to mbox format
ThunderbirdCan import EMLX files
Any text editorCan view raw EMLX content (though binary attachments will be unreadable)

References

Previous
Envelope Index