EMLX File Format
Overview
The EMLX file format is Apple Mail's proprietary format for storing individual email messages on disk. Each .emlx file contains a single email message and consists of three components: a byte count on the first line, the full RFC 2822 email message, and an optional trailing XML plist containing Apple-specific metadata such as conversation threading, flags, and timestamps.
EMLX files are stored within the Mail version directory, organized by account UUID and mailbox name. They are numbered sequentially (e.g., 1.emlx, 2.emlx) within each mailbox's Messages directory.
File Locations
EMLX files are located within the Mail data directory hierarchy:
~/Library/Mail/V10/{Account-UUID}/{Mailbox}.mbox/{UUID}/Data/Messages/
1.emlx
2.emlx
3.emlx
...
Example full path:
~/Library/Mail/V10/ABC123-DEF456/INBOX.mbox/789GHI/Data/Messages/42.emlx
Database Schema / File Format
Three-Part Structure
An EMLX file has three sequential parts:
<byte-count>\n
<RFC 2822 message content of exactly byte-count bytes>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" ...>
<plist version="1.0">
<dict>
<key>conversation-id</key>
<integer>7421</integer>
<key>date-last-viewed</key>
<integer>0</integer>
<key>date-received</key>
<integer>1641261501</integer>
<key>flags</key>
<integer>8623750145</integer>
</dict>
</plist>
Part 1: Byte Count
The first line contains a decimal integer representing the exact byte length of the RFC 2822 message content that follows. This allows the parser to know exactly where the message ends and the trailing plist begins.
Part 2: RFC 2822 Message
The message content is a standard RFC 2822 email, which may be:
- Plain text: Simple single-part message with headers and text body.
- MIME multipart: Complex message with multiple parts (text, HTML, attachments) delimited by MIME boundaries.
Standard email headers are present:
| Header | Description |
|---|---|
From | Sender address |
To | Primary recipients |
Cc | Carbon copy recipients |
Subject | Message subject |
Date | Send date (RFC 2822 format) |
Message-ID | Unique message identifier |
In-Reply-To | Parent message ID (for threading) |
References | Thread message IDs |
Content-Type | MIME type (e.g., multipart/mixed; boundary="...") |
Content-Transfer-Encoding | Encoding (base64, quoted-printable, 7bit) |
Part 3: Trailing Plist
After exactly byte-count bytes of message content, Apple appends an XML plist dictionary with metadata:
| Key | Type | Description |
|---|---|---|
conversation-id | Integer | Mail.app conversation thread ID |
date-received | Integer | Unix timestamp of when message was received |
date-last-viewed | Integer | Unix timestamp of when message was last opened (0 = never viewed) |
flags | Integer | Bitmask encoding message state |
Flags Bitmask
The flags value in the trailing plist uses the same bitmask as the Envelope Index:
| Bit | Value | Flag |
|---|---|---|
| 0 | 1 | Read |
| 1 | 2 | Deleted |
| 2 | 4 | Answered |
| 3 | 8 | Encrypted |
| 4 | 16 | Flagged |
| 5 | 32 | Recent |
| 6 | 64 | Draft |
| 7 | 128 | Initial download |
| 8 | 256 | Forwarded |
| 9 | 512 | Redirected |
Note that the flags integer can be very large (e.g., 8623750145) because higher bits encode additional internal state beyond the documented flags.
Key Fields for Analysis
Parsing Algorithm
- Read the first line and parse it as an integer (
byte_count). - Read exactly
byte_countbytes as the RFC 2822 message content. - Parse the RFC 2822 content to extract headers, body text, HTML body, and attachment metadata.
- If data remains after the message content, locate the
<?xmldeclaration and parse the trailing plist.
MIME Multipart Handling
For messages with Content-Type: multipart/..., the parser must:
- Extract the MIME boundary from the Content-Type header parameters.
- Split the body on boundary markers.
- For each part:
text/plainparts provide the plain text body.text/htmlparts provide the HTML body.- Parts with
Content-Disposition: attachmentor non-text content types are attachment references.
- Handle nested multipart structures (e.g.,
multipart/alternativeinsidemultipart/mixed).
Attachment Metadata from MIME Parts
Attachments are identified by their Content-Disposition and Content-Type headers within MIME parts:
| Source | Field |
|---|---|
Content-Disposition: attachment; filename="doc.pdf" | Filename from disposition |
Content-Type: application/pdf; name="doc.pdf" | Filename from type parameters |
| Part body length (after decoding) | Attachment size |
Content-Type media type | MIME type of the attachment |
The actual attachment content is embedded in the MIME part (typically base64-encoded). macfor extracts metadata only and does not collect attachment binary content.
Timestamps
The trailing plist uses Unix timestamps (seconds since 1970-01-01 00:00:00 UTC):
| Field | Description |
|---|---|
date-received | When the message was received locally |
date-last-viewed | When the user last opened the message; 0 means never viewed |
The Date header in the RFC 2822 message uses RFC 2822 date format (e.g., Mon, 15 Jan 2026 10:30:00 -0800). The timezone in the Date header reflects the sender's timezone, which may differ from the local system timezone.
Analysis Notes
- Byte count accuracy: The byte count on the first line must be exact. If the file is truncated (message shorter than the byte count), the plist metadata cannot be reliably located.
- Missing plist: Some EMLX files may not have trailing plist metadata. The parser should handle this gracefully and still extract the RFC 2822 message.
- Character encoding: Message bodies may use various character encodings (UTF-8, ISO-8859-1, etc.) as specified in the Content-Type header. Proper decoding requires checking the
charsetparameter. - Content-Transfer-Encoding: Message body parts may be encoded as
base64,quoted-printable, or7bit. Decoding is required before reading the actual text content. - Flags consistency: The flags in the trailing plist should match the flags in the Envelope Index for the same message. Discrepancies may indicate database corruption or manual file manipulation.
- Large files: EMLX files with large attachments can be many megabytes. The byte count ensures efficient seeking past the message content to the metadata plist.
- Date-last-viewed forensic value: A
date-last-viewedvalue of 0 means the user never opened this specific message. A non-zero value provides evidence that the user viewed the message at a specific time.
Version Differences
The EMLX format has been stable across macOS versions. The primary change is the addition of new plist metadata keys in newer versions:
| macOS Version | Changes |
|---|---|
| All versions | Core format (byte count + RFC 2822 + plist) unchanged |
| 10.15+ | Additional plist keys may appear but are backwards compatible |
Tool Support
| Tool | Capability |
|---|---|
| macfor | Full EMLX parsing: byte count, headers, body extraction, MIME multipart, attachment metadata, trailing plist |
| emlx2mbox | Converts EMLX to mbox format |
| Thunderbird | Can import EMLX files |
| Any text editor | Can view raw EMLX content (though binary attachments will be unreadable) |