This is an old revision of the document!
Table of Contents
Presence and count of specific mail headers
This query extracts, for each of the header fields from a fixed list, the number of occurrences of all distinct values of the field.
SELECT split_part(substr(LINES, 1+POSITION(E'\n'||FIELD IN LINES), 200), E'\n', 1) AS ct, COUNT(*) FROM header, (VALUES ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'), ('X-MSMail-Priority'), ('X-MS-Priority')) AS h(FIELD) WHERE POSITION(E'\n'||FIELD IN LINES)>0 GROUP BY 1
Sample output:
ct | count ---------------------------+------- Importance: high | 32 Importance: High | 130 Importance: low | 1 Importance: Medium | 3 Importance: normal | 44 Importance: Normal | 802 Precedence: bulk | 19987 Precedence: bulk | 1 Precedence: fm-user | 3 Precedence: junk | 3 Precedence: list | 4026 Priority: non-urgent | 11 Priority: Non-Urgent | 1 Priority: normal | 464 Priority: urgent | 603 Priority: Urgent | 12 X-MSMail-Priority: High | 121 X-MSMail-Priority: Low | 1 X-MSMail-Priority: Lowest | 11 X-MSMail-Priority: Medium | 3 X-MSMail-Priority: Middle | 8 X-MSMail-Priority: Normal | 923 X-Priority: 0 | 1 X-Priority: 1 | 102 X-Priority: 1 (High) | 2 X-Priority: 1 (Highest) | 120 X-Priority: 2 | 6 X-Priority: 2 (High) | 1 X-Priority: 3 | 2892 X-Priority: 3 (Normal) | 635 X-Priority: 5 | 14 X-Priority: Normal | 5
This alternative implementation uses regular expressions and differs in that it doesn't limit header values to 200 characters or any other fixed length.
SELECT FIELD||':'||arr[1], COUNT(*) FROM (SELECT FIELD, regexp_matches(LINES, '(?:^|\n)' || FIELD || ':\s*([^\n]*)\n', 'g') AS arr FROM header, (VALUES ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'), ('X-MSMail-Priority'), ('X-MS-Priority')) AS h(FIELD) WHERE strpos(LINES,FIELD)>0) l GROUP BY 1 ORDER BY 1
In this version, the strpos(lines,FIELD)>0
condition is not essential: it's introduced only as a first-pass filter to eliminate the headers that don't contain anywhere any of the searched fields.
Duplicate messages
This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate.
SELECT h1.mail_id FROM header h1, header h2 WHERE h1.lines=h2.lines AND h1.mail_id > h2.mail_id
Messages with specific attachment types
To retrieve all messages containing pdf files or any image file:
SELECT DISTINCT mail_id FROM attachments WHERE content_type='application/pdf' OR content_type LIKE 'image/%';
Messages sent or received today
select mail_id from mail where msg_date>=date_trunc('day',now());