sql_analysis
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
sql_analysis [2012/01/28 02:55] – daniel | sql_analysis [2019/01/24 17:55] (current) – [Presence and count of specific mail headers] daniel | ||
---|---|---|---|
Line 8: | Line 8: | ||
FROM header, (values (' | FROM header, (values (' | ||
WHERE position(E' | WHERE position(E' | ||
- | group by split_part(substr(lines, | + | group by 1 |
</ | </ | ||
Line 48: | Line 48: | ||
| | ||
</ | </ | ||
+ | |||
+ | This alternative implementation uses regular expressions and differs in that it doesn' | ||
+ | <code sql> | ||
+ | select FIELD||':' | ||
+ | from (select FIELD, regexp_matches(lines, | ||
+ | from header, | ||
+ | | ||
+ | | ||
+ | where strpos(lines, | ||
+ | GROUP BY 1 ORDER BY 1 | ||
+ | </ | ||
+ | |||
+ | In this version, the '' | ||
====== Duplicate messages ====== | ====== Duplicate messages ====== | ||
This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate. | This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate. | ||
Line 54: | Line 67: | ||
</ | </ | ||
- | ====== Messages with specific attachment types ====== | + | A stricter version, comparing the md5 hashes of bodies (text and html parts) in addition to the headers: |
- | To retrieve all messages containing pdf files or any image file: | + | |
<code sql> | <code sql> | ||
- | select | + | select |
+ | where b1.mail_id < b2.mail_id | ||
+ | and h1.mail_id = b1.mail_id | ||
+ | and h2.mail_id = b2.mail_id | ||
+ | and md5(h1.lines) = md5(h2.lines) | ||
+ | and md5(b1.bodytext) is not distinct from md5(b2.bodytext) | ||
+ | and md5(b1.bodyhtml) is not distinct from md5(b2.bodyhtml); | ||
</ | </ | ||
- | ====== | + | The '' |
- | < | + | to the simple equality operator, for which '' |
- | select | + | |
+ | ====== Hierarchical view of tags | ||
+ | |||
+ | Output a list of tags sorted by hierarchy level and names, | ||
+ | with an indentation to represent the hierarchy. | ||
+ | |||
+ | < | ||
+ | WITH RECURSIVE tagr(a, | ||
+ | | ||
+ | array[row_number() over (order by name)] as a, | ||
+ | tag_id, | ||
+ | name, | ||
+ | 1 as level | ||
+ | from tags where parent_id is null | ||
+ | UNION ALL | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | FROM tags JOIN tagr ON tagr._tag_id=tags.parent_id | ||
+ | ) | ||
+ | select repeat(' | ||
</ | </ | ||
sql_analysis.1327719358.txt.gz · Last modified: 2012/01/28 02:55 by daniel