Heuristics Reference

Detection rules used during scans

← Back to analyzer

CVE Confidence

CVE exact The heuristic matches highly likely exploitation of a specific vulnerability.

CVE likely The heuristic matches on the known vulnerable primitive or delivery pattern, but does not confirm a full exploit chain.

CVE related The heuristic identifies presence of the vulnerable component or attack surface, but is unable to confirm an exploit.

Shellcode 30

CreateRemoteThread API reference critical SC_STR_CREATEREMOTETHREAD
String 'CreateRemoteThread' found in file bytes.
CreateRemoteThread starts execution in another process. In documents or embedded payloads this is a strong process-injection indicator.
Metasploit bind_tcp critical SC_MSF_BIND
Byte signature matching Metasploit Framework bind_tcp shellcode.
This byte sequence matches Metasploit's bind TCP shell, which opens a listening port on the victim's machine for the attacker to connect to.
Metasploit reverse_tcp critical SC_MSF_REVERSE
Byte signature matching Metasploit Framework reverse_tcp shellcode.
This exact byte sequence is the preamble of Metasploit's reverse TCP shell payload — one of the most widely-used exploitation tools. Its presence is consistent with a weaponised file.
URLDownloadToFile API reference critical SC_STR_URLDOWNLOAD
String 'URLDownloadToFile' found in file bytes.
URLDownloadToFile downloads a file from the internet to disk. This is one of the most common APIs used by shellcode to fetch and save second-stage malware. Its presence is a high-signal indicator of malicious intent.
WriteProcessMemory API reference critical SC_STR_WRITEPROCESSMEMORY
String 'WriteProcessMemory' found in file bytes.
WriteProcessMemory writes bytes into another process. Malware uses it with VirtualAllocEx and CreateRemoteThread for process injection.
XOR-encoded Windows strings critical SC_XOR_ENCODED
Windows DLL or API names found XOR-encoded with a single-byte key.
Shellcode frequently XOR-encodes strings like 'kernel32.dll' or 'LoadLibraryA' to evade signature-based detection. At runtime, the shellcode decodes them with the same key before calling the APIs. Finding known library or API names encoded under a single-byte XOR key is a high-signal indicator of obfuscated shellcode. The analyzer brute-forces all 255 possible single-byte keys against common Windows DLL and API names to detect this technique.
CreateProcess API reference high SC_STR_CREATEPROCESS
String 'CreateProcess' found in file bytes.
CreateProcess starts a new process. Its presence in raw document bytes suggests embedded code intended to execute programs on the system.
Egg-hunter shellcode high SC_EGG_HUNTER
Egg-hunter pattern that searches process memory for a marker ('egg').
When an exploit has limited buffer space, an egg-hunter is a small piece of shellcode that scans memory for a larger payload marked with a specific tag. This is a well-known exploitation technique.
GetProcAddress API reference high SC_STR_GETPROCADDRESS
String 'GetProcAddress' found in file bytes.
GetProcAddress resolves an exported function inside a loaded DLL. Together with LoadLibrary it is the foundational API resolution pair used by nearly all in-document shellcode. Ordinary documents rarely contain this string.
Heap-spray pattern high SC_HEAP_SPRAY
Repeated byte pattern typical of heap-spray payloads.
Heap spraying fills large areas of memory with repeated data (often NOP-sled + shellcode) so that a corrupted pointer is likely to land in attacker-controlled memory. Seeing long repeated byte patterns in a document is a strong exploit indicator.
LoadLibrary API reference high SC_STR_LOADLIBRARY
String 'LoadLibrary' (or LoadLibraryA/W/Ex) found in file bytes.
LoadLibrary maps a DLL into the current process and returns its base address; combined with GetProcAddress it is the standard primitive for shellcode to resolve Win32 API functions without using the import table. Documents do not normally embed this string in their data.
NOP sled high SC_NOP_SLED
Long run of 0x90 (NOP) bytes detected in the file.
A NOP sled is a sequence of no-operation instructions used by attackers to pad shellcode so that execution 'slides' into the payload regardless of the exact jump address. Long NOP runs are unusual in normal documents.
PEB API-hash resolver high SC_API_HASH_RESOLVER
PEB access combined with nearby ROR13-style API hashing.
Windows shellcode often walks the PEB to find loaded DLLs, then hashes export names to resolve APIs without storing cleartext imports. ROR13 hash loops are a common resolver primitive; seeing them near PEB access is a strong position-independent shellcode indicator.
PEB access (x64) high SC_PEB_ACCESS_X64
Access to the Process Environment Block via GS:[0x60].
The 64-bit equivalent of PEB access. Shellcode reads GS:[0x60] to find the PEB on 64-bit Windows, then walks DLL lists to resolve API functions.
PEB access (x86) high SC_PEB_ACCESS
Access to the Process Environment Block via FS:[0x30].
Windows shellcode accesses the PEB to find loaded DLLs and resolve API addresses without using imports. Reading FS:[0x30] on x86 is the standard way to reach the PEB — a hallmark of position-independent shellcode.
PowerShell reference high SC_STR_POWERSHELL
String 'powershell' found in file bytes.
PowerShell is a powerful scripting environment frequently abused by attackers to download payloads, execute scripts in memory, and evade detection.
ShellExecute API reference high SC_STR_SHELLEXEC
String 'ShellExecute' found in file bytes.
ShellExecute can open files, URLs, or run programs. In shellcode it is often used to launch payloads or open second-stage URLs.
WinExec API reference high SC_STR_WINEXEC
String 'WinExec' found in file bytes.
WinExec is a Windows API that runs a command. Shellcode resolves and calls WinExec to launch malicious commands. Normal documents should not contain this raw API name in their binary data.
Windows Script Host reference high SC_STR_WSCRIPT
String 'wscript' or 'cscript' found in file bytes.
Windows Script Host (wscript/cscript) executes VBScript and JScript. In malicious documents it is often used to run downloader or installer scripts.
XOR decoder loop high SC_XOR_DECODER
XOR-based decoder stub that decrypts shellcode at runtime.
Shellcode is often XOR-encoded to evade signature detection. A decoder stub at the start decrypts the real payload in memory. Finding a decoder stub strongly suggests the file carries encrypted shellcode.
bitsadmin reference high SC_STR_BITSADMIN
String 'bitsadmin' found in file bytes.
bitsadmin manages Background Intelligent Transfer Service jobs. Attackers abuse it to download files stealthily in the background.
certutil reference high SC_STR_CERTUTIL
String 'certutil' found in file bytes.
certutil is a legitimate Windows tool that attackers misuse to download files (-urlcache) or decode Base64 payloads (-decode). It is a commonly abused LOLBin.
cmd.exe reference high SC_STR_CMD
String 'cmd.exe' followed by an execution switch (/c, /k, or /r) — i.e. an actual invocation, not just a bare reference.
The rule matches 'cmd.exe' immediately followed by /c, /k, or /r, which is the shape of a real command invocation by shellcode or a macro launching a payload. Plain documentation mentions of 'cmd.exe' (e.g. in user manuals or embedded paths) do not fire this rule.
mshta.exe reference high SC_STR_MSHTA
String 'mshta' found in file bytes.
mshta.exe runs HTML Applications (.hta files) which can contain VBScript or JScript. It is a well-known 'living-off-the-land' binary (LOLBin) abused to execute malicious scripts.
x86 GetPC stub (CALL $+5) high SC_GETPC_CALL
x86 CALL $+5 instruction sequence that obtains the current instruction pointer.
Shellcode needs to know its own memory address to locate encoded payloads. CALL $+5 followed by POP is a common way to get the program counter (PC). This pattern is unusual in normal documents.
x86 GetPC stub (FSTENV) high SC_GETPC_FSTENV
x86 FSTENV-based instruction sequence to obtain the instruction pointer.
An alternative GetPC technique that uses floating-point environment save (FSTENV) to leak the instruction pointer. This is a strong shellcode indicator because this instruction pattern is uncommon in ordinary document data.
NOP-equivalent sled medium SC_NOP_EQUIV_SLED
Long run of NOP-equivalent instructions (e.g. INC, DEC, POPA).
Some shellcode replaces 0x90 NOPs with other single-byte instructions that have no meaningful side-effect (like INC ECX) to evade simple NOP-sled detection.
VirtualAlloc API reference medium SC_STR_VIRTUALALLOC
String 'VirtualAlloc' found in file bytes.
VirtualAlloc allocates executable memory. Shellcode uses it to create a writable+executable memory region where decoded payloads can run.
VirtualProtect API reference medium SC_STR_VIRTUALPROTECT
String 'VirtualProtect' found in file bytes.
VirtualProtect changes memory page permissions. Shellcode uses it to make memory regions executable (bypassing DEP). Not expected in documents.
x86 push-string-call medium SC_PUSH_STRING
Two or more consecutive PUSH imm32 instructions whose decoded bytes spell a Windows API or shell-keyword string.
Shellcode frequently constructs strings (like 'cmd.exe' or 'WinExec') on the stack by pushing 4-byte immediates with the 0x68 opcode. The rule matches a run of ≥2 PUSH imm32 instructions and only fires when the decoded bytes contain a known execution, network, or Windows API keyword — so generic numeric pushes do not trigger it.

PDF 162

/Launch /P parameter is a javascript: URL critical PDF_LAUNCH_JS_PROTOCOL
PDF /Launch action passes a `javascript:` URL as the /P parameter.
When mshta receives `javascript:...` as its argument, it can execute the script inside its scripting host. This converts the PDF /Launch action into an inline-script execution primitive. Combined with PDF_LAUNCH_MSHTA, this is T1218.005 + T1059.005.
/Launch action target critical PDF_LAUNCH_COMMAND
PDF /Launch action specifies an executable target (and optionally parameters).
The /Launch action can run an external program when activated, or on open if paired with an open trigger. This rule captures the launched command for display and elevates to CRITICAL when the target references a known-dangerous executable (cmd, PowerShell, etc.).
/Launch action targets mshta.exe (LOLBIN) critical PDF_LAUNCH_MSHTA
PDF /Launch action whose /F parameter explicitly names mshta.
Mshta is the Microsoft HTML Application host and a documented LOLBIN. Modern PDF launcher campaigns prefer it because mshta accepts a `javascript:` URL as its /P parameter, which executes inline JScript without requiring a dropper file on disk. The PDF /Launch carrier plus an mshta target is the unambiguous shape of MITRE ATT&CK T1218.005.
Adobe Reader U3D auto-activated 3D annotation — CVE-2009-3459 critical CVE_2009_3459_U3D_AUTOACTIVATE
PDF embeds a U3D stream behind a /3D annotation set to auto-activate on page view.
CVE-2009-3459 is a heap buffer overflow in Adobe Reader / Acrobat's U3D (Universal 3D, ECMA-363) CLODProgressiveMeshDeclaration parser, patched in APSB09-15 (Reader 9.2 / 8.1.7 / 7.1.4). The exploitable document shape is a /Subtype /3D annotation whose /3DA activation dictionary binds /A /PV with /AIS /I — that combination makes the U3D parser run on page view with no click required. Real-world samples pair this with a 0x0c0c0c0c heap-spray JavaScript that lays a urlmon-based download shellcode at the corrupted allocation. Legitimate 3D PDFs almost never use the auto-activate + JS combination.
Annotation subject hex-decoded eval stager critical PDF_ANNOT_SUBJECT_HEX_EVAL_STAGER
PDF JavaScript decodes dash-delimited hex from annotation subjects and evals the result.
Old PDF exploit kits often hide second-stage JavaScript in annotation /Subject fields. The rule requires the full staging shape: OpenAction JavaScript that enumerates annotations, converts dash-delimited hex bytes with String.fromCharCode(), and evals the recovered stage. This avoids flagging ordinary annotations or benign hexadecimal text.
Annotation subject percent-decoding eval stager critical PDF_ANNOT_SUBJECT_MARKER_EVAL_STAGER
OpenAction JavaScript reads an annotation /Subject payload, rewrites marker bytes into percent escapes, unescapes the result, and dispatches it through eval.
This rule is a high-confidence exploit-kit transport pattern, not a CVE attribution by itself. It requires an /OpenAction launcher, an annotation /Subj payload, syncAnnotScan/getAnnots annotation enumeration, marker-to-% rewriting, unescape(), and direct or indirect eval dispatch. Plain getAnnots({nPage:0}) is not enough; CVE-2009-1492 is only assigned by the separate CVE rule when getAnnots() carries crafted integer-overflow or long string arguments.
Base64-encoded Windows executable payload in PDF critical PDF_BASE64_PE_PAYLOAD
PDF text contains a long base64 blob that decodes to a verified MZ/PE executable payload.
Malicious PDFs may hide a Windows executable as base64 in comments, after %%EOF, or in plain object text rather than as a declared attachment or stream. Decoding to a verified PE header is a strong payload-smuggling indicator.
Embedded Windows executable payload in PDF stream critical PDF_EMBEDDED_PE_PAYLOAD
PDF stream bytes contain an embedded MZ/PE executable payload.
Exploit chains sometimes hide droppers inside ordinary PDF stream bytes rather than as declared /EmbeddedFile attachments. A verified PE header inside a PDF stream is strong staged-payload evidence.
Embedded export-and-launch chain — CVE-2010-1240 likely critical CVE_2010_1240_EMBEDDED_EXPORT_LAUNCH
PDF combines /Launch, EmbeddedFiles/EF, and exportDataObject with nLaunch:0.
This rule covers CVE-2010-1240-style documents where the attached payload is not a clean PE executable but the Adobe Reader drop-and-launch mechanism is explicit. The rule requires /Launch plus an embedded-file name tree and exportDataObject(... nLaunch:0), which is not a benign attachment workflow.
Fake 'free download' SEO-poisoning PDF critical PDF_SEO_FAKE_DOWNLOAD
ML-flagged PDF that also carries a download/call-to-action lure and an off-domain downloadN.php?file=document gateway link.
The mass-generated 'free PDF download' / fake-document family ranks in search results for a lure query, then funnels the victim through an off-domain server-side download gateway (e.g. /download3.php?q=<name>.pdf) to malware, scareware, or ad-fraud redirects. The pages pad themselves with benign decoy links to dilute classifier scores, so the ML hit alone lands only in the suspicious band. This rule fires only on the conjunction of the ML hit, a visual download lure, and the gateway link — a combination benign PDFs essentially never carry — and promotes the verdict to malicious.
Hidden ZIP with executable payloads in PDF stream critical PDF_HIDDEN_ZIP_EXECUTABLE_PAYLOAD
PDF stream contains a hidden ZIP archive with executable entries.
PDFs can legitimately carry attachments, but normal attachments are declared through /EmbeddedFile, /EmbeddedFiles, or /EF metadata so the viewer and user can treat them as attachments. This rule looks for the different pattern of raw ZIP local-file headers hidden inside ordinary PDF stream bytes, then only fires when ZIP entry names end in executable payload extensions such as .dll, .exe, .scr, .ps1, .hta, or .lnk. Legitimate reasons for DLLs inside a PDF are very rare; a software manual or PDF portfolio should use explicit attachment metadata, not a concealed stream archive.
JBIG2Decode generic heap-spray exploit — CVE-2009-0658 likely critical CVE_2009_0658_GENERIC_SPRAY
PDF combines JBIG2Decode image streams with JavaScript heap-spray or decoder scaffolding.
The exact CVE-2009-0658 rule requires stronger Reader-version or decoded-shellcode fingerprints. This likely rule requires JBIG2Decode plus exploit-preparation JavaScript such as unescape heap-spray builders, large arrays, fromCharCode decoders, or eval dispatch, which is the static shape of the older Adobe Reader JBIG2 exploit family.
Launch VBS dropper command chain — CVE-2010-1240 likely critical CVE likely CVE_2010_1240_LAUNCH_VBS_DROPPER
PDF /Launch invokes cmd.exe to build a VBS ADODB.Stream/XMLHTTP/FileSystemObject dropper.
CVE-2010-1240 covers Adobe Reader/Acrobat Launch File dialog abuse. This variant does not rely on PDF EmbeddedFiles; instead the Launch command constructs VBS that either reopens the PDF itself and extracts an appended byte range, or downloads a payload with XMLHTTP, saves it via ADODB.Stream/FileSystemObject, and runs it. The rule requires cmd.exe from /Launch plus VBS dropper APIs to avoid tagging ordinary Launch actions.
Launch action critical PDF_LAUNCH
PDF contains a /Launch action to start an external application.
A Launch action can start an external application when the action is activated, or on open if it is wired to an open trigger. This is a high-risk PDF feature and is useful evidence when reviewing a document.
Launch/export embedded executable chain — CVE-2010-1240 likely critical CVE_2010_1240_EMBEDDED_PE_EXPORT
PDF combines /Launch, EmbeddedFiles/EF, exportDataObject, and embedded executable bytes.
This conservative variant of the CVE-2010-1240 detector covers samples where the Launch dictionary does not expose the strict cmd.exe /Win shape. Requiring all four surfaces keeps benign attachments out while attributing the same drop-and-launch abuse chain.
Pidief-style multi-CVE JavaScript dispatcher critical PDF_PIDIEF_MULTI_CVE_DISPATCH
Single PDF JavaScript body branches on viewerVersion and invokes multiple Reader CVE sinks.
The 2009-2010 Pidief.J template carries three Reader exploits in one PDF: CVE-2007-5659 (Collab.collectEmailInfo), CVE-2008-2992 (util.printf with a field-width %f format string), and CVE-2009-0927 (Collab.getIcon). A small dispatcher reads app.viewerVersion and fires the matching sink. The rule requires both a viewerVersion switch and two or more distinct CVE sinks in the same JavaScript body, so it doesn't fire on benign code that mentions one of them.
PowerShell download cradle in PDF critical PDF_PS_DOWNLOAD_CRADLE
PDF action body contains a PowerShell download-and-execute cradle.
Patterns matched include `Invoke-Expression(Invoke-RestMethod ...)`, `IEX(IRM ...)`, `(New-Object Net.WebClient).DownloadString`, `[Net.WebClient]`, `[Net.ServicePointManager]::SecurityProtocol`, and `powershell -ep Bypass -enc <base64>`. These strings are rare in benign PDFs; their presence is strong evidence of the payload-staging stage of an attack chain (MITRE T1059.001 + T1105).
RichMedia Flash exploit — CVE-2011-0611 likely critical CVE likely CVE_2011_0611_FLASH_RICHMEDIA
PDF combines RichMedia Flash activation, an AS3 ByteArray/loadBytes SWF, and shellcode staging.
CVE-2011-0611 affects Adobe Flash Player and Adobe Reader's Authplay Flash handling. This rule requires a RichMedia Flash annotation, an embedded SWF with AS3 ByteArray/loadBytes loader logic, and either PDF-side or SWF-internal shellcode/heap-spray staging. Those gates separate exploit-delivery PDFs from ordinary RichMedia content.
Shell.Application.ShellExecute COM pivot critical PDF_SHELL_APPLICATION_PIVOT
PDF (or its embedded JavaScript stub) instantiates Shell.Application and calls ShellExecute.
Some PDF readers prompt the user before honouring a /Launch action. Attackers sidestep that prompt by having a JScript stub (typically loaded via mshta as a `javascript:` URL) instantiate the Shell.Application COM object and call ShellExecute to spawn the next-stage process — the reader's /Launch warning may not fire because the spawn happens inside the mshta host, not inside the PDF reader.
U3D parser exploit with JavaScript heap spray — CVE-2011-2462 likely critical CVE likely CVE_2011_2462_U3D_HEAPSPRAY
PDF combines U3D/3D annotation content with JavaScript heap-spray shellcode.
Public CVE-2011-2462 exploit chains use a crafted U3D stream and JavaScript heap spray to control memory during Adobe Reader's U3D parser memory corruption. The rule requires both U3D/3D content and a heap-spray JavaScript shape, avoiding attribution for ordinary 3D PDFs.
U3D/RichMedia activation — CVE-2011-2462 likely critical CVE_2011_2462_RICHMEDIA_U3D
PDF combines U3D stream markers with RichMedia and JavaScript/XFA activation surfaces.
This rule covers U3D exploit documents where the U3D marker is present in stream data but not exposed through the canonical /Subtype /U3D dictionary. Requiring RichMedia plus active JavaScript/XFA surfaces keeps the rule focused on weaponized CVE-2011-2462-style delivery documents rather than benign 3D assets.
VBScript decimal byte array PE payload in PDF critical PDF_VBS_DECIMAL_ARRAY_PE_PAYLOAD
PDF comment text contains a decimal byte array that decodes to a verified MZ/PE executable payload.
Some malicious PDFs hide a Windows executable in commented VB/VBScript-style source lines such as Array(c(077),c(090),...). The detector only fires when that concealed decimal array decodes to a valid MZ/PE header, which keeps the rule focused on staged payloads rather than ordinary numeric arrays.
exportDataObject + nLaunch — embedded-file dropper critical PDF_JS_EXPORT_LAUNCH_DROPPER
PDF JavaScript calls exportDataObject() with nLaunch set, extracting and launching the document's embedded file on open.
exportDataObject({cName:..., nLaunch:2}) writes the PDF's embedded file to a temp folder and opens it in its default handler — a launch-on-open dropper. The embedded file is the real payload (commonly a VelvetSweatshop-encrypted Office document wrapping an Equation Editor exploit, or a script/executable). No benign PDF workflow auto-launches an extracted attachment, so this is a high-confidence malicious-delivery indicator.
/OpenAction targets an object not reachable from /Root high PDF_OPENACTION_HIDDEN_OBJECT
PDF defines an /OpenAction whose target object cannot be reached by walking indirect references from the document /Root catalog tree.
When a PDF is opened, the /OpenAction fires regardless of whether the target object is reachable from the catalog. Many static analysers and indexers enumerate the document via the /Root tree and never see hidden objects — yet the action still runs. This shape is associated with evasive samples that hide JavaScript or launch actions outside the normal catalog descent.
Adobe Reader APSB08-13 patch-range version gate (CVE-2007-5659) high PDF_JS_ADOBE_APSB08_13_PATCH_GATE
PDF JavaScript gates the payload on the Reader 7.0.x / 8.0–8.1.1 window.
A version gate of (>= 8 && < 8.11) OR (< 7.1) is the exact Reader release window patched by Adobe APSB08-13 for CVE-2007-5659 (Collab.collectEmailInfo buffer overflow). Pidief-family PDFs use this gate to fire the collectEmailInfo trigger only on vulnerable Readers and stay quiet on patched ones.
Adobe Reader APSB09-15 patch-range version gate (CVE-2009-3459) high PDF_JS_ADOBE_APSB09_15_PATCH_GATE
PDF JavaScript gates the payload on the exact Adobe APSB09-15 patch boundary.
A single JS body that simultaneously checks Reader version against 9.2, 8.17 (=8.1.7), and 7.14 (=7.1.4) is fingerprinting the APSB09-15 patch range, which covered CVE-2009-2990 and CVE-2009-3459 (Adobe Reader U3D parser bugs). No benign script tests all three of those Reader version points together; this is exploit-kit dispatcher logic.
Annotation subject callee-key hex JavaScript stager high PDF_ANNOT_SUBJECT_CALLEE_HEX_STAGER
PDF JavaScript decodes an annotation /Subject payload with marker replacement and a callee.toString-derived key.
Agent-359xx/361xx style PDFs use syncAnnotScan()/getAnnots() only as a staging primitive: JavaScript reads an indirect annotation /Subject stream, rewrites marker bytes such as F/A/E or z to percent signs, or splits short delimiter-prefixed hex bytes such as mz/xyz. The recovered second-stage decoder then derives a small key from arguments.callee.toString() or an embedded numeric table to decode the final exploit JavaScript. The rule is emitted only after static decoding recovers exploit-like JavaScript; the exact CVE is then assigned by scanning the recovered stage for the real vulnerable API.
Base-N pair JavaScript stager high PDF_BASE_N_PAIR_JS_STAGER
PDF JavaScript rebuilds an exploit stage from base-N character pairs.
Some PDF exploit kits store the real payload as a long string of two-character tokens, decode each pair with parseInt(radix), turn the bytes into JavaScript with String.fromCharCode, and eval the result. The rule is bounded to long pair tables in JavaScript streams and only fires when the recovered stage contains concrete exploit markers.
CFF CharString excessive subroutine calls high PDF_CFF_CHARSTRING_SUBR_STORM
CFF CharStrings contain an unusually high number of subroutine calls.
Dense callsubr/callgsubr usage stresses call-stack, bias, and bounds logic in CFF interpreters.
CFF CharString operand stack underflow high PDF_CFF_CHARSTRING_STACK_UNDERFLOW
Type 2 CharString bytecode invokes an operator without enough operands.
Underflow forces different font interpreters to reject, pad, or continue from corrupted state, which is the kind of parser divergence used by font-engine exploits.
CFF INDEX has an invalid offSize high PDF_CFF_OFFSIZE_INVALID
CFF INDEX or header declares an offSize outside the spec-allowed 1..4 range.
Implementations that accept the invalid value as a hint for offset-array stride read or write off-by-N-byte misaligned data — a shape associated with multiple Acrobat font-engine CVEs.
CFF INDEX offset array is not monotonically non-decreasing high PDF_CFF_INDEX_NOT_MONOTONIC
CFF INDEX's offset array contains entries that decrease, so successive elements appear in unexpected order.
Renderers that compute element sizes via subtraction (offset[i+1] - offset[i]) read negative or implausibly large values when offsets are non-monotonic — a known bug class in CFF parsers.
CFF INDEX offsets extend past stream end high PDF_CFF_INDEX_OFFSET_OVERFLOW
CFF INDEX offset array or data section is declared to extend beyond the available font bytes.
Renderers that follow the declared offsets read attacker-influenced bytes from adjacent memory. This is the structural shape behind several Acrobat CFF-parser CVEs.
CFF Private DICT offset points outside font high PDF_CFF_PRIVATE_DICT_OUT_OF_RANGE
CFF Top DICT points the Private DICT outside the embedded font stream.
The Private DICT controls subroutine and hint metadata. Out-of-range Private offsets create parser divergence and are a useful font-engine exploit primitive.
Character-table JavaScript eval stager high PDF_JS_CHAR_TABLE_EVAL_STAGER
PDF JavaScript rebuilds an exploit stage through character-table indexes and eval.
Older PDF exploit kits hide the real Adobe Reader exploit APIs by keeping a small alphabet string and appending hundreds of single-character substr/charAt lookups into an array before join()+eval. This rule is emitted only after the bounded static decoder reconstructs an exploit-like stage, making it a low-cost fallback when exact CVE signatures are unavailable or were hidden from the first scan pass.
Compressed object stream hides active PDF content high PDF_OBJSTM_ACTIVE_CONTENT
A PDF /ObjStm stream contains active-content keys such as /JavaScript or /OpenAction.
Object streams are valid PDF, but hiding executable objects inside compressed object streams is a common way to bypass simple static scanners that do not expand /ObjStm content.
Embedded JS stream high PDF_JS
PDF references a /JS stream with inline JavaScript code.
An inline JavaScript stream can contain obfuscated exploit code that triggers when the PDF is opened. It is a red flag unless the PDF is a known interactive form.
Embedded script payload in PDF stream high PDF_EMBEDDED_SCRIPT_PAYLOAD
PDF stream bytes contain Windows or HTML script execution markers.
ActiveXObject/CreateObject, WScript.Shell, PowerShell, ADODB.Stream, and HTML <script> markers inside ordinary PDF streams indicate a hidden second-stage script payload rather than normal PDF JavaScript.
Encrypted PDF carrying executable triggers high PDF_ENCRYPTED_WITH_JS
PDF declares /Encrypt and also contains /JavaScript, /JS, /OpenAction, /AA, or /Launch — payload is hidden from static analysis.
Document encryption hides the JavaScript body and stream contents from static scanners. The combination with executable triggers is unusual and worth review because it can hide payloads from static inspection. Real-world droppers may use empty user passwords so the reader decrypts and runs the payload without prompting for a password.
Escaped URI image lure high PDF_ESCAPED_URI_IMAGE_LURE
PDF image lure hides its clickable HTTP(S) URI with PDF octal string escapes.
PDF literal strings may legally encode characters as octal escapes, but phishing carriers often encode URL punctuation this way so simple URL extractors miss the destination. Combined with an image-heavy, low-text document, this is a strong screenshot-lure signal.
Hidden HTML iframe in PDF high PDF_HIDDEN_HTML_IFRAME
PDF bytes contain a zero-size external HTML iframe.
A hidden iframe pointing to an external URL is a browser exploit-kit or redirect/dropper pattern. It is not normal PDF structure, so this rule is high-signal while remaining cheap to evaluate.
ICC tag offset+size lies outside the profile high PDF_ICC_TAG_OUT_OF_RANGE
ICC tag entry points at byte ranges outside the embedded profile (or inside the tag-table region).
Colour-management stacks that follow the offset blindly read attacker-influenced bytes from adjacent memory. This shape has driven multiple ICC-parser CVEs and remains a regular finding in font/colour fuzzing campaigns.
Image lure with local builder path and remote links high PDF_IMAGE_LURE_LOCAL_FILE_AND_REMOTE_URI
Image-only PDF contains both remote HTTP(S) links and a local file:/// builder path.
A scanned or photo PDF may legitimately be image-only, and a normal document may contain external links. The suspicious combination here is narrower: a click-action image lure exposes a local file:/// path such as a user desktop, appdata, temp, or generator work directory while also linking to remote web infrastructure. That points to generated clickbait/phishing carriers rather than a normal document workflow.
Image/button lure to file-hosting download high PDF_FILE_HOSTING_DOWNLOAD_LURE
PDF screenshot/button lure links to a public file-hosting download endpoint.
This rule combines multiple signals: the PDF is image-only or nearly image-only, contains a clickable PDF action/button, and the target URL points to a public file-hosting download endpoint such as Pixeldrain, Gofile, Filemail, file.io, transfer.sh, Catbox, MediaFire, or Workupload. That combination is much stronger than a generic URI or image-only PDF because it matches malware-delivery lures where the visible page is just a fake document/download prompt and the actual payload is retrieved from external hosting.
JBIG2 segment refers forward to a later segment high PDF_JBIG2_FORWARD_REFERENCE
JBIG2 segment refers to one or more later segments by number.
Spec-conformant JBIG2 streams only refer backwards. Forward references are the structural shape that drove the FORCEDENTRY family of JBIG2 0-days (CVE-2021-30860 and relatives) — they confuse refcount tracking when the renderer dereferences a segment that has not been parsed yet.
JBIG2 segment refers to an undefined segment high PDF_JBIG2_REFERRED_OUT_OF_RANGE
JBIG2 segment refers to a segment number that has not been declared earlier in the stream.
This is the JBIG2 equivalent of a dangling pointer. Renderers that treat the missing segment as null follow a different code path than renderers that abort or read uninitialised state — a known parser-confusion primitive.
JBIG2 segment refers to itself high PDF_JBIG2_SELF_REFERENCE
JBIG2 segment lists its own segment number in its referred-to list.
The JBIG2 spec does not define what happens when a segment refers to itself, and renderers that follow refcount paths through self-references are a textbook exploit shape (recursive resolution, double-free, use-after-free).
JBIG2 unknown-length form on non-generic-region segment high PDF_JBIG2_DATA_LENGTH_OVERFLOW
JBIG2 segment uses the 0xFFFFFFFF 'unknown length' form on a segment type other than generic region.
The spec restricts the unknown-length form to generic-region segments (types 36, 38, 39). Using it on other segment types lets the parser continue past the intended segment boundary and read attacker-controlled bytes from later segments.
JPEG2000 COD declares too many decomposition levels high PDF_JPX_COD_DECOMP_LEVELS_HIGH
COD marker declares more than the spec-maximum 32 wavelet decomposition levels.
Buffers and lookup tables sized from the decomposition-level count overflow on real implementations once the count exceeds 32.
JPEG2000 PCLR palette declares too many entries high PDF_JPX_PCLR_OVERSIZE
PCLR (palette) sub-box declares more than the spec-maximum 1024 entries.
Palette allocators sized from this 16-bit field overflow when the declared entry count exceeds 1024 — a known-vulnerable shape across multiple JPEG2000 implementations.
JPEG2000 SIZ marker has anomalous parameters high PDF_JPX_SIZ_ANOMALY
JPEG2000 SIZ marker declares image dimensions, image offsets, or component counts outside plausible ranges.
SIZ overflows have triggered several past JPEG2000 parser bugs because downstream allocators multiply width × height × components × bit-depth. Zero dimensions, image offsets at or past dimensions, and absurd component counts are all known-bad shapes.
JPEG2000 box declares an impossibly small size high PDF_JPX_BOX_TOO_SMALL
JP2 box header declares a total length less than 8 bytes (the minimum for the size+type header alone).
Spec-conformant boxes are at least 8 bytes; smaller declarations cause the box walker to either loop or read box headers from the middle of the previous box's body, a known evasion shape.
JPEG2000 box extends past stream end high PDF_JPX_BOX_TRUNCATED
JP2 box declares a length that runs past the available stream bytes.
Different readers handle the broken box differently — some clamp to stream end and continue, some abort, some resync to the next plausible box header — leading to divergent interpretation of image geometry.
JPEG2000 codestream missing SOC start marker high PDF_JPX_NO_SOC
jp2c codestream box does not begin with the required Start Of Codestream (FF 4F) marker.
Renderers that tolerate the missing marker and those that abort see different content. Some implementations search forward for the next FF xx marker, which lets attacker-supplied bytes between the box header and the SOC be treated as codestream content.
JPEG2000 jp2h missing required ihdr sub-box high PDF_JPX_JP2H_MISSING_IHDR
jp2h header box does not begin with the mandatory ihdr (image header) sub-box.
Strict readers reject the file; lenient readers parse downstream sub-boxes anyway and infer image dimensions from defaults or from later codestream markers — leading to differing interpretations of image geometry between scanner and viewer.
JPEG2000 top-level boxes overlap high PDF_JPX_BOX_OVERLAP
Two top-level JP2 boxes claim overlapping byte ranges.
The spec requires non-overlapping box concatenation. Overlap lets a reader interpret the same bytes as two different boxes — a deliberate way to hide content from one parser while presenting it to another.
JavaScript action high PDF_JAVASCRIPT
PDF contains a /JavaScript action.
JavaScript embedded in a PDF can interact with the viewer, exploit vulnerabilities, or download external content. Most legitimate PDFs do not need JavaScript. This is the most common PDF exploit vector.
JavaScript heap-spray launcher high PDF_JS_HEAPSPRAY
PDF JS schedules a callback with a multi-kilobyte string (heap-spray primitive).
app.setTimeOut / app.setInterval with a multi-kilobyte string argument is the common PDF heap-spray primitive: it fills the renderer's heap with attacker-controlled bytes so a corrupted pointer lands in the spray.
Large character-table JavaScript eval stager high PDF_JS_LARGE_CHAR_TABLE_EVAL_STAGER
PDF JavaScript uses a large numeric index table and indirect eval to rebuild a hidden stage.
Older PDF exploit kits sometimes keep the first recovered stage encrypted or otherwise encoded, so a static decoder cannot always validate a final CVE API. This rule catches the high-confidence launcher shape itself: a large ar[] numeric table, a short cc character table, anti-analysis exception scaffolding such as loadXML({}), and an indirect eval sink.
Malformed PDF with no object graph high PDF_MALFORMED_NO_OBJECT_GRAPH
File has a PDF header but no indirect objects, xref table/stream, or startxref pointer.
Normal PDFs need an object graph and cross-reference structure. A large PDF-header blob with no objects is not renderable content; it is more consistent with parser fuzzing, evasion, corruption, or an exploit test case than a benign document.
Obfuscated JavaScript getURL redirector high PDF_JS_OBFUSCATED_GETURL_REDIRECTOR
PDF document JavaScript opens an obfuscated redirector URL with getURL().
The rule is constrained to document-level JavaScript that calls getURL() with a percent-escaped HTTP(S) URL and a redirector-style endpoint such as /in.cgi or /go.php. This catches redirect-carrier PDFs where the outbound URL is hidden in JavaScript instead of a normal /URI action, while avoiding broad matches on benign visible getURL links. This is malicious routing behavior, not a PDF parser CVE fingerprint.
Obfuscated multi-stage PDF JavaScript dropper high PDF_JS_OBFUSCATED_DROPPER
Composite signal of pre-2011 Adobe Reader exploit-kit dropper shape.
Fires when the PDF JavaScript shows three or more independent signals of exploit-kit-style multi-stage obfuscation: annotation-subject payload staging (reading pr[N].subject after getAnnots), String.fromCharCode hex decoder loops, long -hh-hh-hh hex-dashed payloads, incremental construction of a method name starting from 'ev' (to hide an eval call), and three or more app.plugIns.length anti-analysis gates. The actual CVE is hidden in the final decoded layer and is not visible via static analysis, but the template is strongly consistent with exploit-kit style payload staging.
Object defined twice with divergent /Filter chains high PDF_DUPLICATE_OBJ_DIVERGENT
Same indirect object (N G) is defined more than once in the file, and the definitions declare different /Filter chains.
Readers that take the first definition decode different bytes than readers that take the last definition (PDF spec is ambiguous on which wins; Acrobat takes last). Divergent filter chains across the duplicates is a deliberate parser-divergence pattern: benign content is shown to scanners while malicious content is shown to the actual reader.
OpenAction trigger high PDF_OPENACTION
PDF has an /OpenAction that performs an action when the file is opened.
OpenAction specifies an action or destination to perform when the document is opened. It can execute JavaScript when paired with a JavaScript action, but OpenAction is not always code by itself.
OpenType EBSC max-range record with bitmap tables high PDF_OPENTYPE_EBSC_WITH_SBIT
Malformed EBSC max-range record appears alongside EBLC/EBDT bitmap tables.
EBSC is used for embedded bitmap scaling metadata. A max-range EBSC record paired with bitmap glyph tables is an Adobe libCoolType-specific parser differential related to CVE-2023-26369.
OpenType EBSC table declares max offset and length high PDF_OPENTYPE_EBSC_MAX_RANGE
sfnt EBSC table record declares offset=0xffffffff and length=0xffffffff.
Project Zero noted the CVE-2023-26369 proof-of-concept font used this malformed EBSC table record to prevent the font from loading in many font parsing libraries while Adobe libCoolType still processed the bitmap tables.
OpenType cmap subtable offset out of range high PDF_OPENTYPE_CMAP_OFFSET_OUT_OF_RANGE
A cmap encoding record points outside the cmap table.
Out-of-range cmap offsets are a common structural parser bug shape.
OpenType embedded-bitmap component placement exceeds bitmap buffer high PDF_OPENTYPE_SBIT_COMPONENT_OOB
EBLC/EBDT compound bitmap glyph metadata positions a component beyond the computed bitmap buffer.
CVE-2023-26369 exploited missing bounds checks in Adobe libCoolType's sfac_GetSbitBitmap when merging embedded bitmap glyph components. The scanner computes the bitmap buffer size from glyph metrics and flags component offsets whose merge index exceeds that buffer.
OpenType glyph offset outside glyf table high PDF_OPENTYPE_GLYF_OFFSET_OUT_OF_RANGE
A loca entry points beyond the glyf table.
Out-of-range glyph offsets are a direct font parser memory-safety primitive.
OpenType head table is truncated high PDF_OPENTYPE_HEAD_TRUNCATED
The head table is too short to carry indexToLocFormat.
Without a valid indexToLocFormat, loca offsets can be interpreted with the wrong width.
OpenType invalid loca format high PDF_OPENTYPE_LOCA_FORMAT_INVALID
head.indexToLocFormat is outside the valid 0/1 range.
Invalid loca format values create parser divergence in glyph offset decoding.
OpenType loca offsets decrease high PDF_OPENTYPE_LOCA_NOT_MONOTONIC
loca glyph offsets are not monotonically increasing.
Decreasing glyph offsets imply overlapping or negative-size glyf records.
OpenType loca table too short high PDF_OPENTYPE_LOCA_TRUNCATED
loca cannot hold numGlyphs+1 offsets.
A too-short loca table can make glyph lookup read past the embedded table.
OpenType maxp table is truncated high PDF_OPENTYPE_MAXP_TRUNCATED
The maxp table is too short to declare numGlyphs.
Glyph table validation depends on maxp.numGlyphs. A truncated maxp table can make readers disagree over glyph bounds.
OpenType table record points outside the font high PDF_OPENTYPE_TABLE_OUT_OF_RANGE
sfnt table-record entry's offset+length lies beyond the embedded font bytes.
Renderers that follow the offset blindly read attacker-influenced bytes from adjacent memory. The OpenType table directory is the first structure parsed in any sfnt font, so this affects every font-engine code path.
PDF JavaScript ActiveX downloader high PDF_JS_ACTIVEX_DOWNLOADER
Decoded PDF JavaScript downloads, writes, and executes a Windows payload through ActiveX.
The detector requires co-occurring ActiveXObject, XMLHTTP/WinHTTP, ADODB.Stream or ResponseBody file-write behavior, and WScript/rundll32-style execution in a recovered PDF JavaScript stage. This is a precise commodity downloader signature, not a specific Acrobat parser CVE.
PDF JavaScript WScript downloader high PDF_JS_WSCRIPT_DOWNLOADER
Decoded PDF JavaScript reconstructs a Windows Script Host downloader.
The detector requires WScript.CreateObject/WScript.Shell together with XMLHTTP or WinHTTP download behavior, ADODB.Stream/SaveToFile style file writing, and Run/cmd execution markers in a recovered PDF JavaScript stage. This is a precise commodity downloader signature, not a specific Acrobat parser CVE.
PDF JavaScript object lifetime reuse pattern high PDF_JS_LIFETIME_REUSE_PATTERN
PDF JavaScript acquires, releases, delays, and then reuses a viewer-managed object.
Several Adobe Reader exploit chains abuse stale JavaScript wrappers for viewer-managed objects such as dataObjects, form fields, annotations, or media players. This rule looks for the behavioral sequence instead of one exact CVE: acquire an object, delete/remove/null it, use a timer or GC/heap pressure step, then access the same object family again.
PDF JavaScript shellcode contains an embedded download URL high PDF_JS_SHELLCODE_DOWNLOAD_URL
A URL was recovered from a %uXXXX shellcode run inside decoded PDF JavaScript.
Reader exploit shellcode stores its second-stage fetch URL as a run of little-endian %uXXXX Unicode escapes and downloads-and-executes it with a urlmon/URLDownloadToFile-style call. Recovering an http(s) URL from that byte stream is a concrete download/C2 indicator on its own, independent of which version-gated Acrobat CVE the surrounding script triggers. This is commodity downloader behaviour, so no specific CVE is asserted.
PDF URI command path high PDF_DANGEROUS_URI_COMMAND
PDF /URI action references a command interpreter or script host path.
Normal PDF URI actions point to web, mail, or document links. A URI using path traversal and command interpreter names such as cmd.exe is a legacy dropper/execution lure pattern and should not appear in benign documents.
PDF metadata JavaScript eval stager high PDF_METADATA_EVAL_STAGER
PDF JavaScript decodes document metadata fields and evals the recovered stage.
Some PDF exploit kits hide JavaScript or shellcode in metadata fields such as Title, Subject, Producer, or Keywords, then use parseInt/String.fromCharCode-style decoding and eval. The rule requires metadata field access, a decoder, and an eval sink, plus either multiple metadata fields or a large encoded base-N payload.
PDF metadata arithmetic JavaScript stager high PDF_INFO_ARITHMETIC_JS_STAGER
PDF Info metadata rebuilds an exploit stage through arithmetic char-code tokens.
Some PDF exploit kits hide the real JavaScript in Info metadata fields as comma-separated arithmetic tokens such as t9.5*w or n*7.375. A small launcher reads metadata like this.producer or this.title, rebuilds the JavaScript with String.fromCharCode, and evals it. The rule fires only after a bounded decoder recovers exploit-like JavaScript from that metadata/launcher shape.
PDF parsers disagree on structural counts high PDF_PARSER_DIVERGENCE
Two independent PDF parsers produced significantly different counts of streams or pages on the same bytes.
Exploitation samples routinely rely on parser confusion: one reader processes the file one way, a vulnerable target reads it another. When the in-process byte-level scanner and a second parser (pdfminer.six) disagree on basic structural counts, the file is almost always either corrupt or deliberately crafted to be ambiguous — both are higher-risk than a clean parse.
PRC stream missing 'PRC' magic high PDF_PRC_HEADER_INVALID
PRC stream does not begin with the ASCII bytes 'PRC' at offset 0.
Adobe's Product Representation Compact format begins with the 'PRC' magic. A missing or wrong magic on a closed-format parser surface that is rarely exercised is an interesting unknown-exploit signal — fewer eyes have looked at PRC parsing than at any of the open 3D formats.
Page-word XOR JavaScript eval stager high PDF_PAGE_WORD_XOR_EVAL_STAGER
PDF JavaScript rebuilds and evals a hidden stage from rendered page words.
Older PDF exploit kits hide byte values in visible page text, then use getPageNumWords()/getPageNthWord() to enumerate words, take byte-like fragments, XOR-decode them, and eval the recovered JavaScript. The rule requires the page-word APIs, char-code decoding, an eval sink, and XOR logic, keeping it narrower than a generic JavaScript/OpenAction match.
Prototype-pollution JavaScript pattern high CVE related PDF_JS_PROTOTYPE_POLLUTION
PDF JavaScript mutates prototypes and references privileged PDF APIs.
Prototype pollution is a modern JavaScript exploitation technique. This rule matches __proto__, constructor.prototype, or Object.prototype mutation alongside privileged APIs such as trustedFunction, launchURL, submitForm, getField, or readFileIntoStream. It deliberately tracks the technique without assigning an unverified CVE number.
QR-code business verification phishing lure high PDF_QR_PHISHING_LURE
PDF combines a QR-like image with scan/verification/business-process lure text.
QR-code phishing can hide the target URL inside image pixels so static URL rules never see a destination. This rule does not need to decode the QR payload: it requires a QR-like square image plus visible text instructing the recipient to scan or use the QR code for verification, HR, payroll, policy, email, signature, or similar business-process activity. The co-occurrence keeps normal QR codes in brochures or invoices from being flagged on image shape alone.
RichMedia (Flash) high PDF_RICHMEDIA
PDF contains /RichMedia (Adobe Flash content).
Flash has a long history of critical vulnerabilities. Embedded Flash in PDFs was a major exploit vector. Flash is now end-of-life, making any Flash content in a PDF highly suspicious.
Type 1 CharString callOtherSubr stack-pivot sequence high PDF_TYPE1_CALLOTHERSUBR_STACK_PIVOT
Decrypted Type 1 CharString contains repeated get/callOtherSubr bytecode sequences.
Adobe Type 1 CharStrings are stack-machine bytecode. Repeated get/callOtherSubr sequences are unusual in normal glyph programs and match the primitive used by CVE-2021-21086-style CoolType operand-stack manipulation.
Type 1 CharString operand stack grows beyond spec high PDF_TYPE1_CHARSTRING_STACK_OVERFLOW
Decrypted Type 1 CharString bytecode pushes more operands than expected.
Type 1 CharStrings are stack bytecode. Operand-stack overflow is a font-engine exploit primitive.
Type 1 CharString operand stack underflow high PDF_TYPE1_CHARSTRING_STACK_UNDERFLOW
Decrypted Type 1 CharString bytecode consumes operands that are unavailable.
Stack underflow in Type 1 bytecode can lead to interpreter state corruption or parser divergence.
U3D block declares an implausibly large size high PDF_U3D_HUGE_BLOCK_SIZE
U3D block declares a data or metadata section size beyond any plausible legitimate value.
Allocators sized from these 32-bit fields are a recurring exploit primitive in 3D engines. Real-world U3D blocks rarely exceed a few megabytes; tens of megabytes is consistent with adversarial construction.
U3D block extends past stream end high PDF_U3D_BLOCK_TRUNCATED
U3D block declares a total size that runs past the available stream bytes.
Renderers that resync to the next plausible block header read attacker-controlled bytes from the gap between the truncated block and the next header. Real-world U3D files produced by 3D tooling are byte-aligned and never declare a size larger than the file.
U3D stream missing or wrong File Header block high PDF_U3D_HEADER_MISSING
U3D stream does not begin with the mandatory File Header block (type 0x00443355).
ECMA-363 requires the first U3D block to be the File Header. A different first block is a parser-divergence shape on a low-traffic attack surface that few static analysers cover deeply. Truncated streams that fail to contain even the 30-byte minimum header are reported under this same rule.
XFA form contains executable script high PDF_XFA_SCRIPT
PDF embeds an XFA dataset with a <script> or <xfa:script> block.
XFA scripting has been the exploit primitive for several Adobe Reader RCEs (CVE-2010-0188 family, CVE-2018-4901). Plain XFA without scripts is far less risky.
XFA numeric JavaScript stager high PDF_XFA_NUMERIC_JS_STAGER
PDF XFA script rebuilds hidden JavaScript from numeric field data or a character table.
Some XFA exploit kits store the real JavaScript as numeric values in form fields or as indexes into a short character table, then rebuild and eval it during an initialize event. The rule is bounded to XFA script packets and only fires after static decoding recovers exploit-like JavaScript or shellcode markers.
XFA numeric character-table eval stager high PDF_XFA_NUMERIC_EVAL_STAGER
XFA initialize script maps numeric rawValue data through a character table and evals it.
This is a bounded fallback for XFA exploit-kit launchers whose final recovered stage remains encoded. It requires an XFA script, rawValue numeric staging, a short cc character table, a long reconstruction loop, and an eval-like sink.
app.launchURL with file/cmd/UNC target high PDF_FOXIT_LAUNCHURL
PDF JavaScript launches a URL with a file://, cmd:, or UNC scheme.
Foxit and Adobe handle these schemes inconsistently — they have been used for code execution and NTLM credential theft (the latter via UNC paths).
eval() call high PDF_EVAL
JavaScript eval() function found in PDF.
eval() executes a string as code. In malicious PDFs it is often used for dynamically constructed or decoded exploit code, making static analysis harder.
getAnnots heap-spray JavaScript stager high CVE related PDF_JS_GETANNOTS_HEAPSPRAY_STAGER
PDF JavaScript pairs getAnnots with heap-spray shellcode and an embedded payload.
The document calls getAnnots() in a context containing classic Adobe Reader heap-spray markers and an embedded payload. This is CVE-2009-1492-related evidence, but is not exact CVE attribution unless the getAnnots argument has the distinctive overflow/long-string trigger shape.
unescape() call high PDF_UNESCAPE
JavaScript unescape() function found in PDF.
unescape() decodes percent-encoded strings. In PDF exploits, it is commonly used to convert encoded shellcode back to raw bytes before triggering a vulnerability.
xref table points away from the real object high PDF_XREF_OFFSET_MISMATCH
PDF cross-reference table claims object N is at byte offset O, but the bytes at O do not begin with the expected 'N G obj' header.
Readers that trust the xref will resolve the indirect reference to one set of bytes; readers that scan the file linearly will resolve it to another. This parser-divergence shape is a recurring evasion technique in targeted PDF exploits.
ASCII85Decode filter (with exploit indicators) medium PDF_FILTER_85
PDF uses ASCII85Decode stream filter alongside active scripting content.
ASCII85 is a relatively uncommon encoding. Like ASCIIHexDecode, it has legitimate uses, so we only flag it when it co-occurs with active scripting content (/JavaScript, /JS, /XFA, or /RichMedia).
ASCIIHexDecode filter (with exploit indicators) medium PDF_FILTER_HEX
PDF uses ASCIIHexDecode stream filter alongside active scripting content.
ASCIIHexDecode is legitimately used by some scanned-document and PostScript-derived PDFs, so on its own it's noise. This rule only fires when the filter co-occurs with active scripting content (/JavaScript, /JS, /XFA, or /RichMedia) — the shape associated with payload obfuscation.
Additional Actions dictionary medium PDF_AA
PDF defines /AA (Additional Actions) triggers.
Additional Actions can fire on events like page open, print, or close. They are often used to trigger script or external actions during viewing.
CFF CharString is unusually large medium PDF_CFF_CHARSTRING_HUGE
A single CFF Type 2 glyph program is far larger than expected.
Normal glyph programs are small. Very large CharStrings often indicate bytecode-as-payload, malformed subroutine graphs, or exploit grooming.
CFF INDEX declares an implausibly large entry count medium PDF_CFF_INDEX_COUNT_HUGE
CFF INDEX (Name / Top DICT / String / Subrs / CharStrings) declares thousands of entries.
Real fonts have at most hundreds of entries in any single INDEX. Allocators sized from this 16-bit field have been a recurring exploit primitive in font rasterisers.
CFF INDEX first offset != 1 medium PDF_CFF_INDEX_FIRST_OFFSET_WRONG
CFF INDEX's first offset entry is not 1 (the spec-mandated value).
Some implementations validate; others trust the value and read from the wrong byte. Real-world fonts produced by standard tooling always have first offset = 1.
CFF font header is truncated or malformed medium PDF_CFF_HEADER_TRUNCATED
CFF font header is structurally invalid: header size out of range, or header runs past the stream length.
Different font rasterisers handle truncated CFF headers differently — some abort, some pad with zeros and continue — leading to divergent glyph rasterisation between scanner and viewer.
Cracked-software link-farm lure medium PDF_CRACKED_SOFTWARE_LURE
PDF links advertise cracked/pirated software (crack, keygen, serial key, warez).
These PDFs are SEO-spam carriers: they pack many clickable links whose targets use software-piracy vocabulary so the document ranks for '<app> crack download' searches, then routes users to fake crack pages serving potentially-unwanted programs, adware, or droppers. The rule fires on several distinct links carrying piracy-specific tokens in the URL itself, so ordinary documents that merely mention the word are not affected. The PDF carries no exploit of its own — the risk is the linked destinations, so it is capped at suspicious.
Escaped URL shortener medium PDF_ESCAPED_SHORTENER_URI
PDF hides a clickable URL-shortener destination with PDF string escapes.
PDF literal strings can legally encode punctuation with octal escapes, but pairing that obfuscation with a URL shortener is a stronger phishing signal than a normal visible link. Attackers use this to defeat simple URL extractors while hiding the final landing page behind a redirector.
High stream count medium PDF_MANY_STREAMS
PDF contains 500+ stream objects.
An abnormally high stream count may indicate heap spraying (filling memory with repeated data) or heavy obfuscation of the PDF structure. Threshold is 500 to avoid flagging legitimate technical / textbook PDFs.
ICC profile contains a duplicate tag signature medium PDF_ICC_DUPLICATE_TAG_SIG
Same ICC tag signature appears more than once in the tag table.
Per spec each signature must be unique. Implementations that take the first vs. the last copy produce different colour transforms — a shape used to hide an attacker-chosen mAB./mBA. pipeline behind a benign-looking earlier entry.
ICC profile declares an implausibly large tag count medium PDF_ICC_TAG_COUNT_HUGE
ICC profile declares more than ~256 tag entries; real-world profiles have at most a few dozen.
Implausibly large tag counts are consistent with hand-crafted profiles designed to exhaust allocator state or trigger integer overflows in the tag-table indexing arithmetic.
ICC profile size disagrees with embedded length medium PDF_ICC_SIZE_MISMATCH
ICC profile header field 'profile size' does not match the actual length of the embedded profile bytes (or the tag table extends past the bytes available).
Colour-management stacks that trust the header field and those that trust the container length read different bytes. The CVE-2018-4990 family of ICC-parser bugs lived in exactly this disagreement.
ICC tag has size=0 with non-zero offset medium PDF_ICC_TAG_ZERO_SIZE_NONZERO_OFFSET
ICC tag declares zero data length but a non-zero offset.
Implementations that prefetch the offset before checking size read out-of-bounds bytes. Real-world profiles either use zero/zero or non-zero/non-zero — the mismatched form is consistent with adversarial construction.
JBIG2 segment-header walk aborted medium PDF_JBIG2_HEADER_TRUNCATED
JBIG2 stream segment-header walk aborted before reaching the end of the stream.
A JBIG2 segment header could not be parsed cleanly: a length field claimed more bytes than the stream contains, a referred-to-segment count was truncated, or a reserved code was used. Renderers that fail open on broken headers parse different segment data than renderers that abort.
JBIG2 stream contains an implausibly large number of segments medium PDF_JBIG2_HUGE_SEGMENT_COUNT
JBIG2 stream declares thousands of segments where real-world scanned-document JBIG2 typically contains tens to a few hundred.
Hand-crafted JBIG2 streams designed to stress the segment-graph allocator or refcounter are a recurring exploit primitive. On its own this is a weak signal but it contributes when paired with other JBIG2 anomaly rules.
JBIG2Decode filter medium PDF_JBIG2
PDF uses JBIG2Decode image compression.
JBIG2 is a complex image codec. Vulnerabilities in JBIG2 decoders have been exploited in high-profile zero-click attacks (e.g. NSO Group's FORCEDENTRY, CVE-2021-30860).
OpenType / sfnt directory declares too many tables medium PDF_OPENTYPE_NUMTABLES_HUGE
sfnt offset table declares more than ~64 tables, well beyond any realistic font.
Allocators sized from the numTables field are an exploit primitive. Real OpenType / TrueType fonts contain on the order of 10–25 tables.
OpenType cmap declares too many subtables medium PDF_OPENTYPE_CMAP_SUBTABLES_HUGE
cmap declares an implausibly large number of subtables.
Very large subtable counts stress parser loops and allocations.
OpenType cmap table is truncated medium PDF_OPENTYPE_CMAP_TRUNCATED
cmap header or encoding records extend past the table.
Malformed character-map tables can make different text/glyph paths disagree.
OpenType directory contains a duplicate table tag medium PDF_OPENTYPE_DUPLICATE_TABLE
sfnt directory contains the same 4-byte table tag more than once.
Renderers that take the first record vs. the last record produce different glyph rasterisation. Duplicated table tags are not produced by any standard font tooling.
OpenType maxp declares implausibly many glyphs medium PDF_OPENTYPE_MAXP_GLYPHS_HUGE
maxp.numGlyphs is far beyond typical embedded PDF fonts.
Huge glyph counts stress loca/glyf allocation and iteration paths.
OpenType name string offset out of range medium PDF_OPENTYPE_NAME_STRING_OUT_OF_RANGE
A name record points outside name table string storage.
Out-of-range name strings are a low-level font table consistency violation.
OpenType name table declares too many records medium PDF_OPENTYPE_NAME_RECORDS_HUGE
The name table record count is implausibly large.
Huge name record counts stress table-walking logic.
OpenType name table is truncated medium PDF_OPENTYPE_NAME_TRUNCATED
name records or string storage point outside the name table.
Malformed name tables are useful parser-divergence evidence when paired with other font anomalies.
PDF paints image(s) but contains no text operators medium PDF_IMAGE_ONLY_LURE
PDF has at least one image XObject and zero text-emitting operators in raw or decompressed content streams.
Phishing PDFs are often built by exporting a screenshot to PDF: a single page with one or more image XObjects and no text. The carrier evades text-based scanners (no keywords to match) and delivers its call-to-action purely through rendered pixels — a phone number to call, a QR code to scan, or a link the user is told to type. Distinct from PDF_IMAGE_LURE, which requires a small file and an in-PDF click-action; this rule has neither constraint and looks inside compressed content streams to avoid false positives on real text-bearing PDFs.
Raw-IP clickable URI medium PDF_URI_IP_LITERAL
PDF clickable URI points to a literal IPv4 address.
Legitimate PDFs normally link to named domains. Clickable HTTP(S) links to raw IP addresses are common in disposable phishing and malware-delivery infrastructure, especially when paired with link annotations or screenshot lures.
Remote GoTo action medium PDF_GOTO_REMOTE
PDF references a remote or embedded document via GoToR/GoToE.
GoToR/GoToE can open another PDF or trigger loading of a remote resource, potentially bypassing security controls by chaining documents.
Stream /Length disagrees with actual byte count medium PDF_LENGTH_MISMATCH
PDF stream object declares a /Length that does not match the actual bytes between 'stream' and 'endstream'.
Different PDF readers resolve stream length either from the declared /Length value or from the 'endstream' framing markers. When the two disagree, the same file renders as different content in different readers — a known evasion shape used to hide payload from static scanners that trust one source while the actual reader trusts the other.
Stream advertises a filter that cannot decode the body medium PDF_FILTER_CHAIN_UNDECODABLE
PDF stream declares /Filter /FlateDecode but the raw stream bytes are rejected by zlib in both wrapped and raw modes.
A renderer that aborts on the broken stream and one that fails-open see different document content. Targeted samples sometimes deliberately break the filter chain so that lighter-weight scanners skip the stream while heavier renderers still extract a payload from the partial data.
String.fromCharCode medium PDF_FROMCHARCODE
String.fromCharCode found in PDF JavaScript.
fromCharCode constructs strings from numeric character codes. Exploit authors use it to build payloads character by character to evade string-based detection.
SubmitForm action medium PDF_SUBMITFORM
PDF has a /SubmitForm action that can POST data to an external URL.
SubmitForm actions send form field data to a URL when triggered — potentially exfiltrating credentials, file paths, or system information to an attacker-controlled server. Caveat: legitimate PDF forms (government applications, grant submissions, enterprise surveys) do use /SubmitForm to post data to known servers. Check the target URL context before escalating this finding.
U3D stream contains an implausibly large number of blocks medium PDF_U3D_HUGE_BLOCK_COUNT
U3D stream contains thousands of blocks where real-world files typically contain at most a few hundred.
Hand-crafted U3D streams designed to stress the modifier-chain allocator or refcounter are a recurring exploit primitive on the U3D parser surface. Weak signal alone, contributes when paired with other 3D-content anomaly rules.
URL shortener link medium PDF_URL_SHORTENER_URI
PDF clickable URI points to a URL shortener.
Clickable URL-shortener links hide the final landing page from static review and are common in phishing redirect PDFs. This is stronger than a generic external URI because the visible destination is an intermediate redirect service rather than the actual site.
AcroForm button with action trigger low PDF_ACROFORM_BUTTON
PDF contains a /Btn form field paired with a SubmitForm/URI/Launch/JS trigger.
Large interactive form buttons are the common 'fake download button' in phishing PDFs. Attackers overlay a /Btn field on a screenshot of a legitimate document to create the illusion of a clickable download link. /Btn appears in essentially every fillable PDF form, so the rule only fires when paired with a remote-action trigger — the actual phishing-button shape.
Embedded file low PDF_EMBEDDED
PDF embeds a file attachment.
Embedded files can carry executables, scripts, or other malware. While some legitimate PDFs include attachments, this warrants inspection.
Image-only document (screenshot lure) low PDF_IMAGE_LURE
PDF contains many images but very few text blocks — possible screenshot lure.
A common phishing technique renders a screenshot of a legitimate document (e.g. a locked Word file, a DocuSign request) as a full-page image with no real text content, then overlays a form button or URI action on the image. Caveat: this heuristic has a high false-positive rate. Scanned documents (contracts, invoices, IDs), image-heavy brochures, and photo PDFs all trigger it legitimately. Use this finding only as supporting context alongside higher-severity indicators.
Indirect reference to undefined object low PDF_DANGLING_INDIRECT
PDF body contains an indirect reference (N G R) to an object number that is never defined in the file.
Lenient readers silently skip dangling references; strict readers may treat the slot as null and follow a different code path. On its own this is often just a corruption artefact — older scanner output and damaged PDF/A files commonly trip it — but it contributes weakly when paired with stronger anomaly signals.
Optional Content Group with action trigger low PDF_OPTIONAL_CONTENT
PDF uses Optional Content Groups (OCG) and contains an action trigger.
Optional Content Groups allow parts of a PDF to be shown or hidden. Attackers abuse this to show lure content on first open then hide it (defeating sandbox screenshots) while the action trigger still fires. OCGs alone are standard in CAD/technical, multilingual, and layered PDFs, so this rule only fires when an action trigger is also present.
XFA form low PDF_XFA
PDF uses XML Forms Architecture (XFA).
XFA forms can contain JavaScript and complex logic. Vulnerabilities in XFA parsers have been exploited in the past.
syncAnnotScan annotation-staging primitive low PDF_FOXIT_SYNCANNOTSCAN
PDF JavaScript calls syncAnnotScan() — an exploit-kit staging primitive used to force annotation enumeration before reading payload bytes from /Subject fields.
syncAnnotScan() is a legitimate no-argument Acrobat / Foxit JavaScript API that ensures all annotation objects are populated before getAnnots() is called. It is not a vulnerable sink and has no associated CVE. However, exploit-kit JavaScript routinely calls it as a staging step in the pattern 'z.syncAnnotScan(); var p = y.getAnnots({nPage:0}); var s = p[0].subject; ... eval(s)' — where the encoded payload was hidden in annotation /Subject fields. A bare call rarely appears in legitimate PDFs, so it is a low-severity exploit-kit-shape indicator on its own; combined with getAnnots() + subject reads + eval, the related rule PDF_JS_OBFUSCATED_DROPPER fires the high-severity composite finding.
Body-only duplicate object in PDF info PDF_DUPLICATE_OBJ_BODY_INCREMENTAL
Same indirect object (N G) is defined more than once with different body bytes.
Body-only duplicate objects are common in benign incremental updates and PDF editor save chains. The analyzer records the structure for explainability, but it is not treated as an unknown-exploit signal unless a duplicate body carries active content or divergent filters.
CFF CharString operand stack grows beyond spec info PDF_CFF_CHARSTRING_STACK_OVERFLOW
Type 2 CharString bytecode pushes more operands than the interpreter stack should hold.
CFF/Type 2 CharStrings are stack bytecode. Operand-stack overflow is a recurring font parser bug class and a strong structural exploit signal.
Encrypted document info PDF_ENCRYPTED
PDF declares /Encrypt — strings and stream contents are encrypted.
PDF document encryption applies the standard security handler's cipher (RC4 or AES) to all strings and stream contents. The keys (/JavaScript, /Filter, etc.) remain visible but their values do not. Many legitimate documents are encrypted (signed contracts, billing statements, rights-managed material); on its own this is informational, but it limits what the static scanner can see.
External URI info PDF_URI
PDF contains an external URL action.
The PDF links to an external website. While common in legitimate PDFs, malicious PDFs use URLs to redirect to phishing sites or malware downloads.
Object defined twice with different bodies info PDF_DUPLICATE_OBJ_DIVERGENT_BODY
Same indirect object (N G) is defined more than once with different body bytes.
Duplicate object bodies create first-wins versus last-wins parser divergence even when the /Filter chains look identical.
PDF differential parser failed info PDF_DIFFERENTIAL_PARSE_FAILED
The cross-check parser (pdfminer.six) raised an error on this file.
The analyzer cross-checks its in-process byte-level PDF scan against an independent pdfminer.six pass to surface parser-divergence exploit shapes. A pdfminer error here is itself a signal — malformed PDFs (corrupted xref, divergent duplicate objects, broken object streams) often deliberately defeat one parser while remaining renderable in another, which is the basis of several real-world PDF exploitation primitives. The static byte-level heuristics still ran on the file and their findings above are valid; only the differential cross-check signal is missing.

Office 133

Dangerous XLM formula APIs critical OOXML_XLM_DANGEROUS_FN
Excel 4.0 macro sheet uses formula APIs that call directly into Win32.
=CALL / =EXEC / =REGISTER / =FORMULA / =FOPEN — these are the primitives used by XLM-based droppers to download payloads, write files, and start processes without invoking VBA.
Embedded Adobe Flash (SWF) in Office document critical OFFICE_EMBEDDED_SWF
Office document contains an embedded SWF (Flash) object.
Vulnerabilities such as CVE-2018-4878 and CVE-2018-15982 involved Flash objects embedded in Office files. Adobe Flash has been end-of-life since December 2020.
Embedded Office document static findings critical EMBEDDED_OFFICE_CHILD_STATIC_TRIAGE
A carved embedded OLE Office document matched exploit or payload heuristics.
Some exploit samples are wrapped by a PE, binder, or other outer container while preserving a complete CFB/OLE Office document at a later offset. The engine carves that secondary Office body, runs the normal Office static rules on it, and promotes concrete CVE or high-risk child findings onto the parent sample.
Embedded Office object carries macros critical OFFICE_EMBEDDED_MACRO_OBJECT
An embedded OLE/OOXML object is itself an Office file that contains a VBA macro project or an Excel 4.0 (XLM) macro sheet.
Hiding a macro-bearing workbook or document inside another document — often under an obfuscated, non-standard part name — is a macro-smuggling technique that defeats scanners which only inspect the outer document's macro storage. No benign authoring workflow stages a hidden macro project this way, so an embedded Office file with its own macros is a strong delivery-vehicle indicator.
Embedded PE executable critical OLE_EMBEDDED_EXE
MZ/PE header found inside the document.
A Windows executable (PE file) is embedded inside the document. This is high-risk — the document is carrying an executable payload.
Encrypted Office package with CFB FAT corruption critical OLE_ENCRYPTED_AND_MALFORMED
Encrypted-package shape co-occurs with FAT-chain corruption — the canonical combined evasion form.
An OLE container that is both password-encrypted at the MS-OFFCRYPTO layer and structurally malformed at the FAT-chain level is the canonical evasion shape used to deliver exploit-carrier Office documents past email and gateway scanners. Each signal alone has some benign-occurrence rate; the combination has effectively no legitimate explanation — Excel opens the file because both its FAT walker and its encrypted-package loader are lenient, and that asymmetric tolerance against strict static scanners is the point.
Equation Editor FONT overflow — CVE-2017-11882 critical CVE_2017_11882
MTEF FONT record contains an overlong typeface field.
CVE-2017-11882 is the Equation Editor stack buffer overflow triggered by copying an overlong FONT typeface field into a fixed-size stack buffer in EQNEDT32.EXE. This rule requires the malformed MTEF record primitive, not just the Equation Editor CLSID.
Equation Editor OLE object critical OLE_EQUATION_EDITOR
Equation Editor OLE CLSID found in the document.
The Microsoft Equation Editor component had critical vulnerabilities (CVE-2017-11882, CVE-2018-0802) that allowed arbitrary code execution. Microsoft later removed the component. Finding its CLSID is a high-signal indicator, but the CLSID alone is related evidence rather than a full exploit match.
Equation Editor command stager — CVE-2017-11882 likely critical CVE_2017_11882_EQUATION_NATIVE_CMD
Equation Native stream has invalid MTEF structure and embedded command-launch bytes.
This rule requires Equation Editor CLSID context, an invalid Equation Native/MTEF header, and process-launch command bytes inside the native stream. That combination is a weaponized Equation Editor exploitation pattern consistent with CVE-2017-11882 while avoiding attribution from CLSID presence alone or from benign embedded equations.
Equation Editor object carries Ole10Native downloader shellcode critical OLE_EQUATION_OLE10NATIVE_DOWNLOADER
Equation Editor OLE object contains Ole10Native shellcode with download and process APIs.
An embedded OLE object declares the legacy Equation Editor CLSID and its Ole10Native stream contains URLDownloadToFile plus process-launch API strings and a remote URL. This is high-confidence exploit payload evidence. It is not assigned to a specific Equation Editor CVE unless the malformed Equation Native/MTEF primitive also matches.
Excel 5 Laroux/Larou-CV macro virus critical OLE_XLS5_LAROUX_MACRO_VIRUS
Legacy Excel workbook contains Laroux/Larou-CV auto-open replication markers.
Laroux-family Excel 5/95 macro viruses infect workbooks by auto-running legacy VBA and copying macro modules into other workbooks. This rule requires a family marker such as laroux, Larou-CV.xls, or big_dork together with auto_open and workbook/module replication strings, so ordinary legacy Excel files and textual references are not enough to trigger it.
Field QUOTE with ASCII-integer payload critical OOXML_FIELD_QUOTE_ASCII_PAYLOAD
A Word field QUOTE expression contains a decimal-ASCII byte sequence. The decoded payload is emitted at field-update time and is typically used to assemble shell-command text that does not appear literally in the document bytes.
Word's QUOTE field accepts a list of decimal byte values and emits them as text when the field is evaluated. Threat actors use this to defeat content-based filters that look for literal 'cmd'/'powershell' strings in document bytes — the dangerous string only exists after Word evaluates the field. When a SET/REF field chain references the QUOTE output from a DDE field, the resulting command runs on document open (MITRE ATT&CK T1559.002). Severity escalates to CRITICAL when the decoded payload references a known-dangerous executable (cmd, powershell, mshta, etc.); MEDIUM otherwise (form has no legitimate use case but no immediately-visible dangerous target).
LOLBin reference in VBA critical OLE_VBA_LOLBIN
VBA macro references a Living-off-the-Land binary (certutil, bitsadmin, mshta).
LOLBins are legitimate Windows tools that attackers misuse for malicious purposes — downloading files, executing scripts, or decoding payloads — because they are trusted by security software.
Legacy Excel formula macro virus marker critical OLE_XLS_FORMULA_MACRO_VIRUS
Workbook contains self-identifying legacy Excel formula macro virus strings.
Older Excel malware sometimes used worksheet formulas and hidden workbook content rather than VBA projects or modern XLM macro-sheet structures. This rule is intentionally narrow: it requires explicit formula-macro-virus markers such as XF.Classic/Poppy text in the Workbook stream, so documents that merely contain ordinary formulas are not flagged.
Legacy XLM macro-virus family marker critical OLE_XLM_LEGACY_MACRO_VIRUS
Workbook contains an XLM Auto_Open chain plus legacy macro-virus family strings.
Legacy Excel macro viruses commonly infected workbooks by adding Excel 4.0 macro sheets and Auto_Open/Auto_Close defined names. This rule requires that auto-execution structure plus specific family strings such as XL4Poppy, Normal_MacroVirus, or HPDung, keeping it narrower than a generic XLM-macro finding.
MSHTML-style external object relationship critical CVE related OFFICE_MSHTML_EXTERNAL_OBJECT
OOXML external relationship targets HTML/CAB/MHTML/HTA-style content.
This is an Office MSHTML attack-surface indicator related to CVE-2021-40444-style delivery, but it does not match the stricter external OLEObject gadget pattern used for the CVE_2021_40444 exact rule.
Malicious DDE command critical OOXML_DDE_MALICIOUS
A DDE field instruction launches a dangerous system executable (cmd.exe, PowerShell, mshta, etc.).
This document uses DDE to silently execute a system command. The DDE field references a known-dangerous executable such as cmd.exe, powershell.exe, mshta.exe, or a UNC path. When the user opens the document and agrees to 'update links' (or if DDEAUTO is used, without any prompt), the command runs immediately. This is a well-known attack technique (MITRE ATT&CK T1559.002) that bypasses macro security entirely — no macros need to be enabled.
Microsoft PowerPoint malformed record — CVE-2006-0022 critical CVE likely CVE_2006_0022
PowerPoint OLE Pictures stream is malformed and carries a PE-like payload.
CVE-2006-0022 is a Microsoft PowerPoint remote-code-execution vulnerability triggered by a malformed PowerPoint record. Weaponized samples from this era commonly hide executable payload bytes in the Pictures stream while damaging the stream's compound-file chain so tolerant PowerPoint parsing reaches bytes that ordinary OLE stream readers do not expose. This rule requires a PowerPoint OLE container, a large malformed Pictures stream, image-record material, and an embedded PE-like payload to limit false positives.
Microsoft PowerPoint malformed record — CVE-2006-3877 critical CVE likely CVE_2006_3877
PowerPoint OLE numbered Table stream is malformed and carries a PE-like payload.
CVE-2006-3877 is a Microsoft PowerPoint malformed-record memory corruption vulnerability. This rule requires a PowerPoint OLE container, a large numbered *Table stream whose CFB chain cannot be read normally, and an embedded PE-like payload with process injection or Office-resiliency cleanup strings. Those gates avoid tagging ordinary legacy decks that merely contain table streams.
Microsoft PowerPoint mso.dll malformed shape — CVE-2006-3590 critical CVE likely CVE_2006_3590
PowerPoint Pictures stream contains malformed shape-container material and shellcode.
CVE-2006-3590 is the MS06-048 Microsoft PowerPoint mso.dll remote-code-execution vulnerability triggered by a malformed shape container in a PPT file. In observed PPDropper-style decks, the Pictures stream begins with malformed Escher/shape material and carries PEB/API-resolver shellcode or a PE-like payload. The rule requires PowerPoint stream context plus specific shellcode/payload evidence to avoid ordinary picture stream false positives.
Microsoft Word malformed object pointer — CVE-2006-2492 critical CVE likely CVE_2006_2492
Word OLE object pointers are malformed and unreferenced sectors contain decoded shellcode.
CVE-2006-2492 is the MS06-027 Microsoft Word malformed object pointer vulnerability. This rule requires malformed CFB directory/object-pointer evidence, an impossible WordDocument declared size versus readable stream length, and a rotate-decoded Win32 shellcode payload that manipulates Word Resiliency/StartupItems registry keys.
Microsoft Word malformed string — CVE-2007-3899 critical CVE likely CVE_2007_3899
Word FIB points to a malformed DOP/string-table region with exploit payload evidence.
CVE-2007-3899 is the MS07-060 Microsoft Word malformed-string memory-corruption vulnerability. The rule validates Word OLE context, a Word 97-family FIB, an abnormal INT_MAX run in the DOP/string-table area, inflated text counters, and payload or Mdropper.Z campaign markers before assigning the CVE.
Microsoft Word malformed table SPRM — CVE-2006-6456 critical CVE_2006_6456
WordDocument contains a malformed table border-color SPRM cluster.
CVE-2006-6456 is a Microsoft Word 2000/2002/2003 and Word Viewer remote-code-execution vulnerability caused by malformed Word data structures. Exploit documents corrupt a table-formatting SPRM cluster, for example replacing a normal sprmTBrc*Cv record with an invalid 0xFF high-byte SPRM immediately after valid table border/color SPRMs.
Microsoft Word record parsing — CVE-2008-2244 critical CVE likely CVE_2008_2244
Word OLE document has malformed-record exploit structure with payload in OLE slack.
CVE-2008-2244 is the MS08-042 Microsoft Word record parsing remote-code-execution vulnerability. Targeted exploit documents from 2008 commonly keep normal-looking WordDocument/table streams and place shellcode or a PE payload in a large unallocated OLE slack region reached after malformed Word record parsing. The rule requires Word stream context, large OLE slack, and concrete payload bytes.
OOXML autoload OLE object target is missing critical OOXML_MISSING_AUTOLOAD_OLEOBJECT
Spreadsheet declares an auto-loaded OLE object, but the referenced embedded OLE part is absent.
Excel is instructed to activate an embedded OLE object through an `oleObject` relationship and worksheet declaration, but the target part is missing from the ZIP. When this co-occurs with autoLoad, VML shape context, or a random-looking ProgID, it is a high-confidence payload-stripped or malformed OLE activation carrier. This is not a specific CVE attribution unless the embedded OLE payload is present and contains a recoverable vulnerable-parser primitive.
Obfuscated VBA Shell command with URL critical OLE_VBA_OBFUSCATED_SHELL_URL
VBA macro builds a Shell command through decoder/string functions and includes a URL.
This rule requires a Shell invocation, decoder or string-building functions such as RC4String, Chr, StrReverse, Replace, Split, or Base64-style decoding, and a URL in the same macro source. That compound pattern is typical of downloader macros: the macro hides the command line, then launches it to retrieve or execute a payload.
Obfuscated XLM Auto_Open execution chain critical OLE_XLM_OBFUSCATED_AUTOEXEC_CHAIN
XLM macro sheet auto-executes an obfuscated formula/RUN chain.
Excel 4.0 macro malware commonly hides its command text in formula arithmetic, reconstructs strings with FORMULA(CHAR(...)), stores intermediate values through SET.VALUE / GET.CELL / GOTO, and transfers execution with RUN(). Seeing that chain behind Auto_Open is a high-confidence malicious pattern even when no VBA project exists.
Ole10Native package payload is a download-and-execute script critical OFFICE_PACKAGE_SCRIPT_DROPPER
OLE Package payload contains a script that hosts a shell, fetches a remote resource, and executes it.
An OLE Object Packager payload whose embedded script combines a shell host (PowerShell/WScript/mshta), a network-fetch verb (Invoke-WebRequest/Irm/certutil/http(s) URL), and an execute verb (Start-Process/ShellExecute/-outfile) is a download-and-run dropper. This is a direct user-execution delivery technique (MITRE T1204.002) and is detected on payload content, so it still fires when the package header fields are blanked or shuffled to evade extension-based checks.
PowerShell reference in VBA critical OLE_VBA_PS
VBA macro references PowerShell.
PowerShell is one of the most common tools used in the second stage of macro-based attacks — downloading and executing payloads in memory.
Shell() call in VBA critical OLE_VBA_SHELL
VBA macro calls Shell() function.
The VBA Shell() function executes an external program. Macro malware often uses it to launch payloads such as cmd.exe or PowerShell scripts.
URL reconstructed from XLM cell array critical OOXML_XLM_CELL_ARRAY_URL
Payload URL was reconstructed from numeric cell values across the worksheet, not present as a literal string.
XLM downloaders evade literal-bytes URL extraction by storing each character of the URL — or of an embedded HTA that contains the URL — as the numeric value of an individual cell. The macrosheet's formulas read the cells via CHAR()/&-concat and build the URL only at execution time, so the string is never contiguous in the workbook bytes. URLs surfaced here were recovered by walking the BIFF12 record stream of every worksheet and macrosheet part.
URLDownloadToFile in VBA critical OLE_VBA_DOWNLOAD
VBA macro references URLDownloadToFile API.
This API downloads a file from the internet to disk. It is one of the most common functions in macro droppers that fetch second-stage malware from a remote server.
VBA ActiveX event launches decoded Excel4 macro critical OLE_VBA_ACTIVEX_XLM_STAGER
VBA ActiveX/UserForm event decodes worksheet-cell strings and executes them through ExecuteExcel4Macro.
The macro bridges ActiveX/UserForm event activation into Excel 4.0 macro formula execution. The command text is reconstructed from worksheet cells with Mid/Asc/Chr shifting before being passed to ExecuteExcel4Macro, which is a high-confidence macro stager pattern rather than a specific Office parser CVE.
VBA writes script and launches it through Excel DDE cmd critical OLE_VBA_DDE_CMD_SCRIPT_DROPPER
VBA writes a script-like file and launches it via Excel DDEInitiate with cmd.
The macro creates a .vbe/.vbs/.js/.hta/.bat/.cmd/.ps1 payload on disk and uses Excel DDEInitiate("cmd", ...) to execute it. This is a high-confidence macro execution chain and should be treated as malware even when no Office parser CVE is present.
WScript.Shell usage critical OLE_VBA_WSCRIPT
VBA macro uses WScript.Shell object.
WScript.Shell provides Run and Exec methods that launch commands. Malware creates this COM object to execute system commands or scripts.
XLM Auto_Open environment-evasion close gate critical OLE_XLM_ENVIRONMENT_EVASION_CLOSE
XLM Auto_Open macro runs host-environment checks before showing a fake error and closing.
Malicious Excel 4.0 macros often abort when the workbook appears to be opened in a sandbox, non-Windows host, or reduced UI environment. This rule requires an XLM macro sheet with Auto_Open plus multiple GET.WORKSPACE/GET.WINDOW checks and an ALERT()/CLOSE(FALSE) decoy such as a fake corrupted workbook message. That combination is intentionally narrow and is not normal spreadsheet automation.
XLM payload reassembled from CHAR()/split formulas critical OOXML_XLM_REASSEMBLED_PAYLOAD
WinAPI names, LOLBin commands, or a payload URL were reassembled from per-character CHAR()/string-fragment concatenation inside the macrosheet formulas.
The most evasive Excel 4.0 downloaders never store their payload as a contiguous literal: each WinAPI name, shell command, drop path, or URL is built at runtime by concatenating CHAR(n) calls and one- or two-character string fragments inside the formula token stream (rgce). Literal-bytes and numeric cell-array scanners both miss this. The analyzer parses each formula's rgce, reconstructs the string it builds, and reports it when it resolves to a download/execute kill chain (e.g. URLDownloadToFile, regsvr32, mshta, wmic, a URL). This construct does not occur in benign workbooks.
Access database masquerading as Office document high ACCESS_MASQUERADE_DROPPER
Jet/Access database uses a document extension and contains macro/dropper strings.
The file is a Microsoft Jet/Access database while using a Word/Excel/PowerPoint-style extension, and contains strings associated with VBA execution or payload dropping. This is a masquerade/dropper vector, not a parser CVE.
ActiveX control high OOXML_ACTIVEX
Document contains ActiveX controls.
ActiveX controls are compiled components that can execute native code. They have a long history of exploitation and are a significant risk in Office documents.
AutoOpen macro high OLE_VBA_AUTOOPEN
Macro with AutoOpen trigger found.
AutoOpen runs automatically when a Word document is opened. Malware uses this to execute malicious code without further user interaction once macros are enabled.
Auto_Close macro high OLE_VBA_AUTOCLOSE
Macro with Auto_Close trigger found.
Auto_Close runs automatically when an Office document closes. Malware uses close-time execution to delay activity until after the user has interacted with the lure or to evade simple open-only sandboxes.
Auto_Open macro high OLE_VBA_AUTO
Macro with Auto_Open trigger found.
Auto_Open is the legacy Excel auto-execute macro. Like AutoOpen in Word, it runs code automatically when the file is opened.
BIFF CONTINUE follows a structurally-incompatible record high OLE_BIFF_CONTINUE_ORPHAN
CONTINUE (0x003C) appears after a BOF, EOF, or at the start of the stream.
Readers that append CONTINUE bodies to the previous record's buffer regardless of compatibility have driven multiple Excel CVEs.
BIFF record body exceeds the 8224-byte spec maximum high OLE_BIFF_RECORD_HUGE
Single non-CONTINUE record body > 8224 bytes (BIFF8 spec maximum).
The legitimate way to ship a payload that big is a CONTINUE chain. An oversized single record is the shape behind several Excel size-parsing bugs.
BIFF record runs past Workbook stream end high OLE_BIFF_RECORD_TRUNCATED
Record's declared body size extends past the stream's last byte.
Excel's record reader can reject the file or — in older versions — copy as many bytes as remain, leaving uninitialised memory in the record buffer. Known shape behind several Excel parser CVEs.
BIFF stream ends with unclosed BOF substream high OLE_BIFF_BOF_SUBSTREAM_UNCLOSED
A BOF substream reaches the end of the Workbook stream without a matching EOF.
A BOF substream reaching the end of the Workbook stream without a matching EOF means older or buggy readers may continue parsing with stale substream/parser state.
BIFF workbook contains a defined-name record flood high OLE_BIFF_NAME_RECORD_FLOOD
Workbook contains thousands of BIFF NAME records.
Defined names are formula-bearing BIFF parser inputs. Very large contiguous runs of NAME records are unusual in benign documents and can trigger Excel/BIFF parser stress or corruption bugs in Office and analysis tooling.
CallByName call high OLE_VBA_CALLBYNAME
VBA macro uses CallByName for dynamic method invocation.
CallByName invokes methods dynamically by string name, allowing malware to obfuscate which functions it calls and evade static analysis.
CreateObject call high OLE_VBA_CREATEOBJ
VBA macro calls CreateObject.
CreateObject instantiates COM objects (like WScript.Shell, XMLHTTP, ADODB.Stream) that provide system access. Malware uses these to download files, run commands, or interact with the file system.
DDEAUTO field (auto-execute) high OOXML_DDE_AUTO
A DDEAUTO field instruction was found — it attempts automatic execution or update when the document is opened.
Unlike regular DDE which asks the user to 'update links', DDEAUTO attempts to execute its command automatically when the document opens. Prompts or blocking can still occur depending on Office version, policy, and Protected View state. This technique is catalogued as MITRE ATT&CK T1559.002.
Document_Open macro high OLE_VBA_DOCOPEN
Macro with Document_Open event handler found.
Document_Open is the modern equivalent of AutoOpen — it fires when the document opens. Same risk as AutoOpen.
EMF rclBounds has negative width or height high OFFICE_EMF_BOUNDS_NEGATIVE
EMF header's rclBounds rectangle has right < left or bottom < top.
Multiple EMF parser bugs (CVE-2017-0108 / CVE-2017-8553 family) have lived behind unchecked dimension arithmetic on rclBounds. A negative width or height drives into integer-overflow paths.
EMF record extends past blob end or has invalid size high OFFICE_EMF_RECORD_TRUNCATED
EMR record's size field runs past available bytes, is < 8, or is not 4-byte aligned.
EMF records are required to be 4-byte aligned with size >= 8 bytes. Readers that advance by `nSize` without checking misalign their record pointer or read attacker-controlled bytes from the gap.
Encrypted Office package with non-block-aligned cipher high OFFICE_ENCRYPTED_PACKAGE_MALFORMED
EncryptedPackage cipher body is not a multiple of 16 bytes, violating the AES block-alignment requirement in [MS-OFFCRYPTO] §2.3.4.4.
The AES-CBC/ECB cipher used by Office Standard Encryption requires the cipher body (after the 8-byte declared-size header) to be a multiple of 16 bytes. Excel itself tolerates the misalignment by truncating to the last full block; most strict decryption tools (including antivirus/EDR scanners that introspect inner content) reject the file outright. This asymmetric tolerance is a deliberate evasion shape: the document opens normally in Office but defeats static inspection that depends on a successful decrypt-and-rescan.
Excel 4.0 (XLM) macro / Auto_Open high OLE_XLM_AUTOOPEN
OLE workbook contains an Excel 4.0 macro sheet, optionally with Auto_Open/Close.
Excel 4.0 (XLM) macros were a major Office malware vector during 2020-2022 and evaded many VBA-focused controls before Microsoft tightened XLM defaults. An Auto_Open / Auto_Close defined name combined with a macro-sheet sub-stream is the common XLM auto-execution shape used by families such as Emotet and QakBot.
Excel 4.0 (XLM) macro sheet high OOXML_XLM_MACROSHEET
Spreadsheet contains an xl/macrosheets/sheet*.xml part.
A definitive structural indicator of Excel 4.0 macros. XLM is rarely seen in modern legitimate workbooks and was a major Office malware vector during 2020-2022.
External OLE object relationship high OOXML_EXTERNAL_OLE_OBJECT
OOXML oleObject relationship targets an external HTTP(S) URL.
An external oleObject relationship is stronger than a normal hyperlink: Office resolves it through object/OLE update paths and may fetch remote content when the document is opened or updated. This is the relationship shape used by multiple Office remote-object exploitation and delivery chains, so it should not be reduced to generic external-link evidence.
External relationship high OOXML_EXTERNAL_REL
Document references an external target (URL) in its .rels file.
External relationships can be used for remote template injection — the document loads a macro-enabled template from a remote server when opened, bypassing email attachment filters that block macros.
GetObject call high OLE_VBA_GETOBJ
VBA macro calls GetObject.
GetObject can reference running COM objects or create instances from monikers. It is sometimes used as an alternative to CreateObject to evade detection.
Legacy Flash object embedded in Office document high CVE related OFFICE_LEGACY_SWF_OBJECT
Office document embeds a ShockwaveFlash object with an old SWF version.
The document contains a ShockwaveFlash ActiveX object and an embedded SWF with a legacy version. This is Flash-in-Office exploit-family evidence, but exact Flash CVE attribution requires SWF tag-level validation.
MTEF FONT typeface field exceeds 32 bytes high OLE_MTEF_FONT_NAME_OVERLONG
FONT record's NUL-terminated typeface name is longer than the 32-byte spec maximum.
The CVE-2017-11882 primitive overflows this exact field — readers prior to the patched build copied the entire NUL-terminated string into a 32-byte stack buffer. This rule catches the structural shape regardless of the specific exploit byte pattern.
MTEF MATRIX record has implausible dimensions high OLE_MTEF_MATRIX_ROWCOUNT
MATRIX record declares rows or columns > 64.
CVE-2018-0798 abuses the MATRIX rows/cols fields to drive an OOB write. Real equations contain at most a few dozen rows/cols; values above 64 are the structural shape of the exploit family.
OLE DIFAT chain length or pointer is invalid high OLE_HEADER_DIFAT_ANOMALY
DIFAT extension chain loops, points beyond file end, or its declared length disagrees with the first-sector field.
A crafted DIFAT pointing past EOF is the classic shape behind the Word `cb*` family of CVEs. Real writers always emit a linear, in-range DIFAT.
OLE ObjectPool in file named RTF high OLE_OBJECTPOOL_CONTAINER_DISGUISED_RTF
OLE compound document is named with an .rtf extension and contains ObjectPool storage.
ObjectPool is where Word/OLE compound documents keep embedded-object storages. A file whose basename ends in .rtf but is actually an OLE compound container with ObjectPool is an extension/content mismatch that suggests a disguised Word/OLE container and embedded-object attack surface.
OLE appended executable-looking payload high OLE_APPENDED_PAYLOAD
Large high-entropy bytes beyond declared streams contain shellcode or loader markers.
The OLE file has a sizable high-entropy region after the declared major streams, and that appended region contains PE, shellcode, or loader API indicators. This is a payload-carrier heuristic rather than a CVE-specific attribution, and is gated on both entropy and concrete executable markers to limit false positives.
OLE directory tree contains a cycle high OLE_DIR_CYCLE
CFB directory red/black-tree walk visits the same DirID twice.
A spec-compliant compound file's directory tree is acyclic. Cycles are an encoder-impossible shape that hand-edited containers use to force parsers into infinite recursion or to hide named streams from one walk while still resolving them from another.
OLE document has large unaccounted-for region high OLE_SLACK_ANOMALY
OLE file bytes greatly exceed the sum of declared stream sizes.
Well-formed Office binary documents pack data into named streams with little slack. When the file is dramatically larger than its declared streams (>40% slack and >16 KB of unaccounted bytes), the extra bytes live in unallocated sectors. Pre-macro-era Word/Excel exploits (e.g. CVE-2010-3333, CVE-2014-1761, CVE-2015-2424) commonly hide XOR-encoded shellcode in this region, reached via a parser pointer-corruption bug in the document structure. The rule is a structural anomaly, not a CVE-specific match.
OLE metadata lists many Excel 4.0 macro sheets high OLE_XLM_DOCPROPS_MACROSHEET_INVENTORY
OLE workbook metadata lists many MacroN sheet titles with an Excel 4.0 macro-sheet marker.
Encrypted BIFF workbooks can hide XLM formula bodies from static extractors, but the clear DocumentSummaryInformation stream may still expose the workbook sheet inventory. Many MacroN sheet titles plus a BIFF Excel 4.0 macro-sheet marker is a strong XLM malware/evasion signal, especially when FILEPASS encryption is also present.
OLE raw shellcode-like payload high OLE_RAW_SHELLCODE_PAYLOAD
Malformed OLE bytes contain PEB/API-resolver shellcode evidence.
The file-level OLE bytes contain a PEB/API-resolver marker, loader-walk instruction context, and a nearby payload marker such as a NOP sled, MZ bytes, or hash-rotate loop. This is useful for malformed OLE exploit carriers where stream parsing fails, but it is intentionally not a CVE-specific attribution.
OLE sector chain loops or runs past end of FAT high OLE_FAT_CHAIN_LOOP
A stream's sector chain revisits a sector or follows a pointer outside the FAT.
Standard CFB writers never produce loops; readers that follow chains without cycle detection enter an infinite loop or read attacker-controlled bytes after the loop wraps. Encoder-impossible shape behind several pre-2017 Word/Excel container CVEs.
OLE streams share a sector high OLE_FAT_CROSSLINKED
Two different streams' sector chains include the same sector.
Reader divergence shape — depending on which stream is read first, the same bytes are interpreted as different content. Encoder-impossible.
OOXML XML part contains a DOCTYPE declaration high OOXML_XML_DOCTYPE_PRESENT
Any XML part inside an OOXML package contains <!DOCTYPE.
Office never emits DOCTYPE in its XML parts. The presence of one is a structural indicator that the package was authored by hand or by tooling outside the Office writer family — frequently as the staging step for an XXE attempt.
OOXML XML part declares an external entity high OOXML_XML_EXTERNAL_ENTITY
An <!ENTITY ... SYSTEM ...> or PUBLIC declaration was found in an XML part.
Pure XML-external-entity (XXE) shape. Office's own parser configuration historically ignored these, but third-party consumers of the same XML (cloud indexing, preview generators) may resolve them and disclose data.
OOXML internal relationship escapes the package root high OOXML_REL_TARGET_OUTSIDE_PACKAGE
Internal-mode `<Relationship>` Target uses `..` segments that resolve above the package root.
ZIP path-traversal shape. Some Office configurations honour the literal path before normalisation, giving a drop-anywhere primitive.
OOXML oleObject relationship points at a non-OLE target high OOXML_REL_TYPE_TARGET_MISMATCH
Relationship typed as `oleObject` resolves to an HTML/CAB/MHT/scriptlet/HTA target.
The CVE-2021-40444 family abuses exactly this shape: the Type drives Office to load the target through the OLE/MHTML dispatch path, but the target bytes are interpreted as a different format. Generalised from named CVEs so future 0-days in the same family surface.
OOXML relationship graph contains a cycle high OOXML_REL_CYCLE
The OPC relationship graph is supposed to be a DAG; a cycle is encoder-impossible.
Containers using cycles try to force resolution loops or to make a part reachable only from inside its own subtree.
Office EPRINT stream contains EMF object high CVE related OLE_EPRINT_EMF_OBJECT
ObjectPool EPRINT stream contains EMF data.
An OLE ObjectPool EPRINT stream with EMF data is rare in normal documents and is consistent with Office EMF/EPRINT exploit-family delivery. The rule is related-family evidence only; it does not prove the malformed EMF record required for exact CVE attribution.
Ole10Native UI name and on-disk name disagree on type high OFFICE_PACKAGE_DOUBLE_EXT
OLE Package displayName is benign-looking while fullPath/defFile ends in an executable extension.
The user double-clicks what looks like a document and gets a binary executed. UI-spoofing shape used by package-as-dropper campaigns.
Ole10Native package carries executable/script file type high OFFICE_PACKAGE_RISKY_FILE
OLE Package displayName, fullPath, or defFile has an executable/script-capable extension.
Office Package objects are commonly used to embed arbitrary files. When the packaged file is directly runnable, such as EXE, JAR, HTA, script, shortcut, installer, or similar content, the document is carrying high-risk delivery payload even if the UI name is not spoofed.
Ole10Native package path contains traversal or UNC root high OFFICE_PACKAGE_PATH_TRAVERSAL
OLE Package filename contains `..\` or `\\host\` traversal sequences.
Some Office versions write the dropped file to the path embedded in the Package, giving the attacker a drop-anywhere primitive.
Remote template injection high OOXML_REMOTE_TEMPLATE
Document loads its template from a remote URL (attachedTemplate / template / frame).
Word can fetch and apply a remote template when the document is opened; macros in that template may execute depending on Office policy, trust state, and Protected View. This is a common remote-template-injection vector used by Hancitor, Emotet, and many phishing campaigns.
VBA p-code auto-exec with execution tokens high OLE_VBA_PCODE_AUTOEXEC_EXEC
Compiled VBA/cache stream contains auto-run and shell/download execution tokens.
Some malicious Office documents keep executable VBA in compiled p-code or cache streams while source extraction fails or returns empty output. Seeing an auto-execution entry point such as Document_Open together with Shell, CreateObject, PowerShell, URLDownloadToFile, or related execution tokens is a strong macro-malware indicator even when decoded source is unavailable.
VBA project has compiled P-code but empty/missing module source high OLE_VBA_PCODE_NO_SOURCE
_VBA_PROJECT stream is substantive but every module-like sibling source stream is empty or absent.
The canonical 'VBA stomping' shape: the renderer executes the compiled P-code while scanners that read source see nothing. Used by post-2018 Office malware campaigns to evade source-based AV.
Word 6/95 legacy binary with executable payload high CVE related WORD6_LEGACY_BINARY_PAYLOAD
Legacy Word binary format carries executable payload markers.
The file starts with a Word 6/95 legacy binary magic and carries embedded executable payload markers. This is MS09-024/CVE-2009-1136-family attack surface evidence, but not an exact CVE attribution without validating the malformed converter record.
Word field-chain (SET/REF) co-located with DDE high OOXML_FIELD_SET_REF_CHAINING
Two or more closed SET/REF variable pairs appear in the same document part as a DDE field — the documented field-chain obfuscation form used to assemble DDE commands from fragments.
Word's SET <name> field defines a variable and REF <name> dereferences it. When ≥2 closed SET/REF pairs co-occur with a DDE field, the typical purpose is to assemble the DDE command string at field-update time so the literal cmd/powershell tokens never appear in the raw document bytes. This is the documented field-chain obfuscation chain (SensePost 2017, MITRE ATT&CK T1559.002) and has no documented benign use.
Workbook_Open macro high OLE_VBA_WBOPEN
Macro with Workbook_Open event handler found.
Workbook_Open runs automatically when an Excel workbook opens. Malicious spreadsheets use this to launch their payload.
XLM Auto_Open with dangerous formula APIs high OLE_XLM_DANGEROUS_FN
XLM auto-exec macro uses formula APIs that can run code or write files.
Excel 4.0 macro formulas such as RUN, CALL, EXEC, REGISTER, FOPEN, FWRITE, FORMULA, and HALT can execute programs, write payloads, or control macro flow without VBA. Paired with Auto_Open, this is a strong malware indicator.
XLM macro uses URL shortener high OLE_XLM_URL_SHORTENER
Excel 4.0 macro sheet contains a URL-shortener target.
URL shorteners are legitimate services, but they are high-signal inside Excel 4.0 macro formulas because XLM malware commonly uses shortened links to obscure the payload host and rotate infrastructure after delivery.
altChunk RTF/HTML injection wrapper, payload part missing high OOXML_ALTCHUNK_INJECTION_STUB
A <w:altChunk> wires an aFChunk relationship to an RTF/HTML part that is absent from the package.
The altChunk RTF/HTML-injection wrapper inlines and executes an embedded RTF/HTML part when Word opens the document. When the wiring and content-type are present but the target part is missing, the sample is a payload-stripped builder stub (or had its weaponized chunk removed). The injection wiring to an RTF/HTML type is itself the indicator, independent of whether the payload is currently attached.
cmd.exe reference in VBA high OLE_VBA_CMD
VBA macro references cmd.exe.
Invoking cmd.exe from a macro allows running arbitrary Windows commands, which is a core technique in macro-based malware.
BIFF BOF declares an unknown substream type medium OLE_BIFF_SUBSTREAM_TYPE_INVALID
BOF record's `dt` field is outside the documented set (WB-globals/sheet/chart/macro/VB-module).
Older Excel falls through to a default handler that may parse the substream with the wrong record-table.
BIFF NAME record declares an overlong name medium OLE_BIFF_NAME_RECORD_OVERLONG
NAME record's character-count (cch) field exceeds the BIFF8 limit of 255.
Older Excel may copy `cch * sizeof(WCHAR)` bytes into a fixed-size buffer.
BIFF record graph is unusually large medium OLE_BIFF_RECORD_COUNT_EXCESSIVE
Workbook contains an unusually large number of BIFF records.
This is not a CVE-specific signature, but malformed or stress-test BIFF graphs have been used to trigger Excel parser bugs and third-party scanner failures. The scanner bounds its walk and reports excessive record graphs explicitly.
BIFF stream has unbalanced BOF/EOF substreams medium OLE_BIFF_BOF_EOF_UNBALANCED
BOF (0x0809) and EOF (0x000A) are not balanced over a substream.
Substreams must be properly nested; readers that pop their substream stack regardless of balance reach attacker-controlled state.
CFB header with no readable streams medium OLE_PARSE_EMPTY_STREAMS
File has a valid OLE2/CFB header but olefile exposes zero directory streams.
A non-empty compound document whose directory cannot be read is anomalous. It occurs with truncated/corrupt files and, notably, with content shifted off byte boundaries (e.g. a whole-file nibble shift) to defeat olefile and byte-aligned signatures while the host Office application still recovers the embedded object — a known CVE-2017-11882 evasion.
EMF blob header is malformed or missing signature medium OFFICE_EMF_HEADER_INVALID
First record isn't EMR_HEADER (type 1) with the documented dSignature value.
An EMF blob's first record must be EMR_HEADER with declared size >= 88 and the dSignature field 'EMF' at offset 40. Readers that fail-open on header anomalies have driven multiple EMF parser CVEs.
EMF declares an implausibly large number of records medium OFFICE_EMF_HUGE_RECORD_COUNT
EMF header's nRecords field exceeds 100,000.
Real-world embedded EMF blobs contain at most a few thousand records. Hundreds of thousands is a hand-crafted shape used to stress the renderer's record dispatch table.
EMF record type outside spec range medium OFFICE_EMF_RECORD_TYPE_INVALID
EMR record's type field is outside 1..123 standard or 0x4000+ vendor extension.
Renderers' dispatch tables typically fall through to a generic handler that may mis-parse the body when an unknown record type is reached.
Embedded OLE object medium OOXML_OLE_OBJECT
Document contains an embedded OLE object.
OLE objects embedded in OOXML documents can contain executables, scripts, or exploit payloads. They warrant inspection.
MTEF stream version byte outside valid set medium OLE_MTEF_HEADER_ANOMALY
MTEF version byte at the start of the Equation Native stream is not 2..6.
Encoders never produce version values outside the documented set (2 = Equation Editor 2.x, 3 = Equation Editor 3.0, 4 = MathType 3.x, 5 = MathType 4.x, 6 = MathType 5.x). Other values force readers into default handlers that may parse the body with the wrong record table.
Multiple OLE Package CLSIDs nested in one container medium OFFICE_PACKAGE_NESTED_PACKAGE
Inner payload of an OLE Package contains the OLE Package CLSID itself, or multiple Package CLSIDs in one container.
Russian-doll obfuscation has been used to bypass scanners that only inspect the outermost layer.
OLE stream allocation kind disagrees with size medium OLE_MINISTREAM_OUT_OF_RANGE
Stream below the MiniStream cutoff is allocated in the regular FAT, or vice versa.
The CFB spec requires the MiniFAT path for sub-cutoff streams; Office writers always honour it. A mismatch is the kind of structural quirk that distinguishes hand-edited containers from real-world output.
OLE stream size disagrees with its sector chain medium OLE_STREAM_LEN_MISMATCH
Direntry size field claims more bytes than the FAT-walked sector chain can carry.
Readers that allocate from the size field and copy from the chain read out-of-bounds bytes. A simple structural shape that has appeared in several Office parser memory-corruption bugs.
OOXML XML part contains a non-standard processing instruction medium OOXML_XML_PI_NONSTD
Processing instruction with a target outside the Office allowlist (xml, mso-*).
Some readers dispatch on PI targets; an unexpected target can force a different parsing mode than the renderer.
OOXML XML part has an oversize CDATA section medium OOXML_XML_CDATA_OVERSIZE
A single CDATA section exceeds 1 MB.
Real Office output uses CDATA only for short embedded fragments; large CDATA sections are a smuggling shape — they let an attacker hide a binary payload in plain XML where most XML-aware scanners ignore the contents.
OOXML XML part has excessive element nesting medium OOXML_XML_DEPTH_EXCESSIVE
Element nesting depth exceeds 256 levels.
Real Office output caps in the dozens. Pathological depth is the structural shape behind several billion-laughs / stack-exhaustion DoS bugs.
OOXML [Content_Types].xml has conflicting type declarations medium OOXML_CONTENT_TYPES_DUPLICATE
Two different Content-Types declared for the same extension or PartName.
Reader divergence — first vs. last definition determines how every part of that extension is parsed.
OOXML internal relationship target is missing medium OOXML_REL_DANGLING
Internal `<Relationship>` Target resolves to a ZIP entry that does not exist in the package.
Some Office paths fail open and fetch a remote alternative; the missing-target shape has been observed in droppers that race the renderer to deliver a payload to that path.
OOXML package contains parts unreachable from the root rels medium OOXML_REL_GRAPH_UNREACHABLE
>5% of parts are not reachable by walking from the root .rels through internal relationships.
Hidden-content shape: scanners that only follow the rel graph never see them, but Office may still dispatch to them via name or content-type.
OOXML relationship Id collides within one .rels medium OOXML_REL_DUPLICATE_ID
Two `<Relationship>` entries inside the same `.rels` part share an Id.
OPC requires Ids unique within a part. Readers diverge over which definition wins, letting an attacker bind one rel for static analysers and a different one for the live renderer.
Office document is password-encrypted medium OFFICE_ENCRYPTED_PACKAGE
OLE container holds an MS-OFFCRYPTO encrypted package (EncryptedPackage + EncryptionInfo streams).
Password-protected Office documents are stored as an OLE compound document containing an EncryptedPackage stream (AES-encrypted inner OOXML) and an EncryptionInfo stream (key-derivation metadata). Encryption defeats most content-based filtering at email gateways; threat actors use it to deliver macro/exploit-carrier spreadsheets and documents that would otherwise be detected by string-based scanners. Legitimate password-protected business documents do occur, but the bulk of phishing-delivered encrypted Office files are malicious — treat as a context-amplifier rather than a verdict on its own.
Ole10Native inner payload size exceeds remaining bytes medium OFFICE_PACKAGE_SIZE_MISMATCH
Inner `payloadSize` field declares more bytes than remain in the Ole10Native stream.
Readers that allocate from this field and copy without checking are an out-of-bounds-read primitive.
Ole10Native outer length disagrees with stream size medium OFFICE_PACKAGE_HEADER_ANOMALY
The leading 4-byte length field of an `\x01Ole10Native` stream does not equal the stream byte count.
Parser-divergence shape: readers that trust the field read different bytes than those that walk to end-of-stream.
Remote image (web beacon / tracking pixel) medium OOXML_IMAGE_BEACON
Document contains an external image relationship targeting an http(s):// URL.
An image relationship with an external http:// or https:// target is fetched by Office when external content is allowed. This can reveal the victim's IP address and timestamp to the attacker's server (tracking beacon). In some Windows Integrated Authentication configurations, the request may also expose NTLM authentication material, but plain HTTP image fetches are not a guaranteed NTLM-leak path. Caveat: documents exported from web-based editors or CMS platforms may include externally-hosted images; verify whether the target URL is a known-legitimate host before escalating.
Standalone OOXML relationship XML medium OOXML_STANDALONE_RELS
File is raw OOXML .rels relationship XML rather than a valid OOXML ZIP package.
OOXML relationship files define external content that Office may load from a package. A standalone .rels file with an Office extension is malformed, but when it declares a remote template relationship it is still a strong indicator of remote-template-injection tooling or a stripped payload.
VBA __SRP_ cache stream exceeds 8 MB medium OLE_VBA_PERFORMANCE_CACHE_OVERSIZE
A VBA performance-cache (__SRP_*) stream exceeds 8 MB.
Real-world cached compiled modules are tiny; oversize caches are an allocator-stress shape and have been observed in some Office DoS campaigns.
VBA macros present medium OLE_VBA_MACROS
Document contains VBA macro code.
VBA macros can automate tasks but are also the most common delivery mechanism for Office-based malware. Macros can download and execute arbitrary code when enabled by the user.
VBA project in OOXML medium OOXML_VBA
Document contains vbaProject.bin — VBA macros are present.
Same risk as OLE macros. The document can run VBA code when macros are enabled. Check the accompanying VBA keyword findings for details.
Call-to-action shape / download button low OOXML_DOWNLOAD_SHAPE
Document drawing contains a call-to-action phrase in a shape or text box.
Shapes with phrases like 'Click Here to Enable Content', 'Download Now', or 'Open Document' are the Office equivalent of the PDF fake-button overlay, tricking users into enabling macros or following a malicious link. Caveat: these phrases appear legitimately in user manuals, training materials, onboarding documents, and any instructional content that guides users through a process. This finding is low-signal; elevate concern only when combined with macros, external relationships, or hidden sheets.
DDE field low OOXML_DDE
A DDE field instruction was found in the document XML. The command does not reference a known-dangerous executable.
DDE (Dynamic Data Exchange) fields link a document to an external data source or application. Benign uses include pulling live data from Excel spreadsheets or databases. However, DDE can be abused to execute arbitrary commands. This particular field does not appear to launch a dangerous program, but you should review the detail to confirm.
Environ() call low OLE_VBA_ENVIRON
VBA macro uses Environ() to access environment variables.
Environ() is used widely in legitimate macros for locale paths and user temp directories. It does appear in droppers (to find %TEMP% / %APPDATA% for staging payloads) but on its own is too noisy to be more than LOW.
Hidden worksheet low OOXML_HIDDEN_SHEET
Excel workbook contains hidden or veryHidden worksheets.
Hidden and 'veryHidden' Excel sheets are commonly used to conceal macro scaffolding, staging data, or intermediate payload construction from the user. Caveat: hidden sheets are routine in legitimate professional Excel workbooks — financial models hide calculation sheets and lookup tables, enterprise templates hide configuration sheets, and many vendor-supplied spreadsheets protect their formulas this way. This finding is low-signal on its own; treat as significant only when combined with VBA macros or external relationships.
Malformed OOXML package with recoverable local headers low OOXML_MALFORMED_ZIP_LOCAL_HEADERS
OOXML ZIP central directory is invalid, but local headers expose Office parts.
Office and tolerant ZIP readers may recover document parts from local file headers even when the central directory is malformed or missing. This is a parser-divergence shape; it is low-signal by itself, but important when the recoverable local parts include VBA projects, ActiveX controls, or XLM macro sheets.
OLE dirents share an unrecognised CLSID low OLE_DIRENT_CLSID_DUPLICATE
Multiple direntries share a non-null, unrecognised CLSID (>= 4 occurrences).
Office host CLSIDs (Excel, Word, PowerPoint, Equation Editor, MathType, Visio) legitimately repeat. Heavy duplication of a less-common CLSID is the shape used by containers trying to hide extra invocation points of the same parser surface from scanners that only inspect the first match.
Ole10Native tempPath leaks an AppData or Temp path low OFFICE_PACKAGE_TEMP_PATH_LEAK
Package's tempPath references an `AppData\` or `Temp\` folder of the author's machine.
Weak signal on its own — legitimate drag-and-drop attachments produce this — but a useful axis contributor when paired with other anomalies.
Unsupported Office format for VBA extraction info OFFICE_FORMAT_UNSUPPORTED
olevba could not extract VBA macros from the document; VBA source extraction was skipped.
olevba (and its olefile dependency) ran into a parse failure on this specific file — common causes include legacy formats (Excel 4/5 BIFF), encrypted streams, hand-crafted/malformed OLE compound storage, or anti-analysis structures that trip the parser. The detail field names the exception class that fired. Format-agnostic byte-level scans still ran, so the verdict is real and re-scanning the same bytes will yield the same outcome — unlike SCAN_INCOMPLETE, this finding does not flag the result as needing retry.

RTF 16

Equation Editor CLSID critical RTF_EQUATION_EDITOR
Equation Editor OLE CLSID (0002CE02) found in RTF hex data.
This CLSID instantiates the vulnerable Equation Editor component. CVE-2017-11882, CVE-2018-0802, and CVE-2018-0798 are among the most exploited Office vulnerabilities in history, and RTF is the most common delivery format.
Equation Editor object class critical RTF_OBJCLASS_EQUATION
Object class name references Equation Editor.
The explicit mention of the Equation Editor class confirms the document is attempting to instantiate the vulnerable component.
PE header in hex data critical RTF_MZ_HEX
MZ header (hex '4D5A') found in RTF hex data.
The presence of a Windows PE executable header (MZ) inside hex-encoded RTF data means an executable is embedded in the document. This is consistent with an embedded dropper payload.
INCLUDETEXT/INCLUDEPICTURE remote URL high RTF_INCLUDE_REMOTE
RTF document uses INCLUDETEXT or INCLUDEPICTURE with an http:// or https:// URL.
RTF \fldinst blocks with INCLUDETEXT or INCLUDEPICTURE and a remote (http:// or https://) target can cause Word to fetch the remote resource when the document is opened, depending on Office version and external-content settings. This is a remote template injection vector: the attacker controls what content is fetched, can steal NTLM credentials via a UNC redirect, or deliver a second-stage payload. Caveat: legitimate RTF documents very rarely include remote http:// field references; this construction has almost no benign use in consumer-produced documents, making the false-positive rate low.
Large hex data blocks high RTF_EXCESSIVE_HEX
RTF contains large blocks of hex-encoded data.
Legitimate RTF files rarely contain very large hex blocks. Excessive hex data usually hides an embedded payload (executable, shellcode, or exploit object) encoded in hexadecimal.
OLE Package CLSID high RTF_PACKAGE_OLE
OLE Package CLSID pattern found alongside object data.
The Package CLSID combined with embedded OLE data suggests the document wraps an arbitrary file (potentially an executable) inside an OLE Package object.
Obfuscated control words high RTF_OBFUSCATION
Many RTF control words appear fragmented or obfuscated.
RTF parsers are tolerant of whitespace in control words. Malware authors insert spaces to break up keywords (like 'o b j d a t a') so that simple string scanners miss them.
PHP IRC bot source embedded in RTF high RTF_PHP_IRC_BOT_SOURCE
RTF text contains PHP IRC bot source code.
This rule looks for a compound source-code pattern: PHP markers, socket connection calls, IRC protocol commands such as JOIN or PRIVMSG, and bot-control strings. The RTF is not necessarily an Office exploit, but it is carrying operational malware source code and should not be treated as a clean document.
Package object class high RTF_OBJCLASS_PACKAGE
OLE Package object found in RTF.
OLE Package objects can wrap arbitrary files (including executables) inside a document. The packaged file can be extracted and run when the user double-clicks the object.
Remote template injection (\*\template to remote URL) high CVE related RTF_REMOTE_TEMPLATE
The RTF's \*\template destination is a remote URL/UNC path that Word fetches and loads on open.
RTF template injection (MITRE T1221): the document attaches a remote template via {\*\template <url>}. On open, Word retrieves and loads it, which can deliver a macro/exploit template, a scriptlet/HTA (.html/.hta target), or leak NTLM credentials over a UNC path. Benign RTFs attach only a local template, so a remote target is the injection itself. Obfuscated targets (\uN/\'xx escapes), raw-IP or dynamic-DNS hosts, and active/script extensions escalate it to critical.
\objupdate forces OLE activation high RTF_OBJUPDATE
RTF contains \objupdate — forces automatic OLE object activation.
The \objupdate control word forces Word to immediately instantiate the embedded OLE object when the document is opened, without requiring the user to double-click or interact with the object. This is a near-universal indicator of Equation Editor exploit documents — it ensures the vulnerable EQNEDT32.EXE process is spawned automatically. Legitimate use of \objupdate is extremely rare.
Embedded OLE object medium RTF_OBJEMB
RTF contains \objemb — an embedded OLE object marker.
The \objemb control word marks an embedded OLE object. Combined with \objdata, it indicates an object is fully embedded in the RTF.
OLE object data medium RTF_OBJDATA
RTF contains \objdata sections with embedded OLE objects.
RTF documents can embed OLE objects via \objdata sections. These objects can carry executables, scripts, or trigger exploits in OLE-handling code.
OlePres presentation stream in RTF OLE object medium RTF_OLEPRES_STREAM
RTF embedded OLE object contains an OlePres presentation stream marker.
OlePres is an OLE presentation stream name. It is relevant to the CVE-2025-21298 attack surface, but the stream name alone is common in embedded OLE objects and does not prove malformed OlePres internals.

HWP 14

Embedded PE executable critical HWP_EMBEDDED_PE
PE executable found inside HWP document.
A Windows executable hidden inside an HWP document is a clear indicator of malware. The document is carrying an executable payload.
PostScript exec command critical HWP_PS_EXEC
PostScript 'exec' operator found in embedded PostScript.
The PostScript 'exec' operator executes a string as PostScript code. In malicious PostScript it can run dynamically decoded payloads.
PostScript runtime hex-to-code execution critical HWP_PS_CVX_EXEC
PostScript hex string converted to executable code and executed at runtime.
The pattern '<HEX...> cvx exec' decodes a PostScript token from a hex literal and executes it. APT-grade HWP exploits use this to stage payloads in pieces — every fragment is reconstructed at parse time so static scanners that only look for plain 'exec' tokens never see the dangerous operator string in the file. A handful of these is a strong indicator the embedded EPS is a weaponised exploit, not a benign vector graphic.
PostScript system call critical HWP_PS_SYSTEM
PostScript 'system' operator found.
Some PostScript interpreters expose a 'system' operator or equivalent extension that can run operating-system commands. Its presence in embedded PostScript is high-risk and unusual in ordinary documents.
Shell command reference critical HWP_SHELL_CMD
Reference to a shell command (cmd.exe, powershell, etc.) in HWP.
Direct references to system shells inside a document can indicate an attempt to execute commands on the recipient's system.
Embedded PostScript / EPS high HWP_POSTSCRIPT
HWP document contains embedded PostScript or EPS content.
Embedded PostScript/EPS is a common exploit surface in targeted HWP campaigns. PostScript is a full programming language; file and command execution depends on the interpreter and sandbox configuration.
Hex-encoded data in PostScript high HWP_PS_HEXCODE
Many hex escape sequences found in PostScript content.
A high number of hex escape sequences (\xNN) in PostScript suggests the presence of encoded shellcode or binary payloads.
JavaScript in HWP high HWP_JAVASCRIPT
JavaScript references found in HWP document.
JavaScript in an HWP document is unusual and potentially dangerous. It may indicate an attempt to exploit the document viewer.
PostScript file operation high HWP_PS_FILE
PostScript file operation (file/run/deletefile) found.
File operations in PostScript allow reading, writing, or deleting files on the system — capabilities that exploits use to drop payloads.
External URL in HWP medium HWP_URL
External URL(s) found in HWP document content.
URLs in HWP content may be used to download second-stage payloads or connect to command-and-control servers.
PostScript decode filter medium HWP_PS_FILTER
PostScript decode filter (SubFileDecode, ASCIIHexDecode, etc.) found.
Decode filters in PostScript can be used to hide encoded payloads that are decoded at runtime.
Scripts storage medium HWP_SCRIPTS_STREAM
OLE-based HWP contains a Scripts storage section.
A Scripts section in an HWP OLE container may contain executable code that runs when the document is opened.
BinData stream low HWP_BINDATA
OLE-based HWP contains a BinData storage section.
BinData stores binary objects (images, OLE objects). While normal, it can also contain malicious embedded objects.
Compressed sections info HWP_COMPRESSED
Zlib-compressed sections were found and decompressed for analysis.
HWP 5.0+ files store sections compressed with zlib. Decompressing them allows scanning for embedded threats. This is informational.

CVE 65

Adobe Acrobat malformed TrueType bitmap font — CVE-2023-26369 critical CVE exact CVE_2023_26369
Embedded TrueType font has malformed EBLC/EBDT bitmap-glyph placement plus the EBSC max-range table trap.
CVE-2023-26369 is an out-of-bounds write in Adobe Acrobat Reader's libCoolType sfac_GetSbitBitmap path. Project Zero documented a font whose EBLC/EBDT compound bitmap glyph metadata places a component beyond the bitmap buffer computed from the glyph metrics, and whose EBSC table record declares offset and length as 0xffffffff to avoid loading in many non-Adobe font parsers. This rule parses the embedded sfnt directory and bitmap tables and fires only when both structural conditions are present.
Adobe Acrobat/Reader privileged API chain — CVE-2026-34621 critical CVE exact CVE_2026_34621
PDF JavaScript uses Acrobat internal share/login APIs, swConn prototype manipulation, and privileged RSS/file-read APIs.
CVE-2026-34621 is an Adobe Acrobat/Reader JavaScript exploit chain reported in actively exploited malicious PDFs. Public analyses describe abuse of internal APIs such as ANFancyAlertImpl, ANShareFile, and SilentDocCenterLogin, prototype/getter manipulation around swConn, and privileged APIs such as RSS.addFeed/removeFeed or util.readFileIntoStream to fingerprint the victim and retrieve staged JavaScript. The scanner matches that combined marker set, including when the JavaScript is hidden inside a long base64 AcroForm value.
Adobe Flash authplay SWF exploit in PDF — CVE-2010-1297 critical CVE likely CVE_2010_1297_FLASH_RICHMEDIA
PDF combines RichMedia Flash activation, a crafted SWF with authplay-era markers, and PDF-side shellcode heap-spray staging.
CVE-2010-1297 is an Adobe Flash/authplay.dll memory-corruption vulnerability exploited in the wild through malicious PDFs containing crafted SWF content plus encoded JavaScript heap-spray stages. This rule requires RichMedia Flash activation, an embedded SWF with ActionScript prototype/AVM-era markers or AES-PHP/authplay variant markers seen in the 2010 exploit family, and PDF-side shellcode staging. Ordinary RichMedia Flash documents and SWFs without the heap-spray stage are not attributed to the CVE.
Adobe Reader CoolType SING font exploit — CVE-2010-2883 critical CVE likely CVE_2010_2883
PDF embeds a TrueType/OpenType SING font table together with JavaScript heap-spray shellcode.
CVE-2010-2883 is the Adobe Reader/Acrobat CoolType SING table stack overflow exploited in weaponised PDFs. The rule requires an actual SING table in a decoded sfnt font stream plus heap-spray JavaScript, which keeps it narrower than a plain string match for the word SING.
Adobe Reader JPEG2000 JPX command payload exploit — CVE-2018-4990 critical CVE likely CVE_2018_4990_JPX_EMBEDDED_CMD
PDF embeds a malformed JPX/JPEG2000 image whose JP2 header area contains a command-execution/download payload.
CVE-2018-4990 is an Adobe Acrobat/Reader JPEG2000 parser memory-corruption vulnerability. This rule requires a /JPXDecode stream, malformed JP2 box structure, and a command/download payload embedded inside the JPEG2000 stream body. Plain JPEG2000 images, and malformed JPX images without an execution/download payload, remain covered only by related JPX anomaly rules.
Adobe Reader Launch action command execution — CVE-2010-1240 critical CVE likely CVE_2010_1240
PDF uses /Launch with shell parameters and an embedded/exported payload chain.
CVE-2010-1240 is the Adobe Reader/Acrobat Launch File warning dialog abuse used by malicious PDFs to drop or rename an embedded payload and then start it through a /Launch /Win action. The detector requires a Launch action with command-shell parameters plus embedded-file/export evidence such as exportDataObject/nLaunch:0 or an EmbeddedFiles/EF payload chain.
Adobe Reader LibTIFF XFA image exploit — CVE-2010-0188 critical CVE likely CVE_2010_0188
PDF contains XFA JavaScript that heap-sprays shellcode, builds a TIFF image payload, and assigns it to an XFA image rawValue.
CVE-2010-0188 was widely exploited through XFA JavaScript that generated a malformed TIFF image and assigned it to an image field, causing Adobe Reader/Acrobat to parse the crafted TIFF through the vulnerable LibTIFF path. The rule decodes the common long-hex XFA wrapper and requires the TIFF payload marker, rawValue trigger, and heap-spray/version-selection logic, including split-string character-table wrappers seen in older kits. It is a high-confidence exploit-template match rather than a full TIFF structural validator.
Adobe Reader ToolButton UAF — CVE-2014-0496 critical CVE exact CVE_2014_0496
PDF JavaScript combines app.addToolButton(), app.removeToolButton(), heap-spray arrays, and unescape('%u...') shellcode markers.
CVE-2014-0496 is an Adobe Reader/Acrobat use-after-free vulnerability in affected 10.x and 11.x releases. Public exploit examples use the ToolButton JavaScript API pattern: add a toolbar button, trigger code through cEnable, remove the button, and heap-spray shellcode/ROP data with unescape('%u...') strings and large arrays. That combination is not consistent with a benign interactive form.
Adobe Reader XFA oneOfChild exploit - CVE-2013-0640 critical CVE likely CVE_2013_0640
PDF contains the XFA choiceList/oneOfChild trigger shape associated with CVE-2013-0640.
CVE-2013-0640 is an Adobe Reader/Acrobat XFA memory-corruption vulnerability exploited in the wild in 2013. The rule requires the specific JavaScript/XFA sequence described in public technical analysis: resolve a choiceList, mutate a draw object's keep.previous property to contentArea, and reattach the choiceList through the UI node's oneOfChild property via a timer. Requiring all of these elements avoids flagging ordinary XFA forms that merely contain choice lists or resolveNode() form logic.
Adobe Reader mailto URI command execution — CVE-2007-5020 critical CVE likely CVE_2007_5020_MAILTO_MSHTA
PDF contains a crafted mailto URI that reaches mshta via path traversal and executes inline script.
CVE-2007-5020 is the Adobe Reader/Acrobat 8.1 mailto URI command-execution issue described by Adobe APSA07-04. The rule requires a mailto URI with traversal or the historical percent-slash shape, an mshta target, and inline JavaScript/WScript.Shell execution markers. Normal mailto links and generic suspicious command paths remain covered only by the lower-confidence URI rule.
C6 Messenger DownloaderActiveX — CVE-2008-2551 critical CVE exact CVE_2008_2551
HTML or PDF-embedded HTML configures C6 Messenger DownloaderActiveX to download and run a file.
CVE-2008-2551 affects the Icona/C6 Messenger DownloaderActiveX control. The rule requires the vulnerable control identity (DownloaderActiveX or CLSID c1b7e532-3ecb-4e9e-bb3a-2951ffe67c61), propDownloadUrl, and propPostDownloadAction=run. This avoids flagging generic ActiveX content.
CAB/HTML external object — CVE-2021-40444 critical CVE exact CVE_2021_40444
OOXML external OLEObject relationship targets HTML/CAB/MSHTML-style content.
The scanner parses .rels entries with TargetMode="External" and assigns this CVE only when the relationship has the stricter OLEObject gadget shape associated with CVE-2021-40444. Broader MSHTML/CAB/MHTML external targets are reported separately as OFFICE_MSHTML_EXTERNAL_OBJECT.
Collab.collectEmailInfo — CVE-2007-5659 critical CVE exact CVE_2007_5659
PDF JavaScript calls Collab.collectEmailInfo() with a long or heap-sprayed message argument.
CVE-2007-5659 is a buffer overflow in Adobe Reader triggered by a long argument to the Collab.collectEmailInfo() JavaScript API. This was one of the earliest widely-exploited PDF JavaScript vulnerabilities. The rule requires either Collab.collectEmailInfo() with a quoted string at least 128 bytes long, or the decoded exploit-kit shape where JavaScript builds a %u0c0c heap spray and passes that sprayed string through the msg field of Collab.collectEmailInfo(). It also catches older variants that decode shellcode with unescape(), assemble a large version-dependent buffer, and pass that variable through msg, including annotation-/Subject staged payloads that are recovered before CVE matching. Plain short calls are not flagged. CISA KEV.
Collab.getIcon — CVE-2009-0927 critical CVE exact CVE_2009_0927
PDF JavaScript calls Collab.getIcon() with a long string argument.
CVE-2009-0927 (CVSS 9.3) is a stack-based buffer overflow in Adobe Reader triggered by the Collab.getIcon() JavaScript API with a crafted string argument. The overflow allows arbitrary code execution. The rule requires Collab.getIcon() with a quoted string at least 64 bytes long; plain short calls are not flagged. CISA KEV.
Composite Moniker — CVE-2017-8570 critical CVE likely CVE_2017_8570
OLE data contains the Composite Moniker CLSID with nearby scriptlet payload evidence.
CVE-2017-8570 abuses Composite Moniker handling to load scriptlet content. The CVE rule now requires the Composite Moniker CLSID plus nearby SCT/scriptlet/scrobj-style payload evidence. Bare Composite Moniker evidence is reported as RTF_COMPOSITE_MONIKER_RELATED instead.
Doc.printSeps — CVE-2010-4091 critical CVE exact CVE_2010_4091
PDF JavaScript invokes Doc.printSeps() with exploit-shaped arguments.
CVE-2010-4091 is a memory corruption vulnerability triggered by a crafted argument to Doc.printSeps(). The rule matches long quoted strings, large hex constants such as 0xffff, or 10+ digit numeric arguments.
EPS image filter — CVE-2017-0261/0262 critical CVE related CVE_2017_0261
Document references EPSIMP32 or contains PostScript/EPS markers.
CVE-2017-0261 and CVE-2017-0262 exploit the Windows EPS (Encapsulated PostScript) image filter (EPSIMP32.FLT) used by Microsoft Office. A crafted EPS image inside an Office document triggers a use-after-free or out-of-bounds read that allows arbitrary code execution. These vulnerabilities were exploited in highly targeted APT campaigns in 2017. The rule matches EPSIMP32 or %!PS-Adobe markers; it does not validate the malformed EPS needed to identify a specific EPS CVE.
Embedded Flash authplay SWF exploit — CVE-2010-1297 likely critical CVE likely CVE_2010_1297_FLASH_EMBEDDED
PDF embeds a crafted authplay-era SWF and pairs it with PDF-side shellcode heap-spray staging.
Some CVE-2010-1297 exploit PDFs carry the malicious SWF in a plain compressed stream rather than exposing a canonical /RichMedia dictionary. This rule requires the same high-signal SWF markers used by the RichMedia variant plus a PDF-side encoded heap-spray stage, avoiding attribution for ordinary embedded Flash content.
Equation Editor Matrix overflow — CVE-2018-0798 critical CVE exact CVE_2018_0798
MTEF Matrix record exploit signature found in Equation Editor OLE data.
CVE-2018-0798 is a stack buffer overflow in the Matrix record (type 0x05) parser of EQNEDT32.EXE (Microsoft Equation Editor). Public writeups describe it as affecting Equation Editor broadly, including builds patched for CVE-2017-11882 and CVE-2018-0802. The public exploit shape uses 0x60 bytes followed by 0x61 padding and a return address (0x0BFB) to hijack execution. Widely used by APT groups including Conimes, KeyBoy, Emissary Panda, and Rancor.
Equation Editor Ole10Native payload — CVE-2017-11882 likely critical CVE likely CVE_2017_11882_EQUATION_OLE10NATIVE
RTF activates a Microsoft Equation 3.0 OLE storage carrying a high-entropy Ole10Native payload.
Normal Equation Editor OLE objects store Equation Native/MTEF data. This rule requires decoded RTF objdata containing a CFB whose Root Entry CLSID is Microsoft Equation 3.0, RTF activation controls such as \objemb and \objupdate, and a high-entropy Ole10Native payload stream. That combination is a weaponized Equation Editor RCE delivery shape consistent with CVE-2017-11882/CVE-2018-0802 while avoiding attribution from the Equation CLSID alone.
Excel FEATHEADER record overflow — CVE-2009-3129 critical CVE exact CVE_2009_3129
Excel BIFF FEATHEADER record (0x0867) with anomalous size or cbHdrData length.
Microsoft Excel 2007/2003 SP1 FEATHEADER (Feature Header) parser uses an attacker-controlled cbHdrData length in a memcpy. A FEATHEADER record with an oversized record size or cbHdrData value is the documented exploit primitive — legitimate FEATHEADER records are tens of bytes. CVE-2009-3129 was actively exploited in targeted attacks to achieve code execution from crafted .xls files.
Follina/MSDT URI — CVE-2022-30190 critical CVE likely CVE_2022_30190
Document contains an ms-msdt: URI consistent with Follina payload delivery.
CVE-2022-30190 (Follina) is a critical Microsoft Office zero-day that allows arbitrary code execution without macros. When a specially crafted document containing an ms-msdt: URI is opened, Word fetches a remote HTML file that triggers the Microsoft Support Diagnostic Tool (MSDT) to execute a PowerShell payload. The scanner matches the URI string in Office content or relationships; that is strong evidence, but it does not prove the surrounding HTML/Office load path is live. Actively exploited in the wild since May 2022.
Ghostscript SAFER bypass in HWP/EPS — CVE-2017-8291 critical CVE exact CVE_2017_8291
Embedded PostScript/EPS uses Ghostscript CVE-2017-8291 exploitation primitives.
CVE-2017-8291 is a Ghostscript -dSAFER bypass/type-confusion issue exploited through crafted EPS/PostScript content, including HWP documents that embed EPS in BinData streams. The rule covers both major static exploit shapes: the public .rsdparams plus /OutputFile(%pipe%) command execution path, and the .eqproc type-confusion path commonly hidden in '<HEX> cvx exec' staged HWP/EPS payloads.
Moniker Link — CVE-2024-21413 critical CVE likely CVE_2024_21413
Document contains a file:///\\ moniker-link target with an exclamation mark.
CVE-2024-21413 (Moniker Link) is a Microsoft Outlook/Office vulnerability where a specially crafted hyperlink using the file:///\\host\share\path!something format can bypass Protected View and trigger NTLM authentication in affected Office/Outlook handling paths. The rule matches the file:///\\host\share...! target shape; it does not prove which host application or preview path will process the link.
OLE OlePres zero-click RCE — CVE-2025-21298 critical CVE related CVE_2025_21298
RTF embedded OLE object contains malformed OlePres evidence.
CVE-2025-21298 (CVSS 9.8) is a zero-click vulnerability in Windows ole32.dll where the UtOlePresStmToContentsStm function mishandles memory while processing malformed OlePres streams. Public advisories classify it as a use-after-free. Outlook's Preview Pane can trigger the vulnerable parsing path. A bare OlePres stream name is now reported as a generic RTF_OLEPRES_STREAM marker rather than this CVE, because normal embedded OLE objects can contain presentation streams.
OLE/COM security bypass — CVE-2026-21509 critical CVE related CVE_2026_21509
Document contains Shell.Explorer.1 CLSID evidence plus OLE activation context.
CVE-2026-21509 is a Microsoft Office security feature bypass (CVSS 7.8) that exploits reliance on untrusted inputs in OLE/COM security decisions. Attackers craft documents with manipulated metadata so the parser incorrectly marks dangerous objects as 'Safe for Initialization,' allowing code execution without security warnings. Actively exploited in the wild. The scanner matches the Shell.Explorer.1 CLSID or ProgID and, for text forms, requires embedding context such as Ole10Native, RTF objdata, or an embedded PE marker; bare text is not enough.
Outlook NTLM credential theft — CVE-2023-23397 critical CVE related CVE_2023_23397
Outlook .msg contains UNC reminder evidence: exact for ReminderFileParameter, related for raw UNC fallback.
CVE-2023-23397 is a critical Microsoft Outlook privilege escalation / credential theft vulnerability with CVSS 9.8. An attacker sends a specially crafted meeting request, task, or appointment where the PidLidReminderFileParameter property contains a UNC path pointing to an attacker-controlled server. When Outlook processes the item (even before the user opens it), Windows automatically authenticates to the remote server using NTLM, leaking the victim's Net-NTLMv2 hash. This hash can be cracked offline or used in relay attacks for lateral movement across the network. The sandboxed MSG parser matches the ReminderFileParameter property stream and marks this exact when it contains a UNC path; fallback raw-byte UNC evidence remains related.
Outlook composite moniker in img tag — CVE-2024-38021 critical CVE related CVE_2024_38021
Document XML/HTML contains an <img> tag with file://...!... moniker URL.
CVE-2024-38021 is a zero-click vulnerability in Microsoft Outlook where composite monikers in <img> tag URLs are processed without setting the BlockMkParseDisplayNameOnCurrentThread flag. This allows code execution through image tag URLs using the file://...!... moniker syntax. The rule matches <img> tags in Office document XML/HTML parts whose source uses that moniker shape; outside Outlook, treat this as related moniker-abuse evidence rather than proof of zero-click exploitation.
PDF.js FontMatrix type confusion — CVE-2024-4367 critical CVE exact CVE_2024_4367
PDF font dictionary contains non-numeric FontMatrix values.
CVE-2024-4367 is a high-severity vulnerability in Mozilla PDF.js (used in Firefox and Thunderbird) where a missing type check in FontFaceObject.getPathGenerator allows arbitrary JavaScript execution. The exploit replaces numeric values in the /FontMatrix array with JavaScript code that gets compiled during glyph rendering. Affects Firefox < 126 and Thunderbird < 115.11.
SOAP Moniker — CVE-2017-8759 critical CVE likely CVE_2017_8759
OLE data contains the SOAP Moniker CLSID.
The scanner matches the SOAP Moniker CLSID bytes {ECABB0C7-7F19-11D2-978E-0000F8757E2A}. This is likely CVE evidence because the CLSID is the vulnerable moniker primitive, but the static rule does not validate a crafted WSDL body.
Sandworm OLE Package — CVE-2014-4114 critical CVE likely CVE_2014_4114
OLE Package CLSID found alongside executable file references.
CVE-2014-4114 is a Windows OLE remote code execution vulnerability exploited by the Russian Sandworm threat group in targeted attacks against NATO and Ukraine in 2014. The attack uses OLE Package objects embedded in PowerPoint (PPSX) files to silently drop and execute .inf and .exe files when the document is opened. The rule requires the OLE Package CLSID plus executable file references (.inf, .exe, .dll, .bat, .cmd, .scr, .vbs, .ps1, or .hta), so it is a strong package-dropper indicator but not a full reconstruction of the original Sandworm exploit chain.
Storm-0978/RomCom HTML RCE — CVE-2023-36884 critical CVE likely CVE_2023_36884
OOXML .rels file contains an auto-load relationship Target pointing to a remote .rtf URL.
CVE-2023-36884 (Storm-0978/RomCom) was exploited through specially crafted Office documents that led to remote RTF/MSHTML processing and payload delivery. The scanner now requires a non-hyperlink relationship that Office can auto-load (template/subdocument/frame/OLE/altChunk-style relationship) whose Target is an http:// or https:// URL ending in .rtf. Plain clickable hyperlinks to RTF files are not enough for this CVE rule.
Type 1 callOtherSubr operand-stack manipulation — CVE-2021-21086 critical CVE exact CVE_2021_21086
Decrypted Type 1 CharString matches the public callOtherSubr stack-pointer manipulation shape.
CVE-2021-21086 is an out-of-bounds write in Adobe Acrobat/Reader's CoolType Type 1 CharString interpreter. The Project Zero/Faraday analysis describes abusing predefined callOtherSubr handling, especially subroutine 18, to move the operand-stack pointer outside the operand stack and write toward the saved return address. The scanner decrypts Type 1 eexec and CharString data and flags repeated not/get/callOtherSubr bytecode sequences matching the public exploit generator.
URL Moniker weaponized URL — CVE-2017-0199 critical CVE exact CVE_2017_0199_WEAPONIZED_URL
URL Moniker OLE link points to an HTA/script/template-style remote loader.
This is the tighter CVE-2017-0199 static shape: URL Moniker / OLE2Link evidence plus a remote URL ending in an executable Office-loader extension such as .hta, .sct, .wsf, .xsl, .mht, or a macro-capable template. Generic remote URL Moniker evidence remains CVE_2017_0199 likely because the returned server content cannot be proven statically.
URL Moniker — CVE-2017-0199 critical CVE likely CVE_2017_0199
URL Moniker OLE link points to a remote loader.
CVE-2017-0199 abuses OLE2Link / URL Moniker handling so Office fetches remote content and processes it based on the response type. The scanner matches embedded URL Moniker structures or URL Moniker CLSID bytes near weaponised remote-loader targets; it does not prove the server-side content type or payload returned at scan time.
UTF-16BE Base URL — CVE-2021-39863 critical CVE exact CVE_2021_39863
PDF catalog uses a UTF-16BE /URI /Base value and JavaScript resolves a relative URL.
CVE-2021-39863 is an Adobe Acrobat/Reader heap-based buffer overflow in document base-URL concatenation. Exodus Intelligence documented that earlier research had associated this primitive with CVE-2021-21017, but Adobe assigned CVE-2021-39863 for the still-vulnerable bug. The rule requires the malformed UTF-16BE Base URL primitive plus submitForm(), app.launchURL(), or app.media.createPlayer() JavaScript, avoiding broad matches on ordinary PDF JavaScript or normal /URI actions.
\fonttbl heap overflow — CVE-2023-21716 critical CVE exact CVE_2023_21716
RTF font table with excessive entries — Word heap buffer overflow.
CVE-2023-21716 is a critical heap buffer overflow in Microsoft Word's RTF parser triggered by an RTF file with an abnormally large number of font entries in the \fonttbl group. The rule requires a \fonttbl and counts at least 32768 \fN font entries, matching the public exploit-scale trigger rather than merely large but valid font tables.
\pFragments RTF stack overflow — CVE-2010-3333 critical CVE exact CVE_2010_3333
RTF contains an oversized pFragments value.
CVE-2010-3333 is a stack-based buffer overflow in Microsoft Word 2002, 2003, and 2007 triggered by a crafted pFragments value in an RTF document. The scanner emits this CVE rule when the control-word numeric argument is at least 256 or when the canonical shape-property form {\sn pFragments}{\sv ...} carries an oversized value; bare pFragments without exploit-sized data is reported separately as related RTF evidence.
customUI external link — CVE-2021-42292 critical CVE related CVE_2021_42292
customUI ribbon part contains an external relationship target.
The scanner looks in customUI XML/.rels parts for TargetMode="External" relationships. This is related evidence because the customUI external-load surface is present, but public summaries provide limited technical detail and the rule does not prove the full exploit chain.
dataObjects ESObject stale-cache trigger — CVE-2020-9715 critical CVE exact CVE_2020_9715
PDF embeds a file and JavaScript triggers the dataObjects ESObject use-after-free pattern.
CVE-2020-9715 is an Adobe Acrobat/Reader ESObject use-after-free. The PixiePoint/ZDI trigger creates a Data ESObject by accessing this.dataObjects[0].toString(), clears this.dataObjects[0], then uses app.setTimeOut() to run garbage collection before re-accessing the stale cached Data ESObject. The rule requires the embedded-file surface plus the dataObjects toString/null/setTimeOut lifecycle so ordinary attachment-bearing PDFs are not flagged as the CVE.
media.newPlayer — CVE-2009-4324 critical CVE exact CVE_2009_4324
PDF JavaScript calls the media.newPlayer API.
CVE-2009-4324 (CVSS 9.3) is a use-after-free vulnerability in Adobe Reader's multimedia plugin triggered by the media.newPlayer() JavaScript API. It was actively exploited as a zero-day in December 2009 and became one of the most widely exploited PDF vulnerabilities. The newPlayer API is extremely rarely used in legitimate PDF documents — its presence is a useful indicator of an exploit attempt. CISA KEV.
util.printf — CVE-2008-2992 critical CVE exact CVE_2008_2992
PDF JavaScript invokes util.printf() with an oversized format/string argument.
CVE-2008-2992 is a widely-exploited stack buffer overflow in Adobe Reader's util.printf JavaScript implementation. The rule matches util.printf() only when the argument contains a very long format specifier (for example %0000x with 4+ digits) or a quoted string of at least 256 bytes.
ADODB.RecordSet — CVE-2015-0097 high CVE likely CVE_2015_0097
OLE data contains the ADODB.RecordSet CLSID.
The scanner matches the ADODB.RecordSet CLSID bytes. This is likely CVE evidence for CVE-2015-0097-era sandbox-escape documents, but the static rule does not prove the surrounding exploit logic.
Anomalous Equation Editor native stream — CVE-2018-0798 likely high CVE likely CVE_2018_0798_EQUATION_NATIVE_ANOMALY
Embedded Equation Editor OLE data contains malformed, payload-like native stream bytes.
CVE-2018-0798 is an Equation Editor memory-corruption vulnerability in the MTEF Matrix-record parser. Some weaponized Office samples carry malformed Equation native data that is high-entropy or otherwise payload-like but does not preserve the exact public 0x60/0x61 matrix signature. This rule requires an embedded Equation Editor CLSID and an anomalous native/Ole10Native stream, so it is treated as likely CVE-2018-0798-family evidence rather than an exact match.
CoolType/SING font exploit indicator high CVE related PDF_COOLTYPE_SING
PDF font data contains SING/CoolType markers inside font content.
Adobe Reader CoolType font parsing has been exploited by document CVEs such as CVE-2010-1297. The rule matches SING markers in font-related PDF data; it is related evidence for the CoolType attack surface, not proof of a specific malformed font CVE.
Exchange P2 FROM header spoofing — CVE-2024-49040 high CVE likely CVE_2024_49040
Raw email From header contains multiple parsed/angle-bracket addresses.
CVE-2024-49040 exploits improper P2 FROM header parsing in Microsoft Exchange Server. By including multiple angle-bracket addresses in the From header (non-RFC-compliant syntax), attackers can make the displayed sender address differ from the actual routing address. The rule inspects the raw From line before parser normalization and fires when it contains multiple address markers or parses as multiple mailboxes; it cannot prove the recipient Exchange server was vulnerable.
GoToE/GoToR UNC action — CVE-2018-4993 high CVE exact CVE_2018_4993_GOTOE_UNC
PDF automatic/open action uses GoToE or GoToR with a UNC /F target.
This is the tighter Adobe Reader NTLM credential-leak shape: an /AA or /OpenAction-triggered GoToE/GoToR action whose /F file target is a UNC path. That matches the public CVE-2018-4993 proof-of-concept pattern more closely than the broader UNC-in-action rule.
JBIG2 + active content high CVE related PDF_JBIG2_ACTIVE_CONTENT
PDF uses JBIG2Decode/JBIG2 data alongside active content.
JBIG2 plus active content is a high-value parser-exploit indicator for families including CVE-2021-30860 and CVE-2009-0658. The rule matches /JBIG2Decode or JBIG2 signatures plus active content such as JavaScript, XFA, or RichMedia; it does not uniquely identify either CVE.
MSCOMCTL.ListView — CVE-2012-0158 high CVE likely CVE_2012_0158
OLE data contains the MSCOMCTL.ListView CLSID.
The scanner matches the ListView ActiveX CLSID bytes. This identifies the vulnerable control used by CVE-2012-0158 campaigns, but does not parse the crafted control property data needed to prove the overflow.
MSCOMCTL.Toolbar — CVE-2012-0158 / CVE-2012-1856 high CVE likely CVE_2012_1856
OLE data contains the MSCOMCTL.Toolbar CLSID.
The scanner matches the Toolbar ActiveX CLSID bytes. This identifies the vulnerable control used by CVE-2012-0158/1856 campaigns, but does not parse the crafted control property data needed to prove the overflow.
MSScriptControl — CVE-2015-0097 high CVE likely CVE_2015_0097_SC
OLE data contains the MSScriptControl.ScriptControl CLSID.
The scanner matches the MSScriptControl.ScriptControl CLSID bytes. Treat this as related local-zone or scripting-surface evidence rather than a specific proof of CVE-2015-0097, because public Microsoft guidance ties ADODB.RecordSet more directly to that CVE's workaround.
Malformed JPEG2000/JP2 box structure high CVE related PDF_JP2_BOX_ANOMALY
Embedded JP2/JPEG2000 data has invalid, oversized, or truncated box sizes.
Malformed JP2 boxes provide stronger evidence than a bare /JPXDecode filter for JPEG2000 parser attack surface, but still do not prove a specific CVE such as CVE-2018-4990. The rule matches JP2 box lengths statically and flags impossible or truncated structures.
Suspicious Equation Editor Matrix record — CVE-2018-0798 likely high CVE likely CVE_2018_0798_MTEF_ANOMALY
Equation Editor MTEF Matrix record has an anomalous exploit-like shape.
The scanner found an abnormal MTEF Matrix record inside Equation Editor OLE data, but not the tighter public CVE-2018-0798 byte pattern. Treat this as likely Equation Editor exploit evidence rather than an exact CVE-2018-0798 signature.
Suspicious JBIG2 segment structure high CVE related PDF_JBIG2_SEGMENT_ANOMALY
Embedded JBIG2 data contains anomalous segment headers or sizes.
Malformed JBIG2 segment structure is a parser-exploit indicator. Use this as related evidence for JBIG2 decoder CVE families. The rule looks for JBIG2 signatures plus suspicious segment-header shapes, not a validated FORCEDENTRY-style logical circuit.
UNC path in PDF — CVE-2018-4993/CVE-2019-7089 high CVE likely CVE_2018_4993
PDF action target contains a UNC path and the file has action triggers.
CVE-2018-4993 (Adobe Acrobat/Reader) and CVE-2019-7089 allow an attacker to steal NTLM authentication credentials by embedding a UNC path (\\server\share) in a PDF action (JavaScript, GoToR, URI, etc.). When the victim opens the PDF, a vulnerable viewer may resolve the UNC path and Windows can send the user's Net-NTLMv2 hash to the attacker's server. The hash can be cracked offline or used in relay attacks to authenticate as the victim. The rule requires the UNC path to be inside a PDF action target parameter (/F, /URI, /D, or /Target) plus an action keyword such as /JavaScript, /GoToR, /URI, /Launch, /OpenAction, or /AA.
Word OLE security bypass — CVE-2026-21514 high CVE likely CVE_2026_21514
Document contains CVE-2026-21514-style Word/OLE bypass indicators.
CVE-2026-21514 is a Microsoft Word security feature bypass (CVSS 7.8) that exploits how Word validates OLE stream metadata. Malicious documents use crafted metadata to bypass security checks and execute without macro or Protected View warnings. The Ole10Native stream can embed payloads; combined with the bypass, it enables code execution. The rule matches Ole10Native plus strong embedding indicators, or the observed RTF-embedded Word package shape where webSettings.xml.rels contains a frame relationship to a local Windows diagnostics XML target. It also matches the altChunk/RTF shape where a hidden \svb hex package contains DrsE2oDoc, graphicFrameDoc, and downRevStg drawing compatibility parts.
\listoverridecount corruption — CVE-2014-1761 high CVE exact CVE_2014_1761
RTF \listoverridecount with abnormally large value.
CVE-2014-1761 is a memory corruption vulnerability in Microsoft Word triggered by an excessively large \listoverridecount value in an RTF document. The vulnerability allows arbitrary code execution and was used in targeted attacks. The rule fires only when \listoverridecount has a numeric value at or above 2048, not on ordinary list metadata.
getAnnots — CVE-2009-1492 high CVE exact CVE_2009_1492
PDF JavaScript calls getAnnots() with an exploit-shaped argument.
CVE-2009-1492 affects Adobe Reader's annotation handling. Rule fires only when the call has an integer-overflow numeric (many-F hex or >int32 decimal) or a long stuffed-string argument; plain calls like getAnnots({nPage:0}) used as harmless staging boilerplate in exploit kits are not flagged.
spell.customDictionaryOpen — CVE-2009-1493 high CVE exact CVE_2009_1493
PDF JavaScript invokes spell.customDictionaryOpen() with a long string argument.
CVE-2009-1493 is a stack buffer overflow in Adobe Reader's spell-check API. The rule requires spell.customDictionaryOpen() with a quoted string at least 128 bytes long; ordinary short dictionary names are not flagged.
PRC/3D content in PDF medium CVE related PDF_PRC_3D
PDF contains PRC 3D content markers.
3D parsers in PDF viewers are a recurring exploit surface. PRC content is rare in normal business documents and should be reviewed as related parser-exploit evidence. The rule matches /Subtype /PRC or /PRCStream markers; it does not validate malformed PRC records.

Email 51

ClamAV detected malware at linked site critical EMAIL_URL_CLAMAV
A page linked in the email matched a ClamAV malware signature.
The analyzer downloaded the linked page with a browser-like user-agent and scanned it with ClamAV. The page matched a known malware or phishing-kit signature, so treat the link as high-risk.
Dangerous attachment type critical EMAIL_DANGEROUS_ATTACH
An attachment has a file extension that can execute code.
The email contains an attachment with a file type that can execute code (for example .exe, .js, .vbs, .scr, .bat, or .hta). Executable attachments are high-risk in email, including when the sender appears familiar.
Double file extension critical EMAIL_DOUBLE_EXT
An attachment uses a double extension to disguise its type.
The attachment has two file extensions, such as 'invoice.pdf.exe'. If the mail client or operating system hides the final extension, the executable can appear to be a document.
HTML smuggling in email critical EMAIL_HTML_SMUGGLING
Email HTML contains JavaScript patterns for dynamic payload construction.
HTML smuggling is a technique where email HTML contains JavaScript that dynamically constructs and triggers the download of executable payloads using Blob/createObjectURL or base64 decoding. This can bypass email gateway scanners because the payload is assembled in the victim's browser instead of being attached as a normal file.
Hyperlink uses javascript:/vbscript: scheme critical EMAIL_HREF_SCRIPT
An <a> tag's href executes script when clicked.
javascript: and vbscript: links execute script in the rendering context. Many webmail providers strip them, but some clients and previewers may not.
Link text doesn't match destination critical EMAIL_URL_MISMATCH
A hyperlink displays one URL but actually leads to a different domain.
This is a high-risk phishing technique. The email shows a link that looks like it goes to a trusted site (e.g. 'https://yourbank.com') but when you click it, you're taken to a completely different website controlled by the attacker. A visible URL that points to a different domain is high-risk, though forwarded or rewritten mail can occasionally produce odd link text.
Linked page contains a login form critical EMAIL_URL_LOGIN_PAGE
The email links to a page with a password input field.
The analyzer fetched the linked page and found a password input field. That is a high-risk phishing shape when reached from unsolicited mail, especially if the page uses brand names or external form handlers.
Assessed phishing intent high EMAIL_INTENT
The email matches one or more phishing-intent patterns.
The analyzer examines the full text of the email — subject line, body, HTML structure, attachments, and linked pages — and matches it against known attack patterns (credential harvesting, financial fraud, malware delivery, data exfiltration, account takeover, business email compromise, and extortion). The assessed intent tells you *what the attacker is trying to achieve*, helping you understand the specific risk and take appropriate action.
Brand name in subdomain of unrelated domain high EMAIL_BRAND_IN_SUBDOMAIN
Sender domain has a known brand label as a subdomain but the registrable domain isn't legitimate.
The sender domain contains a brand string as a subdomain, such as 'microsoft.com.example.tld'. This can make the address look familiar even though the registrable domain is different.
Business email compromise (BEC) language high EMAIL_BEC_PATTERN
Body matches multiple BEC / wire-fraud / gift-card / executive-impersonation patterns.
The rule matches multiple business email compromise patterns, such as 'Are you available?', gift-card requests, wire-transfer instructions, or last-minute vendor bank-detail changes. It fires only when at least two distinct patterns are present.
DKIM signing domain differs from sender high EMAIL_DKIM_D_MISMATCH
DKIM-Signature d= tag does not match the From: domain.
DKIM lets a third party sign for any domain, so DKIM=pass is not the same as 'the message comes from who it claims'. When the d= tag points at a different domain than From:, DKIM alignment fails; DMARC may still pass if SPF aligns with the From domain. This mismatch is useful evidence when reviewing sender authenticity.
Email authentication failure high EMAIL_AUTH_FAIL
SPF, DKIM, or DMARC authentication failed.
Email authentication protocols verify that the sender is who they claim to be. SPF checks if the sending server is authorised by the domain owner. DKIM verifies the email hasn't been tampered with using a cryptographic signature. DMARC ties SPF and DKIM together with a policy. When any of these fail, it means the email may be spoofed — sent by someone pretending to be from a domain they don't control.
Freemail impersonating an organisation high EMAIL_FREEMAIL_ORG
Sender claims to be an organisation but uses a free email provider.
The email claims to be from an official organisation (bank, government, tech company) but is sent from a free email service like Gmail, Yahoo, or Outlook.com. Many legitimate organisations use their own corporate domain (e.g. @yourbank.com, not @gmail.com), so a freemail sender is useful impersonation evidence.
HTML file attached high EMAIL_HTML_ATTACHMENT
Attachment is an .htm / .html / .shtml / .xhtml file (or text/html MIME).
HTML attachments open as local browser pages, which can avoid link rewriting and URL reputation checks. They are common in credential phishing, but some business workflows also send HTML reports or exports.
HTML form in email body high EMAIL_HTML_FORM
The email contains an HTML form element.
An HTML form embedded in an email can capture typed information such as passwords or card numbers. Many mail clients block or alter forms, but their presence is still high-risk in unsolicited mail.
Hyperlink uses data: URI high EMAIL_HREF_DATA_URI
An <a> tag's href is a data: URI.
data: URIs are decoded inline by the browser, leaving no remote URL for reputation checks. They can carry phishing pages or small downloads directly inside the link.
IP address in URL high EMAIL_URL_IP
A URL in the email uses an IP address instead of a domain name.
The rule matches links that use an IP address instead of a domain name. Temporary phishing infrastructure often does this, though internal systems and appliances can also use IP-based URLs.
Image-only attachment with phishing-shape language (quishing) high EMAIL_QR_LURE_ATTACHMENT
Email's only attachment(s) are images and the body uses MFA / verification / scan language.
QR-code phishing can hide the target URL inside an image so mail gateways do not see a clickable link. This rule looks for image-only attachments paired with MFA, verification, or scan-language lures.
Internationalised domain name (IDN) in URL high EMAIL_URL_HOMOGRAPH
URL contains a Punycode domain that may be a visual lookalike.
The rule matches Punycode domains, which can represent internationalised domain names. Some phishing sites use lookalike characters such as Cyrillic or Greek letters; legitimate internationalised domains can also use Punycode.
Linked page contains obfuscated JavaScript high EMAIL_URL_OBFUSCATED_JS
The linked page uses JavaScript obfuscation techniques.
The page uses eval(), unescape(), fromCharCode, or similar obfuscation. Heavy JavaScript obfuscation is common in phishing kits and exploit pages; some legitimate sites also minify code, so the exact technique matters.
Linked page impersonates known brand(s) high EMAIL_URL_BRAND_SPOOF
The linked page references well-known brand names.
The downloaded page contains brand names of major services (banks, cloud providers, delivery companies, etc.). Phishing kits replicate the logos, colours, and layout of trusted brands to make the fake page convincing. The presence of brand names on a non-official domain is a useful indicator of impersonation.
Linked page submits data to external server high EMAIL_URL_DATA_EXFIL
The page has both input forms and JavaScript that sends data externally.
The linked page has form fields and JavaScript paths that submit data to another server. In phishing pages this is used to collect credentials or session data; legitimate sites can also use external form processors.
MFA/OTP language with a clickable link high EMAIL_MFA_PHISHING
Email mentions MFA, 2FA, OTP, or verification codes alongside an action link.
The rule matches MFA, OTP, authenticator, or sign-in alert language together with an action link. That combination is common in adversary-in-the-middle phishing, but account-security mail can use similar terms.
Multiple From: headers high EMAIL_MULTIPLE_FROM_HEADERS
Email contains more than one From: header.
RFC 5322 requires exactly one From: header. Multiple From: headers are a header-injection technique: different MTAs and clients can disagree on which one to display vs which one to authenticate, allowing the visible sender to be different from the one the spam filter and DMARC check.
Password-protected archive suspected high EMAIL_PASSWORD_ARCHIVE
Email mentions a password and includes an archive attachment.
The email includes an archive and gives a password in the body. Encrypted archives can prevent mail gateways from inspecting the contents before the recipient extracts them.
Reply-To domain differs from sender high EMAIL_REPLYTO_DIFF
The Reply-To address points to a different domain than the sender.
When you reply to this email, your response will go to a different domain than the apparent sender. Phishing campaigns use this so replies containing sensitive information go somewhere other than the apparent sender.
Reply-To redirects to a freemail account high EMAIL_REPLY_TO_FREEMAIL
Sender uses a corporate domain but Reply-To is on a freemail provider.
Classic business email compromise pattern: spoof a colleague or vendor's address, then quietly redirect any reply to an attacker-controlled freemail inbox (Gmail, Outlook.com). The victim believes they are responding to the real person.
Right-to-left override character (Unicode evasion) high EMAIL_RTL_OVERRIDE
Headers, body, or filename contain U+202E or related bidi-override characters.
RTL-override characters reverse the displayed reading order of subsequent text. They can make filenames like 'invoice‮fdp.exe' display as a different extension in some clients and can also interfere with keyword matching.
Sender display name spoofing high EMAIL_FROM_MISMATCH
The display name contains an email address that differs from the actual sender.
Phishers set the display name to a trusted email address (e.g. 'support@yourbank.com') while the real From address is completely different. Most email clients show the display name prominently and hide the actual address, so recipients believe the email is from a trusted source. Always check the actual email address, not just the name shown.
Sender domain looks like a known brand high EMAIL_LOOKALIKE_DOMAIN
Sender registrable domain is one or two character edits from a brand domain.
The analyzer compares the sender's domain to a small list of well-known brand domains using bounded edit distance. Close lookalikes such as 'paypa1.com' or 'micros0ft.com' are useful impersonation evidence.
URL contains userinfo (user@host display spoof) high EMAIL_URL_CREDENTIALS
A URL in the body uses the user:pass@host syntax.
Browsers treat everything before the first '@' in a URL authority as userinfo and route to the host after it. A URL such as 'https://login.bank.com@example.tld/' can therefore display a familiar name while pointing elsewhere.
no-reply sender with Reply-To override high EMAIL_NOREPLY_WITH_REPLYTO
From: is a no-reply / do-not-reply address but the message has Reply-To.
Legitimate transactional 'no-reply' mailers do not set Reply-To — the whole point is that replies go nowhere. Phishers spoof a no-reply From and set Reply-To to capture replies (or push the conversation to a different inbox) while the victim still trusts the visible sender.
Body is essentially a single URL medium EMAIL_SHORT_BODY_URL_ONLY
Email body, with URLs removed, is fewer than 80 characters.
The body contains little text once URLs are removed. Minimal 'click here' messages are common in throwaway phishing, though automated notifications can also be brief.
Calendar invite attachment (.ics / text/calendar) medium EMAIL_CALENDAR_INVITE
Email contains a calendar-invite attachment.
Mail clients (especially Outlook) auto-render calendar invites in the preview pane. Links and attachments embedded in the description / location fields become clickable before the recipient ever opens the file. The vector is increasingly used for OAuth-consent phishing and BEC.
Email parse error medium EMAIL_PARSE_ERROR
Failed to parse the email file.
The email could not be parsed properly. It may be corrupt, truncated, or deliberately malformed to evade analysis. Malformed emails can sometimes exploit vulnerabilities in email clients.
HTML-entity-encoded hyperlink medium EMAIL_HREF_OBFUSCATED
An <a> tag's href contains long runs of HTML character entities.
Encoding the URL as decimal or hex character entities ('&#104;&#116;…') preserves the link's meaning in the browser but defeats simple gateway scanners that don't decode entities. Used together with shorteners or punycode to compound the obfuscation.
Hidden text in HTML body medium EMAIL_HIDDEN_TEXT
CSS techniques used to hide text from the reader.
The email contains invisible text (font-size: 0, display: none, white text on white background, etc.). Hidden text can be used to confuse spam filters by including 'legitimate' invisible words, or to hide malicious content from casual inspection while it remains functional in the HTML code.
Link to phishing-heavy TLD medium EMAIL_URL_SUSPICIOUS_TLD
Email body links to a domain on a phishing-heavy TLD.
This applies the phishing-heavy TLD list to links in the body. It is most useful when combined with brand language, login prompts, or other phishing signals.
Phishing language detected medium EMAIL_PHISHING_KEYWORDS
The email body contains typical phishing language patterns.
The rule matches common phishing phrases such as account verification, suspension threats, sensitive-information requests, generic greetings, and prize language. These phrases are useful context, not a verdict by themselves.
Re:/Fwd: subject without thread headers medium EMAIL_FAKE_REPLY
Subject begins with Re:/Fwd: but no In-Reply-To or References header is present.
Real replies and forwards often carry In-Reply-To or References headers. A Re:/Fwd: subject without thread headers can indicate a cold message made to look like an existing conversation.
Return-Path domain mismatch medium EMAIL_RETURN_PATH_MISMATCH
The Return-Path domain differs from the sender domain.
The Return-Path (envelope sender) is the address that receives bounce messages. When it differs from the From address, it may indicate the email was sent through a third-party service or is spoofed. Legitimate organisations typically have matching domains.
Sender domain on phishing-heavy TLD medium EMAIL_SUSPICIOUS_TLD
Sender uses a top-level domain over-represented in phishing.
Some TLDs ('.zip', '.mov', '.top', '.click', '.country', '.work', etc.) are over-represented in phishing and abuse datasets. This is a sender reputation signal, not proof by itself.
URL shortener link detected medium EMAIL_URL_SHORTENER
Email contains a shortened URL that hides the true destination.
URL shorteners (bit.ly, tinyurl.com, etc.) are legitimate services, but phishers abuse them to hide destinations. This finding means the destination is not visible from the email text alone; expand or inspect the URL before trusting it.
Urgency / pressure in subject line medium EMAIL_URGENCY_SUBJECT
The subject line contains urgency or pressure language.
The rule matches urgency phrases such as 'Act now', 'Account suspended', or 'Verify immediately'. Urgency appears in both phishing and real account notices, so this is supporting context rather than a standalone verdict.
Zero-width / invisible Unicode characters medium EMAIL_ZERO_WIDTH_CHARS
Subject, display name, or body contains zero-width Unicode characters.
Zero-width spaces and joiners (U+200B, U+200C, U+200D, U+FEFF) are invisible to readers but can break simple keyword matching. They can also be used as per-recipient markers in text.
Missing Message-ID header low EMAIL_NO_MESSAGEID
The email has no Message-ID header.
Most normal mail systems add a unique Message-ID. A missing Message-ID can mean the message was generated by a script, tool, or unusual mail path.
Tracking pixel detected low EMAIL_TRACKING_PIXEL
Tiny external image likely used for email open tracking.
A tracking pixel is a tiny (usually 1x1) invisible image loaded from an external server. When your email client loads this image, it tells the sender that you opened the email, your IP address, and sometimes your location. Phishers use tracking pixels to confirm which email addresses are active and being read, making you a higher-value target.
Unusually many attachments low EMAIL_MANY_ATTACHMENTS
The email has an unusually large number of attachments.
While not inherently malicious, a large number of attachments is unusual and may indicate an attempt to overwhelm or confuse the recipient.
Limited .msg parsing info EMAIL_MSG_LIMITED
The .msg format requires additional libraries for full analysis.
Outlook .msg files use a proprietary format. Full parsing requires the 'extract-msg' Python library. Without it, only basic text-level scanning is performed. For complete email analysis, save the email as .eml format.
Linked page scan summary info EMAIL_URL_SCAN_SUMMARY
Summary of pages downloaded and scanned from links in the email.
The analyzer used curl with a realistic Chrome browser user-agent to retrieve the web pages linked in the email. Each page was scanned with ClamAV for known malware/phishing signatures, and analysed for phishing indicators such as login forms with password fields, brand impersonation, obfuscated JavaScript, and data exfiltration code. This summary shows what was scanned and whether anything suspicious was found.

Archive 5

Corrupt or invalid ZIP archive medium ARCHIVE_CORRUPT
The file has a ZIP signature but could not be opened.
The archive appears to be a ZIP file based on its header bytes, but the internal structure is invalid. It may be corrupt, truncated, or deliberately malformed to exploit vulnerabilities in ZIP parsers.
Encrypted archive — could not decrypt medium ARCHIVE_ENCRYPTED
The archive is password-protected with an unknown password.
The ZIP archive is encrypted and none of the common analysis passwords worked. The files inside could not be extracted or scanned. If you know the password, extract the files manually and upload them individually.
Total decompression limit reached medium ARCHIVE_SIZE_LIMIT
The total decompressed size exceeded the safety limit.
To protect against zip bombs (archives that decompress to enormous sizes), the analyzer caps total decompressed output. Some files in the archive may not have been scanned.
Archive entry limit reached info ARCHIVE_LIMIT
Only a limited number of files were scanned from the archive.
To prevent resource exhaustion, the analyzer limits the number of files it extracts and scans from a single archive.
Oversized archive entry skipped info ARCHIVE_LARGE_ENTRY
An entry in the archive exceeds the per-file size limit.
Individual files inside the archive are capped at 50 MB. Files exceeding this limit are skipped to prevent memory exhaustion.

PSD 1

Embedded PE in PSD critical PSD_EMBEDDED_EXE
PE executable found inside Photoshop file.
A Windows executable inside a PSD file indicates the image is being used as a container for malware.

ClamAV 2

ClamAV malware detection critical CLAMAV_DETECTION
ClamAV antivirus identified this file as malware.
ClamAV is an open-source antivirus engine with a database of known malware signatures. A positive detection means this file matches a known malicious pattern catalogued in the ClamAV virus database.
ClamAV scan did not complete info CLAMAV_SCAN_INCOMPLETE
ClamAV invocation failed (timeout, daemon unreachable, or DB missing).
The ClamAV signature pass on this run did not finish — typically because clamd was reloading its signature database (after freshclam) or briefly unreachable. The static heuristics still ran, but the AV signature signal is absent for this scan, so the score may be lower than it would be with a working clamd. Results carrying this marker are not cached, so a later lookup re-runs the analysis.

General 16

PDF/ZIP bundle contains child CVE exploit critical POLYGLOT_PDF_APPENDED_ZIP_CVE_BUNDLE
A ZIP appended after PDF EOF contains a member that matches a CVE rule.
The analyzer scanned the visible PDF and then scanned ZIP bytes appended after %%EOF. A member of that appended archive independently matched a CVE-specific exploit rule, so this finding ties the child CVE back to the parent PDF as a bundled multi-exploit delivery package.
Multiple structural anomalies in a single file high SPEC_DIVERGENCE_HIGH
Three or more independent structural-anomaly heuristics fired on this file.
Format parsers tolerate a wide range of small spec violations in benign files. When multiple independent structural-anomaly rules fire together on the same file, it usually means the file was crafted to confuse a specific parser into mis-handling its content — the canonical shape of exploitation against a memory-safety bug. The aggregate signal is more reliable than any single anomaly.
Non-PDF with .pdf extension high PDF_EXTENSION_MISMATCH
File was submitted as .pdf but does not contain a PDF header.
A file using a PDF extension without PDF structure is a masquerade or evasion pattern. It should not be considered a clean PDF simply because of the filename.
PDF with appended ZIP archive high POLYGLOT_PDF_ZIP_APPENDED
ZIP local-file header appears after the last %%EOF in a PDF.
Polyglot files carry valid bytes for two formats simultaneously. A PDF followed by a ZIP local-file header at the tail will open in a PDF reader as the document and in an archive parser as the ZIP. This is a known exploitation primitive used to smuggle payloads past one-format scanners.
Suspicious extracted artifact high EXTRACTED_FILE_STATIC_TRIAGE
A file carved from inside the sample matched static suspicious-content checks.
The analyzer performs a lightweight triage pass on carved artifacts in addition to ClamAV. It looks for signals such as script obfuscation, long encoded blobs, PowerShell encoded commands, VBA auto-exec with execution terms, and high-entropy packed content. These are not signature matches by themselves, but they are useful evidence when an embedded payload is trying to hide behavior from simple static inspection.
Tag-set historically associated with malicious files high CORPUS_HISTORICALLY_MALICIOUS
This combination of heuristics has fired multiple times before, mostly on malicious files.
The analyzer keeps a record of every prior scan's heuristic combination. When the same combination has been seen at least three times in the last 90 days and at least 80% of those were classified malicious, the corpus prior is a strong indicator that the current file should be treated the same way.
Text document carries an embedded PDF body high POLYGLOT_TEXT_PDF
%PDF- magic and %%EOF trailer found inside a text document.
A document classified as HTML, RTF, or script that also contains a fully formed %PDF body (header and trailer) is a polyglot: browsers render the wrapper, PDF readers open the embedded document. Used to deliver PDF exploits while evading file-type-based filtering.
ZIP/OOXML container with non-ZIP prefix bytes high POLYGLOT_ZIP_PREFIXED
Non-empty bytes precede the first ZIP local-file header.
Many ZIP-based parsers (including OOXML readers) scan forward to find the central directory and successfully open archives that have arbitrary prefix bytes. Format-aware parsers see only the prefix. Mismatched parser behaviour on the same bytes is a polyglot delivery pattern.
Rare heuristic combination, previously flagged medium CORPUS_RARE_COMBINATION
An unusual combination of heuristics that has been flagged on at least one prior file.
Files that fire heuristic combinations rarely seen in the corpus are worth a closer look — especially when at least one prior occurrence was suspicious or malicious. This is a weaker signal than the historically-malicious combination, but it surfaces low-volume patterns that signature-style rules would miss.
Rare structural feature combination, previously flagged medium CORPUS_RARE_STRUCTURAL_FEATURE_SET
Normalized parser/payload feature set is rare and has appeared on a flagged file before.
This corpus signal groups exact rule IDs into broader structural features such as font-parser anomalies, object-graph divergence, embedded payloads, and shellcode evidence. It can surface recurring exploit shapes even when the precise signature names differ.
Analysis timed out (partial result) info ANALYSIS_TIMEOUT_PARTIAL
Analysis exceeded the wall-clock timeout; phases that had completed before the timeout are preserved.
Some scanners (regex-heavy parsers, large structured documents) can hit the per-file wall-clock budget. Unlike SCAN_INCOMPLETE this finding does not flag the result as needing retry — re-scanning the same bytes will hit the same timeout, so the partial result is cached as-is. Operators investigating individual cases can force a fresh scan via the rescan API.
Embedded URL info EMBEDDED_URL
One or more URLs were extracted from the document bytes but were not attributed to any other specific heuristic.
URL-themed rules attribute URLs they actually evaluated (e.g. PDF link annotations, macro download calls). URLs that appear in the file but were not tied to a specific finding are surfaced here so an analyst can still see and triage them. The per-finding detail names the most likely channel (e.g. 'Macro calls URL', 'PDF link annotation references URL') inferred from the other rules that fired on the same file.
Macro capabilities present but unconfirmed info MACRO_CAPABILITY_UNCORROBORATED
An Office document's VBA exposes execution capabilities (Shell / WScript / CreateObject / auto-exec) but nothing corroborates malicious intent, so the verdict was capped at suspicious rather than malicious.
Capability-presence rules fire on a single keyword and carry full weight, so one keyword can otherwise reach the malicious threshold and false-positive legitimate macro-heavy business documents. This note is added when only such capability rules fired and there is no obfuscation, memory-exec primitive, download+exec chain, encoded payload, LOLBin, DDE, AV hit, or suspicious URL to corroborate malice. Low-confidence structural or social-engineering rules (an 'enable macros' instruction, external hyperlinks, a hidden sheet) do not count as corroboration; any genuine malice signal leaves the verdict untouched.
PDF appended ZIP child scan incomplete info POLYGLOT_PDF_APPENDED_ZIP_SCAN_INCOMPLETE
A ZIP appended after PDF EOF could not be fully scanned.
ZIP bytes were found after the PDF EOF marker, but the appended archive member scan failed or could not run. Parent PDF findings remain valid, but child payload attribution may be incomplete.
Scan did not complete info SCAN_INCOMPLETE
A scanner failed, timed out, or hit an output/resource cap before analysis completed.
This is an operational failure, not evidence that the file is safe. At least one required parser or worker did not finish, so the result is marked Error and should be retried or reviewed manually.
Unrecognised file format info UNKNOWN_FORMAT
File format was not recognised by the analyzer.
The file does not match any supported document format. Only generic shellcode scanning was performed. The file may still be harmful.

HTML 7

HTA/VBScript DOM-text execution critical HTML_HTA_VBSCRIPT_DOM_EXECUTE
HTA/VBScript document executes code assembled from DOM text.
Malicious HTA attachments often split the real script across HTML text nodes, then use VBScript Execute to run the reconstructed body on load. This hides the payload from simple script-block scanners while preserving automatic execution.
HTML ActiveX/COM object high HTML_ACTIVEX_OBJECT
HTML script instantiates ActiveX or COM objects.
CreateObject and ActiveXObject let script reach Windows automation interfaces such as WScript.Shell, XMLHTTP, and ADODB.Stream. That is rare in benign documents and common in script malware.
HTML Windows scripting object high HTML_WINDOWS_SCRIPTING_OBJECT
HTML references COM objects commonly used for execution or payload download.
Objects such as WScript.Shell, Shell.Application, MSXML2.XMLHTTP, and ADODB.Stream provide command execution and staged-download capability from script.
HTML contains VBScript high HTML_VBSCRIPT
Standalone HTML contains a VBScript script block.
VBScript in local HTML documents is a legacy Windows execution surface. Malicious attachments commonly use it with COM objects to download, drop, or execute payloads.
HTML scripted COM execution high HTML_SCRIPTED_COM_EXECUTION
HTML script dynamically creates objects and invokes execution/open methods.
Dynamic object creation followed by execution-like calls is a staged script malware pattern, especially when hidden inside a file with a document extension.
HTML base64 payload medium HTML_LONG_BASE64_SCRIPT_PAYLOAD
HTML script contains a long base64-like blob.
Long encoded blobs in script are commonly used for HTML smuggling or for staging a second payload while avoiding simple content filters.
HTML obfuscated string builder medium HTML_OBFUSCATED_STRING_BUILDER
HTML script repeatedly builds strings from small fragments.
Heavy string-fragment construction hides object names, commands, URLs, and payloads from static scanners and is common in malicious script.

Machine Learning 1

Nyx PDF Classifier flagged this file high ML_NYX_PDF_MALICIOUS
Gradient-boosted classifier scored the PDF above the suspicious threshold.
The Nyx classifier is a LightGBM model trained on byte-level structural features (keyword counts, filter histograms, entropy, object/stream balance) of hundreds of thousands of malicious and benign PDFs. Severity is graded by score: medium >= 0.25, high >= 0.5, critical >= 0.9. The model is complementary to the rule-based heuristics — it can catch families with no individual indicator that trips an explicit rule but whose overall shape matches the malicious training distribution.

Office Macros 7

Dangerous API name reassembled from split string literals critical OLE_VBA_SPLIT_KEYWORD_OBFUSCATION
VBA concatenates short string literals that reassemble a dangerous API/ProgID/LOLBin name (e.g. Scripting.FileSystemObject, WScript.Shell, powershell) appearing in no single literal.
Splitting an API or ProgID name across string concatenation is done only to evade keyword scanning; the rule keys on the reconstructed token rather than on concatenation density, so benign macro-heavy documents (which concatenate even more) are not flagged.
Obfuscated auto-exec VBA loader critical OLE_VBA_OBFUSCATED_AUTOEXEC_LOADER
Auto-exec VBA reconstructs strings with a heavy custom decoder and feeds them to a COM-instantiation or execution sink.
A numeric char-array decoder, repeated hex-string decode, dynamic CreateObject(decoder(...)), or many chained Replace() junk-token reconstructions — combined with an auto-exec entry point and a CreateObject/Shell/exec sink — is the obfuscated-loader shape used to keep indicators out of the macro source.
VBA downloads and writes a file to disk critical OLE_VBA_HTTP_DROP_EXEC
VBA reads an HTTP response body and writes it to disk (ADODB.Stream SaveToFile) — a download-drop dropper even when the COM ProgIDs are built dynamically.
Macro downloaders often hide the MSXML/ADODB ProgID in a variable so the URLDownloadToFile / ProgID keyword rules never fire. The .ResponseBody + .SaveToFile chain is a high-confidence indicator independent of those names.
VBA executes content staged in worksheet cells critical OLE_VBA_CELL_GETOBJECT_EXEC
VBA passes a worksheet cell/comment reference to GetObject and drives an Exec/Open/Run sink.
Malware hides the COM moniker and command in cell data (Range().Value / .NoteText) so the macro source carries no literal indicators; the GetObject(cell)+exec shape catches it regardless.
VBA injects an Excel-4 macro CALL to a download/exec API critical OLE_VBA_XLM_CALL_INJECTION
VBA writes Excel-4 (XLM) =CALL() formulas targeting urlmon URLDownloadToFile / Shell32 ShellExecute and runs them.
This VBA-to-XLM bridge downloads and executes a payload while keeping the API names out of normal VBA keyword scanning (the names are split / CHAR-built into cell formulas).
VBA project carries a recognised code-signing signature info VBA_SIGNED_TRUSTED
The VBA project is Authenticode-signed and the signer/issuer chain matches a recognised code-signing publisher or CA.
Informational positive signal. The signature is NOT yet verified to cover the current project bytes (MS-OVBA content hash), so it does not reduce the verdict — a future, content-bound check would let a trusted signature soften a capability-only verdict.
VBA project is signed but not by a recognised publisher info VBA_SIGNED_UNTRUSTED
The VBA project carries a digital signature, but the signer does not chain to a recognised publisher/CA (self-signed, unknown issuer, or unparseable).
A signature alone is not evidence of benignity — malware is routinely self-signed or signed with stolen certificates. Surfaced for analyst context only.

Polyglot 1

Executable/encoded overlay appended after image end critical IMAGE_TRAILING_OVERLAY_EXECUTABLE
A valid image is followed by a large appended overlay that is a second container (PE/ZIP/...) or a loader-delimited base64-encoded payload.
Image-stego-loaders ship a real lure image (often a 4K screenshot) with an encoded or raw payload appended after the image's logical end (JPEG EOI / PNG IEND / GIF trailer). The image renders normally while a loader (AutoIt/.NET, AsyncRAT/njRAT/DCRat family) extracts and runs the hidden PE. Benign images carry at most a few KB of structured slack, never a large executable/archive or a delimiter-marked base64 blob, so this overlay shape has no benign use.

Script 2

Obfuscated WSH script critical SCRIPT_WSH_OBFUSCATED
Windows Script Host content includes execution or obfuscation indicators.
WSH script combined with repeated string construction, encoded blobs, or shell/COM execution terms is a strong indicator of script malware.
Windows Script Host masquerade high SCRIPT_WSH_MASQUERADE
File contains Windows Script Host code while masquerading as a document.
WScript/CScript content can execute directly on Windows. When it is submitted with a document-like name or extension, it is a common attachment masquerade pattern.

Social Engineering 24

Fake CAPTCHA with command-running instructions critical SE_FAKE_CAPTCHA_CLICKFIX
Document combines fake CAPTCHA language with instructions to paste or run a command.
The rule requires both a CAPTCHA or human-verification frame and a command-running step such as Win+R, Ctrl+V, PowerShell, cmd, mshta, or similar. This is a high-confidence ClickFix pattern rather than a generic CAPTCHA phrase.
Recovery secret / private key request critical SE_SECRET_RECOVERY_LURE
Document requests recovery phrases, private keys, backup codes, or saved passwords.
The rule matches requests for seed phrases, private keys, backup codes, or saved passwords. These are recovery secrets; asking for them inside a document is a high-risk signal.
Advance-fee lottery/parcel scam high SE_ADVANCE_FEE_SCAM_LURE
Document contains lottery/beneficiary, large-value funds, and claim/contact/payment instructions.
The rule requires multiple independent fraud-letter cues: beneficiary or lottery/prize language, large-value draft/funds wording, and claim, contact, payment-bureau, or courier instructions. This is the classic advance-fee scam shape and is stronger than generic prize wording alone.
Browser extension / update installation lure high SE_BROWSER_INSTALL_LURE
Document tells the user to install a browser extension, plugin, viewer, or update.
The rule matches instructions to install a browser extension, plugin, viewer, or browser update to read the document. This is high-risk in an unsolicited file because fake updates and extensions are common malware delivery paths.
ClickFix social engineering attack high SE_CLICKFIX
Document instructs the user to press Win+R and paste or run a command.
ClickFix lures ask the user to open the Run dialog or a shell and paste a command, usually PowerShell or cmd. The rule looks for those instructions in document text. It is high-risk because the command is supplied by the lure, not because an exploit is present.
Clipboard command execution lure high SE_CLIPBOARD_COMMAND_LURE
Document tells the user to copy or paste clipboard content into a command context.
The rule matches clipboard, copy, paste, or Ctrl+V instructions near Run, PowerShell, cmd, terminal, mshta, regsvr32, or similar execution contexts. That combination is uncommon in ordinary documents and is typical of ClickFix-style social engineering.
Fake CAPTCHA / human verification prompt high SE_FAKE_CAPTCHA
Document displays a fake CAPTCHA or robot-verification prompt.
The rule matches CAPTCHA or human-verification phrases ('verify you are not a robot', 'complete the verification', etc.) in document text. In malicious lures this is typically the framing for a ClickFix-style step that asks the user to paste a command, but the rule fires on the verification language alone.
Fake browser/security check with command step high SE_FAKE_BROWSER_SECURITY_CHECK
Document combines browser/security-check language with instructions to run a command.
The rule looks for browser check, connection verification, or security verification language together with a command-running step. This catches fake browser checks that use the same ClickFix workflow without saying CAPTCHA.
Invoice remittance address uses free webmail high SE_INVOICE_FREE_WEBMAIL_REMITTANCE
Invoice/payment document routes remittance contact through a consumer webmail domain.
Legitimate vendor invoices can include payment instructions, but a bank-transfer or remittance workflow that points to generic consumer webmail (for example gmail.com, outlook.com, mail.com, or post.com) is a strong business-email-compromise indicator, especially when the document impersonates a named organisation.
LOLBin token sequence in document text high SE_LOLBIN_RUN_COMMAND
Extracted document text contains a Windows execution tool name within 220 characters of a dangerous flag, command verb, or URL.
The rule matches the name of a script/execution tool (PowerShell, cmd, mshta, rundll32, regsvr32, wscript, cscript, certutil, bitsadmin, curl, wget) within 220 characters of a dangerous flag (-enc, downloadstring, iex, /i:, javascript:, vbscript:) or a URL. This catches two distinct shapes: (1) a visible 'run this' instruction in HTML/PDF/RTF lure bodies, where the matched span really is the command a victim is asked to run; and (2) macro-laden Office files where the macro's own string-pool entries (CreateObject names, action verbs, payload URLs) end up adjacent in the extracted text. The detail field shows the head and tail of the matched span so an analyst can tell which case applies.
MFA / one-time-code harvesting lure high SE_MFA_LURE
Document asks for an MFA, OTP, authenticator, or one-time passcode action.
The rule matches requests for MFA, OTP, authenticator, one-time code, or push approval actions. Documents that ask for these actions should be reviewed carefully, especially when combined with credential or account language.
Password-protected archive handoff high SE_PASSWORD_ARCHIVE_LURE
Document gives password instructions for an archive or attachment.
The rule matches password instructions near archive or attachment language. Encrypted archives are often used to keep gateway scanners from inspecting the payload before the recipient extracts it.
Payment redirection / bank-detail change lure high SE_PAYMENT_REDIRECT_LURE
Document describes new or changed bank, wire, ACH, IBAN, SWIFT, or routing instructions.
The rule matches text about changed bank, wire, ACH, IBAN, SWIFT, or routing details. This is a high-value business email compromise pattern, but it still needs business-context review.
Remote-support code/control handoff high SE_REMOTE_SUPPORT_CODE_LURE
Document asks the user to share a support code or allow remote control.
The rule matches remote-support tools such as AnyDesk, TeamViewer, Quick Assist, ScreenConnect, or Splashtop near requests for a session code, support code, connection ID, or permission to control the machine.
Remote-support tool lure high SE_REMOTE_SUPPORT_LURE
Document instructs the user to install or open remote-support software.
The rule matches instructions to install or open remote-support tools such as AnyDesk, TeamViewer, Quick Assist, ScreenConnect, or Splashtop. This is high-risk in an unsolicited document because it can lead to interactive control of the machine.
Security software disable instruction high SE_SECURITY_BYPASS
Document instructs the user to disable antivirus or security software.
The rule matches instructions to disable antivirus, security tools, or protections before opening content. That request is unusual for ordinary documents and should be treated as high-risk.
Callback phishing phone lure medium SE_CALLBACK_LURE
Document asks the user to call a phone number in a finance or security context.
The rule matches phone-call instructions in finance, renewal, refund, fraud, or account-security contexts. Callback phishing commonly starts this way, but some legitimate notices also use phone numbers.
Cloud document impersonation lure medium SE_CLOUD_DOC_LURE
Document impersonates a cloud file-sharing or collaboration service.
The rule matches cloud-file service names such as SharePoint, OneDrive, Google Drive, Dropbox, Box, Teams, or Microsoft 365 in a verify-to-view or secure-document context. Real sharing workflows can look similar, so use this as supporting evidence.
Document signing service impersonation lure medium SE_DOCUSIGN_LURE
Document impersonates DocuSign, Adobe Sign, or similar service to lure users.
The rule matches visible references to e-signature services such as DocuSign or Adobe Sign in a signing-request context. This is common in phishing, but real signature workflows can use the same names, so the service reference is supporting evidence rather than proof.
Macro/content-enable lure medium SE_ENABLE_LURE
Document instructs the user to enable macros or editing.
Macro malware often uses fake preview text or security-warning language to push the user toward 'Enable Content' or 'Enable Macros'. This finding means the document contains that kind of instruction; it does not prove the macros are malicious by itself.
QR-code redirect lure medium SE_QR_LURE
Document instructs the user to scan a QR code, likely as an off-band phishing vector.
The rule matches instructions to scan a QR code to view, verify, or access a document. QR codes are common in legitimate material, but these access-oriented phrases are useful supporting evidence for QR phishing.
Invoice / payment language low SE_INVOICE_LURE
Document contains invoice or payment language paired with an action instruction.
The rule matches invoice or payment language paired with an action such as open, download, review, or click. Genuine invoices use the same vocabulary, so this finding is mainly useful when paired with macro, link, or attachment indicators.
Urgency / deadline lure low SE_URGENCY_LURE
Document contains urgency or deadline language to pressure the user into acting.
The rule matches deadline and account-pressure phrases such as 'final notice' or 'action required within 24 hours'. This language is common in both phishing and legitimate billing, legal, and account notices, so it is low-signal on its own.
Visual download / call-to-action button lure low SE_DOWNLOAD_BUTTON
Document contains a call-to-action phrase ('Click here to download', etc.).
The rule matches call-to-action text such as 'Download Now' or 'Open Document'. That wording is common in manuals and setup guides, so this is low-signal unless other findings point to a malicious workflow.