Skip to content

Ilias1988/Magic-Bytes-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

🔮 The Magic Bytes Book

Magic Bytes Security Development

A comprehensive reference guide to file magic bytes (file signatures)

Identify file types by their binary signatures, not just extensions


📖 Introduction

Magic bytes (also known as file signatures or magic numbers) are unique sequences of bytes located at the beginning of a file that identify its format. Unlike file extensions, which can be easily changed or faked, magic bytes are embedded in the file's binary structure and provide a reliable method for file type identification.

Why Are Magic Bytes Important?

Use Case Description
🔒 Security Detect disguised malware (e.g., an .exe renamed to .jpg)
🔍 Digital Forensics Recover and identify files without extensions
🛡️ File Validation Verify uploaded files match their claimed type
🔧 Development Build robust file handling in applications
📁 Data Recovery Carve files from damaged storage media

📑 Table of Contents


🖼️ Images

Format Extension(s) Hex Signature ASCII/String Notes
JPEG .jpg, .jpeg FF D8 FF E0 ÿØÿà JFIF format
JPEG .jpg, .jpeg FF D8 FF E1 ÿØÿá EXIF format
JPEG .jpg, .jpeg FF D8 FF DB ÿØÿÛ Raw JPEG
PNG .png 89 50 4E 47 0D 0A 1A 0A ‰PNG.... Portable Network Graphics
GIF87a .gif 47 49 46 38 37 61 GIF87a Original GIF
GIF89a .gif 47 49 46 38 39 61 GIF89a GIF with animation support
BMP .bmp, .dib 42 4D BM Windows Bitmap
WebP .webp 52 49 46 46 ?? ?? ?? ?? 57 45 42 50 RIFF....WEBP Google WebP format
TIFF (LE) .tif, .tiff 49 49 2A 00 II*. Little-endian TIFF
TIFF (BE) .tif, .tiff 4D 4D 00 2A MM.* Big-endian TIFF
ICO .ico 00 00 01 00 .... Windows Icon
PSD .psd 38 42 50 53 8BPS Adobe Photoshop

🎬 Video & Audio

Format Extension(s) Hex Signature ASCII/String Notes
MP4 .mp4, .m4v, .m4a 00 00 00 ?? 66 74 79 70 ....ftyp MPEG-4 container
MP4 (isom) .mp4 00 00 00 ?? 66 74 79 70 69 73 6F 6D ....ftypisom ISO Base Media
MP4 (M4A) .m4a 00 00 00 ?? 66 74 79 70 4D 34 41 20 ....ftypM4A Apple Audio
MP3 .mp3 FF FB ÿû MPEG Audio Layer III
MP3 .mp3 FF F3 ÿó MPEG Audio Layer III
MP3 .mp3 FF F2 ÿò MPEG Audio Layer III
MP3 (ID3v2) .mp3 49 44 33 ID3 MP3 with ID3v2 tag
WAV .wav 52 49 46 46 ?? ?? ?? ?? 57 41 56 45 RIFF....WAVE Waveform Audio
AVI .avi 52 49 46 46 ?? ?? ?? ?? 41 56 49 20 RIFF....AVI Audio Video Interleave
MKV .mkv, .webm 1A 45 DF A3 .EᥠMatroska container
FLV .flv 46 4C 56 01 FLV. Flash Video
FLAC .flac 66 4C 61 43 fLaC Free Lossless Audio Codec
OGG .ogg, .oga, .ogv 4F 67 67 53 OggS Ogg container
WMV/WMA .wmv, .wma, .asf 30 26 B2 75 8E 66 CF 11 0&²u.fÏ. Windows Media
MIDI .mid, .midi 4D 54 68 64 MThd Musical Instrument Digital
MOV .mov, .qt 00 00 00 ?? 66 74 79 70 71 74 20 20 ....ftypqt QuickTime Movie

📄 Documents

Format Extension(s) Hex Signature ASCII/String Notes
PDF .pdf 25 50 44 46 2D %PDF- Portable Document Format
DOCX .docx 50 4B 03 04 PK.. Word (Office Open XML)
XLSX .xlsx 50 4B 03 04 PK.. Excel (Office Open XML)
PPTX .pptx 50 4B 03 04 PK.. PowerPoint (Office Open XML)
DOC .doc D0 CF 11 E0 A1 B1 1A E1 ÐÏ.ð±.á Word (OLE Compound)
XLS .xls D0 CF 11 E0 A1 B1 1A E1 ÐÏ.ð±.á Excel (OLE Compound)
PPT .ppt D0 CF 11 E0 A1 B1 1A E1 ÐÏ.ð±.á PowerPoint (OLE Compound)
RTF .rtf 7B 5C 72 74 66 31 {\rtf1 Rich Text Format
ODT .odt 50 4B 03 04 PK.. OpenDocument Text
ODS .ods 50 4B 03 04 PK.. OpenDocument Spreadsheet
EPUB .epub 50 4B 03 04 PK.. Electronic Publication
XML .xml 3C 3F 78 6D 6C 20 <?xml Extensible Markup Language
HTML .html, .htm 3C 21 44 4F 43 54 59 50 45 <!DOCTYPE HTML Document

⚠️ Note: DOCX, XLSX, PPTX, ODT, ODS, and EPUB all share the same magic bytes (50 4B 03 04) because they are ZIP-based archives. To differentiate them, you need to examine the archive contents (e.g., [Content_Types].xml for Office formats).


📦 Archives

Format Extension(s) Hex Signature ASCII/String Notes
ZIP .zip 50 4B 03 04 PK.. Standard ZIP archive
ZIP (empty) .zip 50 4B 05 06 PK.. Empty ZIP archive
ZIP (spanned) .zip 50 4B 07 08 PK.. Spanned ZIP archive
RAR v1.5+ .rar 52 61 72 21 1A 07 00 Rar!... RAR archive v1.5-4.x
RAR v5.0+ .rar 52 61 72 21 1A 07 01 00 Rar!.... RAR archive v5.0+
7-Zip .7z 37 7A BC AF 27 1C 7z¼¯'. 7-Zip archive
GZIP .gz, .tar.gz 1F 8B 08 ... GNU Zip
TAR .tar 75 73 74 61 72 (offset 257) ustar Tape Archive
BZIP2 .bz2 42 5A 68 BZh BZIP2 compressed
XZ .xz FD 37 7A 58 5A 00 ý7zXZ. XZ compressed
ZSTD .zst 28 B5 2F FD (µ/ý Zstandard compressed
LZ4 .lz4 04 22 4D 18 ."M. LZ4 compressed
CAB .cab 4D 53 43 46 MSCF Microsoft Cabinet
ISO .iso 43 44 30 30 31 (offset 32769) CD001 ISO 9660 image

⚙️ Executables

Format Extension(s) Hex Signature ASCII/String Notes
EXE/DLL (MZ) .exe, .dll, .sys 4D 5A MZ DOS/Windows Executable
ELF (none), .so, .o 7F 45 4C 46 .ELF Linux/Unix Executable
Mach-O (32-bit) (none), .dylib FE ED FA CE þíúÎ macOS Executable (32-bit)
Mach-O (64-bit) (none), .dylib FE ED FA CF þíúÏ macOS Executable (64-bit)
Mach-O (Universal) (none), .dylib CA FE BA BE Êþº¾ macOS Universal Binary
Java Class .class CA FE BA BE Êþº¾ Java bytecode
DEX .dex 64 65 78 0A 30 33 35 00 dex.035. Android Dalvik Executable
WebAssembly .wasm 00 61 73 6D .asm WebAssembly binary
COM .com No standard signature
Python Bytecode .pyc Varies by version Version-dependent magic
Shell Script .sh 23 21 #! Shebang (e.g., #!/bin/bash)

💡 Tip: Java Class files and Mach-O Universal binaries share the same magic bytes (CA FE BA BE). Context (file extension, platform) is needed to differentiate them.


🐍 Python Usage Example

Here's a practical Python script to detect file types using magic bytes:

#!/usr/bin/env python3
"""
Magic Bytes File Type Detector
Identify file types by their binary signatures.
"""

# Dictionary of file signatures (magic bytes)
MAGIC_SIGNATURES = {
    # Images
    b'\xFF\xD8\xFF': 'JPEG Image',
    b'\x89PNG\r\n\x1a\n': 'PNG Image',
    b'GIF87a': 'GIF Image (87a)',
    b'GIF89a': 'GIF Image (89a)',
    b'BM': 'BMP Image',
    b'RIFF': 'RIFF Container (WAV/AVI/WebP)',
    
    # Documents
    b'%PDF-': 'PDF Document',
    b'PK\x03\x04': 'ZIP Archive / Office Document (DOCX/XLSX/PPTX)',
    b'\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1': 'Microsoft Office (DOC/XLS/PPT)',
    b'{\\rtf1': 'RTF Document',
    
    # Archives
    b'Rar!\x1a\x07\x00': 'RAR Archive (v1.5-4.x)',
    b'Rar!\x1a\x07\x01\x00': 'RAR Archive (v5.0+)',
    b'7z\xBC\xAF\x27\x1C': '7-Zip Archive',
    b'\x1F\x8B\x08': 'GZIP Archive',
    b'BZh': 'BZIP2 Archive',
    
    # Executables
    b'MZ': 'Windows Executable (EXE/DLL)',
    b'\x7FELF': 'ELF Executable (Linux)',
    b'\xFE\xED\xFA\xCE': 'Mach-O Executable (32-bit)',
    b'\xFE\xED\xFA\xCF': 'Mach-O Executable (64-bit)',
    b'\xCA\xFE\xBA\xBE': 'Java Class / Mach-O Universal',
    
    # Audio/Video
    b'ID3': 'MP3 Audio (ID3 tag)',
    b'\xFF\xFB': 'MP3 Audio',
    b'\xFF\xF3': 'MP3 Audio',
    b'fLaC': 'FLAC Audio',
    b'OggS': 'OGG Container',
    b'\x1AE\xDF\xA3': 'Matroska Video (MKV/WebM)',
}


def identify_file(filepath: str, read_bytes: int = 32) -> str:
    """
    Identify a file's type by reading its magic bytes.
    
    Args:
        filepath: Path to the file to identify
        read_bytes: Number of bytes to read from the start (default: 32)
    
    Returns:
        String describing the detected file type
    """
    try:
        with open(filepath, 'rb') as f:
            header = f.read(read_bytes)
        
        if not header:
            return "Empty file"
        
        # Check against known signatures
        for signature, file_type in MAGIC_SIGNATURES.items():
            if header.startswith(signature):
                return file_type
        
        # Special case: MP4 and MOV files (ftyp at offset 4)
        if b'ftyp' in header[:12]:
            return "MP4/MOV Video"
        
        # Special case: Check for text/script files
        if header.startswith(b'#!'):
            return "Shell Script"
        if header.startswith(b'<?xml'):
            return "XML Document"
        if header.startswith(b'<!DOCTYPE') or header.startswith(b'<html'):
            return "HTML Document"
        
        return "Unknown file type"
        
    except FileNotFoundError:
        return f"Error: File '{filepath}' not found"
    except PermissionError:
        return f"Error: Permission denied for '{filepath}'"
    except Exception as e:
        return f"Error: {str(e)}"


def print_hex_dump(filepath: str, num_bytes: int = 16) -> None:
    """
    Print a hex dump of the first N bytes of a file.
    
    Args:
        filepath: Path to the file
        num_bytes: Number of bytes to display (default: 16)
    """
    try:
        with open(filepath, 'rb') as f:
            data = f.read(num_bytes)
        
        hex_str = ' '.join(f'{b:02X}' for b in data)
        ascii_str = ''.join(chr(b) if 32 <= b < 127 else '.' for b in data)
        
        print(f"\n📁 File: {filepath}")
        print(f"🔢 Hex:   {hex_str}")
        print(f"📝 ASCII: {ascii_str}")
        print(f"🎯 Type:  {identify_file(filepath)}")
        
    except Exception as e:
        print(f"Error: {str(e)}")


# Example usage
if __name__ == "__main__":
    import sys
    
    if len(sys.argv) > 1:
        # Check files provided as command-line arguments
        for filepath in sys.argv[1:]:
            print_hex_dump(filepath)
            print("-" * 50)
    else:
        # Demo with a test message
        print("🔮 Magic Bytes File Type Detector")
        print("=" * 40)
        print("\nUsage: python magic_bytes.py <file1> [file2] ...")
        print("\nExample:")
        print("  python magic_bytes.py image.jpg document.pdf archive.zip")
        print("\nOr import and use in your code:")
        print('  from magic_bytes import identify_file')
        print('  file_type = identify_file("myfile.bin")')

Quick One-Liner

# Quick check for file signature
with open('file.bin', 'rb') as f: print(' '.join(f'{b:02X}' for b in f.read(16)))

Output Example

📁 File: example.jpg
🔢 Hex:   FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01
📝 ASCII: ÿØÿà..JFIF......
🎯 Type:  JPEG Image

📚 Resources


🤝 Contributing

Contributions are welcome! If you'd like to add more file signatures or improve the documentation:

  1. Fork this repository
  2. Create a feature branch (git checkout -b add-new-signatures)
  3. Add your changes
  4. Submit a Pull Request

Please ensure any new signatures include:

  • Format name
  • Common file extensions
  • Verified hex signature
  • ASCII representation (if applicable)
  • Any relevant notes

⭐ Star this repo if you found it useful! ⭐

Made with ❤️ for the security & development community

Releases

No releases published

Packages

 
 
 

Contributors