What is Shannons Entropy?

Well, Shannon’s Entropy is used for measuring the disarray of bits and information within a certain field. This is useful because modern encryption standards have a quirk called **diffusion** in which the algorithm scrambles the data its encrypting making analysis based attacks impossible. However, it does make it heavily known that the following data is encrypted.

*NOTE: This also goes for compression algorithms for some reason*

How does it work then?

Well the function it is defined by can go something like this:

ƒ(x) = -∑p(x_i)*log_2(p(x_i))

Where the `H(x)` represents as the overall entropy of the given input, `p(xi)` can be identified as the percent of that of unique character `xi` respect to the total input x. Also `log_2()` is used because the measure of bits usually goes in powers of 2.

#include <iostream>      // For input-output operations
#include <fstream>       // For file handling
#include <bits/stdc++.h> // Includes many standard libraries, primarily used here for mathematical functions
#include <string>        // For string manipulations

int main(int argc, char* argv[])
{
    // Check if exactly one argument (file path) is provided
    if(argc != 2)
    {
        std::cout << "[-] Usage: " << argv[0] << " {filepath}\n"; // Print usage message
        return 1; // Exit with error code
    }

    // Array to store frequency distribution of bytes and their values
    int distribution[256][2] = {0}; 
    double shannons = 0.00; // Variable to store Shannon entropy

    // Initialize the first column of the distribution array with byte values (0 to 255)
    for(int i = 0; i < 256; ++i)
    {
        distribution[i][0] = i;
    }

    // Open the specified file in binary mode
    std::ifstream file(argv[1], std::ios::binary);
    
    // Check if the file could not be opened
    if(!file.is_open())
    {
        std::cout << "[-] error opening file!\n"; // Print error message
        return 1; // Exit with error code
    }

    // Move the file pointer to the end to determine the file size
    file.seekg(0, std::ios::end);
    long length = file.tellg(); // Get the total file size
    file.seekg(0, std::ios::beg); // Reset the file pointer to the beginning

    char byte; // Variable to store each byte read from the file

    // Count the frequency of each byte in the file
    while(file.get(byte))
    {
        distribution[(int)(unsigned char)byte][1] += 1;
    }
    file.close(); // Close the file after reading

    // Print the total file size
    std::cout << "Length: " << length << "\n";

    // Calculate and display the entropy for each byte
    for(int (&x)[2] : distribution) // Loop through the distribution array
    {
        // Print the byte value, its frequency, and the entropy contribution
        std::cout << x[0] << ": " << x[1] << " --> " 
                  << -(x[1] / (double)length) * (log2((x[1] / (double)length))) << "\n";
        
        // Skip processing if the byte frequency is zero
        if(x[1] == 0)
        {
            continue;
        }
        
        // Accumulate the entropy for the byte
        shannons -= (x[1] / (double)length) * (log2((x[1] / (double)length)));
    }

    // Print the total Shannon entropy of the file
    std::cout << "Entropy: " << shannons << "\n";

    // Calculate and print the likeliness of packing based on Shannon entropy
    std::cout << "Likeliness of packing is: " << (shannons / 8.00) * 100 << "%\n"; 
    // 8.00 is used because 8 bits per byte is the maximum entropy value

    // Return 0 to indicate successful execution
    return 0;
}

What this does is it first gets the frequency of every byte in the file and then to a variable called `shannons` it subtracts the number that character is present in the file by the total file size which is multiplied by log base 2 of the original expression.


Check out more amazing work by LazyLearner on GitHub:
LazyLearner’s GitHub

But does it even work??

This is the output of the `hhh.exe` payload that I previously analyzed.

This is the output of the `hhh.exe` payload under zip compression

Lets Play a Game!!

Its called can you spot which file is encrypted with AES-256-CTR

This is the output of `file1` shown above.

This is the output of `file2` shown above.

If you guessed `file2` you guessed correct!

In conclusion, Shannon’s Entropy can be used to detect if a certain data is encrypted or compressed however it is not fool proof; using basic Xor cipher, Ceaser cipher, or Viginere cipher wont increase the entropy any higher

This is the output when every byte in `hhh.exe` was added by char ‘c’ (literally no difference)

This is the output when  `hhh.exe` was encrypted Viginere cipher with a 5 byte long key

(Gave a total of ~8% increase which is not a lot of diffusion compared to 60% by AES)

Code for viginere cipher encryption given down in below(very sloppily made but it works)

#include <iostream>      // For input-output operations
#include <fstream>       // For file handling
#include <bits/stdc++.h> // Includes many standard libraries, primarily used here for mathematical functions
#include <sstream>       // For reading entire file content into a stringstream
#include <string>        // For string manipulations

int main(int argc, char* argv[])
{
    // Simple "encryption key" (array of characters used to modify the file content)
    char code[] = {'h', 'e', 'l', '1', 'o'};
    
    // Check if the correct number of arguments is provided
    if(argc != 2)
    {
        std::cout << "[-] Usage: " << argv[0] << " {filepath}\n"; // Print usage message
        return 1; // Exit with error code
    }
    
    // Open the specified file in binary mode for reading
    std::ifstream file(argv[1], std::ios::binary);
    if(!file.is_open())
    {
        std::cout << "[-] error opening file!\n"; // Print error message
        return 1; // Exit with error code
    }
    
    // Read the entire file content into a string using a stringstream
    std::stringstream buffer;
    buffer << file.rdbuf(); // Read the file into the buffer
    std::string fileContent = buffer.str(); // Store the file content as a string
    file.close(); // Close the input file
    
    // Encrypt the file content using the "code" array
    for(int i = 0; i < fileContent.length(); ++i)
    {
        // Modify each character by adding the corresponding character from the "code" array
        // Use the modulo operator to cycle through the "code" array for long files
        fileContent[i] += code[i % sizeof(code)];
    }
    
    // Open a new file named "let" in binary mode for writing
    std::ofstream f0ile("let", std::ios::binary);
    f0ile << (fileContent); // Write the encrypted content to the file
    f0ile.close(); // Close the output file
}

This blog was written by LazyLearner.
Special thanks for the detailed code examples, clear explanations, and engaging content!

Check out more amazing work by LazyLearner on GitHub:
LazyLearner’s GitHub

OCSALY