Efficient clustering of MASSCAN results

MASSCAN is an incredibly fast TCP port scanner, that with the right equipment, can scan the entire Internet in under five  minutes.

MASSCAN does a wonderful job at scanning the entire internet randomly, but it doesn't cluster its results by port, which is a feature I need.

TL;DR: I wrote MASSCAN-Cluster to classify the results efficiently.

Motivation

When scanning multiple ports, MASSCAN dumps all the data into one file. It can also rotate the output file by time or size. In order to efficiently process results and extract banners, I need to classify each results by port.

The naive approach would be to read each result file, classify,  and run the proper banner extraction / DPI algorithm.

My approach is to classify the results on the fly, to avoid unnecessary  I/O and waste of CPU cycles.

How does it work?

  1. Configures MASSCAN to write all its output to one output file.
  2. Creates a named pipe as MASSCAN's output-file.
  3. Runs MASSCAN & a script that feeds of the named pipe, classifies the results by port and rotates them.