This document explains the concepts behind the port scanner: what a port is, how TCP connections are established, what banner grabbing reveals, and how threading changes the performance profile of network I/O. It is written for anyone who wants to understand what the code is actually doing.
Only scan hosts you own or have explicit written permission to scan. Unauthorized port scanning may be illegal in your jurisdiction.
The OSI (Open Systems Interconnection) model is a seven-layer framework that describes how data travels from one machine to another. Each layer has a specific job, and they stack on top of each other. Port scanning operates at layers 3 and 4 — it is concerned with which host and which service on that host.
| Layer 7 | Application | HTTP, SSH, FTP — what apps speak |
| Layer 6 | Presentation | Encoding, encryption (TLS) |
| Layer 5 | Session | Managing connection state |
| Layer 4 | Transport ✦ | TCP and UDP — ports live here |
| Layer 3 | Network ✦ | IP addresses, routing between hosts |
| Layer 2 | Data Link | MAC addresses, switches, frames |
| Layer 1 | Physical | Cables, radio, light pulses |
The scanner uses Python's socket library, which sits at the boundary
of layers 3 and 4. It asks the OS to attempt a TCP connection to a given IP
address and port number. The OS handles everything below — routing, ARP,
physical transmission. The scanner only cares about whether the connection
succeeded or not.
TCP (Transmission Control Protocol, defined in RFC 793) is a connection-oriented protocol. Before any data flows, both sides must agree to talk. This agreement is called the three-way handshake.
The TCP three-way handshake. A port is "open" if the target replies with SYN-ACK.
When the scanner calls sock.connect_ex((host, port)), the OS sends
a SYN packet. If the target port has a service listening, it replies with SYN-ACK.
The OS completes the handshake by sending ACK, and connect_ex
returns 0 — meaning open. The entire handshake is invisible to
the Python code; the socket library handles it.
A port can respond in three distinct ways, and each tells us something different about the target.
| State | What happens on the wire | What it means |
|---|---|---|
| OPEN | Target replies SYN-ACK. connect_ex returns 0. |
A service is actively listening on this port. |
| CLOSED | Target replies RST (reset). Connection refused immediately. | Nothing is listening, but the host is reachable. |
| FILTERED | No reply. The socket times out after N seconds. | A firewall is silently dropping packets — we cannot tell if anything is listening. |
The difference between closed and filtered is significant. A closed port actively refuses — the host is there, it just has nothing on that port. A filtered port gives no signal at all. This silence is the firewall speaking: it drops the packet without sending any reply, which is why filtered ports take much longer to scan (we have to wait for the full timeout).
There are two main approaches to port scanning. This scanner uses a TCP connect scan. Here is how they compare.
Left: TCP connect scan completes the handshake. Right: SYN scan tears down early with RST, avoiding the log entry.
A TCP connect scan completes the full three-way handshake. The target's service logs see a real connection. This scanner uses connect scans because they require no special privileges — any user can open a socket.
A SYN scan (also called a half-open scan) sends a SYN,
receives the SYN-ACK, and then immediately sends a RST to tear down the
connection before it completes. Because the handshake never finishes, many
applications and older firewalls do not log it. The tradeoff: crafting raw
packets requires root/administrator privileges and a raw socket, which Python's
socket library cannot do without elevation.
Once we know a port is open, we want to know what is listening. Many services announce themselves immediately after a connection is established — this announcement is called a banner.
SSH announces itself immediately. HTTP requires a probe request before responding.
Services like SSH, FTP, and SMTP send a greeting as soon as a connection is
established. SSH will immediately say SSH-2.0-OpenSSH_8.9 — which
tells us the protocol version and server software in one line.
HTTP is a request-response protocol: it waits for a request before responding.
The scanner sends a minimal HEAD / HTTP/1.0 probe to elicit a
response. The reply — HTTP/1.1 200 OK along with server headers —
reveals the web server and sometimes its version.
Service identification works in two stages. First, we scan the banner for known
signature strings. If the banner contains SSH, the service is SSH.
If no signature matches, we fall back to the IANA port number registry — port
3306 is almost certainly MySQL, port 5432 is almost certainly PostgreSQL. This
is not foolproof (any service can listen on any port) but it is accurate for
standard deployments.
Scanning 1,024 ports sequentially — one at a time — would take minutes. Most of that time is spent waiting: waiting for the network packet to travel, waiting for the target to reply, waiting for a timeout to expire. The CPU is idle for nearly all of this.
Threads let us wait for many ports simultaneously. The bottleneck is network latency, not CPU.
Threading works here because port scanning is I/O-bound, not
CPU-bound. Python's Global Interpreter Lock (GIL) prevents true
parallel CPU execution — but it releases the GIL during blocking I/O calls
like connect_ex. This means 100 threads can each be waiting for
a network response at the same time, even in Python.
The scanner controls concurrency with a threading.Semaphore.
This ensures no more than N threads are active simultaneously,
preventing the OS from running out of file descriptors (each open socket
consumes one) and preventing the target from seeing an unrealistic connection
storm.
Intrusion Detection Systems (IDS) look for port scan signatures: a single source IP attempting many connections in a short window. A rate limit — a small delay between launching each thread — stretches the scan over a longer time window, reducing the per-second connection rate and making the scan look less anomalous.
python -m scanner --target 192.168.1.1 --ports 1-65535 \ --threads 50 --rate-limit 0.05
This sends no more than 20 connection attempts per second instead of thousands. It is not invisibility — a determined analyst reviewing logs will still see the pattern — but it avoids triggering automated threshold alerts.
The scanner is structured as a Python package with four modules, each with a single responsibility.
Dependency flow. The CLI orchestrates the three modules; only core.py touches the network directly.
This separation means each module can be imported and tested independently.
core.py has no knowledge of output formats.
reporter.py has no knowledge of sockets. banner.py
only receives result dicts — it does not care how the scan was run.
This scanner performs TCP connect scans only. UDP scanning requires a different approach — UDP is connectionless, so there is no handshake to complete. A closed UDP port typically replies with an ICMP "port unreachable" message, but filtered ports produce no reply, making UDP results inherently ambiguous.
IPv6 is not explicitly supported. The socket code would need AF_INET6
and the CLI would need validation for IPv6 address formats.
The banner grabbing is shallow — it reads the first 1024 bytes and pattern-matches
against a small signature list. Full service version detection (like Nmap's
-sV) uses a much larger probe database and can identify thousands
of service versions.
UDP scan support, OS fingerprinting via TCP/IP stack analysis, HTML report output,
CIDR range scanning (e.g. 192.168.1.0/24), async I/O with
asyncio instead of threads, and a progress bar using
tqdm.
RFC 793 — Transmission Control Protocol. The original 1981 TCP specification. Section 3.4 describes the connection establishment procedure.
Nmap Reference Guide. Nmap is the gold standard in port scanning. Its documentation explains scan types, timing, and service detection in depth.
IANA Service Name and Port Number Registry. The authoritative list of port number assignments.
Python socket module documentation.
The stdlib module wrapping BSD sockets.