Network programming in Linux

Transmission Control Protocol (TCP)


Introduction
Getting started with libpcap

Extracting Ethernet information
Internet Protocol (IP)

Filtering captured datagrams
Capturing datagrams offline

Address Resolution Protocol (ARP)
Internet Control Message Protocol (ICMP)
Transmission Control Protocol (TCP)
User Datagram Protocol (UDP)
Trivial File Transfer Protocol (TFTP)

Injecting datagrams with libnet
Implementing ping
Implementing traceroute

Download source code


The Transmission Control Protocol (TCP) is one of the core protocols of the Internet protocol suite. It's a connection oriented protocol providing reliable, ordered, error-checked delivery of data. Applications that do not require the reliability of a TCP connection may instead use the connectionless User Datagram Protocol (UDP), which emphasizes low-overhead operation and reduced latency rather than error checking and delivery assurance.

IP works by exchanging packets. Due to network congestion, traffic load balancing, or other unpredictable network behavior, IP packets can be lost, duplicated, or delivered out of order. TCP detects these problems, requests retransmission of lost data, rearranges out-of-order data, and even helps minimize network congestion to reduce occurrences of failures.

TCP is a reliable stream delivery service that guarantees all bytes received will be identical to bytes sent and in the correct order. Since packet transiting over many networks is not reliable, a technique known as positive acknowledgment with retransmission is used to guarantee reliability of packet transfers. This technique requires the receiver to respond with an acknowledgment message as it receives packets. The sender keeps a record of each packet it sends. The sender also maintains a timer from when the packet was sent, and retransmits that packet if the timer expires before it has been acknowledged by the receiver. The timer is needed in case a packet gets lost or corrupted, in which cases no receiver may notify the sender of the failure.

While IP handles actual delivery of packets, TCP guarantees delivery of its datagrams, called segments. TCP segments are encapsulated (i.e. transported as payload) within IP packets.

TCP ensures reliable and error-free exchange of data through connections between hosts. Key features are:

  • Ordered data transfer: sequencing of segments (using the Sequence number field) ensures data segments are rearranged upon reception in the same order they were sent.

  • Retransmission of lost segments: any segment not acknowledged by the receiver is retransmitted by the sender.

  • Throughput optimization: multiple consecutive segments are acknowledged with a single ACK in order to maximize data transfer rates.

  • Error detection: any erroneous segment is discarded and retransmitted.

  • Flow control: data transfer rate is limited to guarantee reliable delivery. The receiver continuously informs the sender of how much data it can handle at any moment.

  • Congestion control: reception buffer resizing prevents saturation of the receiver's network bandwidth.

The TCP segment format

TCP accepts data from the TCP/IP application layer, divides it into chunks, and adds a TCP header to each chunk, creating a sequence of TCP segments. Each TCP segment is then encapsulated into an IP packet and transmitted over the Internet.

The TCP header contains 8 mandatory fields (some of which are divided into sub fields), and an optional extension field:

TCP segment

The various TCP header fields are:

  • Source port: Identifies the sending port number.

  • Destination port: Identifies the receiving port number.

  • Sequence number: This field has two roles

    1. If the SYN flag is set (i.e. is 1), the field holds the initial sequence number. The sequence number of the first data byte and the acknowledgment number in the corresponding ACK are this sequence number plus 1.
    2. If the SYN flag is clear (i.e. is 0), the field holds the accumulated sequence number of the first data byte of this segment for the current session.
  • Acknowledgment number: If the ACK flag is set then the value in this field is the next sequence number that the receiver is expecting. It acknowledges reception of all prior bytes (if any). The first ACK sent by each end acknowledges the other end's initial sequence number itself, but no data.

  • Offset and flags: These two bytes are divided into multiple components. In the following figure, each block represents a single bit:

    Offset + flags field

    • Data offset (4 bits): Specifies the size of the TCP header in 32-bit words. The minimum size header is 5 words and the maximum is 15 words, thus giving the minimum size of 20 bytes and maximum of 60 bytes, allowing for up to 40 bytes of options in the header.

    • Reserved (3 bits): Bits reserved for future use; should all be cleared (i.e. set to 0).

    • Flags (also called Control bits): The TCP header holds 9 control bits:

      1. NS: The Nonce Sum flag is used in Explicit Congestion Notification to protect against accidental or malicious concealment of marked segments from the TCP sender. The two following flags are also part of Explicit Congestion Notification implementation.

      2. CWR: The Congestion Window Reduced flag is used to signal the sender to reduce the amount of information it sends .

      3. ECE: The ECN-Echo flag is used to acknowledge congestion-indication echoing was received.

      4. URG: The Urgent flag indicates that the Urgent pointer field is significant.

      5. ACK: The Acknowledgment flag indicates that the Acknowledgment field is significant. All segments after the initial SYN segment sent by the client should have this flag set.

      6. PSH (Push function): Asks to push the buffered data to the receiving application.

      7. RST: The Reset flag is used to force connection shutdown.

      8. SYN: The Synchronize flag is used to initiate a new TCP connection. Only the first segment sent from each end should have the SYN flag set. Some other flags change meaning based on this flag, some are only valid when it is set, and others when it is clear.

      9. FIN: This flag is used by the sender to indicate it has no more data to send, thus ending the connection. The receiver also uses this flag, along with the ACK flag, to accept connection shutdown.

  • Window size: This field gives the size of the receiver's window, which specifies the number of bytes, beyond the sequence number in the acknowledgment field, that the receivert is currently willing to receive. This field is used for flow control within connections.

  • Checksum: The 16-bit checksum field is used for error-checking of header and data.

  • Urgent pointer: If the URG flag is set, this field is an offset from the sequence number indicating the last urgent data byte.

  • Options: This variable sized field, of length 0 to 40 bytes divisible by 4, may hold up to 10 successive options added to the TCP header for various purposes such as time stamping, window scaling, checksum control, etc.

Establishing a new TCP connection

To establish a connection, TCP uses a three-way handshake. Before a client host attempts to connect with a server host, the server must first bind to and listen at a port to open it up for connections: this is called a passive open. Once passive open is established by the server, a client may initiate an active open. To establish a connection, three-way (or 3-step) handshake occurs:

  1. The active open is performed by the client sending a SYN to the server. The client sets the segment's sequence number to a random value X.

  2. In response, the server replies with a SYN+ACK. The acknowledgment number is set to one more than the received sequence number i.e. X+1, and the sequence number that the server chooses for its segment is another random number, Y.

  3. Finally, the client sends an ACK back to the server. The sequence number is set to the received acknowledgement value i.e. X+1, and the acknowledgement number is set to one more than the received sequence number i.e. Y+1.

Connection establishment

At this point, both the client and server have received an acknowledgment of the connection. Steps 1 and 2 establish the connection parameter (sequence number) for one direction and is acknowledged. Steps 2 and 3 establish the connection parameter (sequence number) for the other direction and is acknowledged. Upon successful completion of the 3-way handshake, a full-duplex communication is established.

Closing an active TCP connection

The connection termination phase uses a four-way handshake, with each side of the connection closing it independently. When a host wants to end a TCP connection, it transmits a FIN, which the other end acknowledges with an ACK. Therefore, a typical tear-down requires a pair of FIN and ACK segments from each TCP endpoint. After both FIN/ACK exchanges are concluded, the side which sent the first FIN before receiving one waits for a timeout before finally closing the connection, during which time the local port is unavailable for new connections; this prevents confusion due to delayed segments being delivered during subsequent connections.

It is also possible to terminate the connection by a 3-way handshake, when host A sends a FIN and host B replies with a FIN+ACK (actually combining two steps into one); host A replies with an ACK. This is perhaps the most common method.

Finally, a connection may be abruptly closed by one host sending a RST segment to the other. The initiating host does not expect an ACK in return, nor the receiving host need to go through a FIN/ACK exchange to close its end of the connection. RST segments are sometimes used by firewalls to end suspicious TCP connections or by hosts to actually refuse connection demands.

TCP connection termination

Efficient acknowledgment strategy

The acknowledgement mechanism is at the heart of TCP. Simply speaking, when data arrives at the recipient, the protocol requires it sends back an acknowledgement of this data. The protocol specifies that the bytes of data are sequentially numbered (starting with a randomly selected initial number), so that the recipient can acknowledge data by specifying the highest numbered byte of data it has received, which also acknowledges all previous bytes.

This acknowledgement strategy (one ACK acknowledging multiple received data segments) maximizes data transfer throughput by reducing time and bandwidth dedicated to acknowledgments. The following figure clearly illustrates the benefits of acknowledging multiple segments at once. The left diagram shows each received segment being acknowledged individually, which results in poor overall throughput compared to the right diagram where a single ACK acknowledges multiple received segments:

Acknowledgment strategy

When a segment gets lost, the receiver acknowledges the last consecutive segments it has received, forcing the sender to restart sending segments from the lost one:

Lost segments

The sliding window mechanism

The window mechanism is a flow control tool. Whenever appropriate, the recipient of data returns to the sender a number (in the Window size field), indicating the remaining buffer size the receiver currently has available for additional data. This number of bytes, called the window, is the maximum number of bytes which the sender is permitted to transmit until the receiver returns some additional window information.

The clearest way to illustrate the sliding window mechanism is by classifying sender's segment data in four categories from the receiver's point of view:

  1. Bytes received and acknowledged: the segment data received and which the receiver has acknowledged.

  2. Bytes received but not yet acknowledged: the receiver has received this segment data but has not yet sent an acknowledgment for, as depicted in the previous figure (e.g. bytes in segments X+7, X+8 and X+9).

  3. Bytes not received but for which the receiver is ready: the receiver has not received this segment data yet, but it has buffer space available to store these data bytes until acknowledgment. It essentially tells the sender how much data it may still send without overflooding the receiver.

  4. Bytes not received and for which the receiver is not ready: the sender cannot send this segment data because the receiver has no space to buffer it.

The following figure illustrates these four categories of data bytes in the data stream:

Sliding window

The four data categories are positioned according to a TCP sliding window which may hold 15 bytes (in real implementations the window size is much larger; it's kept small here to simplify the example). While the receiver processes received bytes, it conceptually makes its window slide over the data stream as it acknowledges the received bytes. Since the receiver's window size is passed along to the sender over TCP's Window size field, the sender is also aware of the receiver's window size, keeping track of how much more data the receiver may accept without overflooding its buffer.

Since the window size can be used to manage the rate at which data flows between devices at the ends of the connection, it is the method by which TCP implements flow control, one of the “classical” jobs of the transport layer. Flow control is vitally important to TCP, as it is the method by which devices communicate their status to each other. By reducing or increasing window size, the sender and receiver each ensure that the other device sends data just as fast as the receiver can process it.

Sometimes, the receiver will have no buffer space available, and will return a window value of zero. Under these circumstances, the protocol requires the sender to send a small segment to the receiver now and then, to see if more data may be accepted. If the window remains closed at zero for some substantial period and the sender obtains no response from the receiver, the protocol requires the sender to conclude that the receiver has failed, and close the connection abruptly with a RST segment.

The TCPSegment class

The TCPSegment class maps raw captured bytes (stored in a Datagram instance) into corresponding TCP header fields; it is therefore derived from the DatagramFragment class.

tcpsegment.h
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
#ifndef TCPSEGMENT_H
#define TCPSEGMENT_H

#include <iostream>

#include "datagramfragment.h"   // DatagramFragment

using namespace std;

/* TCPSegment: class mapping the inherited data block as an TCP segment.
 *
 * Attributes
 *   p_data (inherited) : array of bytes
 *   p_len (inherited)  : size of p_data
 *
 * Notes
 *   1. the data block referenced by p_data may not be owned by the instance
 *      but instead owned by a Datagram instance which shares its data with
 *      instances of classes derived from DatagramFragment, including this
 *      class.
 */
class TCPSegment : public DatagramFragment {
  public:
    TCPSegment(bool = false);                             // default constructor
    TCPSegment(bool, unsigned char *, unsigned int);      // parameterized constructor
    
    unsigned int header_length() const;                   // length of TCP segment header in bytes

    unsigned int source_port() const;                     // source port
    unsigned int destination_port() const;                // destination port

    unsigned int sequence_nb() const;                     // access to sequence number field
    unsigned int ack_nb() const;                          // access to acknowledgment number field
        
    unsigned int offset() const;                          // access to data offset field
    unsigned int reserved() const;                        // access to reserve field
        
    bool flag_ns() const;                                 // access to NS flag
    bool flag_cwr() const;                                // access to CWR flag
    bool flag_ece() const;                                // access to ECE flag
    bool flag_urg() const;                                // access to URG flag
    bool flag_ack() const;                                // access to ACK flag
    bool flag_psh() const;                                // access to PSH flag
    bool flag_rst() const;                                // access to RST flag
    bool flag_syn() const;                                // access to SYN flag
    bool flag_fin() const;                                // access to FIN flag
        
    unsigned int window_size() const;                     // access to window size field
    unsigned int checksum() const;                        // access to checksum field
    unsigned int pointer_urg() const;                     // access to urgent pointer field

    // Operator overloads
    friend ostream & operator<<(ostream &, const TCPSegment &);
    
  protected:
    
    // Returns a string textually identifying standard ports
    const char * port_name(unsigned int) const;
};

#endif

And here are the member definitions for TCPSegment:

tcpsegment.cpp
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
097
098
099
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
#ifndef TCPSEGMENT_CPP
#define TCPSEGMENT_CPP

#include "tcpsegment.h"

// Default constructor
TCPSegment::TCPSegment(bool owned) : DatagramFragment(owned) {
}

// Parameterized constructor
TCPSegment::TCPSegment(bool owned, unsigned char * s, unsigned int l) : DatagramFragment(owned, s, l) {
}

// Returns the TCP segment header length in bytes
unsigned int TCPSegment::header_length() const {
  return offset() * 4;        // take into account possible options
}

// Returns source port
unsigned int TCPSegment::source_port() const {
  return char2word(p_data);
}

// Returns destination port
unsigned int TCPSegment::destination_port() const {
  return char2word(p_data+2);
}

// Returns the sequence number field
unsigned int TCPSegment::sequence_nb() const {
  return char4word(p_data+4);
}

// Returns the acknowledgment number field
unsigned int TCPSegment::ack_nb() const {
  return char4word(p_data+8);
}

// Returns the data offset field (header length in 4-bytes words)    
unsigned int TCPSegment::offset() const {
  return p_data[12] >> 4;
}

// Returns the reserved field
unsigned int TCPSegment::reserved() const {
  return (char2word(p_data+12) & 0x0FC0) >> 6;
}

// Returns the NS flag value
bool TCPSegment::flag_ns() const {
  return p_data[12] & 0x01	;
}

// Returns the CWR flag value
bool TCPSegment::flag_cwr() const {
  return p_data[13] & 0x80;
}

// Returns the ECE flag value
bool TCPSegment::flag_ece() const {
  return p_data[13] & 0x40;
}

// Returns the URG flag value
bool TCPSegment::flag_urg() const {
  return p_data[13] & 0x20;
}

// Returns the ACK flag value
bool TCPSegment::flag_ack() const {
  return p_data[13] & 0x10;
}

// Returns the PSH flag value
bool TCPSegment::flag_psh() const {
  return p_data[13] & 0x08;
}

// Returns the RST flag value
bool TCPSegment::flag_rst() const {
  return p_data[13] & 0x04;
}

// Returns the SYN flag value
bool TCPSegment::flag_syn() const {
  return p_data[13] & 0x02;
}

// Returns the FIN flag value
bool TCPSegment::flag_fin() const {
  return p_data[13] & 0x01;
}

// Returns the window size field    
unsigned int TCPSegment::window_size() const {
  return char2word(p_data+14);
}

// Returns the checksum field    
unsigned int TCPSegment::checksum() const {
  return char2word(p_data+16);
}

// Returns the urgent pointer field    
unsigned int TCPSegment::pointer_urg() const {
  return char2word(p_data+18);
}

// Returns a string textually identifying most popular standard ports
const char * TCPSegment::port_name(unsigned int num) const {
  switch (num) {
    case  20: 
    case  21: return "FTP";
    case  22: return "SSH";
    case  23: return "telnet";
    case  25: return "SMTP";
    case  53: return "DNS";
    case  67: 
    case  68: return "DHCP";
    case  69: return "TFTP";
    case  80: return "HTTP";
    case 110: return "POP3";
    case 137: 
    case 150: return "NetBIOS";
    case 389: return "LDAP";
    case 546:
    case 547: return "DHCP";
  }
    
  // Distinguish assigned ports from ephemerals
  if (num < 1024)
    return "unknown";
  else  
    return "ephemeral";
}

// Output operator displaying the IP packet header fields in human readable
// form
ostream & operator<<(ostream & ostr, const TCPSegment & tcp) {
  if (tcp.p_data) {
    char outstr[16];

    ostr << "source port = " << tcp.source_port();
    ostr << " [" << tcp.port_name(tcp.source_port()) << "]" << endl;
        
    ostr << "destination port = " << tcp.destination_port();
    ostr << " [" << tcp.port_name(tcp.destination_port()) << "]" << endl;

    ostr << "sequence number = " << tcp.sequence_nb() << endl;
    ostr << "ack number = " << tcp.ack_nb() << endl;
        
    ostr << "offset = " << tcp.offset() << endl;
    ostr << "reserved = " << tcp.reserved() << endl;

    ostr << "NS  flag = " << tcp.flag_ns()  << endl;
    ostr << "CWR flag = " << tcp.flag_cwr() << endl;
    ostr << "ECE flag = " << tcp.flag_ece() << endl;
    ostr << "URG flag = " << tcp.flag_urg() << endl;
    ostr << "ACK flag = " << tcp.flag_ack() << endl;
    ostr << "PSH flag = " << tcp.flag_psh() << endl;
    ostr << "RST flag = " << tcp.flag_rst() << endl;
    ostr << "SYN flag = " << tcp.flag_syn() << endl;
    ostr << "FIN flag = " << tcp.flag_fin() << endl;
            
    ostr << "window size = " << tcp.window_size() << endl;
    ostr << "urgent pointer = " << tcp.pointer_urg() << endl;
        
    sprintf(outstr, "0x%.4x", tcp.checksum());
    ostr << "checksum = " << outstr << endl;
  }
    
  ostr << flush;
    
  return ostr;
}

#endif

The implementation of TCPSegment is straightforward and easy to understand. Note however that not all assigned TCP ports are textually identified by method port_name() (you can easily add more if needed), and options header fields are not extracted from the TCP segment header; we refer you to the IPPacket class for an example of how to add such functionality to TCPSegment.

Since TCP is transported by IP, we add a new method to the IPPacket class for mapping its payload into a TCPSegment instance:

ippacket.cpp (partial)
192
193
194
195
196
197
198
// Returns TCP segment transported in payload
TCPSegment IPPacket::tcp() {
    if (protocol() != ipp_tcp)
        throw EBadTransportException("IP packet not transporting TCP traffic");

    return TCPSegment(false, data(), length() - header_length());
}

Finally, we update the callback function to map the captured bytes in the IPPacket's payload to a TCPSegment instance for display:

sniff13.cpp (partial)
088
089
090
091
092
093
094
095
096
097
098
099
100
101
102
103
104
105
106
107
108
109
100
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
// Callback given to pcap_loop() for processing captured datagrams
void process_packet(u_char *user, const struct pcap_pkthdr * h, const u_char * packet) {
  static set<IPAddress> arpRequests;
  IPPacket   ip;
  ARPPacket  arp;
  ICMPPacket icmp;
  TCPSegment tcp;
  
  COUT << "Grabbed " << h->caplen << " bytes (" << static_cast<int>(100.0 * h->caplen / h->len) 
       << "%) of datagram received on " << ctime((const time_t*)&h->ts.tv_sec);
       
  Datagram pkt(packet, h->caplen);        // initialized Datagram instance
  if (show_raw) COUT << "---------------- Raw data -----------------" << pkt << endl;

  EthernetFrame ether = pkt.ethernet();   // get EthernetFrame instance from transported data
  COUT << "---------- Ethernet frame header ----------" << endl << ether;
  
  // Display payload content according to EtherType
  switch (ether.ether_type()) {
    case EthernetFrame::et_IPv4 :         // get IPPacket instance from transported data
      ip = ether.ip4();
      COUT << "-------- IP packet header --------" << endl << ip;

      // If it transports an ICMP packet, display its attributes
      if (ip.protocol() == IPPacket::ipp_icmp) {
        icmp = ip.icmp();
        COUT << "------ ICMP packet header ------" << endl << icmp;
        
        // Apply ping flood detection if required
        if (security_tool == PINGFLOOD && pingFloods.process_ping(ip.destination_ip(), icmp))
          cout << "**** ALERT - Potential Ping flood detected ****" << endl
               << "     numerous echo requests with large payload targeting" << endl
               << "     host " << ip.destination_ip() << endl << endl;
      }

      // If it transports a TCP segment, display its attributes
      else if (ip.protocol() == IPPacket::ipp_tcp) {
        tcp = ip.tcp();
        COUT << "------ TCP segment header ------" << endl << tcp;
      }
          
      break;

Here is the resulting display of capturing the three first TCP segments involved in a client browser connecting to a Web server. Note the 3-way handshake exchange (SYN, SYN+ACK , ACK) for opening the TCP connection, as well as the sequence and acknowledgment numbering scenario as described previously.

%root> ./sniff13 -f tcp
device = eth0
network ip = 172.16.179.0
network mask = 255.255.255.0
BPF filter = tcp
Grabbed 74 bytes (100%) of datagram received on Sat Sep 28 09:58:50 2013
---------- Ethernet frame header ----------
destination MAC address = 00.50.56.ec.28.a3
source MAC address = 00.0c.29.19.22.3a
ether type = IPv4 [0x0800]
-------- IP packet header --------
version = IPv4
header length = 20 (IHL = 5)
type of service = 0:
total length = 60
fragment ID = 0x62eb
  don't fragment = 2
  more fragments = 0
  fragment position = 0
protocol = TCP [0x06]
time to live = 64
checksum = 0xf9c7
destination IP address = 142.154.239.212
source IP address = 172.16.179.137
------ TCP segment header ------
source port = 51992 [ephemeral]
destination port = 80 [HTTP]
sequence number = 2425921669
ack number = 0
offset = 10
reserved = 0
NS  flag = 0
CWR flag = 0
ECE flag = 0
URG flag = 0
ACK flag = 0
PSH flag = 0
RST flag = 0
SYN flag = 1
FIN flag = 0
window size = 14600
urgent pointer = 0
checksum = 0xde37

Grabbed 60 bytes (100%) of datagram received on Sat Sep 28 09:58:50 2013
---------- Ethernet frame header ----------
destination MAC address = 00.0c.29.19.22.3a
source MAC address = 00.50.56.ec.28.a3
ether type = IPv4 [0x0800]
-------- IP packet header --------
version = IPv4
header length = 20 (IHL = 5)
type of service = 0:
total length = 44
fragment ID = 0xc3da
  don't fragment = 0
  more fragments = 0
  fragment position = 0
protocol = TCP [0x06]
time to live = 128
checksum = 0x98e8
destination IP address = 172.16.179.137
source IP address = 142.154.239.212
------ TCP segment header ------
source port = 80 [HTTP]
destination port = 51992 [ephemeral]
sequence number = 3123152416
ack number = 2425921670
offset = 6
reserved = 0
NS  flag = 0
CWR flag = 0
ECE flag = 0
URG flag = 0
ACK flag = 1
PSH flag = 0
RST flag = 0
SYN flag = 1
FIN flag = 0
window size = 64240
urgent pointer = 0
checksum = 0x824c

Grabbed 54 bytes (100%) of datagram received on Sat Sep 28 09:58:50 2013
---------- Ethernet frame header ----------
destination MAC address = 00.50.56.ec.28.a3
source MAC address = 00.0c.29.19.22.3a
ether type = IPv4 [0x0800]
-------- IP packet header --------
version = IPv4
header length = 20 (IHL = 5)
type of service = 0:
total length = 40
fragment ID = 0x62ec
  don't fragment = 2
  more fragments = 0
  fragment position = 0
protocol = TCP [0x06]
time to live = 64
checksum = 0xf9da
destination IP address = 142.154.239.212
source IP address = 172.16.179.137
------ TCP segment header ------
source port = 51992 [ephemeral]
destination port = 80 [HTTP]
sequence number = 2425921670
ack number = 3123152417
offset = 5
reserved = 0
NS  flag = 0
CWR flag = 0
ECE flag = 0
URG flag = 0
ACK flag = 1
PSH flag = 0
RST flag = 0
SYN flag = 0
FIN flag = 0
window size = 14600
urgent pointer = 0
checksum = 0xde23

^C
*** Capture process interrupted by user...
*** 3 datagrams captured
%root>

Application - Tracking TCP sessions

To demonstrate TCP session opening and termination processes, we implement a TCP session tracker which track the total number of data bytes exchanged within captured TCP sessions. We use a state machine to track session states during the lifetime of a TCP session. The following state machine includes the various states a TCP session goes through from initial connection request to session closure between two hosts, named A and B:

TCP state machine

State 0 represents the absence of TCP session. Once host A sends a SYN request to host B, the state machine transits to state 1, waiting for host B to respond with SYN+ACK, and so on. Here are some observations on this state machine:

  • State 3 represents an established connection, after the 3-way connection handshake is completed.

  • State 10 represents the session once it has been closed, either by a 3-way or 4-way handshake. States 4 and 7 accept both termination handshake scenarios.

  • If host A initiates session termination, the state machine transits through states 4, 5 and 6. On the other hand, if host B initiates termination, the state machine transits through states 7, 8 and 9.

  • For simplicity, we haven't included abrupt session termination with RST, but it may easily be added with a direct transition from state 3 to state 10.

The proposed implementation of TCP session tracking, presented below, uses an STL map to track multiple TCP sessions simultaneously. Each session is tracked using an instance of the following TCPSession class. Each session is uniquely identified by a set of two keys made of the IP address and the TCP port of each host.

Here is the TCPSession class. It's described in details afterwards:

tcpsession.h
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
#ifndef TCPSESSION_H
#define TCPSESSION_H

#include <string>

#include "ippacket.h"           // IPPacket
#include "tcpsegment.h"         // TCPSegment

/* TCPSession: class tracking datagrams exchanged within a single TCP session betweenà
 *             two hosts
 *
 * Attributes
 *   sourceId      : key of host having initiated the session (IP:Port)
 *   destinationId : key of host having received the connection request (IP:Port)
 *   state         : current state of the state machine
 *   bytes         : count of data bytes having been exchanged during the session
 */
class TCPSession {
  public:
    TCPSession();                                                // default constructor
    TCPSession(IPPacket &);                                      // parameterized constructor
        
    unsigned int trackState(IPPacket &, bool);                   // TCP session manager
        
    unsigned int getState() const;                               // access to state attribute
    unsigned int getBytes() const;                               // access to bytes attribute
    
    bool terminated() const;                                     // tells if session terminated

    static bool getKeys(IPPacket &, string &, string &);         // keys identifying session
        
  private:
    string   sourceId;                                           // key identifying source host
    string   destinationId;                                      // key identifying destination host
        
    unsigned int state;                                          // current session state
    unsigned int bytes;                                          // count of data bytes exchanged in session
};

#endif

And here are the member definitions for TCPSegment:

tcpsession.cpp
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
097
098
099
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
#ifndef TCPSESSION_CPP
#define TCPSESSION_CPP

#include <sstream>          // ostringstream

#include "tcpsession.h"

// Default constructor
TCPSession::TCPSession() {
  sourceId = destinationId = "";
    
  state = 0;   // initial state
  bytes = 0;   // no data exchanged yet
}

// Parameterized constructor
TCPSession::TCPSession(IPPacket & ip) {
  // Get TCP session keys
  getKeys(ip, sourceId, destinationId);
    
  state = 0;   // initial state
  bytes = 0;   // no data exchanged yet
}

// Machine state processing: transit from one state to next according to given
// IP packet (which must have a TCP segment as payload)
unsigned int TCPSession::trackState(IPPacket & ip, bool debug) {
  // Get TCP session keys
  string src, dst;
  getKeys(ip, src, dst);
    
  // Make sure the datagram is part of the session
  bool considerDatagram = (sourceId == src && destinationId == dst) || 
                          (sourceId == dst && destinationId == src);
  if (!considerDatagram) return state;
    
  // Determine segment direction according to the host that initiated
  // the connection
  bool forward  = (sourceId == src);
  bool backward = !forward;
    
  // Extract transported TCP segment
  TCPSegment tcp = ip.tcp();
    
  // Apply state machine according to segment
  switch (state) {
    case 0:                                // session closed
      // The segment must be SYN from source
      if (tcp.flag_syn() && !tcp.flag_ack() && forward) {
        bytes = 0;                         // reset data bytes count
        state = 1;                         // waiting for SYN+ACK from destination
                
        if (debug)                         // display debug info on transition
          cout << sourceId << " >>>>> SYN >>>>> " << destinationId << " (open request)" << endl;
      }
           
      break;
            
    case 1:                                // source transmitted a SYN asking to connect
      // The segment must be SYN+ACK from destination
      if (tcp.flag_syn() && tcp.flag_ack() && backward) {
        state = 2;                         // waiting for ACK from source to complete connection
                
        if (debug)                         // display debug info on transition
          cout << sourceId << " <<< SYN+ACK <<< " << destinationId << " (half opened)" << endl;
      }
                
      break;
            
    case 2:                                // destination sent SYN+ACK accepting connection
      // The segment must be ACK from source
      if (!tcp.flag_syn() && tcp.flag_ack() && forward) {
        state = 3;                         // waiting for FIN from source or destination
                
        if (debug)                         // display debug info on transition
          cout << sourceId << " >>>>> ACK >>>>> " << destinationId << " (opened)" << endl;
      }
                
      break;
            
    case 3:                                // connection established
      // The segment must be FIN in either direction
      if (tcp.flag_fin())
        if (forward) {                     // source initiates termination
          state = 4;                       // waiting an ACK or FIN+ACK from destination
                
          if (debug)                       // display debug info on transition
            cout << sourceId << " >>>>> FIN >>>>> " << destinationId << " (close request)" << endl;
        }
        else {                             // destination initiates termination
          state = 7;                       // waiting an ACK or FIN+ACK from source
                
          if (debug)                       // display debug info on transition
            cout << sourceId << " <<<<< FIN <<<<< " << destinationId << " (close request)" << endl;
        }
                
      break;
            
    case 4:                                // destination having received FIN from source
      // The segment should be ACK or FIN+ACK from destination
      if (!tcp.flag_fin() && tcp.flag_ack() && backward) {
        state = 5;                         // waiting for FIN from destination
                
        if (debug)                         // display debug info on transition
          cout << sourceId << " <<<<< ACK <<<<< " << destinationId << " (half closed)" << endl;
      }
      else if (tcp.flag_fin() && tcp.flag_ack() && backward) {
        state = 6;                         // waiting for ACK from source
                
        if (debug)                         // display debug info on transition
          cout << sourceId << " <<< FIN+ACK <<< " << destinationId << " (half closed)" << endl;
      }
                
      break;
            
    case 5:                                // source having received ACK in response to its FIN 
      // The segment must be FIN from destination
      if (tcp.flag_fin() && backward) {
        state = 6;                         // waiting for ACK from source to complete termination

        if (debug)                         // display debug info on transition
          cout << sourceId << " <<<<< FIN <<<<< " << destinationId << " (reverse close request)" << endl;
      }
                
      break;
            
    case 6:                                // source having received FIN or FIN+ACK from destination
      // The segment must be ACK from source
      if (!tcp.flag_fin() && tcp.flag_ack() && forward) {
        state = 10;                        // session closed
                
        if (debug)                         // display debug info on transition
          cout << sourceId << " >>>>> ACK >>>>> " << destinationId << " (closed)" << endl;
      }
                
      break;
            
    case 7:                                // source having received FIN from destination
      // The segment should be ACK or FIN+ACK from source
      if (!tcp.flag_fin() && tcp.flag_ack() && forward) {
        state = 8;                         // waiting for FIN from source
                
        if (debug)                         // display debug info on transition
          cout << sourceId << " >>>>> ACK >>>>> " << destinationId << " (half closed)" << endl;
      }
      else if (tcp.flag_fin() && tcp.flag_ack() && forward) {
        state = 9;                         // waiting for ACK from destination
                
        if (debug)                         // display debug info on transition
          cout << sourceId << " >>> FIN+ACK >>> " << destinationId << " (half closed)" << endl;
      }
                
      break;
            
    case 8:                                // destination having received ACK in response to its FIN
      // The segment must be FIN from source
      if (tcp.flag_fin() && forward) {
        state = 9;                         // waiting for ACK from destination to complete termination

        if (debug)                         // display debug info on transition
          cout << sourceId << " >>>>> FIN >>>>> " << destinationId << " (reverse close request)" << endl;
      }
                
      break;
            
    case 9:                                // destination having received FIN or FIN+ACK from source
      // The segment must be ACK from destination
      if (!tcp.flag_fin() && tcp.flag_ack() && backward) {
        state = 10;                        // session closed
                
        if (debug)                         // display debug info on transition
          cout << sourceId << " <<<<< ACK <<<<< " << destinationId << " (closed)" << endl;
      }
                
      break;
  }

  // If connection established, update data bytes counter
  if (state == 3)
    bytes += tcp.length() - tcp.header_length();
        
  return state;       // return current state
}

// Returns value of attribute state
unsigned int TCPSession::getState() const {
  return state;
}

// Returns value of attribute bytes
unsigned int TCPSession::getBytes() const {
  return bytes;
}

// Returns true if state machine has reached final state (i.e. session is
// terminated)
bool TCPSession::terminated() const {
  return state == 10;
}

// Returns keys identifying both hosts in the TCP connection
bool TCPSession::getKeys(IPPacket& ip, string & src, string & dst) {
  // Make sure the IP packet transports TCP, and get the segment
  TCPSegment tcp = ip.tcp();
  
  // Build keys from IP addresses and associated TCP ports
  ostringstream out;
              
  out << ip.source_ip() << ':' << tcp.source_port(); 
  src = out.str();
              
  out.str("");     // clear the stream
  
  out << ip.destination_ip() << ':' << tcp.destination_port();
  dst = out.str();
}

#endif

Here are the major elements of TCPSession's implementation:

  • A TCP session is uniquely identifiable by two keys: the sender's IP address and TCP port, and the receiver's IP address and TCP port, both strings formatted as IP:Port. Method getKeys() returns those keys. The parameterized constructor at line #017 stores both keys in attributes to later be able to identify if a given TCP segment belongs to the session.

  • The core method is trackState(), which updates the state machine's state according to the given TCP segment. First the method makes sure the given datagram is a TCP segment belonging to the session (lines #033 to #035), then it determines the direction of the datagram at lines #039 and #040. The switch statement processes state changes according to the given TCP segment, as depicted in the state machine presented earlier. Each state is represented by a case statement, and within a state the transition to another state is applied if the segment flags and datagram direction are appropriate. Note also that  trackState() has a debug parameter which, when set to true, displays state transition information; this parameter allows to track in details the evolution of TCP sessions.

  • The other methods in TCPSession are straightforward and their code well commented, so further description is unnecessary.

Now let's see how the main program exploits TCPSession instances to track connections:

sniff14.cpp (partial)
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042



084
085
086
087
088
089
090
091
092
093
094
095
096
097
098
099
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
#include <iostream>

#include <cstring>             // memset
#include <cstdlib>             // exit
#include <unistd.h>            // getopt()
#include <signal.h>            // Ctrl+C handling
#include <arpa/inet.h>         // struct in_addr
#include <string>              // string

#include <set>                 // STL set
#include <map>                 // STL dictionary

#include <pcap.h>              // libpcap

#include "datagram.h"          // Datagram
#include "ethernetframe.h"     // EthernetFrame
#include "ippacket.h"          // IPPacket
#include "arppacket.h"         // ARPPacket
#include "icmppacket.h"        // ICMPPacket
#include "tcpsegment.h"        // TCPSegment

#include "pingflood.h"         // PingFloodDetection
#include "tcpsession.h"        // TCPSession

using namespace std;

...

#define ARPSPOOF  1
#define PINGFLOOD 2
#define TCPTRACK  3

// Macro replacing cout to apply conditional display in callback
#define COUT if (!quiet_mode) cout

// Callback given to pcap_loop() for processing captured datagrams
void process_packet(u_char *user, const struct pcap_pkthdr * h, const u_char * packet) {
  static set<IPAddress>          arpRequests;
  static PingFloodDetection      pingFloods;  
  static map<string, TCPSession> tcpSessions;

  IPPacket   ip;
  ARPPacket  arp;
  ICMPPacket icmp;
  TCPSegment tcp;
    
  COUT << "Grabbed " << h->caplen << " bytes (" << static_cast<int>(100.0 * h->caplen / h->len) 
       << "%) of datagram received on " << ctime((const time_t*)&h->ts.tv_sec);
       
  Datagram pkt(packet, h->caplen);        // initialized Datagram instance
  if (show_raw) COUT << "---------------- Raw data -----------------" << pkt << endl;

  EthernetFrame ether = pkt.ethernet();   // get EthernetFrame instance from transported data
  COUT << "---------- Ethernet frame header ----------" << endl << ether;
  
  // Display payload content according to EtherType
  switch (ether.ether_type()) {
    case EthernetFrame::et_IPv4 :         // get IPPacket instance from transported data
      ip = ether.ip4();
      COUT << "-------- IP packet header --------" << endl << ip;

      // If it transports an ICMP packet, display its attributes
      if (ip.protocol() == IPPacket::ipp_icmp) {
        icmp = ip.icmp();
        COUT << "------ ICMP packet header ------" << endl << icmp;
        
        // Apply ping flood detection if required
        if (security_tool == PINGFLOOD && pingFloods.process_ping(ip.destination_ip(), icmp))
          cout << "**** ALERT - Potential Ping flood detected ****" << endl
               << "     numerous echo requests with large payload targeting" << endl
               << "     host " << ip.destination_ip() << endl << endl;
      }

      // If it transports a TCP segment, display its attributes
      else if (ip.protocol() == IPPacket::ipp_tcp) {
        tcp = ip.tcp();
        COUT << "------ TCP segment header ------" << endl << tcp;

        // Apply TCP session tracking if required
        if (security_tool == TCPTRACK) {
          // Compute keys to uniquely identify the session
          string src, dst;
          TCPSession::getKeys(ip, src, dst);
          
          // Is the segment part of a tracked session or not? We need to search 
          // for two keys since segments within a TCP session travel in both
          // directions
          map<string, TCPSession>::iterator it = tcpSessions.find(src + dst);
          if (it == tcpSessions.end()) it = tcpSessions.find(dst + src);
          
          // If it's a new session then start tracking it
          if (it == tcpSessions.end() && (tcp.flag_syn() && !tcp.flag_ack())) {
            tcpSessions[src + dst] = TCPSession(ip);
            it = tcpSessions.find(src + dst);
          }
          
          if (it != tcpSessions.end())
            // Now we track the session to which is associated the TCP segment
            if (it->second.getState() != it->second.trackState(ip, false))
              // If the session has just been closed, display total number of bytes 
              // exchanged since we started tracking it, and destroy it
              if (it->second.terminated()) {
                cout << "Total data exchanged between " << src << " and " << dst << " = " 
                     << it->second.getBytes() << " bytes" << endl;
                     
                // Destroy TCP session tracker
                tcpSessions.erase(it);
              }
        }          
      }
      
      break;

TCP session processing occurs when a TCP segment is captured (block of code starting at line #135):

  • A new security application option is added to command line arguments, TCPTRACK, for activating TCP session tracking.

  • An STL dictionary is used to track multiple TCP sessions simultaneously. The key consists of the source host's IP address and TCP port followed by the destinations' IP address and TCP port, obtained at line #138.

  • Lines #143 and #144 check if the captured segment is part of a session already being tracked. Since traffic flows in both directions within a TCP session, both variations of session keys must be searched (i.e. src+dst if the segment travels from source host to destination host, or dst+src in the other direction). If the captured segment is not part of a session already being tracked and is a SYN segment (i.e. connection request), the block of code at line #147 creates a new TCPSession instance to track the new connection.

  • Line #154 updates the state machine with the captured datagram. If it provokes a state change and the state machine reaches its end state (conditional statement at line #157), the session's termination handshake is completed so the total amount of data exchanged is displayed (line #158) and the TCPSession instance is destroyed (line #162).

Here is an example of TCP session tracking. It shows TCP sessions related to Web browsing at address 199.7.48.72:

%root> ./sniff14 -s tcptrack -q
device = eth0
network ip = 172.16.179.0
network mask = 255.255.255.0
Total data exchanged between 199.7.48.72:80 and 172.16.179.137:34820 = 1932 bytes
Total data exchanged between 172.16.179.137:54556 and 72.21.92.20:80 = 429 bytes
Total data exchanged between 172.16.179.137:54557 and 72.21.92.20:80 = 429 bytes
Total data exchanged between 172.16.179.137:54558 and 72.21.92.20:80 = 429 bytes
Total data exchanged between 172.16.179.137:40689 and 72.21.92.82:80 = 8539 bytes
Total data exchanged between 172.16.179.137:51183 and 74.125.225.99:80 = 3747 bytes
Total data exchanged between 172.16.179.137:52613 and 74.125.131.104:80 = 3629 bytes
Total data exchanged between 172.16.179.137:59844 and 91.189.90.41:80 = 738 bytes
Total data exchanged between 172.16.179.137:43058 and 74.125.131.104:443 = 75464 bytes
^C
*** Capture process interrupted by user...
*** 311 datagrams captured
%root>

This application was implemented to illustrate opening and closing TCP sessions. However the application could easily be modified to provide more useful functionalities such as SYN flooding detection, an attack which consists of sending multiple SYN segments to a server while not responding to its SYN+ACK segments, in the hope of saturating the server's resources dedicated to handling TCP connection requests. The required modifications are left to the reader as exercise.


Home  |  Previous  |  Next

 
Copyright © 2014 Marco Lavoie