1. Introduction
1.1. About this document
This document is meant to provide all the required documentation for people wishing to understand the Packet Forge library and extend its functionality.
This document was written (and is maintained) by Jean-Francois Brousseau. Updates are made on a regular basis following development. The original version of this document is kept in SDF format and the HTML version is autogenerated.
1.2. Document history
| Revision | Author | Changes |
| 1 | Jean-Francois Brousseau | Original draft |
1.3. Conventions used
This manual uses several typographical conventions in order to clarify context.
| Style | Description |
| italic | Used |
| Fixed | Used to display source code |
Whenever a reference to a program or library interface is made, the program or library name will often be followed by a number in parentheses, such as the following example:
program(1)
The number between parentheses specifies the section in which the manual page for that reference is available. Assuming that you have the appropriate manual pages installed, you can see the documentation for that particular reference by typing
$ man 1 program
which will display the manual page for program in section one. If you are not familiar with man(1), refer to its manual page (in other words, RTFM!).
2. Overview
2.1. What is Packet Forge?
Packet Forge is a library that provides several APIs to facilitate the development of portable network-oriented tools.
Packet Forge can be used in a variety of ways:
- rapid protocol development
- traffic analysis tools
- firewall and IDS regression tests
2.2. Features
2.2.1. Packet capture
Packet Forge can perform packet capture on several types of network interfaces through the host's datalink interface. Currently, the following datalink APIs are supported:
- BSD's Berkeley Packet Filter (bpf) pseudo-device
- Linux packet(7) interface
- STREAMS DataLink Provider Interface (DLPI)
Packet Forge can also read and write to capture dump files in the following formats:
- pcap format (used by tcpdump, snort, etc.)
- Packet Forge dump format
2.2.2. Packet injection
Packets can be crafted and injected at any layer of the stack. A simple but flexible buffer API makes it easy to modify the structure of packets.
2.2.3. Protocol stack
The builtin dynamic protocol stack provides a generic interface that makes it possible to encapsulate and decapsulate protocols
2.2.4. Network Interface management
Network interfaces can be discovered and manipulated through a platform-independent API. Packet Forge also supports the creation of virtual interfaces that exist only within the context of the process.
2.2.5. Fully extendable core
The modular approach used in the library makes it easy to extend the provided functionalities by hooking support for new protocols and domains, interface types, or any other facility that is related to network programming.
2.2.6. Threaded library
The entire library was written with a focus on appropriate locking for multi-threaded applications. All objects created by Packet Forge can be accessed from multiple threads concurrently and granular locking ensures that there is minimized performance loss. Thread support can be enabled at compilation time (see the README file for details).
2.3. Supported Platforms
Due to careful respect of widely accepted standards such as POSIX and X/Open Portability Guide, Packet Forge requires little or no modification to make it run on platforms which conform to those standards. Additionally, the library works on both 32-bit and 64-bit
Packet Forge is currently known to work on the following operating systems:
- OpenBSD
- NetBSD
- FreeBSD
- Linux 2.4 and up
The following platforms should be supported but
2.4. Development Process
The current development process is rather informal due to the limited size of the team (currently only me ;).
The library's minor number is incremented at every release.
3. Development environment
4. Interface conventions
4.1. Opaque structures
Most of the structures used internally by the Packet Forge library do not export their definition in header files. These are known as opaque structures, because external code cannot reference the structure's fields. This is done explicitly so callers do not attempt to use or modify the fields themselves, which could result in subtle bugs (for example, threaded applications must ensure proper locking before accessing any of a structure's internals). Instead, callers should use the proper accessor functions that are part of the API.
4.2. Return values
In order to keep consistency, it is important for code to follow the same general principles with regards to return values. This section explains the most common conventions. In some cases, functions will deviate from these for practical reasons; however, it is strongly encouraged that any new code follow these conventions for simplicity.
Functions which return integer values generally return -1 to indicate an error condition, and 0 on success, although any positive value should be considered successful (this allows function to return additional information, such as counts, line numbers, etc.).
Functions which return pointers to structures are encouraged to make their structures opaque for proper accessing. A return value of NULL means that an error occured.
4.3. Error reporting
One of the key factors involved in producing good software is the quality of its error reporting. Few things are more frustrating than having to debug a program for hours because it does not give enough details about failures. At the other extreme, software that produces too much information makes it easy to get lost in the sea of data and fail to see the parts that actually fail. Good error reporting is a careful balancing of those two sides, and often makes the difference between programs that suck and programs that rock!
Error reporting in Packet Forge is done in a way similar to the errno facility provided by the standard C library. A global variable (or in the case of multi-threaded applications, a per-thread variable) is used to keep the code of the last error that occurred. Naturally, if a function fails to set the error code in the appropriate situations, the error reporting will fail miserably, sometimes even report the wrong error code, such as benign errors that happened prior to the failure.
The normal convention with regards to setting the error code is simple. A function must set the error code if it failed in some way, or if a call to another function that is not part of the Packet Forge library failed (in which case the calling function will likely have to do error mapping from the callee's error reporting facility). For example, if a function within Packet Forge makes a call to the read(2) system call, it has to check if the return value is equal to -1, and in such cases set the error code to a value that matches the error code reported by errno. Here is a short example of what this might look like:
ret = read(fd, buf, sizeof(buf));
if (ret == -1) {
switch (ret) {
case EINVAL:
pf_seterr(PF_ERR_INVARG);
break;
case EIO:
pf_seterr(PF_ERR_INERR);
break;
default:
/* more generic error code */
pf_seterr(PF_ERR_DATAERR);
break;
}
}
In the case where another function that is part of the library failed, that function should take care of setting the error code, so the caller doesn't have to. In some cases, the caller can still replace the error code by another one only if the new error code gives a better description of the error.
Note: Any function which triggers an error that was not generated by another function of the library should set the appropriate error code.
5. Standard interfaces
To simplify portability and code dependencies, the Packet Forge library provides many ancillary interfaces that are commonly used in network development.
5.1. Memory management
Packet Forge provides an API that wraps around the standard malloc(3) interface. Although the two interfaces are currently interchangeable, the use of this interface is strongly encouraged. It provides extra debugging functionality and memory management statistics that cannot be tracked easily otherwise.
The following functions are described here:
void* pf_malloc(size_t size)
void* pf_realloc(void *ptr, size_t size)
void pf_free(void *ptr)
pf_malloc() allocates a chunk of memory of size size and returns a pointer to the start of the memory area. If insufficient memory is available to satisfy the request, pf_malloc() returns NULL and sets the error code to PF_ERR_NOMEM.
pf_realloc() reallocates the pointer specified by ptr to the new size size and returns a pointer to the start of the reallocated region. The contents of the original buffer are unmodified, up to the original size or the new size, whichever is smaller. If insufficient memory is available to satisfy the request, pf_realloc() returns NULL and sets the error code to PF_ERR_NOMEM. It is important to note that in case of error, the original pointer remains valid. For this reason, it is an error to use the same pointer to store the return value of the function. The following code example shows how to properly handle a call to pf_realloc()
pf_free() frees the memory referenced by ptr, which was previously allocated by a call to either pf_malloc() or pf_realloc().
5.2. Random number generation
Random numbers have many uses in network programming, ranging from dynamic port number generation to cryptographic uses.
In the current context, the term random has to be taken with a grain of salt. We are actually talking here about pseudo-random numbers (it is kinda hard to generate real random numbers from a machine that is completely deterministic in nature). Some platforms support hardware random number generators, and in most cases these devices are used transparently to feed into the standard randomness pools, such as /dev/random, /dev/urandom and others.
#include <pforge/pforge.h>
int pf_rand_get(void *dst, size_t size)
u_int8_t pf_rand_get8(void)
u_int16_t pf_rand_get16(void)
u_int32_t pf_rand_get32(void)
int pf_rand_get(void *dst, size_t size) stores up to size bytes of data in the buffer pointed to by dst. The request may not be fulfilled if the entropy pool is low and high randomness is requested. The function returns the number of bytes actually stored in dst on success, or -1 on failure.
u_int8_t pf_rand_get8(void) returns a random byte.
u_int16_t pf_rand_get16(void) returns a random 16-bit word.
u_int32_t pf_rand_get32(void) returns a random 32-bit word.
5.3. Message logging
Packet Forge provides a message logging system similar to the syslog(3) interface. Messages logged through the system must be given a level which determines the urgency of the message. This allows users of the library to filter messages based on the level they want to see.
#include <pforge/log.h>
int pf_log(u_int level, const char *fmt, ...)
int pf_vlog(u_int level, const char *fmt, va_list vap)
pf_log() logs a string message of importance level level to the log system. The message is composed of a format-string variable fmt and a variable number of arguments (ellipsis) which are used to format the escape sequence found in fmt. The supported formats are the same as the printf() family of functions.
pf_vlog() is the equivalent of pf_log(), except that it accepts an argument of type va_list instead of an ellipsis. This is useful for implementing other functions which have a variable number of arguments that are passed to the logging system. This call is very rarely used.
5.4. Timer management
Please document me!
6. Modules
The module subsystem is one of the most important parts of the library. It provides the mechanisms to manage modules (sometimes referred to as plugins) which hook extra functionality to the library, such as support for new protocols or interface types.
6.1. What is a module?
A Packet Forge module is simply a DSO (Dynamically Shared Object) file with certain definitions and functions that allow it to hook its functionality in the library. Modules can perform a variety of tasks: they can implement protocols or protocol families, datalink drivers, interface types. Some modules also perform completely unrelated tasks (such as the p0f module, whose sole purpose is to attempt to guess the operating system type for each IP packet).
6.2. Building a module
Figuring out the appropriate compiler and linker settings to generate module object files can be somewaht of a pain. Fortunately, the source tree has a set of make(1) include files which greatly simplify the task of compiling modules.
6.2.1. A basic Makefile
6.3. The parts of a module
6.3.1. Information
A module can export several pieces of information that are not necessarily used by the library itself, but help in module management. This information can be specified by using the macros provided by the pforge/module.h C header, which will be described in more details below.
6.3.2. Initialization
Before the module can be used, it must perform initialization.
6.3.3. Cleanup
Cleanup is the last step in the lifetime of a module before it gets removed from the library.
6.3.4. Reference counting
Every module that gets loaded has a reference count associated to it, which indicates if the module is still in use by other parts of the library or program. A module is automatically unloaded if its reference count drops to 0. This technique can only work if
6.4. Pitfalls
The following rules must be observed in order to avoid bugs with modules:
6.4.1. Logging
Message logging MUST go through the Packet Forge log system. Standard I/O functions such as printf() and derivatives cannot be used, because the module has no knowledge of the context in which it is operating (i.e. descriptors normally bound to standard input and output could be something else entirely). Moreover, logging messages through the log system allows the user to filter messages and specify the appropriate targets for logging.
6.4.2. Reentrancy and thread safety
Every function in a module must be completely reentrant and SHOULD be thread-safe.
Thread safety can be achieved through the use of POSIX mutexes. Although locking granularity is up to the implementer's choice, care must be taken to avoid excessive contention. The simplest form of locking, known as a big lock, is done by having a single mutex that must be locked for any operation for the module. Although this assures that no data can be modified simultaneously by two threads, it also prohibits two separate functions (or even two instances of the same function) to run concurrently. This technique basically serializes all access and performs very poorly in multi-threaded applications.
6.4.3. Namespace pollution
A module is not a self-contained program and, when used, gets integrated as part of the address space of the program using Packet Forge by the runtime linker (see ld.so(1) for more details on runtime linking). Any symbols exported by the module, be it variables or functions, get added to the program's global symbol table as a result of this operation. This can produce what is known as a namespace clash, because the global namespace can not have more than one symbol mapped to a particular name, and the behaviour of exporting symbols this way is known as namespace pollution.
Avoiding these problems is a question of using variable and function names that have little risk of clashing with variables from other modules. In this respect, the longer the name, the better. The simplest approach is to use a common prefix for all the symbols for a specific module. For example, the IP protocol module uses the prefix pf_ip_ for all symbols. Another good practice is to avoid exporting symbols unnecessarily by declaring global variables and functions static. These symbols will be kept private to the module, and often discarded after linking has been performed, unless they are kept for debugging purposes.
6.4.4. Cleanup
Although modules are not required to have a cleanup routine, they are strongly encouraged to implement one.
6.5. A module example
The Packet Forge source tree contains the source for a skeleton module (that is, the barebones code to generate a module that does nothing), which should be used as the base template for new modules. The source code can be found in the mod/skel directory. This section will explain in more details what each part of the code does (the source code can be found in the mod/skel directory of the distribution).
7. Glossary
Computer-related terminology can sometimes become quite confusing when one or more terms are used interchangeably. This glossary gives a basic definition of some words
Datagram A packet which contains all necessary information to be routed appropriately without prior connection setup.
Decapsulation header data (and sometimes appending trailer data) to the datagram of the upper layer. Encapsulation is the opposite of decapsulation.
Encapsulation A technique used to layer protocols by prepending header data (and sometimes appending trailer data) to the datagram of the upper layer. Encapsulation is the opposite of decapsulation.
Packet Wow! You're in big trouble if you don't know this one! A sequence of data that consists of protocol header information, possibly followed by payload and/or trailer information.