Remoting Your Toaster Using Kernel

Remoting Your Toaster Using Kernel-Mode TCP/IP
March 15, 2003
Thomas F. Divine

If someone asked you to write a user-mode application that would run in a client/server mode across a network connection, you might think immediately of an architecture that used the Winsock sockets-based user-mode network programming API. You'd have a server application doing a listen on a particular port, and you'd have a separate client application that would connect to that port on the server computer and then issue socket read and write calls.

But what if your problem is to implement both your network client and your network server in kernel-mode? Microsoft hasn't provided a kernel-mode implementation of Berkeley sockets, so your job isn't quite the same as it is in user mode. That's where this article comes in. In it, I'll provide a general architecture for a remote WDM device driver.

To avoid bogging down in the specifics of some real device, I decided to take the hoary old TOASTER sample from the DDK and show how to create a Toaster Server and a Toaster Client that work together using a TCP/IP connection. Along the way, I'll provide a brief introduction to TCP/IP programming in the kernel and to the Transport Data Interface (TDI), of which TCP/IP is a subset.

Since this is an architecture paper, I'm not going to show you code samples. But, since most people are put off by the apparent complexity of TDI, I'll give you an introduction to how to use it to do the same sorts of things you do in use mode with socket programming.

What you need to know

Before jumping into TDI development, you need to have a firm understanding of Windows device driver design, development and debugging techniques. Give special attention to these topics:

Handling of driver-allocated interrupt request packets (IRPs).
Queuing of IRPs and asynchronous I/O.
Use of the IoCallDriver facility used by one driver to call another driver.
Memory management techniques, including handling of chained memory descriptor lists (MDLs).
Influence of interrupt request level (IRQL) on driver operations.
Synchronization techniques.
Structure alignment/packing and data marshalling techniques.

Finally, you should have an understanding of how TCP/IP works BEFORE you begin TDI development. In particular, you should know that TCP is a stream-based protocol – not a message-based protocol. Understanding this distinction is important in designing the interface between and TCP client and its server.

Introduction to the Transport Data Interface (TDI)

TDI Is Really a Simple Interface

Despite the fact that the TDI Design Guide is 42 pages in length and the TDI Reference Guide is 180 pages in length, the basic design of TDI interface is simple. [Editor's Note: Easy for Tom to say!]

TDI is simply an IOCTL-based interface between two drivers: a TDI Provider that implements the mechanics of a network protocol such as TCP and a TDI Client that uses the services of the provider.

Here is an abbreviated outline of the TDI interface:

The TDI provider creates a named device object and implements a set of Dispatch routines that are defined by TDI.
The TDI client interacts with the provider by opening the provider’s device object using ZwCreateFile.
The TDI client then submits I/O requests (IRPs) via IoCallDriver to initiate network operations.
The TDI provider indicates completion of operations and occurrence of network events by completing IRPs, which triggers calls to completion routines in the client.

The inter-driver IOCTL interface is almost entirely asynchronous, which is ideal for a network interface in user mode or kernel mode.

Why TDI Seems Complex…

TDI seems complex because it is a flexible, generalized and extensible interface designed to provide a single common interface for a wide variety of different network protocols. For example:

TDI has built-in definitions that support at least 24 network protocols, including: IP, IPv6, AppleTalk, NETBIOS, IPX and DECNet.
- We are only interested in the IP protocol…
TDI supports both message mode and stream mode data transfer operations.
- TCP is stream mode only…
TDI is extensible using the TDI_ACTION facility.
- Not used with TCP…
Plug-and-Play Event Notifications
- Of limited interest for TDI clients…

The simplicity of TDI will only become apparent when you focus on the specific task of writing a TDI client for a single specific protocol.

Transport Data Interface (TDI) Documentation

Documentation for TDI is found in the Windows Driver Development Kit (DDK) Help file under the top-level heading Network Devices and Protocols.

When reviewing the TDI documentation, remember to focus on the features that apply to the specific protocol that you intend to use. That focus will help you avoid being confused by the great generality of the interfaces.

In addition, realize that some of the information (especially in the Design Guide) relates to the task of writing a TDI provider. Although this information is useful, it would be exceptionally rare for anyone to actually be interested in writing a TDI provider these days.

The TDI documentation topics that you should look at first include:

Design Guide\TDI Drivers
- TDI Kernel-Mode Client Interactions
- TDI Operations
- TDI File Objects
Reference Guide\TDI Drivers
- TDI Structures
  - TRANSPORT_ADDRESS
  - TA_ADDRESS
  - TDI_ADDRESS_IP
  - TA_ADDRESS_IP

Introducing the DDK "Toaster" WDM Driver Sample

This article takes the well-known Windows DDK “Toaster” WDM driver (Thanks, Eliyas!) and develops an architecture for a TCP-networked implementation consisting of a local Toaster Client and a remote Toaster Sever.

The Baseline Toaster Driver

Although Windows DDK Toaster sample actually consists of several related sample drivers, I'm looking here just at the toaster.sys function driver.

The essential functionality of toaster.sys to be remoted is handling of:

ToasterDispatchIoctl
ToasterReadWrite

Simplifying Modifications to the Baseline Toaster Driver

We'll make a few modifications need to the Toaster driver before beginning to develop the network architecture. Some of these modifications are necessary prerequisites for eventual network operation. Others are made (at least conceptually) to make the sample network design practical and easy to understand.

Exclusive Access: If the device is open, the driver fails any subsequent IRP_MJ_CREATE requests for the device until the device has been closed.

The benefit to the network implementation is that each client and server will have at most one TCP connection open at any given time.

Serialized I/O: The driver processes one and only one IRP at a time in FIFO order.

The benefit to the network implementation is that there will be at most one network transaction in progress at any given time.

Buffered I/O: The system probes specified pages, makes them resident, and locks the physical pages mapped by the virtual address range in memory.

The benefit to the network implementation is that there is no requirement for the driver to call MmProbeAndLockPages on user data.

Asynchronous I/O: The device API is adapted to support asynchronous I/O.

Certainly remote I/O operations must be performed asynchronously. Applications that call the driver must also be modified for asynchronous operation.

Separate Read/Write Dispatch Routines: The original sample employs a single ToasterReadWrite dispatch routine.; Read and Write need to be handled separately on the network..
Windows 2000 and Higher Only: Discussing the VxD TDI interface provided on Windows 9X/ME is beyond the scope of this article. [Editor's Note: The 9x/Me implementation of TDI uses the same concepts. Instead of an IRP-based interface, however, it relies on a set of direct function calls. Refer to the topic "TCP/IP Vxd Interface" in MSDN. TOASTER is not designed to run on 98/Me anyway.]

Toaster Server Design Goals

Here are the design goals that are used to direct the remote toaster architecture:

Preserve Existing Toaster Functionality – Test applications, such as test.exe, should run on the local Toaster client with minimal modification.
Support Non-Trivial I/O - As written, the sample driver performs only trivial work in the Ioctl, Read and Write dispatch routines. The architecture for the remote toaster will support more meaningful operations.
Multiple Toaster Devices – The architecture will preserve support for having multiple toaster devices.

Toaster Server IP Addressing and Discovery

The first step in developing the networked Toaster architecture is to decide how IP addresses and ports are assigned to Toaster Servers and how Toaster Clients discover Toaster Servers. We’ll keep this as simple as possible...

Server IP Port Numbers

I adopted a trivial approach for IP port assignments for this design. I'll assign each Toaster Server device an IP port number based on the device serial number:

Toaster Serial Number	Toaster IP Port
1	5001
2	5002
…	…
N	5000 + N

Toaster clients and servers can determine their device serial number, and hence the IP port associated with their device, by calling IoGetDeviceProperty for DevicePropertyUINumber.

We also make the assumption that a Toaster Client with a particular serial number can only connect to a Toaster Server with the same serial number.

Each Toaster device is its own independent network server and has no awareness of other Toaster devices.

If you were designing a server "farm" to provide a pool of toasters for use by anybody in the enterprise, you'd want a different architecture. From a network perspective the Toaster Service would listen on a single IP port, such as port 5000. Toaster Clients would attempt to connect on this single service listening port. When the server received a connection request it would attempt to find an available Toaster device. If a Toaster device was available the server would setup TCP connection between the client and the assigned Toaster device; otherwise, the connection would be refused.

Server IP Address Discovery

The user on a host running the Toaster Client must have prior knowledge of the remote Toaster Server host name or its IP address. In particular, there's no kernel equivalent to the user-mode function gethostbyname that would do a DNS lookup. The user must employ a configuration application or device coinstaller Property page (beyond the scope of this article) to save the remote Toaster Server’s IP address in a device-specific registry location.

The Toaster Client driver can fetch the IP address of the remote Toaster Server by reading from the device-specific registry location from its AddDevice routine.

Toaster Remote I/O (RIO) Protocol

The next step in the architecture of the Toaster Server is to define the data definitions and rules that we impose on the TCP stream between the client and the server.

Together these data definitions and rules constitute our Remote I/O (RIO) protocol.

Top-Down Design of the Remote I/O (RIO) Protocol

Here's my first pass at a design for a remote I/O (RIO) protocol::

Send stuff to server.
Get stuff back

Even from so terse a statement of the protocol, we can see that the RIO has these characteristics:

Asymmetric Protocol - The client always initiates network operations.
Transaction-Oriented Protocol - Network operations always consist of a request initiated by the client followed by a matching response from the server.
Reliable Protocol - We really can' tolerate loss of information. If information is lost the Toaster will almost certainly misbehave (burned toast?) and an IRP on the client end will pend forever.

The last characteristic determines whether we should a connection-oriented protocol such as TCP, or whether a connectionless protocol like UDP would serve as well.

Recall from your user-mode experience with socket programming that a connectionless protocol requires you to handle error correction and to cope with missing replies. Furthermore, an implication of a connectionless protocol is that a server is essentially stateless with respect to a series of requests from one or more clients. These factors make a UDP-type protocol inappropriate here: we need there to be a concept that a client reserves the use of a toaster by opening a handle of some kind, performs a series of operations that put the toaster into a handle-specific set of states, and then releases the toaster by closing a handle. To put it another way, a UDP-based toaster might allow Fred to override Barney's request for light-brown toast, in defiance of common sense.

Thus, we will want a connection-oriented protocol (TCP) for our remote toaster.

The "stuff" that is sent in a RIO request is derived from an IRP passed to the client device Ioctl, Read or Write dispatch routine, and the server's RIO response contains the information necessary for the client to complete the IRP to the user.

We can define the contents of a RIO request and RIO response in terms of the IRP being processed by the client:

RIO Request Information
- Request MajorFunction Code - Distinguishes between Ioctl, Read, and Write requests.
- Resqest IoControlCode - Supplemental information for handling Ioctl operations.
- Request Data - Zero or more bytes of user data, depending on the operation
RIO Response Information
- Response Status - Needed to complete the IRP on the client.
- Response Data - Zero or more bytes of data to be returned to the user.

The RIO request information and RIO response information will be transferred as TCP data on the connection between the Toaster Client and Toaster Server, as illustrated below:

RIO Request Header

Request Data (0 or more bytes)

RIO Request TCP Data

RIO Response Header

Response Data (0 or more bytes)

RIO Response TCP Data

Note: Understand that this is not exactly how the data will appear on the network. Data on the network may be fragmented into multiple packets by TCP as it is sent across the network. The receiving TCP implementation will reassemble the TCP data for you; however, the received data may be presented to your receive handler incrementally.

The RIO headers are simple data structures containing the request and response information that must be sent for each transaction:

RIO Request Header

RIO Response Header

MajorFunction Code

IoControlCode

Request Data Length

Response Status

Response Data Length

A more complete RIO protocol could include additional header fields, such as:

Transaction Identifier (TID) - A client-generated number that identifies the transaction. Echoed by the server in the RIO response. In the simple RIO protocol this could be used for sanity checking. Presence of a TID also opens the possibility for a future design that would support multiple concurrent transactions.
Command Code - Initially this field would simply distinguish between a RIO request and a RIO response. The presence of a CC field would open the possibility for a future design that would support server-initiated notification indications to the client.
Fields of Convenience - Additional fields provided to simplify the details of the implementation.

Remote Toaster Implementation Notes

Here are a few ideas and suggestions for a simple implementation of networked Toaster Client and Toaster Server drivers.

Allocating and Deallocating Resources

Where to Save Allocated Resources

Because of the simplifying modifications made to the network Toaster there will be only one TCP connection for each Toaster device. This means that resources for network operation can simple be saved by adding fields to the Toaster device object's DeviceExtension (fdoData). For example, memory for the RIO request and RIO response headers can simply be structures embedded in the DeviceExtension. Pointers associated with allocating items such as interrupt request packets (IRPs) can also be saved in the DeviceExtension.

When to Allocate and Deallocate Resources

The driver must be at IRQL == PASSIVE_LEVEL to allocate some of the resources needed to support the TCP connection. For example, the need to call ZwCreateFile to create the TDI transport address and connection endpoint imposes this restriction.

To simplify this sample it is appropriate to allocate all needed resources in the AddDevice routine of the drivers and to release them at the point where the device is removed.

Understanding TRANSPORT_ADDRESS and Related Data Structures

Since TDI is a generalized interface specification it offers a myriad of different structures to represent a variety of network address types. We need to understand just a few of them:

TDI Generalized Network Address Structures

TA_ADDRESS - A generalized structure (blob) that can represent any single network address of any TDI address type.

TRANSPORT_ADDRESS - A structure that can contain a list one or more TA_ADDRESS structures of a mixture of TDI address types.

TDI IP-Specific Network Address Structures

TDI_ADDRESS_IP - Kernel equivalent of Winsock sockaddr_in IP addressing structure. It describes a single IP address, including the four-byte IPv4 Internet Protocol address and port number. A TDI_ADDRESS_IP structure is of TDI_ADDRESS_TYPE_IP.

TA_ADDRESS_IP - A structure that can contain one or more TDI_ADDRESS_IP structures.

When you distill these definitions in the context of IP/TCP they boil down to this:

You will use the TA_ADDRESS_IP structure to manage IP addresses in your driver.

You will need to typecast between TA_ADDRESS_IP and TRANSPORT_ADDRESS when working with some TDI functions.

TDI Device Objects

TDI providers advertise their services by creating named device objects for each protocol that they support. The Microsoft Tcpip provider supports these protocols of most common interest:

TCP Protocol        - \Device\Tcp

UDP Protocol        - \Device\Udp

Raw IP Protocol    - \Device\RawIp

The Toaster TDI client interfaces with the TCP protocol via the \Device\Tcp device object.

TDI File Objects

The TDI design for connection-oriented protocols like TCP uses two different types of file objects to manage each connection:

Transport Address File Object - Used to manage operations on a local IP address and port specified by the TDI client.

Connection Endpoint File Object - Used to uniquely identify and manage one endpoint-to-endpoint connection once it is established.

In highly over-simplified terms:

You must create a TDI transport address file object for each unique IP address and port that you need on the local host.

You must create a TDI connection endpoint file object for each connection that you have open at any point in time.

When you initially create a TDI connection endpoint, it is an orphan structure of no use at all. The connection endpoint must be "associated" a specific transport address. The process of "associating" a connection endpoint says that "this connection endpoint is to be used with this local transport address.

Of course, servers (well, ones that are more complex than our Toaster Server...) may very well want to use one local IP address and port (one TDI transport address...) to handle multiple connections (multiple TDI connection endpoints). In this case multiple connection endpoints would be associated with the one transport address that the server is listening on.

However, the simple Toaster Client and Toaster Server devices require only one transport address and one connection endpoint for their operation, because I made the simplifying assumption that a given client would work with exactly one server identified by a specific port number.

TCP Transport Address File Objects

The process of opening a TCP transport address is described in the DDK documentation Opening a Transport Address.

This is fairly straightforward except for use of the extended attributes (EA) buffer to pass in the IP address specification.

Extended attributes are widely used in file system drivers (a world of its own) but are seldom used in "ordinary" NT/WDM drivers. The extended attributes mechanism provides a way to pass driver-specific supplemental information in the ZwCreateFile call. In the case of opening a TCP transport address, the supplemental information provided in the EA is the local host IP address specification.

The EA buffer is a structure of the type FILE_FULL_EA_INFORMATION. Building the EA buffer for a transport address should be straightforward once your realize that

The EaName is simply the string "TransportAddress" (defined as TdiTransportAddress in TDI.H)

The EaValue is the TA_TRANSPORT_ADDRESS representation of your desired local host IP address and port.

For the Toaster Client we would specify a local address of 0.0.0.0:0 (any local host IP address and any available port).

For the Toaster Server we would specify a local IP address of 0.0.0.0 and a port selected as described earlier.

TCP Connection Endpoint File Objects

The process of opening a TCP connection endpoint is described in the DDK documentation Opening a Connection Endpoint.

The stumbling point for some developers here is deciding what the "context" value should be.

Context is a value that YOU invent. Whatever value you provide as context when creating a connection endpoint will be passed back to you in certain callbacks (e.g., connection-related events) to help YOU process the connection.

Since the Toaster devices have at most one connection per device we can use a pointer to the DeviceExtension (fdoData) as the context.

Building the EA buffer for a connection endpoint should be straightforward once your realize that

The EaName is simply the string "ConnectionContext" (defined as TdiConnectionContext in TDI.H)

The EaValue is a pointer to the DeviceExtension (for Toaster).

Using Chained MDLs to Send RIO Requests and Responses

The process of sending connection-oriented data on a TCP connection endpoint is described in the DDK documentation Sending and Receiving Connection-Oriented Data.

Lets take a brief look at the process of sending user data from the Toaster Client Write routine to the server.

From our discussion of the RIO Protocol we know that we need to send a RIO request header followed by one or more bytes of user data. As written, the toast.exe application writes only one character at a time. For the sake of illustration, assume that it sends a sufficiently large amount of data to warrant using a design that eliminates any unnecessary buffer copy.

The virtual memory for the RIO header can be embedded in the Toaster DeviceExtension. However, calls to send data require use of memory descriptor lists (MDLs). During the allocation phase we must allocate and initialize a MDL that describes the RIO header VM.

Of course, the caller's IRP already contains a probed-and-locked MDL representing the user's data at the IRP MdlAddress field.

The strategy to be used for sending is to build a "MDL chain". The first MDL in the chain describes the RIO Header and its Next field points to the MDL that describes the user data. The Next field of the last MDL in the chain must be set to NULL.

After building the MDL chain call TdiBuildSend to setup the IRP to be used for making the TDI_SEND request. Make sure that the SendLen parameter passed to TdiBuildSend exactly matches the lengths of the buffers described by the two chained MDLs. Use MmGetMdlByteCount to fetch the lengths when calculating SendLen.

Finally, use IoCallDriver to pass the TDI_SEND request to the TCP provider. The two chained MDLs will be transported to the remote Toaster Server.

Using a "Callback-Based State Machine" for Receiving RIO Requests and Responses

The process of receiving connection-oriented data on a TCP connection endpoint is described in the DDK documentation Sending and Receiving Connection-Oriented Data.

TDI provides two methods to receive connection-oriented data:

Event Based - TDI provider "pushes" received data to the client's event receive handlers.

Request Based - TDI clients make receive requests to the provider asking to receive specific amounts of data.

Combinations of these methods can also be used.

For Toaster device TCP receivers we can simplify the design by just using the request based receive method. To use this method the client calls TdiBuildReceive to setup the IRP to be used for making the TDI_RECEIVE. IoCallDriver is then used to pass the TDI_RECEIVE request to the TDI provider.

The parameters to be passed to TdiBuildReceive are fairly straightforward. However, it is easy to overlook the fact that you can specify a different completion routine each time you build a new request. If you notice this opportunity, then you can exploit it to make a "callback-based state machine".

For the Toaster device you can use two different receive callback functions, as outlined below using the Toaster Server as an example:

ReceiveHeaderCallback

Build a TDI_RECEIVE request for data exactly of the size of the RIO request header and ReceiveHeaderCallback as the completion function. When ReceiveHeaderCallback is called the receiver's local request header buffer will have been filled with the RIO request information.

The ReceiveHeaderCallback function examines the RIO request header RequestDataLength field to determine if it is necessary to read user data on the TCP stream.

If RequestDataLength == 0 all information necessary to process the request has been received. Call the appropriate routine to perform final processing on the RIO request.

If RequestDataLength > 0 it is necessary to read the user data on the TCP stream. Make another receive request using the ReceiveDataCallback, as described below.

ReceiveDataCallback

Build a TDI_RECEIVE request for data exactly of the size RequestDataLength and ReceiveDataCallback as the completion function.

When ReceiveDataCallback is called the receiver's local user data buffer will have been filled with user data. Call the appropriate routine to perform final processing on the RIO request.

After final processing of each RIO request restart the receive sequence by making another receive request on the ReceiveHeaderCallback.

In Retrospect...

Looking back from this point I might consider implementing the Toaster Server an upper device filter instead of actually modifying the Toaster Device itself. If I used this approach (and implemented it very systematically), then the resulting kernel-mode networked server could be used with a wider variety of IRP-based WDM devices. For example, the RIO protocol and RIO upper filter could be adapted fairly easily to support a remote WDM modem or similar device.

About the author:

Thomas F. Divine is founder of PCAUSA, a company which has been serving the Windows device driver community since 1992. PCAUSA licenses network device driver samples that illustrate specialized kernel mode programming technologies such an NDIS Intermediate drivers, TDI Clients and a variety of network data filtering techniques.