IEEE 1394 Address Range Allocation

IEEE 1394 Address Range Allocation

May 1, 2003
Bill McKenzie

Few people would deny the fact that plug-and-play technologies have greatly improved the PC experience. Users today can add or remove peripherals, even disk drives, from their systems at will. Plug-and-play bus technologies such as the Univeral Serial Bus (USB) have made plug-and-play a reality on the PC. USB is arguably the most successful plug-and-play technology to date, evidenced in part by the fact that most PCs now ship with USB ports as standard equipment. However, another plug-and-play technology is making a stir in the PC industry. IEEE 1394 ™, (or just 1394), which gets its name from the IEEE Standard 1394-1995, is a plug-and-play capable, high-speed serial bus similar to USB. 1394, however, demonstrates some potentially big advantages over USB which include higher data rates¹, greater port availability, peer-to-peer communication, and the ability for users to easily create virtual devices that can simulate hardware. The superior qualities of 1394 have historically come at higher hardware, design, and development costs. Recently, however, a significant drop in hardware licensing fees has spurred a subsequent proliferation of cheap 1394 hardware. So, hardware cost is less of an issue today and becomes less all the time. Unfortunately, the development costs for producing 1394 related Windows device drivers have not dropped. One reason for this is that significantly less information (at least correct information) related to 1394 is available to the developer than is available for some of the more established technologies like USB. So, it seems high time to shed some light on a few of the more difficult facets of developing 1394 device drivers for Windows platforms. In that vein, this article represents the first of a multi-part series on 1394 Windows device driver development. I have chosen to start the series with an article on address range allocation. Posters to driver-related newsgroups and list servers seem to stumble over this topic frequently and the related documentation in the DDK is lacking at best. This discussion is not intended to be a complete tutorial, rather a quick, but hopefully helpful tour of some of the more difficult development challenges related to address range allocation. I assume a fair amount of Windows driver experience. Also, some familiarity with the subject matter would not hurt. Before we dive in, we will take a brief moment to step back and look at an overview of the 1394 bus operation.

First, a quick word of thanks. I have worked with Windows 1394 devices off and on for about 3 years now, all the while attempting to get a grasp of this technology. I have run across numerous issues with the 1394 implementation on Windows, and numerous issues with my understanding of this implementation. I can say without hesitation that 1394 would have buried me long ago, (and I might still be stuck trying to get my 1394 digital camera to show me something useful), if it were not for the help of the DDK PSS and 1394 development teams at Microsoft. Particularly, I owe a special thanks to Don Miller who has put up with a LOT of my questions, and with my being a general nuisance. Also, I need to thank Kashif Hasan, who took time to enlighten me on a number of tips and tricks related to the 1394 bus drivers on Windows. Both of these gentlemen have appeared on public forums to assist with 1394 questions as well. Several others at Microsoft have helped along the way as well, and I apologize to anyone I failed to mention. The time I took from these people wasn't accounted for. I hope my relaying some of this information to the Windows driver development community will help pay back the debt I owe to these fine folks. All useful information you might find in this and my future 1394 related articles probably originated with these 1394 gurus. At the same time, any errors remain solely my responsibility. So, here we go.

Overview:

Data transfers on the 1394 bus are carried out between addressable entities on the bus called nodes. The 1394 bus follows the Control and Status Register (CSR) architecture for 64-bit fixed addressing specified in the ANSI/IEEE Standard 1212, 1994 Edition. Each node on the bus is given its own 256 terabyte address space, which contains the node’s initial register space. Each node must provide a configuration ROM which describes the node and which is located in its initial register space. Nodes are typically synonymous with devices on the bus. A single physical device may contain multiple nodes, but in practice few do. Each node provides its own unique 64-bit identification number and is enumerated separately on the bus. Do not confuse nodes with ports, as nodes contain ports. For example, the 1394 bus views a three port host controller as one node on the 1394 bus and all the ports on that host controller share a single common node address space. For more information on 1394 address spaces, configuration ROMs, and unique identifiers see the aforementioned ANSI/IEEE standard 1212, 1994 Edition, and IEEE Standard 1394-1995 specifications.

The two main types of data transfers on the 1394 bus are isochronous transfers and asynchronous transfers.

Isochronous Transfers:

Isochronous transfers guarantee timely delivery of data, but do not in any way guarantee the integrity of that data through the transfer. For situations that require a constant data rate, such as video or audio data transfers, use an isochronous transfer. Isochronous transfers target channels. Channel numbers are 6-bit values, thus there are a maximum of 2^6 or 64 channels per bus. An isochronous transfer uses a single node acting as talker, or data deliverer on a channel and as many nodes on the bus as desired acting as listeners, or data receivers on that channel at any one time. The roles are not fixed, (a different node acting as talker may initiate subsequent transfers on the same channel).

Asynchronous Transfers:

Asynchronous transfers, unlike isochronous transfers, do not guarantee when data will be transferred. Asynchronous transfers do guarantee that data will arrive as sent. We use asynchronous transfers when data integrity is a higher priority than speed. An example might be an image data transfer to a printer, where speed is less critical than getting the image pixels correct. Asynchronous transfers are initiated from a single node, designated the ‘requestor’, to or from the address space of another node, designated the ‘responder’. Asynchronous requests are packet based. The requestor node generates a request packet that the 1394 bus sends to the responder node. The responder node is responsible for handling the request packet and creating a response packet that is sent back to the requestor node to complete a single transfer. The asynchronous transfer model is shown in Figure 1.

There are three types of 1394 asynchronous transfers:

· Read

· Write

· Lock

Read and Write transfers allow reading data from, or writing data to the address space of a remote node, just as the transfer names suggest. The Lock transfer is unique in that it allows a node to access the address range of a remote node in an atomic fashion. Lock transfers can only access 4 or 8 bytes of address space in a single transfer depending on the Lock transfer type.

As with Windows drivers for USB devices, drivers for IEEE 1394 devices don’t communicate with 1394 devices directly, but rather they communicate with a bus driver. The 1394 bus driver is responsible for handling communication on the 1394 bus. A special structure called an IEEE 1394 Request Block, or IRB is used to communicate with the 1394 bus driver. An IRB is built up by the device driver and sent to the 1394 bus driver using an IRP of IRP_MJ_INTERNAL_DEVICE_CONTROL type with an IOCTL value of IOCTL_1394_CLASS. Each IRB is sufficient to describe one 1394 operation which is specified by the FunctionNumber field of the IRB structure. 1394 peripherals are connected to a PC via 1394 cable connection to a 1394 host controller. The Windows PnP manager will enumerate any 1394 device nodes that eventually connect to a host controller on the system no matter where those device nodes are located on the 1394 bus, even device nodes connected behind other nodes.

As shown in Figure 2, when a driver communicates with a 1394 peripheral, via asynchronous transfer, the communication actually takes place between the peripheral node and the 1394 host controller node to which the peripheral is connected. So, if a device driver issues a request for an asynchronous Read, Write, or Lock transfer, the 1394 bus driver translates this request into an asynchronous transfer between the 1394 host controller node (the requestor) and the 1394 peripheral node (the responder).

Some peripherals or virtual devices may expect the host controller node to respond to address space accesses as if it were a specific peripheral device. That is, the host controller node needs to “look like” a particular peripheral device to the remote node. For instance, a 1394 scanner may be capable of sending scan data directly to some particular 1394 printer. In this case, the scanner manufacturer might design the scanner to always expect that it will only communicate with nodes that have the same register layout as that printer. If this scanner is connected to a host controller on a Windows platform, the Windows driver controlling that scanner needs some way of setting up the address space of the host controller to look and act like a printer from the scanner device’s perspective. As we mentioned before, on Windows platforms, the 1394 bus driver stack controls the local host controller node’s address space. Given that there is no way the host controller node or the bus driver could anticipate the register layout required by our scanner driver or any number of other drivers controlling remotely connected devices, the developers of the 1394 bus driver for Windows were savvy enough to expose the REQUEST_ALLOCATE_ADDRESS_RANGE request. The REQUEST_ALLOCATE_ADDRESS_RANGE request allows drivers to setup specific address ranges in the host controller’s address space, in any fashion desired.

REQUEST_ALLOCATE_ADDRESS_RANGE:

A driver sends the REQUEST_ALLOCATE_ADDRESS_RANGE request to the 1394 bus in a manner similar to other 1394 transactions, by constructing an appropriate IRB and sending this IRB down to the 1394 bus driver via an IRP. The IRB parameters of interest for the REQUEST_ALLOCATE_ADDRESS_RANGE request are listed below:

typedef struct _IRB {

  ULONG FunctionNumber;

  union {

    struct {

      PMDL            Mdl;

      ULONG           fulFlags;

      ULONG           nLength;

      ULONG           MaxSegmentSize;

      ULONG           fulAccessType;

      ULONG           fulNotificationOptions;

      PVOID           Callback;

      PVOID           Context;

      ADDRESS_OFFSET  Required1394Offset;

      PSLIST_HEADER   FifoSListHead;

      PKSPIN_LOCK     FifoSpinLock;

      ULONG           AddressesReturned;

      PADDRESS_RANGE  p1394AddressRange;

      HANDLE          hAddressRange;

      PVOID           DeviceExtension;

    } AllocateAddressRange;

  } u;

} IRB;

We cover most of these IRB fields in the following sections.

Access Type:

The driver can specify the type of asynchronous transfers allowed for the allocated address range by setting the u.AllocateAddressRange.fulAccessType field in the IRB. For instance, our scanner driver might need a set of registers that the scanner can write data to but which are never read. The driver in this case will just specify ACCESS_FLAGS_TYPE_WRITE, and the 1394 bus driver will handle any other transaction types to the given address range. The following table, taken from the DDK help, shows the valid values for the u.AllocateAddressRange.fulAccessType field:

Access	Description
`ACCESS_FLAGS_TYPE_READ`	`Allocated addresses can be read.`
`ACCESS_FLAGS_TYPE_WRITE`	`Allocated addresses can be written to``.`
`ACCESS_FLAGS_TYPE_LOCK`	`Allocated addresses can be the target of a lock operation``.`
`ACCESS_FLAGS_TYPE _BROADCAST`	`Allocated addresses can receive asynchronous I/O requests from any node on the bus. (By default, only the device driver's device can send requests to the allocated addresses).`

These values are ORed together to allow Read, Write, or Lock transfers or any combination thereof for a given address range. I talk more about the broadcast flag later.

Access Notification:

If a driver needs notification of, and/or access to the data from asynchronous transfers occurring in the allocated range, the driver can elect to receive notification from the 1394 bus driver in the form of a callback. The u.AllocateAddressRange.fulNotificationOptions field of the IRB specifies the notification type(s) the address range will use. The valid values for the u.AllocateAddressRange.fulNotificationOptions field are:

· NOTIFY_FLAGS_NEVER

· NOTIFY_FLAGS_AFTER_READ

· NOTIFY_FLAGS_AFTER_WRITE

· NOTIFY_FLAGS_AFTER_LOCK

If a driver specifies any value other than NOTIFY_FLAGS_NEVER, a notification callback routine must be specified in the u.AllocateAddressRange.Callback field of the IRB. The notification callback is invoked whenever an appropriate node targets the allocated address range with a transfer of the specified type. Generally, the bus driver will call the notification callback after the transfer has taken place. One exception to this occurs when a driver has elected to provide a response packet for the request. We will talk more about driver provided response packets later.

Address Offsets:

As stated before, some peripheral devices, like our hypothetical scanner, may expect that the nodes it communicates with have a particular register layout. The 1394 bus driver, therefore, allows drivers to specify specific starting address offsets when allocating address ranges. A driver specifies the address offset for an address range in the u.AllocateAddressRange.Required1394Offset field of the IRB. If a driver does not specify a required offset, the bus driver picks a starting offset for the allocated range. In addition, the bus driver may return multiple address ranges to a driver which does not specify a required offset. That is, the value returned in u.AllocateAddressRange.AddressesReturned will be greater than one and the array pointed to by u.AllocateAddressRange.p1394AddressRange will contain u.AllocateAddressRange.AddressesReturned entries. Specifying a required offset guarantees allocation of only one contiguous address range for the request. A driver specifying a required offset must use a quad aligned, (32-bit aligned), address.

Node Accessibility:

As many as 62 nodes could connect to a particular host controller node. Drivers for any or all of these potential nodes can allocate address ranges in the host controller node's address space. Two or more of these nodes might need the same, or overlapping addresses in the host controller's address range. Thus, there is ample opportunity for address range collision among 1394 drivers. To handle this scenario, the bus driver multiplexes the host controller address space, allowing a device driver to essentially overlay an address range with its own storage. Typically, only the device node controlled by a driver can access the address range allocated by that driver. Figure 3 shows an example of two drivers, driver A and driver B, which both request to allocate the same address range in the host node’s address space. When the device that driver A controls attempts to access an address in the allocated range, only driver A is notified. We say typically above, as it is possible for the driver to request notification when any remote node accesses a range of addresses in the host node’s address space. Requesting notification of all accesses to an allocated memory range is known as broadcast notification. Broadcast notification occurs when the address range is allocated with the ACCESS_FLAGS_TYPE_BROADCAST flag mentioned earlier. A driver might want to set the broadcast flag is if it is communicating with multiple nodes. A virtual serial port driver for example might need to allocate a range with broadcast notification so that it can communicate with other virtual serial ports anywhere on the bus.

IMPORTANT NOTE: When using the ACCESS_FLAGS_TYPE_BROADCAST flag, understand that the Windows 1394 bus driver will only hook up the notification routines for the first driver requesting broadcast notification for an allocated address range. This is especially important for virtual devices created on XP and later, (created using the IOCTL_IEEE1394_API_REQUEST IOCTL), as those devices must have this flag set for their allocated address ranges to work properly. Any driver which allocates an address range with broadcast notification will not work as expected is that address range has already been allocated by another driver using broadcast notification.

Backing Store Methods:

As previously stated, drivers typically allocate address ranges for notification of and to possibly participate in asynchronous transfers from remote devices. Upon allocation, the device driver is responsible for providing data storage, or backing store, for the allocated address ranges. A device driver can specify backing store in one of three ways:

· It provides an MDL for backing store.

· It provides a singly linked list of ADDRESS_FIFO elements, (a list of MDLs essentially), for backing store.

· It provides no backing store for the address range and instead elects to handle the request and response packets directly.

These methods are mutually exclusive.

MDL Backing Store

If the driver provides an MDL for backing store, then the 1394 bus driver, depending on the asynchronous transfer type, uses the buffer described by the MDL as the location to read or write the transfer data for this node. As already stated, the driver allocating the address range can specify the asynchronous transfer types the range will allow. Using a single MDL is probably the easiest method for handling backing store for allocated address ranges. The driver specifies the MDL backing store method by filling in the u.AllocateAddressRange.Mdl field of the IRB to point to the MDL describing the storage buffer, and setting the u.AllocateAddressRange.FifoSListHead, and u.AllocateAddressRange.FifoSpinLock fields to NULL. A driver might use an MDL for backing store for an address range that is "read-only", and for which the data does not change. For example, our scanner might expect the printer to have a serial ID register. An ID register would be a good candidate for an address range with an MDL as backing store.

In certain cases, a driver should not use an MDL for backing store. There is no way from the device driver to lock access to the MDL buffer. If the address range allows asynchronous transfers to read data from the MDL buffer, and the driver needs the ability to update the contents of this MDL buffer at runtime, no safe way to restrict access to the buffer to allow the update exists. The developer could use some type of handshaking mechanism between the remote node device and driver to circumvent the problem. Be aware, however, that no inherent mechanism exists to synchronize access to the backing store between the driver allocating an address range, and the entities that access that address range.

A more serious problem can occur if the address range allows transfers that modify the contents of the MDL buffer. Again, no way exists to lock access to the MDL buffer, so data loss may occur. For example, say an allocated address range allows asynchronous Write transfers. The driver allocating the address range will likely want to get at the data for each Write transfer received by this range. If, however, a new Write transfer occurs before the driver has had a chance to access all of the data from a previous Write transfer, the new transfer can overwrite the old data. Again, the driver can use handshaking or some other mechanism to circumvent the problem, but no inherent mechanism in Windows 1394 support synchronizes access to the backing store for an address range. These limitations can make using an MDL for backing store much less desirable than it might otherwise seem.

FIFO List Backing Store

A driver can use a singly linked list of ADDRESS_FIFO structures, called a FIFO list in the DDK documentation², for backing store fairly easily as well. The FIFO list backing store method is selected by setting the u.AllocateAddressRange.Mdl field in the IRB to NULL, setting the u.AllocateAddressRange.FifoSListHead to an initialized singly linked list, and setting u.AllocateAddressRange.FifoSpinLock to an initialized spinlock. The FIFO list is a singly linked list of ADDRESS_FIFO structures. The ADDRESS_FIFO structure is shown below:

typedef struct _ADDRESS_FIFO {

  SINGLE_LIST_ENTRY  FifoList;    // Singly linked list

  PMDL               FifoMdl;     // Mdl for this FIFO element

} ADDRESS_FIFO, *PADDRESS_FIFO;

We use the FifoList field in this structure to allow placement of the ADDRESS_FIFO element in a linked list. The FifoMdl field points to an MDL that will handle data for the request. Each appropriate transfer to the allocated address range from an appropriate node causes the 1394 bus driver to pop an entry from the FIFO list, and use the MDL buffer associated with that entry to read or write the transfer data. With a list of buffers, the driver no longer has to worry about getting the data before another write transfer comes in. But, the driver does need to ensure that the FIFO list stays populated with enough entries to handle the flow of incoming requests. In general, "write-only" address ranges are the most useful case for a FIFO list backing store.

Allocated address ranges that allow read operations probably should not use a FIFO list for backing store. Using a FIFO list for readable addresses is impractical as the bus driver pops a new MDL from the list for each transfer. So in this case, every MDL in the FIFO list would potentially have to contain the same read data, and thus the driver would have to copy the read data to any MDL that it wanted to add to the list. Additionally, if this read data needed to change at runtime, every MDL in the list would have to be modified. Accessing the entire list would be grossly inefficient and there is no way to synchronize access to the FIFO list³.

Another potential shortcoming of FIFO lists involves the developer's inability to pull entries out of the list at will. It is not at all uncommon for a driver to receive MDLs from Read,Write, or IOCTL requests. If MDLs associated with these types of requests are used in a FIFO list as backing store for an allocated address range, the time to completion of the requests is indeterminate. So, these requests must be cancelable. A cancelable request would require it's associated MDL be pulled from the FIFO list upon cancel. There is no DDK-supplied method that allows pulling requests out of the middle of a singly linked list. And, as already mentioned, there is no synchronization method that allows safe access to the FIFO list. So, any specific entry that needs to be pulled from the FIFO list necessitates clearing all entries from the list using ExInterlockedFlushSList(). Then, all but the desired entry have to be pushed back onto the list somehow. In addition to being highly inefficient, this method is potentially unsafe. Incoming transfers may get dropped while the list is empty and if MDL order is important, it may not be preserved while pushing ADDRESS_FIFO elements back onto the list due to the lack of synchronization.

No Backing Store

The use of no backing store, which remains by far the most difficult and least documented address range backing store implementation method, is also the most versatile. To use this backing store option, the driver sets the u.AllocateAddressRange.Mdl, u.AllocateAddressRange.FifoSListHead, and u.AllocateAddressRange.FifoSpinLock IRB members to NULL, and specifies a notification routine in the Callback member of the IRB. If neither a MDL nor a FIFO list is specified in the IRB, the notification options specified in the u.AllocateAddressRange.fulNotificationOptions field of the IRB are ignored and a notification routine is required. This notification routine is then called on receipt of every appropriate transfer to the allocated address range. With this method, the 1394 bus driver calls the device driver’s notification routine with the transfer’s request packet specified in the NOTIFICATION_INFO parameter at the time of receipt of that request packet. The device driver is then responsible for creating a response packet which is handed back to the bus driver and sent back to the requestor node to complete the transfer. By having full and timely access to the request and response packets, the device driver can fully synchronize access to any data associated with the address range. Also, the driver can determine useful information such as what node the request came from which is not possible with the other backing store methods. This backing store method is useful for any type of read/write/lock capable registers needed. Unfortunately, the pitfalls in the no backing store approach are numerous and deep. Also, the DDK documentation on this topic is scarce at best. The next few sections attempt to point out some of the issues involved with this method, and to demonstrate how to use this method in a driver.

Allocating An Address Range:

So, let's finally look at some code for allocating an address range and at the same time walk through what needs to be done in our code to handle our own response packets. The following code snippet shows the necessary steps needed to allocate an address range using no backing store.

PIRP pIrp; IRB irb; ULONG allocationSize = 10 * sizeof(ULONG); ADDRESS_RANGE range; KEVENT event; IO_STATUS_BLOCK ioStatus; PIO_STACK_LOCATION pNextIrpStack; // request our address range with no backing store // setup the IRP for the request pIrp = IoBuildDeviceIoControlRequest(

                        IOCTL_1394_CLASS,                         pDeviceExtension->pNextDeviceObject,                         NULL,                         0,                         NULL,                         0,                         TRUE,                         &event,                         &ioStatus); if(NULL == pIrp) {     return STATUS_INSUFFICIENT_RESOURCES; } irb.FunctionNumber = REQUEST_ALLOCATE_ADDRESS_RANGE;

// no MDL irb.u.AllocateAddressRange.Mdl = NULL;

// no FIFO list irb.u.AllocateAddressRange.FifoSListHead = NULL; irb.u.AllocateAddressRange.FifoSpinLock = NULL;

// our notification callback irb.u.AllocateAddressRange.Callback = RangeNotificationRoutine; // allow reads and writes to this range irb.u.AllocateAddressRange.fulAccessType =

                                ACCESS_FLAGS_TYPE_READ |

                                    ACCESS_FLAGS_TYPE_WRITE; // this really doesn't matter, ignored for no backing store case irb.u.AllocateAddressRange.fulNotificationOptions =

                                NOTIFY_FLAGS_AFTER_READ |

                                    NOTIFY_FLAGS_AFTER_WRITE; // range will be 10 quadlets long (40 bytes) irb.u.AllocateAddressRange.nLength = allocationSize; // our context, we use the driver's device extension here irb.u.AllocateAddressRange.Context = pDeviceExtension; // our required offset, just picked a random/reasonable value.

// Must be quad aligned irb.u.AllocateAddressRange.Required1394Offset.Off_High =

                                INITIAL_REGISTER_SPACE_HI; irb.u.AllocateAddressRange.Required1394Offset.Off_Low =

                                INITIAL_REGISTER_SPACE_LO | 0xA000000; // no backing store guarantees only one address range irb.u.AllocateAddressRange.p1394AddressRange = &pDeviceExtension->Range; // no max segment size irb.u.AllocateAddressRange.MaxSegmentSize = 0; // not big endian irb.u.AllocateAddressRange.fulFlags = 0;

// send the IRB down and wait on return if necessary pNextIrpStack = IoGetNextIrpStackLocation(pIrp); pNextIrpStack->Parameters.Others.Argument1 = &irb; KeInitializeEvent (&event, NotificationEvent, FALSE); status = IoCallDriver(pDeviceExtension->pNextDeviceObject, pIrp); if(status == STATUS_PENDING) {     KeWaitForSingleObject (&event, Executive, KernelMode, FALSE, NULL);     status = ioStatus.Status; } ASSERT(NT_SUCCESS(status));

Notice that the MDL and FIFO list fields of the IRB have been set to NULL indicating that this address range will use no backing store and thus the driver is electing to provide response packets for all asynchronous transfers to this address range. Looking at the u.AllocateAddressRange.fulAccessType setup one can see that this allocated range, if successful, will allow Read and Write asynchronous transfers. But, notice that the ACCESS_FLAGS_TYPE_BROADCAST flag is not set. Without this flag the only node able to communicate with our driver through this address range is the node associated with the driver's device. It might be important to reiterate here that drivers for virtual devices on XP and later platforms require the ACCESS_FLAGS_TYPE_BROADCAST flag be set when allocating address ranges. These virtual 1394 devices are not associated with any real or particular remote peripherals so the 1394 bus driver will just consume any requests coming into the address range unless broadcast is specified. The starting offset chosen for this address range, was chosen somewhat arbitrarily. Notice that the starting offset of the address range is quad (32-bit) aligned. Again, the length of the address range, specified in the IRB's u.AllocateAddressRange.nLength field, was chosen arbitrarily. The DDK documentation should adequately cover any fields not specifically covered in this discussion.

Notification Callbacks And Response Packet Handling:

Next we look at the notification routine for our address range allocated above. The code gets interesting here, so don't fall asleep yet.

VOID RangeNotificationRoutine(PNOTIFICATION_INFO pInfo) {     PMY_OHCI_ASYNC_PACKET pRequestPacket;   // the async packet     KIRQL                 oldIrql;          // our current IRQL     PRESPONSE_CONTEXT     pResponseContext; // our response context pointer     ULONG                 responseSize = 0; // our response packet size     ULONG                 offset;           // offset into the address range     // get the request packet and find out where it came from     pRequestPacket = (PMY_OHCI_ASYNC_PACKET)pInfo->RequestPacket;     // check for a response MDL before we do anything else.     if(NULL == pInfo->ResponseMdl)     {         // nothing to do, just return and the packet will timeout         return;     }     if((pRequestPacket->OHCI_tCode == TCODE_WRITE_REQUEST_BLOCK) ||            (pRequestPacket->OHCI_tCode == TCODE_READ_REQUEST_BLOCK))     {         // get the write response context         pResponseContext =             (PRESPONSE_CONTEXT)ExAllocatePoolWithTag(

                                        NonPagedPool,                                         sizeof(RESPONSE_CONTEXT) +                                             pRequestPacket->u3.Block.OHCI_Data_Length,                                         'PMAS');     }     else     {         // get the write response context         pResponseContext =

            (PRESPONSE_CONTEXT)ExAllocatePoolWithTag(

                                        NonPagedPool,                                         sizeof(RESPONSE_CONTEXT),                                         'PMAS');     }     if(NULL == pResponseContext)     {         // nothing to do, just return and the packet will timeout         return;     }     // allocate a response work item to handle cleanup of the response packet     pResponseContext->pResponseWorkItem =

                        IoAllocateWorkItem(pDeviceExtension->pDeviceObject);     if(NULL == pResponseContext->pResponseWorkItem)     {         // free our allocation         ExFreePool(pResponseContext);         // nothing to do, just return and the packet will timeout         return;     }     // make a temp pointer for clarity     PMY_OHCI_ASYNC_PACKET pResponsePacket =

                    &pResponseContext->ResponsePacket;     // set our response packet buffer into the info struct     *pInfo->ResponsePacket = (PVOID)pResponsePacket;     // copy the request packet over to the response     // packet buffer and modify     RtlCopyMemory(pResponsePacket,                   pRequestPacket,                   sizeof(MY_OHCI_ASYNC_PACKET));     // setup the response packet

    // set an unsuccessful response code in the packet so

    // that we fail if we don't finish     pResponsePacket->u2.Response.OHCI_Rcode = RCODE_TIMED_OUT;     // The source ID from the request packet     // is actually used as the destination address     // for the response. The     // actual source ID for the response packet is

    // filled in by the 1394 bus driver     // calculate the speed for the response packet     pResponsePacket->u.Tx.OHCI_spd =                     GetRequestPacketSpeed(pRequestPacket);     // get the offset into the address range     offset = pRequestPacket->OHCI_Offset_Low &                 ~(INITIAL_REGISTER_SPACE_LO | 0xA000000);     // check for request type.     switch(pRequestPacket->OHCI_tCode)     {         case TCODE_WRITE_REQUEST_QUADLET:             // copy the quadlet data to our receive buffer.             // Offset into the buffer according to the offset             // specified in the request packet             RtlCopyMemory((PCHAR)pDeviceExtension->pReceiveBuffer + offset,                           (PCHAR)&pRequestPacket->u3.OHCI_Quadlet_Data,                           sizeof(ULONG));             // indicate this is a write response packet             pResponsePacket->OHCI_tCode = TCODE_WRITE_RESPONSE;             // set the response buffer size             responseSize = sizeof(MY_OHCI_ASYNC_PACKET);             break;         case TCODE_WRITE_REQUEST_BLOCK:             // copy the block data to our receive buffer.             // Offset into the buffer according to the offset             // specified in the request packet             RtlCopyMemory(

                (PCHAR)pDeviceExtension->pReceiveBuffer + offset,                 (PCHAR)pRequestPacket + sizeof(MY_OHCI_ASYNC_PACKET),                 pRequestPacket->u3.Block.OHCI_Data_Length);             // indicate this is a write response packet             pResponsePacket->OHCI_tCode = TCODE_WRITE_RESPONSE;                 // set the response buffer size             responseSize = sizeof(MY_OHCI_ASYNC_PACKET)                                     + pRequestPacket->u3.Block.OHCI_Data_Length;             break;         case TCODE_READ_REQUEST_QUADLET:             // copy the quadlet data from our write buffer to the             // response packet. Offset into the write buffer according             // to the offset specified in the request packet             RtlCopyMemory((PCHAR)pResponsePacket + sizeof(MY_OHCI_ASYNC_PACKET),                           (PCHAR)pDeviceExtension->pSendBuffer + offset,                           sizeof(ULONG));             // indicate this is a read block response packet             pResponsePacket->OHCI_tCode = TCODE_READ_RESPONSE_QUADLET;             // set the response buffer size             responseSize = sizeof(MY_OHCI_ASYNC_PACKET)                                   + sizeof(ULONG);             break;

        case TCODE_READ_REQUEST_BLOCK:             // copy the block data from our write buffer to the             // response packet. Offset into the write buffer according             // to the offset specified in the request packet             RtlCopyMemory((PCHAR)pResponsePacket + sizeof(MY_OHCI_ASYNC_PACKET),                           (PCHAR)pDeviceExtension->pSendBuffer + offset,                           pRequestPacket->u3.Block.OHCI_Data_Length);             // indicate this is a read block response packet             pResponsePacket->OHCI_tCode = TCODE_READ_RESPONSE_BLOCK;             // set the response buffer size             responseSize = sizeof(MY_OHCI_ASYNC_PACKET)                                     + pRequestPacket->u3.Block.OHCI_Data_Length;             break;         default:             // free our work item             IoFreeWorkItem(pResponseContext->pResponseWorkItem);             // free our response context             ExFreePool(pResponseContext);             // we only handle reads and writes for this range, so             // we should never get here. If we do, just return             // and the transfer will timeout             return;     }     // initialize the info struct's response MDL     // with our response buffer     MmInitializeMdl(pInfo->ResponseMdl,                     pResponsePacket,                     responseSize);     // map the pages     MmBuildMdlForNonPagedPool(pInfo->ResponseMdl);     // set the response length     *pInfo->ResponseLength = responseSize;     // initialize the response event     KeInitializeEvent(&pResponseContext->ResponseEvent,                       NotificationEvent,                       FALSE);     // set our response event as the response event in the info struct     *pInfo->ResponseEvent = &pResponseContext->ResponseEvent;     // queue a work item to handle response packet cleanup     IoQueueWorkItem(pResponseContext->pResponseWorkItem,                     ResponseCleanup,                     DelayedWorkQueue,                     pResponseContext);

    // set a successful response code     pResponsePacket->u2.Response.OHCI_Rcode = RCODE_RESPONSE_COMPLETE; }

Setting the response packet speed:

The request packet for the asynchronous transfer is located in the RequestPacket field of the NOTIFICATION_INFO parameter passed into our callback. The routine above uses much of the information in the request packet to set up our response packet. In fact, as a starting point the code copies the request packet over to the response packet storage 'as is'. Notice the odd named structure MY_OHCI_ASYNC_PACKET being used for the request and response packets. MY_OHCI_ASYNC_PACKET is used here instead of the DDK supplied ASYNC_PACKET structure. This new structure, whose contents are listed below, is actually a copy of a non-released structure used in the 1394 bus driver called OHCI_ASYNC_PACKET. The packet structure was given a unique name here (although not a good one) so that if the OHCI_ASYNC_PACKET structure is ever released in a future DDK, the sample code here won't clash in the build by attempting to redefine an existing structure. The good folks on the 1394 team at Microsoft were kind enough to let me make this structure public here for the first time. The reason the OHCI_ASYNC_PACKET structure is necessary is to allow us access to some otherwise undocumented fields in the packet. There is a subtle bug in the 1394 bus driver for drivers providing their own response packets such that the bus does not set the speed of the response packet correctly. So, a device driver must determine the required speed and fill out the response packet accordingly. Unfortunately, the DDK version of the packet structure specifies the u.Tx.OHCI_spd field as a reserved field. So, MY_OHCI_ASYNC_PACKET actually overlays the ASYNC_PACKET structure defining a few extra fields. You will notice in the code above that the function GetRequestPacketSpeed() is called to calculate the response packet speed from the given request packet. It might seem that the speed in the request packet could just be copied over to the response packet. Unfortunately, the request packet speed is obliterated before the request packet makes it to the notification routine. So how is the response packet speed calculated? Well, the 1394 bus driver itself normally goes through some pretty ugly, and undocumented, gyrations with the request packet to get the speed value. An easier method for us might be to determine the link speed between the nodes of interest and use that. Asynchronous transfers on the 1394 bus generally execute at the highest possible speed for a given link. The 1394 bus provides a map of all links on the bus and the maximum speeds of those links in a map called, appropriately enough, a speed map. The speed map can be obtained using the DDK supplied REQUEST_GET_LOCAL_HOST_INFO request with the GET_HOST_CSR_CONTENTS flag. Search in the DDK help for SPEED_MAP_LOCATION for more information.

IMPORTANT NOTE: While I suggest here, that using the lowest link speed between any two nodes is sufficient, this actually could violate the 1394 specifications. It is not required that two nodes communicate at the highest possible speed for their link. For example, using a speed of 400 Mb/s for the response packet for two directly connected 400Mb/s nodes may not be correct if those nodes were actually communicating at 200Mb/s. The response packet should always be sent at the speed at which the request packet is sent. Unfortunately, today, there is no way to determine the request packet speed in a 1394 Windows device driver. Most examples just copy the request packet to the response packet without modifying the speed. This can result in response packet speeds that exceed the link capabilities. So, while the suggested solution here is not ideal, it is at least as good as copying the request packet to the response packet, and actually quite a bit better as the correct speed will be hit most of the time. Copying the request packet speed to the response packet yields random results and is more often than not incorrect. This hasn't been widely reported or even noticed as most devices are 400 Mb/s capable, and so a slower response packet would not be noticed. I only caught this because I was using a 1394 analyzer when working on an unrelated issue.

Getting the speed for the response packet is left as an exercise for the reader. Be aware that the speed map can change with each 1394 bus reset, and thus the device driver needs to, at a minimum, update the speed map information for each 1394 bus reset if the above recommended solution is used.

typedef struct _MY_OHCI_ASYNC_PACKET {

USHORT OHCI_Reserved3:4; USHORT OHCI_tCode:4; USHORT OHCI_rt:2; USHORT OHCI_tLabel:6; union { struct { NODE_ADDRESS OHCI_Destination_ID; // 1st quadlet } Rx; struct { USHORT OHCI_spd:3; // 1st quadlet USHORT OHCI_Reserved2:4; USHORT OHCI_srcBusId:1; USHORT OHCI_Reserved:8; } Tx; } u; union { USHORT OHCI_Offset_High; struct { USHORT OHCI_Reserved2:8; USHORT OHCI_Reserved1:4; USHORT OHCI_Rcode:4; } Response; } u2; union { struct { NODE_ADDRESS OHCI_Destination_ID; // 2nd quadlet } Tx; struct { NODE_ADDRESS OHCI_Source_ID; // 2nd quadlet } Rx; } u1; ULONG OHCI_Offset_Low; // 3rd quadlet union { struct { USHORT OHCI_Extended_tCode; USHORT OHCI_Data_Length; // 4th quadlet } Block; ULONG OHCI_Quadlet_Data; // 4th quadlet } u3;

} MY_OHCI_ASYNC_PACKET, *PMY_OHCI_ASYNC_PACKET;

Handling the response packet storage:

After getting the request packet, and checking that a few things are in order, our notification routine allocates a response context structure. The format of the response context structure is shown below.

typedef struct _RESPONSE_CONTEXT { PIO_WORKITEM pResponseWorkItem; KEVENT ResponseEvent; MY_OHCI_ASYNC_PACKET ResponsePacket; } RESPONSE_CONTEXT, *PRESPONSE_CONTEXT;

It should be noted here that the response context structure and the notification routine used in our example here are just an example of how a driver might provide a response packet. The purpose of the RESPONSE_CONTEXT structure is really to provide two key things to the driver, the response packet storage and the response event. The allocation of non-paged response packet storage is left to the device driver when handling its own response packets. The driver is responsible for cleanup of this response packet storage as well. The cleanup is what necessitates the other members of our context structure. The driver provides a response event that is passed to the 1394 bus driver along with the response packet in the NOTIFICATION_INFO structure. The 1394 bus driver signals the response event when it is done with the response packet letting the driver know that it can release the response packet storage. In our example, we allocate a work item along with the response packet and response event which we use to wait on the response event and to cleanup our allocation. Again, our example here is definitely not, nor is it intended to be the most efficient solution for handling response packets.

The allocation of the response context structure in the code above may look a bit strange. We check the transfer code, "tcode" of the request packet to see what type of request we have received and based on the transfer type we allocate different sizes for the structure. The reason for the different sized is due to the way the 1394 bus packages asynchronous transfer data with the asynchronous transfer packets. So called quadlet transfers, or transfers of four bytes (32-bits), store the transfer data in the u3.OHCI_Quadlet_Data field of the packet structure. The u3.OHCI_Quadlet_Data field is part of a union with a block structure. For non-quadlet transfers, that is transfers of more or less than four bytes, the transfer data length is stored in the OHCI_Data_Length field of the block structure. The transfer data for the non-quadlet, or block transfer, case is placed in the packet storage immediately following the packet structure. So, an asynchronous packet may be greater in size than the size of the MY_OHCI_ASYNC_PACKET structure. Thus, it is critical that the response packet be the last item stored in the RESPONSE_CONTEXT structure above. For block transfer types, our response context allocation code pads the response context storage with the data size of the requested transfer obtained from the request packet. Padding the storage ensures that we have enough storage space for the response packet and its associated transfer data.

Handling the data:

The next thing to notice in our notification routine involves the handling of the data buffers themselves. As already noted, our address range can receive multiple transfer types. Write transfers require us to copy data out of the request packet. Read transfers require us to copy data into the response packet. In addition, as we mentioned a moment ago, where the we store the data differs for quadlet and block type transfers. So, for TCODE_WRITE_REQUEST_QUADLET type transfers we copy the transfer data out of the u3.OHCI_Quadlet_Data field of the request packet. For TCODE_WRITE_REQUEST_BLOCK type transfers the Write data is located in the request packet storage just following the request packet itself. Now for Read requests we must handle the process a bit differently. For TCODE_READ_REQUEST_BLOCK type transfers we copy our response data to the response packet storage buffer immediately following the response packet and everything works as expected. For TCODE_READ_REQUEST_QUADLET type transfers, however copying our response data to the u3.OHCI_Quadlet_Data field of the response packet does not seem to work. Instead, as shown in the code, we treat the quadlet data for reads just like we treat block data. Then, everything seems to work fine. Keep in mind, if the driver does not handle its own response packets, the bus driver takes care of data handling automagically using the driver provided backing store MDL(s). In that case, the bus driver fills in the ulOffset member of the NOTIFICATION_INFO structure to let the user know where data in the MDL has been accessed.

IMPORTANT NOTE: Unfortunately, the developer must be aware of one more issue related to using the no backing store option with allocated address ranges. Read requests for data sizes larger than 128 bytes on 32-bit platforms, and 256 bytes on 64-bit platforms are mishandled by the 1394 bus driver (up through Windows Server 2003) when the response packet is created by the driver. The data returned in the response packet is corrupted. There isn't a good work around for this issue that I know of aside from just limiting Read request sizes to 128 bytes, thus probably limiting allocated address ranges to 128 bytes in length. This issue has been reported to Microsoft, but as of yet no official resolution has been forwarded.

Summary:

An extremely powerful mechanism has been given to the 1394 Windows device driver developer in the ability to allocate ranges in a host controller node’s address space. The driver can make a host controller node's address space look like almost any hardware desired to remotely connected peripherals. Virtual 1394 devices can even exploit the address range mechanism to simulate hardware. These unique abilities, among some of the other features of the 1394 bus mentioned before, give 1394 a strong leg up on some other serial bus technologies on Windows platforms. Hopefully, this article has helped guide the 1394 developer through some (certainly not all) of the dark and murky waters surrounding 1394 particularly the area of address range allocation. I look forward to providing more useful 1394 tips and tricks in upcoming articles. Please send me your feedback.

About the author:

Bill McKenzie has been developing system level software for over six years, including over four years experience developing device drivers for Windows platforms. His primary background and current efforts are in the development of software products targeted for Windows device driver developers. [Ed. note: Bill is one of the industry's leading experts on 1394 drivers, but he's too modest to say so himself. He's worked for several of the driver tools companies you've heard of.]

¹IEEE Standard 1394-1995 and IEEE Standard 1394a-2000, known collectively as 1394a specify data rates up to 400Mb/s which is much higher than the 12Mb/s maximum data rate specified in the USB 1.1 specification. While the USB 2.0 specification does detail data rates up to 480Mb/s, slightly exceeding 1394a rates, IEEE Standard 1394b details potential data rates reaching up to 3200Mb/s. However, 1394b is not yet official and thus not currently supported by Windows platforms, while USB 2.0 is supported. In practice, 1394 devices can typically out perform their USB 2.0 counterparts today as USB 2.0 hardware and software have not yet matured.

²Although the singly linked list of ADDRESS_FIFO elements used with REQUEST_ALLOCATE_ADDRESS_RANGE is called a FIFO list everywhere in the DDK documentation, access to the list is controlled only by ExInterlockedPushEntrySList() and ExInterlockedPopEntrySList(), which actually makes the list entry usage order last in first out. So, the list is actually a LIFO list.

³The astute reader may think here that the FIFO list spinlock would be a good candidate to use to synchronize access to the FIFO list. While this spinlock may seem reasonable, and it would be on 2000 and earlier platforms, some of the DDK provided singly linked list helper functions like ExInterlockedPushEntrySList() have been updated on XP and later platforms to not use the associated list spinlock for efficiency reasons. These updates are bypassed in a driver build if _WIN2K_COMPAT_SLIST_USAGE is defined. A quick run of dumpbin on the XP (and later) version of 1394bus.sys indicates that the bus driver was not built using this compatibility define and which leaves us with no suitable synchronization.