Technical

Co-location, fast networks, and high-speed NICs - optimizing your electronic trading stack

At Blackcore, we build ultra-fast, reliable servers that provide our clients in the electronic trading industry with an unrivaled competitive edge. We achieve this by overclocking server processors, overclocking the RAM, and innovative liquid cooling. However, the server is only one part of a trading stack, so what is next? Once you have got your hands on the fastest server available, what else can be optimized in your electronic trading stack to reduce latency?

Blackcore Technologies

- 8 min read

Colocation

Most trading applications require their electronic trading servers and the exchange execution gateway to be physically close together to ensure the lowest latency path from the algorithm to the matching engine. This is typically referred to as co-location. Many exchanges provide direct co-location in the form of rack space or dedicated cages in their data centers.  In addition, managed service providers offer a suite of additional services around physical rack space, typically including networking and ancillary services. Third-party data center providers can also provide common data center space with proximity to one or several electronic trading exchanges.

When choosing optimal co-location facilities, trading firms typically consider:

Speed:

Most exchanges will provide somewhat normalized services to trading participants these days, by equalizing cable lengths to rack space and cages. If co-locating with an exchange directly, it’s likely that a trading participant will be on an even playing field with all other participants when it comes to the speed portion that is controllable by physical distance constraints. If choosing to co-locate in a common, non exchange controlled third-party facility, then physical proximity to the exchange handoff becomes an important factor to consider.

Cost:

Many managed service providers lease racks and cages from major exchanges to be able to leverage a shared cost across many different clients with connectivity costs bundled with the physical space. This model has grown in popularity due to the high costs of direct exchange co-location and networking, though while exchange costs are typically set at a premium, they are often regulated so that all trading participants enjoy the same costs. If renting rack space from a third-party data center provider, the additional costs may include networks and comms costs back to the exchange. Generally, the cost model for co-location space tends to be based on the overall power consumption with a cage, rack, or indeed a partial rack.

Operations:

As many trading firms do not have a physical presence and operational teams that are near every trading venue, the ability to operate the trading platform with as little hands-on activity as possible is key. At Blackcore, we've developed a suite of tools designed to make remote management as seamless as possible. With our automation toolkit, clients can easily manage Blackcore servers from anywhere, allowing them to deploy hundreds of units just as effortlessly as one. However, in some cases, clients can require someone on the ground at the data center for troubleshooting and maintenance tasks.

All three co-location models will provide different levels of physical access and services, so it’s worth carefully considering what is needed for a particular site and the services on offer before deciding on where to rack the Blackcore server. When choosing where to place your server there’s a careful consideration between cost, operational capability and how physical proximity to an exchange will impact your latency, and whether that matters to your trading application.

 

Networking

Once you have chosen where to host your Blackcore server, you need to choose how to connect it to an exchange and optimize those paths.

Connectivity:

If you're not co-located at the exchange where you wish to trade—whether due to budget constraints or because your strategy relies on data from multiple exchanges in different locations—wide area networking becomes a key consideration. There are many fiber providers and service providers that provide networking capabilities at various speeds, bandwidths, and latencies between most major trading data centers. In addition, for those that wish to truly optimize the paths between different points, several service providers will provide microwave or millimeter wave links between core trading sites, which bring the advantage of speed between the locations. However, often at the expense of bandwidth and reliability. Once again, this becomes a careful balance, between cost, reliability, and speed.

Local networking:

If you are physically co-located with an exchange, then local networking becomes a key consideration. Assuming you have more than a single server for trading, some form of locale networking will be required, unless, of course, you have an unlimited budget to purchase an exchange handoff for each server! Firms have also been known to leverage custom-length cables within their racks to optimize physical network connectivity or use technologies such as hollow core fiber to gain a further edge. For typical fiber cabling, the rule of thumb is to consider each meter of fiber adding 4 nanoseconds of latency.

The core purpose of a network switch in a trading stack between a server and an exchange is to be able to share that connectivity between multiple trading servers without introducing a large amount of latency. Typically, there are two types of switches to do this:

Layer 2/3 switches

These become a “Swiss army knife” for most firms and can be leveraged for “north <-> south” connectivity between the exchange and trading servers but also for “east <-> west” traffic from server to server, management traffic, and other ancillary network services such as PTP for timing or timestamping or packet aggregation, depending on the model being used. These switches typically are in the range of a few hundred nanoseconds latency introduction, depending on the model and features set required.

Layer 1 switches

While referred to as Layer 1 switches, the term is technically incorrect. No switching happens on Layer 1 devices, only signal replication, usually in a 1:1 or 1:N configuration. Where Layer 1 becomes interesting in the path between an exchange and trading server is when it is paired with an on-board FPGA that is programmed to perform the function of sharing the exchange connectivity (which is N:1), typically referred to as muxing.  Layer 1 + FPGA-based ‘switches’ typically introduce a latency of double-digit nanoseconds, depending on the model and configuration; it is possible to achieve this compared to Layer 2/3 switches by optimizing for the very specific task of sharing an exchange connection and removing all other functionality. By this nature, using L1 between Blackcore servers and exchanges tends to be a slightly more complex configuration with advanced networking knowledge required. 

For local networking, the decision points become a sway between complexity, cost, and functionality.

 

Server NICs & FPGAs

Now that we’ve optimized the placement of the Blackcore server and determined the best communication method with the exchange, we're at the point of connecting the fiber, or DAC (Direct Attached Copper). The question is: what's the best way to link it to a Blackcore server? There are three options when it comes to connecting a server to a network, each with its pros and cons; we’ll now review each in no particular order.

Network Interface Cards (NIC’s)

The goal of a NIC is to communicate between a network switch and the processor on a server, which in electronic trading networks will be using Ethernet.

A NIC will typically be built around an ASIC (application-specific integrated circuit) and perform the task of handling incoming packets and presenting them to the kernel for delivery to a software application in user space for processing. Its tasks typically include things like signal conversion, error detection/correction and buffering data so that nothing is lost between the kernel handling a network packet and the subsequent packet arriving. Over time, and primarily originating with trading applications, the ASICs on some of these NICs were optimized for high performance by reducing buffering, optimizing kernel interrupts, or indeed bypassing the kernel directly to move the network stack to user space and reduce the path between the physical handoff to the NIC and the software application stack. Some of these more advanced functions will require programming to certain APIs or libraries in the software application, and so are not always a “drop-in” solution.

NICs come in different configurations when it comes to connectivity handoffs, port speeds, and performance and can range from hundreds of nanoseconds to microseconds.

For Blackcore servers, there are a range of NICs that are compatible with each of our models, and our team can help choose the one that is right for your application.

Field Programmable Gate Array (FPGA’s)

A Field Programmable Gate Array is like an ASIC, but rather than being application-specific, it may be reprogrammed “in the field” to perform certain functions. We’ve discussed FPGAs in detail in our previous article, which you can read here. The advantage an FPGA can bring within a trading stack is to host logic that will reside entirely on the FPGA and not require the core trading logic to even pass through the PCIe bus to the kernel and user space. Software-based applications can also work alongside an FPGA by having a pre-coded strategy loaded to which the application only updates key parameters as market conditions change, allowing a hybrid approach. FPGA is a common choice for high-Frequency trading, due to its predictable and low-latency characteristics.

Programming FPGAs requires a different level of expertise than software development and has limitations to the types of logic that are optimal for FPGA deployment. However, this has been made easier in more recent years with various development kits and frameworks being made available to simplify the common functions (i.e. for example, TCP, UDP or PCIe stacks). Most applications that leverage Blackcore servers with an FPGA will do so in this type of scenario, which takes advantage of the strengths of both models, high-speed hardware reactions with high-speed software analysis.

When leveraging an FPGA there are optimized commercial solutions that can perform the network stack functions in double-digit nanoseconds within the FPGA, but this does come at the expense of additional complexity compared to typical software development.

FPGA-based Network Interface Cards (a.k.a SmartNICs)

The line between NIC and FPGA has become somewhat blurred from a functionality perspective, a SmartNIC is typically a vendor-provided firmware on a FPGA, which provides an optimized path from the network through to the application stack. This can be in the form of UDP or TCP offload and a PCIe interface, which presents the network packets directly to the application or through the more traditional path via the kernel. In some ways, this is a hybrid of the previous two options, and depending on configuration will have a latency somewhere in between the two.

Historically, SmartNICs have interfaced with the CPU over the PCIe bus, but a new generation of SmartNIC technology leveraging CXL is starting to emerge, which seems likely to be able to reduce latency compared to the path over PCIe. CXL support is available in most modern motherboards and CPUs, like the components used in the Blackcore SPR range.

While NICs can range from as low as $20 through to FPGA cards at $20,000+, cost certainly becomes a factor in the choice of how you choose to interface your Blackcore server with the network. In addition, development expertise, latency profile, and complexity become key decision factors.

 

To conclude...

Choosing each component of an electronic trading stack is not trivial, there are many decision points and factors that go into it. If there was an unlimited budget, it would be easy to choose the lowest latency option for each item in the stack, but the development and optimization of those components can take years to build from scratch. It’s often a factor that the lowest latency offerings for the components discussed above are higher on the price scale than the alternatives, but typically are able to achieve this lower latency because convenience features have been removed to favor performance. The removal of such features typically comes with a level of complexity, which in turn leads to requiring more advanced knowledge of the lower-level components, such as network stacks or operating system kernels, and a deep understanding of how things work, rather than the expectation of simplicity that normally goes with the more wide-ranging generic components that are typically cheaper in cost.

In this article, we have generically covered the core components of a latency-optimized trading stack. However, we have barely touched on the vast variety of technology choices such as redundancy, timestamping & time synchronization, monitoring & management, compliance & data capture, and risk management, to name a few. Every technical decision made will determine the success or the failure of the trading strategy before considering the strategy itself.

Luckily, Blackcore servers deliver an exceptional edge through server performance, quality, and reliability. You can always count on us to be innovating behind the scenes to ensure enterprise-grade configuration, monitoring, and management. Combined with our experience and 100% focus on delivering to the electronic trading industry, this ensures that at least one decision for your trading stack is straightforward and capable of seamlessly being integrated with the other components of your trading stack.

 

Blackcore Technologies

- 8 min read

We use cookies, review our privacy policy here.