An introduction to telemetry

Telemetry is a means by which data is collected from remote or inaccessible devices, or to be more accurate, a means by which it is transmitted by those devices to receivers. We have a lot of devices in the Edge network which operate in a similar fashion; which sit in all manner of inaccessible locations and operate on an egress-only basis. This article will introduce you to how we collect measurements from these devices.

Lead article image

🔗Behind the firewall

The Edge Network is comprised of three device types: Stargates, Gateways, and Hosts. The latter devices—Hosts—operate on such a variety of networks, with such a variety of performance and restrictions, that we have to be able to receive device measurements from them as and when the devices are online and able to report, regardless of limitations such as firewalls or patchy connectivity.

All Hosts run a background process called the Telemetry Client which sidesteps these restrictions by broadcasting to the network’s telemetry receivers over HTTPS. Both HTTP on port 80 and HTTPS on port 443 are generally open in firewall egress ruletables, which is the basis on which the network runs it’s gRPC connections between Hosts and Gateways.

Image

🔗Collecting the metrics

The Telemetry Client collects basic system measurements such as load averages, memory usage, disk space utilisation and network activity. It also collects some other less interesting statistics such as system fork count, active process count, and established TCP connection counts.

This data is then packaged up and sent securely to the network’s telemetry receiver servers, which sit outside of the main network infrastructure — critical to remaining accessible during possible outages — before being verified and stored alongside the metrics of all other devices.

If a device is online but not connected to the Internet, then no data is recorded for that device. If a device is online but has a patchy connection, only the successfully broadcast metrics are stored for that device. This allows us to monitor both the network and individual devices accurately, showing a history of network condition for every second of every day.

Image

🔗Processing the data

Once the data is received by the network’s telemetry receiver servers, it is verified to ensure that it is from the correct device and has not been modified along the way. This is done by signing each payload with a per-device session secret.

The data is then stored in Prometheus, from which it is fed into a number of other services. One of these is Grafana, which we use for our internal interfaces and monitoring. At any one point we are able to see the total load and average load of the network, traffic and bandwidth statistics, and much more besides.

Image

The screencap above shows a few metrics for a small portion of the testnet collected over a 24 hour period, with data collected every 5 seconds.

🔗What’s next?

With so much data the possibilities of how to use it and visualise it are near endless. As you may have seen, we recently launched the first iteration of the Edge Explorer, which shows you some cool statistics such as the number of online Stargates, Gateways, and Hosts, the size of the edge cache and edge storage, as well as a list of devices.

By utilising the data we collect from the telemetry service, we’re going to be able to display, in real time, the cumulative network load, memory and storage utilisation.

Image

In addition to this we’ve already seen some great benefits in detecting and debugging issues on the testnet. The overview that the telemetry service layer provides us really is invaluable.

Thanks for reading! And stay tuned, as I’ll be writing much more about this stuff.

Related articles

More knowledge articles
Mail icon

Like what you are reading? Want to earn tokens by becoming a DADI Node? Save money on cloud computing services? Build amazing digital product with DADI Web Services? Join our mailing list.

To hear about our news, events and products or services subscribe now. You can also indicate which services you are interested in, which we use for research and to inform the content that we send generally.

* You can unsubscribe at any time by emailing us at data@dadi.cloud or by clicking on the unsubscribe link which can be found in our emails to you. Read our Privacy Policy.