Streaming Telemetry and Monitoring

This article details how I implemented streaming telemetry and monitoring on my network architecture using SNMP, Syslog, gRPC, NETCONF, and SQLite integration. Below, I explain the tools, setup, and scripts used for real-time data collection, logging, and storage.

1. SNMP Configuration

SNMP was configured on the Arista cEOS devices to poll CPU utilization metrics periodically.

snmp-server local-interface Ethernet2.100
snmp-server community private rw
snmp-server community public ro
snmp-server host 192.168.100.2 version 2c public
snmp-server enable traps snmp link-down
snmp-server enable traps snmp link-up

I used a monitor_cpu.sh script to poll the OID 1.3.6.1.2.1.25.3.3.1.2 for CPU utilization and store the results in a SQLite database (logs.db) under the cpu_utilization table.

2. SNMP Traps

I configured SNMP traps to monitor link changes. Traps were captured using the capture_snmp_traps.sh script, which listens on port 162 and stores logs in the SQLite database under the snmp_traps table.

systemctl enable monitor_cpu
systemctl start monitor_cpu

3. Syslog Configuration

Syslog was enabled on devices to send critical logs to the NMAS server. I configured rsyslog on the server to sort logs into individual files based on the source IP.

# Sample rsyslog configuration
if $fromhost-ip == '192.168.100.5' then /var/log/netman/192.168.100.5.log
if $fromhost-ip == '192.168.100.6' then /var/log/netman/192.168.100.6.log
& stop

4. gRPC and NETCONF Telemetry

I enabled gRPC and NETCONF for real-time telemetry streaming, which allowed interface statistics to be collected every second.

management api netconf
transport ssh default

management api gnmi
transport grpc default
port 57400

I wrote a interface_stats.py script that polls the following details for each device:

1. Interface name
2. MTU
3. Speed
4. In packets
5. Out packets
6. Timestamp

Data is stored in the SQLite database under the interface_stats table.

5. SQLite Integration

The SQLite database (logs.db) acts as the central store for all telemetry data:

- cpu_utilization: CPU usage metrics collected via SNMP.
- snmp_traps: Link change notifications captured as SNMP traps.
- interface_stats: Interface statistics collected via gRPC/NETCONF.

Key Takeaways

1. Integrated SNMP, Syslog, and gRPC/NETCONF for comprehensive monitoring.
2. Automated data collection and storage with Python scripts.
3. Utilized SQLite for organizing and querying telemetry data efficiently.

Next Steps

In the next article, I will explore integrating Prometheus and Grafana to visualize this telemetry data and generate meaningful insights. Stay tuned!

View Project on GitHub