13. Streaming data from Arduino

import time

import numpy as np
import pandas as pd

import serial
import serial.tools.list_ports

import bokeh.plotting
import bokeh.io
Loading BokehJS ...

As we will need to do when we need to connect to Arduino using Python, we will begin this notebook with utility functions. (Maybe I should have put these in a package, but, again, I leave them like this since you may want to modify them to match whatever sketch is loaded on to your Arduino Uno, or maybe I will.)

def find_arduino(port=None):
    """Get the name of the port that is connected to Arduino."""
    if port is None:
        ports = serial.tools.list_ports.comports()
        for p in ports:
            if p.manufacturer is not None and "Arduino" in p.manufacturer:
                port = p.device
    return port

def handshake_arduino(
    arduino, sleep_time=1, print_handshake_message=False, handshake_code=0
    """Make sure connection is established by sending
    and receiving bytes."""
    # Close and reopen

    # Chill out while everything gets set

    # Set a long timeout to complete handshake
    timeout = arduino.timeout
    arduino.timeout = 2

    # Read and discard everything that may be in the input buffer
    _ = arduino.read_all()

    # Send request to Arduino

    # Read in what Arduino sent
    handshake_message = arduino.read_until()

    # Send and receive request again
    handshake_message = arduino.read_until()

    # Print the handshake message, if desired
    if print_handshake_message:
        print("Handshake message: " + handshake_message.decode())

    # Reset the timeout
    arduino.timeout = timeout

Problems with on-demand data

In many applications, we want to push a button and then have Arduino respond by sending us some data. This is the case when you build your spectrophotometer. You will put in the cuvette, press a button, and then get the measurement. The timing of the measurement, certainly down to the millisecond, is not important. In that case, ask-and-receive, like we did in the last lesson, works fine.

In other applications, though, we want a steady stream of data, and we want it at well-defined time intervals. In this case, the variation in time between samples that we saw in the ask-and-receive style of the last lesson can be problematic. If our time interval between samples is long, say hundreds of millseconds or more, then it is not really a problem, but even with 20 ms between samples, we already saw that we can be pretty far off.

A better option is to have Arduino do all the timing and then automatically send data to your computer over serial communication. That is, data streams from the board and is constantly collected by the Python interpreter. In this lesson, we will learn how to collect streaming data from Arduino. We will use the same setup as the previous lesson, shown below.

Arduino data transfer schematic

Follow-along exercise 10: Streaming data

Our sketch is more involved this time, since we are going to have richer communications with Arduino. We want to turn streaming on and off. A convenient way to do this while still preserving access to the on-demand way of receiving data we set up in the previous lessons, is to have two data acquisition (DAQ) modes, on-request and stream. Streaming mode has another parameter, which is the delay between acquisitions. So, we need to allow for the user to input the delay (in milliseconds).

const int voltagePin = A0;

const int HANDSHAKE = 0;
const int VOLTAGE_REQUEST = 1;
const int ON_REQUEST = 2;
const int STREAM = 3;
const int READ_DAQ_DELAY = 4;

// Initially, only send data upon request
int daqMode = ON_REQUEST;

// Default time between data acquisition is 100 ms
int daqDelay = 100;

// String to store input of DAQ delay
String daqDelayStr;

// Keep track of last data acquistion for delays
unsigned long timeOfLastDAQ = 0;

unsigned long printVoltage() {
  // Read value from analog pin
  int value = analogRead(voltagePin);

  // Get the time point
  unsigned long timeMilliseconds = millis();

  // Write the result
  if (Serial.availableForWrite()) {
    String outstr = String(String(timeMilliseconds, DEC) + "," + String(value, DEC));

  // Return time of acquisition
  return timeMilliseconds;

void setup() {
  // Initialize serial communication

void loop() {
  // If we're streaming
  if (daqMode == STREAM) {
    if (millis() - timeOfLastDAQ >= daqDelay) {
      timeOfLastDAQ = printVoltage();

  // Check if data has been sent to Arduino and respond accordingly
  if (Serial.available() > 0) {
    // Read in request
    int inByte = Serial.read();

    // If data is requested, fetch it and write it, or handshake
    switch(inByte) {
        timeOfLastDAQ = printVoltage();
      case ON_REQUEST:
        daqMode = ON_REQUEST;
      case STREAM:
        daqMode = STREAM;
      case READ_DAQ_DELAY:
        // Read in delay, knowing it is appended with an x
        daqDelayStr = Serial.readStringUntil('x');

        // Convert to int and store
        daqDelay = daqDelayStr.toInt();

      case HANDSHAKE:
        if (Serial.availableForWrite()) {
          Serial.println("Message received.");

Some comments on this sketch:

  • Much of the setup is the same as the last lesson, including the printVoltage() function. We also have global variables daqMode, which specifies whether we are in streaming more or on-demand, and daqDelay, which specifies the time between data acquisitions.

  • In the loop() function, we send data along USB if we are in streaming mode and have waited daqDelay or longer.

  • We then also check to see if any data has been sent to Arduino from the computer. If so, we again enter into a switch-case, as before.

  • If we sent a signal from the computer that we want Arduino to read in a DAQ delay, we use Serial.readStringUntil() to read in the string specifying the DAQ delay. We specify that the string ends with 'x'. This is useful as it ensures that Arduino knows exactly when the data coming over USB ends specification of the delay string. Therefore, on the Python side, we need to make sure to append an 'x' onto the string giving the DAQ delay that we will convert into a bytes array to send to Arduino.

  • We use the toInt() method to convert the read in string for DAQ delay into an integer.

Now that we have our sketch, we can set up our global variables so we have them on the Python side as well.


Setting up Python to receive data

We now need to write some code on the Python side to enable streaming. As usual, the first step is to connect to Arduino with a serial connection. We will open it and leave it open to enable convenient streaming, which is also important to do when we stream in data asynchronously.

port = find_arduino()
arduino = serial.Serial(port, baudrate=115200)
handshake_arduino(arduino, handshake_code=HANDSHAKE, print_handshake_message=True)
Handshake message: Message received.

Since we will again be sending in comma delimited data, we can use the data parser we used last time to convert the string we get from Arduino to a time in milliseconds and a voltage.

def parse_raw(raw):
    """Parse bytes output from Arduino."""
    raw = raw.decode()
    if raw[-1] != "\n":
        raise ValueError(
            "Input must end with newline, otherwise message is incomplete."

    t, V = raw.rstrip().split(",")

    return int(t), int(V) * 5 / 1023

Now, we will write a function to turn on a data stream, collect data from it, and then return the result as a Pandas data frame. The steps toward doing so are as follows:

  1. Send a signal to Arduino that gives the delay in data acquisition. The first byte of this signal must be READ_DAQ_DELAY, signifying that the following bytes, up to the character 'x' give the delay. Then, subsequent bytes give the digits of the DAQ delay, followed by 'x'.

  2. Initialize empty Numpy arrays to receive the data.

  3. Tell Arduino to switch to streaming mode.

  4. Keep reading in data until we acquire the desired number of data points.

  5. Tell Arduino to switch to sending data on request.

  6. Return a data frame containing the results.

We will store the result in a Pandas data frame for convenient use later. (If you are unfamiliar with Pandas, you can check out this introduction.

def daq_stream(arduino, n_data=100, delay=20):
    """Obtain `n_data` data points from an Arduino stream
    with a delay of `delay` milliseconds between each."""
    # Specify delay
    arduino.write(bytes([READ_DAQ_DELAY]) + (str(delay) + "x").encode())

    # Initialize output
    time_ms = np.empty(n_data)
    voltage = np.empty(n_data)

    # Turn on the stream

    # Receive data
    i = 0
    while i < n_data:
        raw = arduino.read_until()

            t, V = parse_raw(raw)
            time_ms[i] = t
            voltage[i] = V
            i += 1

    # Turn off the stream

    return pd.DataFrame({'time (ms)': time_ms, 'voltage (V)': voltage})

Let’s put this function to use and collect some data! We will collect 1000 data points with 20 millisecond intervals. (This will then take 20 seconds to run.)

df = daq_stream(arduino, n_data=1000, delay=20)

As we did in the last lesson, we can plot the results.

df['time (sec)'] = df['time (ms)'] / 1000

p = bokeh.plotting.figure(
    x_axis_label='time (s)',
    y_axis_label='voltage (V)',
    x_range=[df['time (sec)'].min(), df['time (sec)'].max()],
p.line(source=df, x='time (sec)', y='voltage (V)')


You are not required to submit this exercise.

Comparison of timing

We saw that when we acquired data on request using a call to time.sleep() to wait for the request, we got ill-timed data. Let’s generate those data again, and compare to the timing we got by streaming data.

time_ms = []
voltage = []

def request_single_voltage(arduino):
    """Ask Arduino for a single data point"""
    # Ask Arduino for data

    # Read in the data
    raw = arduino.read_until()

    # Parse and return
    return parse_raw(raw)

for i in range(1000):
    # Request and append
    t, V = request_single_voltage(arduino)

    # Wait 20 ms

Now, we can compute the differences in the time intervals and make a plot of how many samples for each inter-sample time we got.

dt_stream = np.diff(df['time (ms)'])
dt_on_demand = np.diff(time_ms)

dt_stream, counts_stream = np.unique(dt_stream, return_counts=True)
dt_on_demand, counts_on_demand = np.unique(dt_on_demand, return_counts=True)

p = bokeh.plotting.figure(
    x_axis_label='Δt (ms)',
    y_axis_label='number of samples',
p.circle(dt_stream, counts_stream, legend_label='stream')
p.circle(dt_on_demand, counts_on_demand, legend_label='on demand', color='orange')


Clearly, streaming has much better performance!


Computing environment

%load_ext watermark
%watermark -v -p numpy,pandas,serial,bokeh,jupyterlab
CPython 3.8.5
IPython 7.18.1

numpy 1.19.1
pandas 1.1.1
serial 3.4
bokeh 2.2.1
jupyterlab 2.2.6