flyway: crowdedness forecasting with LSTM

flyway was a machine learning project to explore the LSTM RNN as a tool for forecasting crowdedness in the Engineering Design Studio (EDS) at NYUAD. A project by @niniack & @woswos

“Every once in a while, you may be faced with the dreaded decision of where to study. We use past network traffic data to predict how crowded a space might be, whether this means high traffic, for those who enjoy company, or low traffic, for those who want quiet. Either way, you’ll know before anyone knows.”

Check out the notebook (Github)

Check out the final submission (PDF)


Data Collection

To begin with, we figured that it would be worth looking at network traffic data from personal devices and extract some sort of information from there to use as a crowdedness metric. We also figured that without any data correlating network traffic to a headcount, it would have been silly for us to predict the number of people, as we would have no “target data”, so to speak.

So, we set up a computer in the EDS with the wifi chip set to monitoring mode, collecting the network traffic in its vicinity. To do this, we made use of the dumpcap tool, which provides users with the -I flag to enter --monitor-mode.

Having collected over a million data packets, we exported the raw data as a CSV file to work with in a python notebook.

preview Raw data collected in the EDS

Data Visualization

Looking at the dataset, we decided to plot the number of frames sent by hour. This could provide a rough estimate for how the space looked at certain times of day (hour of day) and certain days (day of week). More network traffic likely implied more people.

Cleaning the data proved to be an important step. As the EDS is home to a number of non-human devices (RPis from student projects, desktops, etc), we wanted to remove this traffic from the dataset. To do this, we obtained the list of these devices and removed them from our dataset. We also removed the top 10 “loudest” devices, as these most likely included routers and other static devices not part of the EDS network (e.g. nearby cameras). The two graphs below show the drastic difference as a result of data cleaning.

preview Frame count plotted by hour prior to cleaning

preview Frame count plotted by hour after cleaning

It is also worth noting that there is a relatively large discrepancy in frame count trend on the first hour of the first day and the last hour of the last day as data collection was not run for the full hour. This could have seriously affected our model evaluation and so the relevant values were removed.

We also plot the frame count and unique MAC addresses per hour, normalized, to better understand if they are a good representation of crowdedness.

preview Frame count and unique MAC count per hour normalized

While there are a few anomalies, the graph above shows that the two metrics act as a pretty estimator for how busy the area is. An example of an anomaly may be the case where there is a large number of unique MAC addresses but relatively few packages being sent. This could represent a situation where a group of people walked into the space but ultimately decided to not stay. To hedge against this, we can use a weighted average that favors the lower value, always treating the larger metric as an anomaly:

def weighted_average(val1, val2):
    Calculate weighted average, giving weight to the smaller value

  weight = 0.8

  if (val1 > val2):
    weighted_avg = (weight*val2 + (1-weight)*val1)
    weighted_avg = (weight*val1 + (1-weight)*val2)

  return weighted_avg

Applying this technique produces the following graph:

preview Frame count, unique MAC count, and crowdedness per hour normalized

Data Engineering and Consideration

So far the model would only have information about the past frame data. We can supplement the model by adding more features:

We may also improve the accuracy of our model if we input scaled data into the model as well as one hot encode all categorical values.

We also carried out this process for the number of unique MAC addresses per hour, resulting in the two metrics we planned to use for busyness.

Data Preparation

So far, we have three datasets to work with:

As we fed frame count and unique MAC count into the model as inputs, and the plan was to use LSTM cells, we reshaped our data following this tutorial. Next, we split each of these datasets into training and testing (70-30 split)

def split_dataframe(data, n_in=1, n_out=1, dropnan=True):

    Split a dataframe for LSTM input

    n_vars = 1 if type(data) is list else data.shape[1]
    # df = pd.DataFrame(data)
    df = data
    cols, names = list(), list()

    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
      names += [('%s(t-%d)' % (df.columns[j], i)) for j in range(n_vars)]

    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
      if i == 0:
        names += [('%s(t)' % (df.columns[j])) for j in range(n_vars)]
        names += [('%s(t+%d)' % (df.columns[j], i)) for j in range(n_vars)]

    # put it all together
    agg = pd.concat(cols, axis=1)
    agg.columns = names

    # drop rows with NaN values
    if dropnan:

    return agg

Function from machinelearningmastery

Model Design

Having reshaped and split our dataframes, we make attempts at building a reasonable model. Our goal is to feed in the frame and umac count t-n to t and produce a crowdedness measure at t+1.

Model 1

We make a naive attempt, simply feeding in the data into stacked LSTM cells, concatenate the outputs, and feed the merged data into a dense network.

##  define input
visible1 = Input(shape=(n_steps_fc, n_features_fc))
##  first interpretation model
hidden1 = LSTM(250, activation='relu', return_sequences=True, dropout=0.12)(visible1)
hidden2 = LSTM(250, activation='relu', return_sequences=True)(hidden1)
hidden3 = LSTM(250, activation='relu')(hidden2)
frame_count = Dense(1)(hidden3)
##  define output
frame_count_model = Model(visible1, frame_count)

##  define input
visible2 = Input(shape=(n_steps_umac, n_features_umac))
##  second interpretation  model
hidden4 = LSTM(200, activation='relu', return_sequences=True, dropout=0.2)(visible2)
hidden5 = LSTM(200, activation='relu', return_sequences=True)(hidden4)
hidden6 = LSTM(200, activation='relu')(hidden5)
umac_count = Dense(1)(hidden6)
umac_count_model = Model(visible2, umac_count)

merge = concatenate([frame_count, umac_count])
hidden7 = Dense(512)(merge)
hidden8 = Dense(256)(hidden7)
hidden9 = Dense(256)(hidden8)
busyness = Dense(1)(hidden9)

model = Model(inputs=[visible1, visible2], outputs=busyness)
model.compile(optimizer='adam', loss='mse')


However, the issue here is that the model is a bit of a blackbox. Turning parameters is a little overwhelming, considering there are 11 layers to play around with. We could solve this issue if we used a modularized approach!

preview Blue is the original data; orange is the forecasted data from model 1. Not that great!

Model 2

Splitting the previous model into smaller components, results in this overview: preview Higher level overview of model 2

Taking a deeper look, we see how it was constructed:

##  encapsulating model
frames = Input(shape=(n_steps_fc, n_features_fc))
umacs = Input(shape=(n_steps_umac, n_features_umac))

out_frame = fc_model(frames)
out_umac = umac_model(umacs)

fc_model.trainable = False
umac_model.trainable = False

out_busyness = busyness_model([out_frame, out_umac])
busyness_model.trainable = False

overview_model = Model(inputs=[frames, umacs], outputs=out_busyness)
overview_model.compile(optimizer='adam', loss='mse')

Each of the models within was pre-constructed and separately trained. The fc-model was given t-3 to t frame count data to predict the frame count at t+1. Similarly, the umac-model was given t-3 to t unique MAC counts to produce a count at t+1. Finally, the busyness-model was provided with an x number of frame and unique MAC counts, along with a x number of crowdedness measures to model the relationship we wrote in the weighted_average function (see above).

Here’s what the model looks like in detail:

preview Detailed overview of model 2

Final Results

The second model performed well and allowed for much more control over tuning the parameters, as the train and test mean squared error (MSE) could be obtained for each of the three models within.

preview preview preview

Left to right: train/test MSE for fc_model, umac_model, busyness_model

The overall model had a train MSE of 2% and a test MSE of 1.3%

Finally, the graph:


We even created a UI to better represent our project results! Check it out at

preview preview