PiAware and Python

Mon 09 April 2018 by Patrick Pierson

Hardware

ADS-B is a public broadcast message platform that aircraft use to send their location out for others to pick up. In some cases the "others" are a Raspberry Pi running PiAware.

Using the following components:

A Raspberry Pi that I bought for $35 on Amazon. PiAware Pi

The "Dongle" or FlightAware Pro Stick and filter were $35 as a set. PiAware Dongle and Filter

The 1090Mhz Antenna and cable to mount it above my house. Antenna

I followed the instructions here to get it setup. Shortly after I had this view of the airspace over my home state.

PiAware SkyView

Why?

At this point I suspect you are asking why? Why spend the money to pick up weird radio signals in the air?

I wanted a dataset that was different and that I had control over. Piaware is collecting data from the air waves and sends it to Flightaware. However it also provides a simple json endpoint that I could also collect the data from and store for my own use.

Collection

To collect the data I needed to hit the http endpoint on the Piaware. I could test pulling the data with curl by connecting to http://piaware_ip/dump1090-fa/data/aircraft.json

At first I figured pulling the data every five seconds would be good but that ended up being too much data so as you will see later I get the data every 30 seconds. This is a good time-frame because and aircraft will only drop off the screen after it has not been seen for one minute. This means each aircraft seen will have at least two datapoints associated with it.

I wanted to be able to store the data I pulled in a location I knew would always be available so I decided to send it to AWS's S3. Below is a simple Python script I wrote that uses a config file and AWS credentials to pull data from the Piaware json endpoint and stores it in S3 in the following key format:

s3://location/yyyy/mm/dd/hh/piaware-now-time.json

Python script:

import boto3
import requests
import json
import time
from datetime import datetime
import configparser


aws_access_key = 'AWS_ACCESS_KEY_ID'
aws_secret_access_key = 'AWS_SECRET_ACCESS_KEY'
bucket = 'bucket_loc'

s3 = boto3.resource('s3',
                    aws_access_key_id=aws_access_key,
                    aws_secret_access_key=aws_secret_access_key
                    )


class DumpAircraftData:
    def __init__(self):
        config = configparser.ConfigParser()
        config.read('config.ini')
        self.receiver = config.get('default', 'receiver')
        self.piaware_ip = config.get('default', 'piaware_ip')
        self.wait_time = int(config.get('default', 'wait_time'))

    def get_aircraft(self):
        url = 'http://%s/dump1090-fa/data/aircraft.json' % self.piaware_ip
        res = requests.get(url)
        return res.json()

    def save_json_s3(self, data, bucket):
        s3.Object(bucket, '%s/%s/%s.json' % (self.receiver,
                                             datetime.now().strftime('%Y/%m/%d/%H'),
                                             str(int(data['now'])))).put(Body=json.dumps(data))

    def loop_it(self):
        while True:
            data = dump_data.get_aircraft()
            dump_data.save_json_s3(data, bucket)
            print('Uploaded %s.json to Bucket s3://%s/%s/%s' % (str(int(data['now'])), bucket, self.receiver, datetime.now().strftime('%Y/%m/%d/%H')))
            time.sleep(self.wait_time)


if __name__ == "__main__":
    dump_data = DumpAircraftData()
    dump_data.loop_it()

config.ini

[default]
receiver = frederick
wait_time = 30
piaware_ip = piaware_ip_in_my_house

After a few days of collection there is a good bit of data in the bucket:

$ aws s3 ls --recursive --human --summarize s3://bucket_loc
Lots of entries
2018-04-05 04:31:44    1.5 KiB frederick/2018/04/05/04/1522917103.json
2018-04-05 04:32:14    1.5 KiB frederick/2018/04/05/04/1522917133.json
2018-04-05 04:32:45    1.5 KiB frederick/2018/04/05/04/1522917163.json
2018-04-05 04:33:15    1.5 KiB frederick/2018/04/05/04/1522917193.json

Total Objects: 23741
   Total Size: 556.0 MiB

Processing

To look at the data better I passed it through Jq first. This made it more human readable and I was able to start focusing on some things in the data that I wanted to expose.

$ cat frederick/2018/04/03/01/1522733017.json | jq '' | head -n 15
{
  "messages": 88697699,
  "now": 1522733017.7,
  "aircraft": [
    {
      "hex": "aa8389",
      "seen": 2.9,
      "speed": 570,
      "vert_rate": 0,
      "messages": 107,
      "track": 101,
      "mlat": [],
      "tisb": [],
      "rssi": -11.9,
      "altitude": 29000

In some cases the data has just the altitude and some other data. In the cases that I cared about the data had a much fuller picture.

$ cat frederick/2018/04/03/01/1522733047.json | jq '' | head -n 20
{
 "messages": 88699722,
 "now": 1522733047.8,
 "aircraft": [
   {
     "lat": 41.459335,
     "flight": "FDX1021 ",
     "vert_rate": 0,
     "messages": 164,
     "category": "A4",
     "lon": -77.592753,
     "tisb": [],
     "altitude": 29000,
     "hex": "aa8389",
     "mlat": [],
     "seen": 1.3,
     "speed": 570,
     "squawk": "6557",
     "track": 101,
     "nucp": 7,

The first thing I wanted to see was:

  1. Number of aircraft seen.
  2. Average altitude seen.
  3. Highest speed seen.
  4. Total number of messages received.
  5. Average number of messages received per aircraft.

To get this information I first pulled all of the data back down from S3.

$ aws s3 sync s3://bucket_loc ./data

I then wrote the following script to parse through it all.

import glob2
import json

all_files = glob2.glob('data/**/*.json')

flight_reports = []
altitude_reports = []
speed_reports = []
messages = 0

for file in all_files:
    with open(file, 'r') as f:
        data = json.loads(f.readline())
        aircrafts = data.get('aircraft')
        for aircraft in aircrafts:
            # Flights
            flight = aircraft.get('hex')
            flight_reports.append(flight)
            # Altitudes
            altitude = aircraft.get('altitude', 0)
            if type(altitude) is int:
                altitude_reports.append(altitude)
            # Speed
            speed = aircraft.get('speed', 0)
            speed_reports.append(speed)
            messages += aircraft.get('messages', 0)

print('Number of aircraft seen: %s' % len(set(flight_reports)))
average_altitude = int(sum(set(altitude_reports)) / len(set(altitude_reports)))
print('Average altitude seen: %s' % average_altitude)
print('Highest speed seen: %s' % max(speed_reports))
print('Total number of messages received: %s' % messages)
average_messages = int(messages / len(set(flight_reports)))
print('Average number of messages received: %s' % average_messages)

When I ran it the results were as follows:

$ pipenv run python show_aircraft.py
Number of aircraft seen: 10241
Average altitude seen: 38358
Highest speed seen: 3380
Total number of messages received: 10249115411
Average number of messages received: 1000792

I then wanted to get the distance from the aircraft's positions to me. To do this I used a Python library called Geopy. Geopy has smarts built in that calculated my GPS location against the aircrafts GPS location and returned a result in miles. Just to protect myself a little I have removed the more specific my_loc coordinates but the code is the same for anyone. Just update my_loc with the correct GPS location and it will calculate it for you based on the other GPS location.

import geopy.distance
import glob2
import json

all_files = glob2.glob('data/**/*.json')

my_loc = (39.0, -77.0)

_5_miles = 0
_5_to_30_miles = 0
_30_to_100_miles = 0
_100_to_200_miles = 0
_200_or_more_miles = 0

for file in all_files:
    with open(file, 'r') as f:
        data = json.loads(f.readline())
        aircrafts = data.get('aircraft')
        for aircraft in aircrafts:
            if aircraft.get('lat') and aircraft.get('lon'):
                plane_loc = (aircraft.get('lat'), aircraft.get('lon'))
                distance = geopy.distance.vincenty(my_loc, plane_loc).miles
                if distance < 5:
                    _5_miles += 1
                elif 5 < distance < 30:
                    _5_to_30_miles += 1
                elif 30 < distance < 100:
                    _30_to_100_miles += 1
                elif 100 < distance < 200:
                    _100_to_200_miles += 1
                elif distance > 200:
                    _200_or_more_miles += 1

print('Positions found are messages with positions in them')
print('---------------------------------------------------')
print('Positions found within 5 miles: %s' % _5_miles)
print('Positions found between 5 and 30 miles: %s' % _5_to_30_miles)
print('Positions found between 30 and 100 miles: %s' % _30_to_100_miles)
print('Positions found between 100 and 200 miles: %s' % _100_to_200_miles)
print('Positions found past 200 miles: %s' % _200_or_more_miles)
$ pipenv run python distance.py
Positions found are messages with positions in them
---------------------------------------------------
Positions found within 5 miles: 7689
Positions found between 5 and 30 miles: 130920
Positions found between 30 and 100 miles: 857193
Positions found between 100 and 200 miles: 518808
Positions found past 200 miles: 1428

I found this project to be pretty easy to setup. The python behind it is fairly basic as well.
Do not hesitate to contact me if you have any comments at me@patrickpierson.us