I’ve been fiddling about with NodeJS a lot in my free time recently, but I’ve been missing something and for a while I wasn’t sure what.
Then I started browsing the Data Is Beautiful subreddit and realised what it was I was missing: throwing a load of data around in a pillow case and seeing what comes tumbling out when I’m done!
So I had an evening free and decided I wanted to get going on some Python again. I used Python quite extensively in my second year of university, but since then I haven’t had much need for it.
I used it so much before because it is a great tool for quickly and joyfully creating interesting visuals, so it was the perfect tool to achieve my data-in-a-pillowcase plans.
I was browsing Kaggle when I discovered a Deepsat (SAT-6) aerial dataset readily available. The dataset includes 340,000 28×28 RGB(+IR) images. Perfect! But what can I do with it?
I played around with outputting colours in different ways with small amounts of the dataset (400 images) and eventually this popped up in my output window:
Not much, right? But it’s interesting! It looks like a beach to me, so I stuck with it.
I added more data to my set (10,000 images)
And it started to look even cooler.
The method in use goes a little like so:
- Loop through each 28×28 image and find an average RGB value
- Sort the resultant RGB averages by HSV. I played around with a few methods from this page and found HSV sorting to be the most interesting.
- Plot the RGB values on a canvas in order of HSV value.
So I went for the full 340,000 images, which takes my laptop about 15 minutes to complete the script (In reality it took 45+ minutes because I had an error at the end of my script which I managed not to fix.. twice..).
I was surprised by the amount of blue which I guess indicates that there are a lot of lakes or the dataset includes a lot of ocean. Otherwise, the amount of white is interesting, possibly revealing the season the dataset was created in or the sheer size of the Alaskan peninsula.
Well, that was fun! Completely pointless but an interesting outcome. Exactly what I wanted.
It’s not particularly great but the code I wrote for this is here for your reference:
from PIL import Image
filereader = csv.reader(open("deepsat-sat6/X_train_sat6.csv"))
im = Image.new('RGB', ((570*2)+1,(570*2)+1), 'black')
impixels = im.load()
pixels = 
for row in filereader:
for a in row:
if j % 4 == 0:
elif j % 4 == 1:
elif j % 4 == 2:
elif j % 4 == 3:
pixels.sort(key=lambda rgb: colorsys.rgb_to_hsv(*rgb))
for i in pixels:
print (x, y)
impixels[x,y] = (i,i,i)
impixels[x+1,y] = (i,i,i)
impixels[x,y+1] = (i,i,i)
impixels[x+1,y+1] = (i,i,i)