Usually I can find something funny or interesting to say about the projects I’m working one, but this one is pretty cut and dry. I read some papers, wrote a whole lot of code, did a some debugging, listened to my computer sound like it was going to have a heart attack during training, and made a thing that points out boats (mostly). Pictures down below.
Nerd Alert – The rest of this post will be technical details that many will not be interested in.
For a machine learning side project I figured I would check out an image processing Kaggle Competition and the only one available at the time is the Airbus Ship Detection Challenge, satellite image boat segmentation.
From day one, I constantly fought one thing to make a successful model: Sparsity.
The dataset is large (ish), with 100,000 training images and 80,000 test images. Nice, we should be able to make a good model with that, but there are some caveats:
- Each image is 768 x 768 which is pretty large for personal computing purposes, and you run into some problems when shrinking them.
- There is actually a boat in maybe 1/8 images
- This boat takes up a small fraction of the image (~0-2%), so if you were to randomly sample pixels, in less than 1/500 times would that pixel actually belong to a boat.
When first implementing a model for this, I tried running it on the images alone, but found it just way too slow and inaccurate. With a sad 16 gigabytes of RAM on my computer, I could not run a reasonable segmentation model with a batch size greater than 10, and since positive examples are so hard to come by in this data set, a batch size of 10 just won’t do.
So, I divided my problem in half. The first half, dubbed “coarse model”, would deal with finding likely spots for the ships, while the second, “fine model”, would deal with doing fine grained segmentation of those ships.
Like I said earlier, the size of each image quickly becomes a big problem for any single computer, so we quickly enter into a trade-off as we think about coarse models, i.e. variety of batch vs field of view.
Let me explain.
We can only process so many pixel in a given batch. If we process the entire satellite image at the same time, we can only process ten satellite images. This runs into issues on the variety of photos. I have generally found that stochastic or small batch learning can be bumped in poor directions since all images of a given batch happened to be blue or something. When your batches are small enough, you take so many large, inconsistent steps that it can essentially cripple training. I mean, come one look at the variety of some of these images.
Alternatively we can cut each image into chunks. If we then shuffle our data we can get a larger amount of variety in a given batch and avoid this problem. However, the looming problem is what you miss in field of view. if you cut small enough chunks you can’t really know if the grey patch you’re looking at is a boat or a dock.
Regardless, I chose to do both.
My idea for a coarse model would for it to predict boat presence for each (NxN) district (kind of like YOLO). So an output for a section of an image might look like the figure below, with blocks representing presence of a boat in that space. I ended up doing this with (8×8) blocks although I could see it working with others.
Honestly I think this was the weak part of my model. Overall there were a ton of false positives and depending on the iteration (there were many) a size-able amount of false negatives as well. If there was a boat in 1/8 images, the model predicted a boat somewhere in 1/4, although it still greatly reduced the space to look through giving likely areas.
For brevity, I’m going to cut my coarse model explanation here, but if you have any questions on anything email me or checkout the github.
For the fine model I implemented a U-net architecture for the fine grained segmentation, with an architecture similar to the one below.
This method worked surprisingly well, generating very logical fine grained segmentation like some of the images shown above. Although it was trained on some negative as well as positive examples for boat, it still will potentially produce trash when fed trash by the coarse model.
Again, there are so many details that I can’t decide which are would be important to include here, so email me with any questions or checkout the github.