April 20, 2020
Share via

Computer Vision API: Setting up the environment

The requirements for creating a Computer Vision API are straightforward. You need a Google account in order to able to access your Google Drive and use Google Colab.

1. Create your Image Dataset

We are going to classify flowers but, instead of taking existing datasets available from the Internet, we are going to build our own labeled image dataset using Google Images.

First, open your new notebook Google Colab and connect it to your drive with the following code:

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"

You will be asked to click a link and copy/past the code that will be given to you in the box appearing in the output of the cell.

Now you can access the data in your Drive from your notebook.

Download the images and store them into your Drive. To do this, go on Google Image and search for the objects you want to classify. In our case, we want to classify Sunflower, yellow Daisy and yellow Tulip. Let’s start with the Sunflowers. Search Sunflowers on Google image and scroll to the bottom until you see show more results. Then, open the developer mode of your browser and go the Web Console.

There, paste the following code to download a CSV of all the image urls:

urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));

This triggers the download of the file. Save it and upload it using the widget on the left of the Notebook: Files > Upload. Do the same for the other flowers.

We are going to now use the fast.ai library. Create a cell and paste:

!curl -s https://course.fast.ai/setup/colab | bash

This will install the library and configure your Colab notebook to run smoothly with it. Then, activate the GPU going to Runtime > Change Runtime Type > GPU.

2. Download data

Now, import the library and download the images:

from fastai.vision import *
from fastai.metrics import error_rate
In [0]:
folders_files = [('sunflowers', 'sunflowers.csv'), ('yellow_daisy', 'yellow_daisy.csv'), ('yellow_tulips', 'yellow_tulips.csv')]
for (folder, file) in folders_files:
  path = Path('/content/gdrive/My Drive/DeepLearning/Datasets/')
  folder = (path/folder)
  folder.mkdir(parents=True, exist_ok=True)
  download_images(path/file, folder, max_pics=200)
  verify_images(folder, delete=True, max_size=500)

Here we create a folder and download 200 images for each class and verify they are not corrupted.

Next, we create an ImageDataBunch from the downloaded images. This object will represent our data with their labels. To optimise the process, we also:

  • Split the data to have 20% of validation
  • Resize the data in squares of 224 pics
  • Operate image augmentation with get_transforms
  • Normalize the data
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
        ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

3. View data

Let’s visualize our data:

data.show_batch(rows=3, figsize=(7,6))

Everything seems correct. However to ensure optimal performance, you should check the images manually and remove the non-consistent images from your dataset. The Google Drive UI is quite convenient for validating and filtering images manually.

4. Train model

Now we are going to train our model. The fast.ai library is concise and is set with good default features.

learn = cnn_learner(data, models.resnet34, metrics=error_rate)

Downloading: “https://download.pytorch.org/models/resnet34-333f7ec4.pth” to /root/.cache/torch/checkpoints/resnet34-333f7ec4.pth 100%|██████████| 87306240/87306240 [00:00<00:00, 109355856.48it/s]


With these lines, we are downloading a pre-trained model ResNet34, passing our data and setting the metric as the error rate.

Then, we fit the model with the one cycle policy, as this mode of approach usually performs well.

We get an error rate of 6% which is not bad, but we could do better. We were training the top layers of the pre-trained model. Let’s unfreeze all layers so their parameters are able to be modified during the training phase.


fast.ai library provides lr_find which will launch an LR range test that will help you select a good learning rate. Plotting the curve:


A rule thumb is to spot the strongest downward slope. Therefore we pick the beginning of the range at 1e-5 and choose to stop at 1e-4 because we were already training at 1e-3 by default. That way, the first layers are will be trained with a learning rate of 3e-5 and the last ones at 3e-4.

learn.fit_one_cycle(10, max_lr=slice(3e-5,3e-4))

3.5% error, which is better.

Let’s analyze the classification errors:

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

We see most of the errors come from the confusion between yellow daisy and sunflowers which can be understandable due to their visual similarities. Plotting the errors which had the biggest top loss can help us to further understand the causes of the error.

interp.plot_top_losses(9, figsize=(15,11))

Some normal images have been misclassified, but others don’t look like flowers at all. This process demonstrates that we have not cleaned the dataset to the extent that we should have. Therefore, to improve performance, we should clean the image dataset and rerun the process to have a consistent score.

After cleaning and retraining our model, it should be a shame to keep it in the notebook, let’s put it in production!

5. Export the API model to production

Save your model:


Download it and save it at the base of your API local folder. We are going now to build a basic API which will allow the user to upload an image and get the prediction.

Create a virtual environment with Python 3 having Flask and fast.ai library.

Create api_endpoint.py and past the following code:

import os
from flask import Flask, flash, request, redirect, url_for, send_from_directory, jsonify
from werkzeug.utils import secure_filename
from fastai.vision import *

app = Flask(__name__)

UPLOAD_FOLDER = os.getcwd() + '/files/'
ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg', 'gif'}

app = Flask(__name__)

def allowed_file(filename):
    return '.' in filename and \
           filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

@app.route('/', methods=['GET', 'POST'])
def upload_file():
    if request.method == 'POST':
        # check if the post request has the file part
        if 'file' not in request.files:
            flash('No file part')
            return redirect(request.url)
        file = request.files['file']
        # if user does not select file, browser also
        # submit an empty part without filename
        if file.filename == '':
            flash('No selected file')
            return redirect(request.url)
        if file and allowed_file(file.filename):
            filename = secure_filename(file.filename)
            img = open_image(file)
            pred, _, losses = learner.predict(img)
            file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
            return jsonify(str(pred))
    return '''
    <!doctype html>
    <title>Upload new File</title>
    <h1>Upload new File</h1>
    <form method=post enctype=multipart/form-data>
      <input type=file name=file>
      <input type=submit value=Upload>

def uploaded_file(filename):
    # return send_from_directory(app.config['UPLOAD_FOLDER'],
    #                            filename)
    return 'File updated!'

if __name__ == '__main__':
    defaults.device = torch.device('cpu')
    learner = load_learner('.')
    app.run(host="", port=int("80"), debug=True)

Launch your API with python api_endpoint.py and access with

There you can upload your image and get your classification. For example with this image:

As a return result we have the following JSON:


6. Next steps

To go deeper into computer vision and API building, I highly recommend you try the excellent courses of https://www.fast.ai/.

You can check the code in the DAIN Studios’ GitHub repository

Written by Thomas Nguyen, a Data Engineer at DAIN Studios based in Berlin, Germany.

References & more

#Gettingstartedwith is DAIN Studios’ new blog series for those of you that are eager to get acquainted with data and AI.
Curiosity is in our core, and many DAINians are constantly experimenting on things. We owe it to our customers to stay on top of new technologies, and to be honest, being a bit on the nerdy side, it comes quite naturally! Taking time to learn is also something we as employers want to encourage, and make sure some time is reserved each week for sharing knowledge with co-workers.
The blogs and posts we publish with the #gettingstartedwith tag will be more of an introductory level for those of you that want get familiar with the world of data and AI. It may be tips and tricks, showing how to build an API in 60 minutes, or a fun project on experimenting on text analysis for instance. Or we may be sharing links to good reading or watching, e-learning available or people to follow.


Title: Computer Vision: Create an API in 60 minutes
DAIN Studios, Data & AI Strategy Consultancy
Published in ,
Updated on September 11, 2022