Deep Learning QuickStart¶
Installing Deep Lake¶
Deep Lake can be installed using PyPi.
!pip3 install deeplake
Opening Your First Deep Lake Dataset¶
Let's load the Visdrone dataset, a rich dataset with many object detections per image. Datasets hosted by Activeloop are identified by the host organization id followed by the dataset name: <org_id>/<dataset_name>.
import deeplake
import getpass
import os
from deeplake import types
os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass()
dataset_path = 'al://activeloop/visdrone-det-train-v4'
ds = deeplake.open(dataset_path)
The dataset has 3 columns, the images, labels, and bounding boxes:
ds.summary()
Dataset(columns=(images,labels,boxes), length=6471)
+------+--------------------------------------------+
|column| type |
+------+--------------------------------------------+
|images| array(dtype=uint8, shape=[None,None,None]) |
+------+--------------------------------------------+
|labels| array(dtype=uint32, shape=[None]) |
+------+--------------------------------------------+
|boxes |array(dtype=float32, shape=[None,None,None])|
+------+--------------------------------------------+
Reading Data¶
Deep Lake does not download any data in advance. Data is fetched lazily from long-term storage based on row numbers in the dataset:
image = ds["images"][0] # Fetch the first image and return a numpy array
labels = ds["labels"][0] # Fetch the labels in the first image
boxes = ds["boxes"][0] # Fetch the bounding boxes for the first image
img_list = ds["labels"][0:100] # Fetch 100 labels and store them as a list of numpy arrays
Visualizing Datasets¶
The dataset above can be visualized in the Deep Lake App
Creating Your Own Datasets¶
Let's follow along with the example below to create our first dataset. First, download and unzip the small classification dataset below called the animals dataset.
# Download dataset
from IPython.display import clear_output
# !wget https://github.com/activeloopai/examples/blob/main/colabs/starting_data/animals.tar
!curl -L -o animals.tar https://github.com/activeloopai/examples/blob/main/colabs/starting_data/animals.tar
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 167k 0 167k 0 0 426k 0 --:--:-- --:--:-- --:--:-- 425k
# Unzip to './animals' folder
!tar -xvf ./animals.tar
tar: Error opening archive: Unrecognized archive format
animals
- cats
- image_1.jpg
- image_2.jpg
- dogs
- image_3.jpg
- image_4.jpg
Now that you have the data, you can create a Deep Lake Dataset
and initialize its tensors. Running the following code will create a Deep Lake dataset inside of the ./animals_dl
folder.
import deeplake
import numpy as np
import os
ds = deeplake.create('./animals_dl') # Creates the dataset
Next, let's inspect the folder structure for the source dataset './animals' to find the class names and the files that need to be uploaded to the Deep Lake dataset.
# Find the class_names and list of files that need to be uploaded
dataset_folder = '/Users/istranic/ActiveloopCode/Datasets/animals'
# Find the subfolders, but filter additional files like DS_Store that are added on Mac machines.
class_names = [item for item in os.listdir(dataset_folder) if os.path.isdir(os.path.join(dataset_folder, item))]
files_list = []
for dirpath, dirnames, filenames in os.walk(dataset_folder):
for filename in filenames:
files_list.append(os.path.join(dirpath, filename))
Next, let's create the dataset columns and upload data.
ds.add_column('images', dtype = types.Image(sample_compression = "jpg"))
ds.add_column('labels', dtype = types.Array( dtype = types.UInt32(), dimensions=1))
# Iterate through the files and append to Deep Lake dataset
for file in files_list:
label_text = os.path.basename(os.path.dirname(file))
label_num = class_names.index(label_text)
#Append data to the tensors
ds.append({'images': [open(file, "rb").read()], 'labels': [label_num]})
ds.summary()
Dataset(columns=(images,labels), length=0)
+------+------------------------------------------+
|column| type |
+------+------------------------------------------+
|images|array(dtype=uint8, shape=[None,None,None])|
+------+------------------------------------------+
|labels| array(dtype=uint32, shape=[None]) |
+------+------------------------------------------+