Imo-ishoralarni aniqlash - so'nggi yillarda katta e'tiborga ega bo'lgan kompyuter ko'rishning ajoyib sohasi. U oʻyin, robototexnika, inson va kompyuter oʻzaro aloqasi va xavfsizlik tizimlari kabi sohalarda koʻplab ilovalarga ega. Ushbu loyihada biz yuzlab videolarni o'z ichiga olgan ma'lumotlar to'plamidan beshta ishorani to'g'ri bashorat qila oladigan modelni yaratamiz.

Bu erda biz foydalanadigan ikkita turdagi model arxitekturasi mavjud:

  1. 3D konvolyutsion tarmoq(Conv3D): 3D konvolyutsiyalar 2D konvolyutsiyalarining tabiiy kengaytmasidir. Xuddi 2D konv.da boʻlgani kabi, siz filtrni ikki yoʻnalishda (x va y), 3D konvda esa filtrni uchta yoʻnalishda (x, y va z) harakatlantirasiz. Bunday holda, 3D konv.ga kirish videodir (bu 30 ta RGB tasvirlar ketma-ketligi). Har bir tasvirning shakli 100x100x3 deb faraz qilsak, masalan, video 100x100x3x30 shaklidagi 4-D tenzoriga aylanadi, uni (100x100x30)x3 sifatida yozish mumkin, bu erda 3 - kanallar soni. Demak, 2 o'lchamli konvolyutsiyalardan o'xshashlikni keltirib chiqaramiz, bunda 2 o'lchovli yadro/filtr (kvadrat filtr) (fxf)xc sifatida ifodalanadi, bu erda f - filtr o'lchami va c - kanallar soni, 3 o'lchovli yadro/filtr ("kub" filtri) (fxfxf)xc sifatida ifodalanadi (bu erda c = 3, chunki kirish tasvirlari uchta kanalga ega). Ushbu kub filtr endi (100x100x30) tensorning uchta kanalining har birida "3D-konvolve" qiladi.
  2. Convolutions + RNN:Conv2D tarmog'i har bir tasvir uchun xususiyat vektorini ajratib oladi va bu xususiyat vektorlarining ketma-ketligi RNN-ga asoslangan tarmoqqa uzatiladi. RNN chiqishi oddiy softmax (bu kabi tasniflash muammosi uchun).

MA'LUMOTLAR to'plami

Trening ma'lumotlari beshta sinfdan biriga bo'lingan bir necha yuz videolardan iborat:

  1. O'ngga surish: qo'lni to'g'ri yo'nalishda siljitish.

2. Chapga surish: qo‘lni chap tomonga siljitish.

3. Bosh barmog‘ini ko‘tarish: bosh barmog‘ini yuqoriga ko‘rsatish.

4. Bosh barmog‘ini pastga qaratish: bosh barmog‘ini pastga qaratish.

5. To'xtash: kaftni ochiq holda qo'lni ko'rsatish.

Har bir video (odatda 2-3 soniya uzunlikdagi) 30 kvadrat (tasvir) ketma-ketligiga bo'linadi. Ma'lumotlar ikkita papka uchun ikkita CSV fayli bo'lgan "poezd" va "val" papkasini o'z ichiga oladi. Ushbu papkalar o'z navbatida pastki papkalarga bo'linadi, bu erda har bir pastki jild ma'lum bir ishoraning videosini ifodalaydi. Har bir pastki papkada, ya'ni videoda 30 ta ramka (yoki tasvir) mavjud. E'tibor bering, ma'lum bir video pastki papkasidagi barcha rasmlar bir xil o'lchamlarga ega, ammo turli videolar turli o'lchamlarga ega bo'lishi mumkin. Xususan, videolar ikki xil o'lchamga ega - 360x360 yoki 120x160 (videolarni yozib olish uchun ishlatiladigan veb-kameraga qarab).

CSV faylining har bir qatori bitta videoni ifodalaydi va uchta asosiy ma'lumotni o'z ichiga oladi - videoning 30 ta tasvirini o'z ichiga olgan pastki papka nomi, imo-ishora nomi va videoning raqamli yorlig'i (0–4 oralig'ida).

Bizning vazifamiz "val" jildida ham yaxshi ishlaydigan "poyezd" papkasida modelni o'rgatishdir (odatda ML loyihalarida bo'lgani kabi). Model yaratish jarayonini boshlash uchun avvalo xotirangizdagi ma'lumotlarni olishingiz kerak. Saqlashdagi ma'lumotlarni olish uchun quyidagi havolaga o'ting:

https://drive.google.com/uc?id=1ehyrYBQ5rbQQe6yL4XbLWe3FMvuVUGiL

Oldindan ishlov berish

kerakli kutubxonalarni import qiling:

import numpy as np
import math
import os
from imageio import imread
from skimage.transform import resize
from skimage.io import imread, imshow
import matplotlib.pyplot as plt
import datetime
import os
import warnings
import cv2
from tensorflow import keras
import tensorflow as tf
warnings.filterwarnings("ignore")

Ushbu blokda siz o'qitish va tekshirish uchun papka nomlarini o'qiysiz. Siz bu yerda batch_size ni ham o'rnatdingiz. E'tibor bering, siz GPU-ni to'liq quvvat bilan ishlatishingiz mumkin bo'lgan tarzda partiya hajmini o'rnatasiz. Mashina xatoga yo'l qo'ymaguncha, siz partiya hajmini oshirasiz.

train_doc = np.random.permutation(open('/content/Project_data/train.csv').readlines())
val_doc = np.random.permutation(open('/content/Project_data/val.csv').readlines())
batch_size = 26  

Generator

Bu kodning eng muhim qismlaridan biridir. Jeneratorning umumiy tuzilishi berilgan. Jeneratörda siz tasvirlarni oldindan qayta ishlaysiz, chunki sizda 2 xil o'lchamdagi tasvirlar mavjud, shuningdek, video ramkalar to'plami yaratiladi. Yuqori aniqlikka erishish uchun img_idx, y,z va normalizatsiya bilan tajriba o'tkazishingiz kerak.

def generator(source_path, folder_list, batch_size):
    print( 'Source path = ', source_path, '; batch size =', batch_size)
    img_idx = [0,1,2,4,6,8,10,12,14,16,18,20,22,24,26,27,28,29]
    while True:
        t = np.random.permutation(folder_list)
        num_batches = int(len(t)/batch_size)
        for batch in range(num_batches):
            batch_data = np.zeros((batch_size,18,120,120,3))
            batch_labels = np.zeros((batch_size,5))
            for folder in range(batch_size):
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0])
                for idx,item in enumerate(img_idx):
                    image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    if image.shape[1] == 160:
                        image = resize(image[:,20:140,:],(120,120)).astype(np.float32)
                    else:
                        image = resize(image,(120,120)).astype(np.float32)
                    
                    batch_data[folder,idx,:,:,0] = image[:,:,0] - 104
                    batch_data[folder,idx,:,:,1] = image[:,:,1] - 117
                    batch_data[folder,idx,:,:,2] = image[:,:,2] - 123
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels

        if (len(t)%batch_size) != 0:
            batch_data = np.zeros((len(t)%batch_size,18,120,120,3))
            batch_labels = np.zeros((len(t)%batch_size,5))
            for folder in range(len(t)%batch_size):
                imgs = os.listdir(source_path+'/'+ t[folder + (num_batches*batch_size)].split(';')[0])
                for idx,item in enumerate(img_idx):
                    image = imread(source_path+'/'+ t[folder + (num_batches*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    if image.shape[1] == 160:
                        image = resize(image[:,20:140,:],(120,120)).astype(np.float32)
                    else:
                        image = resize(image,(120,120)).astype(np.float32)

                    batch_data[folder,idx,:,:,0] = image[:,:,0] - 104
                    batch_data[folder,idx,:,:,1] = image[:,:,1] - 117
                    batch_data[folder,idx,:,:,2] = image[:,:,2] - 123

                batch_labels[folder, int(t[folder + (num_batches*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels

Bu erda video generatorda yuqorida (tasvirlar soni, balandlik, kenglik, kanallar soni) sifatida ko'rsatilganiga e'tibor bering. Model arxitekturasini yaratishda buni hisobga oling.

curr_dt_time = datetime.datetime.now()
train_path = '/content/Project_data/train'
val_path = '/content/Project_data/val'
num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
num_epochs = 30
print ('# epochs =', num_epochs)

Model yaratish

  1. Conv3D:

from keras.models import Sequential
from keras.layers import Dense, GRU, Dropout, Flatten, BatchNormalization, Activation
from keras.layers.convolutional import Conv3D, MaxPooling3D
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras import optimizers

model = Sequential()
model.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=(18,120,120,3)))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,1), strides=(2,2,1)))

model.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(512, activation='elu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))

Endi siz modelni yozganingizdan so'ng, keyingi qadam modelga compile bo'ladi. Modelning summary raqamini chop etganingizda, o'rgatish kerak bo'lgan parametrlarning umumiy sonini ko'rasiz.

sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.7, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Keling, .fit_generator da ishlatiladigan train_generator va val_generator ni yarataylik.

train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

model_name = 'model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, verbose=1, mode='min', epsilon=0.0001, cooldown=0, min_lr=0.00001)
callbacks_list = [checkpoint, LR]

steps_per_epoch va validation_steps fit usuli bilan keyingi() qo'ng'iroqlar sonini aniqlash uchun ishlatiladi.

if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

Keling, modelni moslashtiramiz. Bu modelni o'qitishni boshlaydi va nazorat punktlari yordamida siz har bir davr oxirida modelni saqlashingiz mumkin bo'ladi.

model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

30 davrdan keyin: Poyezdning aniqligi-0,9502, Tasdiqlashning aniqligi-0,8200.

2. CNN(VGG16) + RNN(Ikki tomonlama LSTM):

#model
from keras.models import Sequential, Model
from keras.layers import Dense, GRU, Dropout, Flatten, TimeDistributed, Bidirectional, LSTM
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras import optimizers
from keras.applications.vgg16 import VGG16
    
base_model = VGG16(include_top=False, weights='imagenet', input_shape=(120,120,3),pooling='avg')
x = base_model.output
x = Flatten()(x)
#x.add(Dropout(0.5))
features = Dense(64, activation='relu')(x)
conv_model = Model(inputs=base_model.input, outputs=features)
    
for layer in base_model.layers:
    layer.trainable = False
        
model_sec = Sequential()
model_sec.add(TimeDistributed(conv_model, input_shape=(18,120,120,3)))
model_sec.add(Bidirectional(LSTM(64, return_sequences=True)))
model_sec.add(GRU(32))
model_sec.add(Dropout(0.5))
model_sec.add(Dense(128, activation='relu'))
model_sec.add(Dense(5, activation='softmax'))
sgd = optimizers.Adam(lr=0.001, decay=1e-6)
model_sec.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model_sec.summary())

train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

model_name = 'model_init_conv_lstm' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, verbose=1, mode='min', epsilon=0.0001, cooldown=0, min_lr=0.00001)
callbacks_list = [checkpoint, LR]
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1
model_sec.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

30 davrdan keyin: Poyezdning aniqligi-0,9985, Tasdiqlashning aniqligi-0,8213.

3. CNN(Resnet50) + RNN(Ikki tomonlama GRU):

#model
from keras.models import Sequential, Model
from keras.layers import Dense, GRU, Dropout, Flatten, TimeDistributed, Bidirectional, LSTM
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras import optimizers
from keras.applications.resnet import ResNet50
    
base_model = ResNet50(include_top=False, weights='imagenet', input_shape=(120,120,3),pooling='max')
x = base_model.output
x = Flatten()(x)
features = Dense(128, activation='relu')(x)
conv_model = Model(inputs=base_model.input, outputs=features)
    
for layer in base_model.layers:
    layer.trainable = False
        
model_three = Sequential()
model_three.add(TimeDistributed(conv_model, input_shape=(18,120,120,3)))
model_three.add(Bidirectional(GRU(32, return_sequences=True)))
model_three.add(GRU(16))
model_three.add(Dropout(0.5))
model_three.add(Dense(64, activation='relu'))
model_three.add(Dense(5, activation='softmax'))
sgd = optimizers.Adam(lr=0.001, decay=1e-6)
model_three.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model_three.summary())
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

model_name = 'model_init_conv_gru' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, verbose=1, mode='min', epsilon=0.0001, cooldown=0, min_lr=0.00001)
callbacks_list = [checkpoint, LR]
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1
model_three.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

30 davrdan keyin: Poyezdning aniqligi-0,9759, Tasdiqlashning aniqligi-0,8300.

Shunday qilib, biz uchun oxirgi model sifatida model_uch: CNN(Resnet50) + RNN(Ikki tomonlama GRU)ni tanlaymiz. Keling, ushbu modeldan foydalanib, bitta video ketma-ketlikda bashorat qilaylik.

Prognoz

Model_three yordamida video papkani bosh barmog‘i bilan yuklash va bashorat qilish:

#preprocessing single video sequence to make prediction upon
img_idx = [0,1,2,4,6,8,10,12,14,16,18,20,22,24,26,27,28,29]
video=[]
imgs = os.listdir('/content/Project_data/train/WIN_20180907_15_38_35_Pro_Thumbs Down_new'.split(';')[0])
for idx,item in enumerate(img_idx):
  image = imread('/content/Project_data/train/WIN_20180907_15_38_35_Pro_Thumbs Down_new'.strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
  if image.shape[1] == 160:
    image = resize(image[:,20:140,:],(120,120)).astype(np.float32)
  else:
    image = resize(image,(120,120)).astype(np.float32)
  image[:,:,0] -= 104
  image[:,:,1] -= 117
  image[:,:,2] -= 123
  video.append(image)                 
video = np.expand_dims(np.array(video), axis=0)
model_three.predict(video)

Chiqish: massiv([[1.6398988e-04, 1.0931127e-04, 2.5824511e-03, 8.9063227e-01, 1.0651199e-01]], dtype=float32)

bosh barmog‘ini pastga tushirish imo-ishorasini bashorat qiluvchi model (toifa:4).

Xulosa

Ushbu blogda muhokama qilingan imo-ishoralarni aniqlash loyihasi amaliy ilovalar uchun katta imkoniyatlarni ko'rsatdi. Chuqur o'rganish usullarini qo'llash orqali real vaqt rejimida turli xil qo'l ishoralarini aniq taniy oladigan va tasniflay oladigan tizimni ishlab chiqish mumkin.

Loyiha qo'l ishoralari videolari ma'lumotlar to'plamida o'qitiladigan chuqur o'rganish modelidan foydalangan. Keyin model turli arxitekturalar, o'qitish usullarini uzatishdan foydalangan holda yaxshi sozlandi, bu uning aniqligini oshirdi va mashg'ulot vaqtini qisqartirdi.

Ushbu texnologiyaning eng istiqbolli ilovalaridan biri inson va kompyuterning oʻzaro taʼsiri sohasida boʻlib, u yerda klaviatura va sichqonlar kabi anʼanaviy kiritish usullari oʻrniga qoʻl ishoralari yordamida qurilmalar va interfeyslarni boshqarishda foydalanish mumkin. Bu imkoniyati cheklangan yoki harakatchanligi cheklangan foydalanuvchilar uchun kompyuterlar va boshqa qurilmalarni yanada qulayroq va intuitiv qilishi mumkin.