Jaringan Syaraf Tiruan Berbasis Wilayah
Jaringan Syaraf Tiruan Berbasis Wilayah (bahasa Inggris: Region-based Convolutional Neural Networks) adalah keluarga model pembelajaran mesin untuk visi komputer dan khususnya deteksi objek
Sejarah
Tujuan awal dari R-CNN adalah untuk mengambil gambar input dan menghasilkan sekumpulan kotak pembatas sebagai output, di mana setiap kotak pembatas berisi objek dan juga kategori (misalnya mobil atau pejalan kaki) dari objek tersebut. Baru-baru ini, R-CNN telah diperluas untuk melakukan tugas-tugas visi komputer lainnya. Berikut ini adalah beberapa versi R-CNN yang telah dikembangkan.
- November 2013: R-CNN. Given an input image, R-CNN begins by applying a mechanism called Selective Search to extract regions of interest (ROI), where each ROI is a rectangle that may represent the boundary of an object in image. Depending on the scenario, there may be as many as two thousand ROIs. After that, each ROI is fed through a neural network to produce output features. For each ROI's output features, a collection of support-vector machine classifiers is used to determine what type of object (if any) is contained within the ROI.[1]
- April 2015: Fast R-CNN. While the original R-CNN independently computed the neural network features on each of as many as two thousand regions of interest, Fast R-CNN runs the neural network once on the whole image. At the end of the network is a novel method called ROIPooling, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses Selective Search to generate its region proposals.[2]
- June 2015: Faster R-CNN. While Fast R-CNN used Selective Search to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself.[2]
- March 2017: Mask R-CNN. While previous versions of R-CNN focused on object detection, Mask R-CNN adds instance segmentation. Mask R-CNN also replaced ROIPooling with a new method called ROIAlign, which can represent fractions of a pixel.[3][4]
- June 2019: Mesh R-CNN adds the ability to generate a 3D mesh from a 2D image.[5]
Penerapan
Jaringan syaraf tiruan berbasis wilayah telah digunakan untuk melacak objek dari kamera yang dipasang di pesawat nirawak,[6] locating text in an image,[7] dan memungkinkan pendeteksian objek di Google Lens.[8] Mask R-CNN berfungsi sebagai salah satu dari tujuh tugas dalam MLPerf Training Benchmark, yang merupakan kompetisi untuk mempercepat pelatihan jaringan saraf.[9]
Referensi
- ^ Gandhi, Rohith (July 9, 2018). "R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms". Towards Data Science. Diakses tanggal March 12, 2020.
- ^ a b Bhatia, Richa (September 10, 2018). "What is region of interest pooling?". Analytics India. Diakses tanggal March 12, 2020.
- ^ Farooq, Umer (February 15, 2018). "From R-CNN to Mask R-CNN". Medium. Diakses tanggal March 12, 2020.
- ^ Weng, Lilian (December 31, 2017). "Object Detection for Dummies Part 3: R-CNN Family". Lil'Log. Diakses tanggal March 12, 2020.
- ^ Wiggers, Kyle (October 29, 2019). "Facebook highlights AI that converts 2D objects into 3D shapes". VentureBeat. Diakses tanggal March 12, 2020.
- ^ Nene, Vidi (2 Agustus 2019). "Deep Learning-Based Real-Time Multiple-Object Detection and Tracking via Drone". Drone Below. Diakses tanggal 28 Maret 2020.
- ^ Ray, Tiernan (Sep 11, 2018). "Facebook pumps up character recognition to mine memes". ZDNET. Diakses tanggal Mar 28, 2020.
- ^ Sagar, Ram (Sep 9, 2019). "These machine learning methods make google lens a success". Analytics India. Diakses tanggal Mar 28, 2020.
- ^ Mattson, Peter (2019). "MLPerf Training Benchmark". arΧiv:1910.01500v3 [math.LG].