Sign in

1. Contribution

2. Limitations of YOLO

1. Contribution of YOLO

1.1 Why are previous networks slow?

This is due to the nature of their architectures of DPM (Deformable Parts Models) or R-CNN. In DPM, a sliding window approach is applied to the every-evenly-divided region to search objects over an…

1. Improvement from Fast R-CNN by introducing a Region Proposal Network

1. Dilated Convolution

Dilated convolution, which is also often called atrous convolution, was introduced in 2016 ICLR. Its major idea is that when performing convolution, not taking a look at directly adjacent pixels, but further away pixels by a certain distance.

This distance is called ‘Dilation rate r,’ and the dilated convolution can be mathematically represented as follows.

Equation 1. Dilated Convolution with dilation rate r

From the equation of standard convolution, only the term for dilation rate r is added, and as one might notice, if r is one, then the eq 1 is just a standard convolution.

Then what is the role of dilatation rate? This question can be…

1. Motivation

When we study neural network architectures based on encoder and decoder, it is commonly observed that the network performs downsampling in the encoder and upsampling inside the decoder, as is illustrated in Fig 1.

Figure 1. Fully-Convolutional Network for Segmentation with encoder-decoder structure, from [7]

Methods for downsampling are max pooling and strided convolution. However, what are the available approaches for upsampling?

This article mainly addresses two upsampling methods based on pooling and convolution, respectively.

2. Unpooling

The first type of unsampling is Unpooling which takes the idea of pooling. The max-pooling operation takes only the largest response from each sub-divided regions of the feature map.

1. Recap

Previously, we have seen how local features are extracted from an image, using scale-invariant local feature descriptors such as Harris-Laplace and SIFT. After extracting local features individually from two images, the local features can be paired by searching the one with the highest similarity.

Figure 1. pairs of interest points independently extracted from two images of the same object, from [1]

More specifically, the basic matching algorithm is as follows:

(1) Detect interest points in two images using local feature descriptors.

(2) For each interest point, extract region (patch) and compute a local feature representation.

(3) Compare one feature from image 1 to every feature in image 2 and select one which shows the highest similarity.


1. SIFT (Scale Invariant Feature Transform)

Another scale-invariant algorithm I want to address is the SIFT. With SIFT, the location of local feature points (interest points) are extracted from an image, and further, the corresponding vector representation of the respective interest point is generated, which could be used later on for feature matching between images. As SIFT has quite many steps involved, it might seem complicated and frustrated but many of them are from what we already studied (e.g. feature descriptor like HoG, scale selection for local feature points by LoG).

We will go through the flow of SIFT one by one, but before getting into…

This article is the second part of the topic: scale-invariant local feature extraction. As I assume you know local feature detectors (Harris, Hessian), feature descriptors (HoG), and automatic scale selection (with LoG) by now, please visit here if you have not read the first part yet or not familiar with them before continuing to go over this article.

1. Recap

Harris and Hessian are shophisticatelly designed local feature detectors, which are used to find corners in the image. Despite they show significant performance, one huge limitation lie in them: the Harris and Hessian corner detectors are not invariant to scale.

Figure 1. Why Harris and Hessian corner detectors are not scale-invariant, from [2]

We also…

1. Motivation: Why scale-invariant?

Let’s recall what we studied about the local feature extractors: Harris and Hessian in the last article. Harris and Hessian are strong corner detectors. However, detectors are rotation invariant. However, they are not scale-invariant, which is a crucial drawback in terms of feature detection. The reason why scale invariance is important is intuitively visualized in the following figure slide from [2].

Figure 1. Why the scale invariance is important, from [2]

Let’s say we have two same images but with different scales (one is an upsampled version of the other one) as is shown above. …


Novice Developer, Research Assistant in Fraunhofer IAIS, Master student of RWTH

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store