Featured image: Shutter Stock | Tesmanian
In the field of machine vision for autonomous vehicles, object detection is a computationally intensive task. Typically, the resolution of an image is sent as input to a detector, and the detector consistently detects pixel size. Most detectors have a minimum number of pixels that are required as input for a detector to detect objects within the image.
For example, many detectors require at least forty pixels in the image in order to detect objects. The computational complexity required for a detector scales directly with the number of pixels being fed into the detector. If twice the number of pixels are fed into the detector as input, then the detector will typically take twice as long to produce an output.
Out of necessity and lack of computational resources within autonomous vehicles, in order to address this high computational requirement, object detectors nearly always perform their processing tasks using downsampled images as input. Downsampling of high resolution images is a technique that lowers the high computational requirement for image processing by creating an access image that is a miniaturized duplicate of the optical resolution master image, typically outputted from an automotive camera.
While computational requirements are lowered, downsampling these images reduces the range, or distance, of detections due to the fewer number of pixels that are acted upon by the detector. For width and height, for example, the detector may process the image four times as fast, but objects such as cars will be smaller in the downsized image and will need to be twice as close in the camera for them to be the same pixel size , depending on the camera and its field of view (hereinafter "FOV").
As a result, accurate detectors are slower than is typically desirable due to the high computational requirements, while faster detectors using downsampled images are not as accurate as typically desired.
Tesla filed a patent 'Enhanced object detection for autonomous vehicles based on field view' on December 4, 2019 and published it on June 4, 2020. It relates generally to the machine vision field, and more specifically to enhanced objection detection from a vehicle. This invention relates to techniques for improving the accuracy of object detection in specific fields of view.
FIG. 1 is a schematic representation of the object detection system.
Source: Tesla patent
Image sensors (like cameras) can be located around a vehicle. Certain image sensors, such as forward facing image sensors, may thus obtain images of a real-world location towards which the vehicle is heading. It may be appreciated that a portion of these images may tend to depict pedestrians, vehicles, obstacles, and so on that are important in applications such as autonomous vehicle navigation. For example, a portion along a road on which the vehicle is driving may tend to depict other vehicles.
FIG. 3 is an illustration of an example of cropped objects in bounding boxes, according to an embodiment of the object detection method.
Source: Tesla patent
This patent discusses a method that includes receiving an image from an image sensor of one or more image sensors located around a vehicle. The field of view for the image is determined, and the field of view is associated with the disappearing line.
Upon determination, the particular field of view may be cropped from an input image. A remaining portion of the input image may then be downsampled. The relatively high resolution cropped portion of the input image and the lower resolution downsampled portion of the input image may then be analyzed by an object detector (e.g., a convolutional neural network).
In this way, the object detector may expend greater computational resources analyzing the higher resolution particular field of view at the vanishing line which is more likely to have important features. Additionally, with the greater detail in the cropped portion the system may more reliably detect objects, avoid false positives, and so on.
According to the patent, methods for detecting an object may include:
- receiving one or more pieces of data relating to a high resolution image;
- determining a field of view (FOV) based on the pieces of data;
- cropping the FOV to generate a high resolution crop of the image;
- downsampling the rest of the image to the size of the cropped region to generate a low resolution image;
- sending a batch of the high resolution crop and the low resolution image to a detector; and
- processing the images via the detector to generate an output of detected objects.