Select Page

Publications – Journals and Conferences

Contrast computation methods for interferometric measurement of sensor modulation transfer function

Tharun Battula; Todor Georgiev; Jennifer Gille; Sergio Goma

Abstract

Accurate measurement of image-sensor frequency response over a wide range of spatial frequencies is very important for analyzing pixel array characteristics, such as modulation transfer function (MTF), crosstalk, and active pixel shape. Such analysis is especially significant in computational photography for the purposes of deconvolution, multi-image superresolution, and improved light-field capture. We use a lensless interferometric setup that produces high-quality fringes for measuring MTF over a wide range of frequencies (here, 37 to 434 line pairs per mm). We discuss the theoretical framework, involving Michelson and Fourier contrast measurement of the MTF, addressing phase alignment problems using a moiré pattern. We solidify the definition of Fourier contrast mathematically and compare it to Michelson contrast. Our interferometric measurement method shows high detail in the MTF, especially at high frequencies (above Nyquist frequency). We are able to estimate active pixel size and pixel pitch from measurements. We compare both simulation and experimental MTF results to a lens-free slanted-edge implementation using commercial software.

6 February 2018

J. of Electronic Imaging, 27(1), 013015 (2018)

A 3D Stacked Programmable Image Processing Engine in a 40nm Logic Process with a Detector Array in a 45nm CMOS Image Sensor Technologies

Biay‐Cheng Hseih1, Sami Khawam1, Nousias Ioannis1, Mark Muir1, Khoi Le1, Keith Honea1, Sergio Goma1 , RJ Lin2, Chin‐Hao Chang2, Charles Liu2, Shang‐Fu Yeh2, Hong‐Yi Tu2, Kuo‐Yu Chou2, Calvin Chao2. 1Qualcomm Technologies Inc., USA; 2TSMC, Taiwan, ROC

Abstract

Current mobile camera systems present a huge image signal processing (ISP) programmable limitation, since the ISP algorithm is mainly hard‐coded via Application Processor. We’ll present the prototype development result of a Re‐Configurable Instruction Cell Array (RICA), a real time, and low power reprogrammable ISP engine stacked with 8MP detector array in 45nm BSI CMOS imager, and 40nm logic technologies. We believe this RICA stacked image sensor technology presents an efficient programmability solution to support adjacent IOT markets, and next generation computational camera technologies.

30 May 2017

IISW (2017)

Hardware-friendly universal demosaick using non-iterative map reconstruction

Hasib Siddiqui; Kalin Atanassov; Sergio Goma

Abstract

Non-Bayer color filter array (CFA) sensors have recently drawn attention due to their superior compression of spectral energy, ability to deliver improved signal-to-noise ratio, or ability to provide high dynamic range (HDR) imaging. Demosaicking methods that perform color interpolation of Bayer CFA data have been widely investigated. However, a bottleneck to the adaption of emerging non-Bayer CFA sensors is the unavailability of efficient color-interpolation algorithms that can demosaick the new patterns. Designing a new demosaick algorithm for every proposed CFA pattern is a challenge. In this paper, we propose a hardware-friendly universal demosaick algorithm based on maximum a-posteriori (MAP) estimation that can be configured to demosaick raw images captured using a variety of CFA sensors. The forward process of mosaicking is modeled as a linear operation. We then use quadratic data-fitting and image prior terms in a MAP framework and pre-compute the inverse matrix for recovering the full RGB image from CFA observations for a given pattern. The pre-computed inverse is later used in real-time application to demosaick the given CFA pattern. The inverse matrix is observed to have a Toeplitz-like structure, allowing for hardware-efficient implementation of the algorithm. We use a set of 24 Kodak color images to evaluate the quality of our demosaick algorithm on three different CFA patterns. The PSNR values of the reconstructed full-channel RGB images from CFA samples are reported in the paper.

28 September 2016

2016 IEEE International Conference on Image Processing (ICIP)

Next gen perception and cognition: augmenting perception and enhancing cognition through mobile technologies

Sergio R Goma

Abstract

In current times, mobile technologies are ubiquitous and the complexity of problems is continuously increasing. In the context of advancement of engineering, we explore in this paper possible reasons that could cause a saturation in technology evolution – namely the ability of problem solving based on previous results and the ability of expressing solutions in a more efficient way, concluding that ‘thinking outside of brain’ – as in solving engineering problems that are expressed in a virtual media due to their complexity – would benefit from mobile technology augmentation. This could be the necessary evolutionary step that would provide the efficiency required to solve new complex problems (addressing the ‘running out of time’ issue) and remove the communication of results barrier (addressing the human ‘perception/expression imbalance’ issue). Some consequences are discussed, as in this context the artificial intelligence becomes an automation tool aid instead of a necessary next evolutionary step. The paper concludes that research in modeling as problem solving aid and data visualization as perception aid augmented with mobile technologies could be the path to an evolutionary step in advancing engineering.

17 March 2015

Proceedings Volume 9394, Human Vision and Electronic Imaging XX; 93940I (2015)

Invited

Depth enhanced and content aware video stabilization

A. Lindner; K. Atanassov; S. Goma

Abstract

We propose a system that uses depth information for video stabilization. The system uses 2D-homographies as frame pair transforms that are estimated with keypoints at the depth of interest. This makes the estimation more robust as the points lie on a plane. The depth of interest can be determined automatically from the depth histogram, inferred from user input such as tap-to-focus, or selected by the user; i.e., tap-to-stabilize. The proposed system can stabilize videos on the fly in a single pass and is especially suited for mobile phones with multiple cameras that can compute depth maps automatically during image acquisition.

11 March 2015

Proceedings Volume 9411, Mobile Devices and Multimedia: Enabling Technologies, Algorithms, and Applications 2015; 941106 (2015)

MTF evaluation of white pixel sensors

Albrecht Lindner; Kalin Atanassov; Jiafu Luo; Sergio Goma

Abstract

We present a methodology to compare image sensors with traditional Bayer RGB layouts to sensors with alternative layouts containing white pixels. We focused on the sensors’ resolving powers, which we measured in the form of a modulation transfer function for variations in both luma and chroma channels. We present the design of the test chart, the acquisition of images, the image analysis, and an interpretation of results. We demonstrate the approach at the example of two sensors that only differ in their color filter arrays. We confirmed that the sensor with white pixels and the corresponding demosaicing result in a higher resolving power in the luma channel, but a lower resolving power in the chroma channels when compared to the traditional Bayer sensor.

8 February 2015

Proceedings Volume 9396, Image Quality and System Performance XII; 939608 (2015)

Video adaptation for consumer devices: opportunities and challenges offered by new standards

James Nightingale; Qi Wang; Christos Grecos; Sergio R. Goma

Abstract

Video and multimedia streaming services continue to grow in popularity and are rapidly becoming the largest consumers of network capacity in both fixed and mobile networks. In this article we discuss the latest advances in video compression technology and demonstrate their potential to improve service quality for consumers while reducing bandwidth consumption. Our study focuses on the adaptation of scalable, highly compressed video streams to meet the resource constraints of a wide range of portable consumer devices in mobile environments. Exploring SHVC, the scalable extension to the recently standardized High Efficiency Video Coding scheme, we show the bandwidth savings that can be achieved over current encoding schemes and highlight the challenges that lie ahead in realizing a deployable and user-centric system.

11 December 2014

IEEE Communications Magazine ( Volume: 52 , Issue: 12 , December 2014 )

The impact of network impairment on quality of experience (QoE) in H.265/HEVC video streaming

James Nightingale; Qi Wang; Christos Grecos; Sergio R. Goma

Abstract

Users of modern portable consumer devices (smartphones, tablets etc.) expect ubiquitous delivery of high quality services, which fully utilise the capabilities of their devices. Video streaming is one of the most widely used yet challenging services for operators to deliver with assured service levels. This challenge is more apparent in wireless networks where bandwidth constraints and packet loss are common. The lower bandwidth requirements of High Efficiency Video Coding (HEVC) provide the potential to enable service providers to deliver high quality video streams in low-bandwidth networks; however, packet loss may result in greater damage in perceived quality given the higher compression ratio. This work considers the delivery of HEVC encoded video streams in impaired network environments and quantifies the effects of network impairment on HEVC video streaming from the perspective of the end user. HEVC encoded streams were transmitted over a test network with both wired and wireless segments that had imperfect communication channels subject to packet loss. Two different error concealment methods were employed to mitigate packet loss and overcome reference decoder robustness issues. The perceptual quality of received video was subjectively assessed by a panel of viewers. Existing subjective studies of HEVC quality have not considered the implications of network impairments. Analysis of results has quantified the effects of packet loss in HEVC on perceptual quality and provided valuable insight into the relative importance of the main factors observed to influence user perception in HEVC streaming. The outputs from this study show the relative importance and relationship between those factors that affect human perception of quality in impaired HEVC encoded video streams. The subjective analysis is supported by comparison with commonly used objective quality measurement techniques. Outputs from this work may be used in the development of quality of experience (QoE) oriented streaming applications for HEVC in loss prone networks.

14 July 2014

IEEE Transactions on Consumer Electronics ( Volume: 60 , Issue: 2 , May 2014 )

Deriving video content type from HEVC bitstream semantics

James Nightingale; Qi Wang; Christos Grecos; Sergio R. Goma

Abstract

As network service providers seek to improve customer satisfaction and retention levels, they are increasingly moving from traditional quality of service (QoS) driven delivery models to customer-centred quality of experience (QoE) delivery models. QoS models only consider metrics derived from the network however, QoE models also consider metrics derived from within the video sequence itself. Various spatial and temporal characteristics of a video sequence have been proposed, both individually and in combination, to derive methods of classifying video content either on a continuous scale or as a set of discrete classes. QoE models can be divided into three broad categories, full reference, reduced reference and no-reference models. Due to the need to have the original video available at the client for comparison, full reference metrics are of limited practical value in adaptive real-time video applications. Reduced reference metrics often require metadata to be transmitted with the bitstream, while no-reference metrics typically operate in the decompressed domain at the client side and require significant processing to extract spatial and temporal features. This paper proposes a heuristic, no-reference approach to video content classification which is specific to HEVC encoded bitstreams. The HEVC encoder already makes use of spatial characteristics to determine partitioning of coding units and temporal characteristics to determine the splitting of prediction units. We derive a function which approximates the spatio-temporal characteristics of the video sequence by using the weighted averages of the depth at which the coding unit quadtree is split and the prediction mode decision made by the encoder to estimate spatial and temporal characteristics respectively. Since the video content type of a sequence is determined by using high level information parsed from the video stream, spatio-temporal characteristics are identified without the need for full decoding and can be used in a timely manner to aid decision making in QoE oriented adaptive real time streaming.

15 May 2014

Proceedings Volume 9139, Real-Time Image and Video Processing 2014; 913902 (2014)

Deriving video content type from HEVC bitstream semantics

James Nightingale; Qi Wang; Christos Grecos; Sergio R. Goma

Abstract

As network service providers seek to improve customer satisfaction and retention levels, they are increasingly moving from traditional quality of service (QoS) driven delivery models to customer-centred quality of experience (QoE) delivery models. QoS models only consider metrics derived from the network however, QoE models also consider metrics derived from within the video sequence itself. Various spatial and temporal characteristics of a video sequence have been proposed, both individually and in combination, to derive methods of classifying video content either on a continuous scale or as a set of discrete classes. QoE models can be divided into three broad categories, full reference, reduced reference and no-reference models. Due to the need to have the original video available at the client for comparison, full reference metrics are of limited practical value in adaptive real-time video applications. Reduced reference metrics often require metadata to be transmitted with the bitstream, while no-reference metrics typically operate in the decompressed domain at the client side and require significant processing to extract spatial and temporal features. This paper proposes a heuristic, no-reference approach to video content classification which is specific to HEVC encoded bitstreams. The HEVC encoder already makes use of spatial characteristics to determine partitioning of coding units and temporal characteristics to determine the splitting of prediction units. We derive a function which approximates the spatio-temporal characteristics of the video sequence by using the weighted averages of the depth at which the coding unit quadtree is split and the prediction mode decision made by the encoder to estimate spatial and temporal characteristics respectively. Since the video content type of a sequence is determined by using high level information parsed from the video stream, spatio-temporal characteristics are identified without the need for full decoding and can be used in a timely manner to aid decision making in QoE oriented adaptive real time streaming.

15 May 2014

Proceedings Volume 9139, Real-Time Image and Video Processing 2014; 913902 (2014)

Structured light 3D depth map enhancement and gesture recognition using image content adaptive filtering

Vikas Ramachandra; James Nash; Kalin Atanassov; Sergio Goma

Abstract

A structured-light system for depth estimation is a type of 3D active sensor that consists of a structured-light projector that projects an illumination pattern on the scene (e.g. mask with vertical stripes) and a camera which captures the illuminated scene. Based on the received patterns, depths of different regions in the scene can be inferred. In this paper, we use side information in the form of image structure to enhance the depth map. This side information is obtained from the received light pattern image reflected by the scene itself. The processing steps run real time. This post-processing stage in the form of depth map enhancement can be used for better hand gesture recognition, as is illustrated in this paper.

7 March 2014

Proceedings Volume 9020, Computational Imaging XII; 902005 (2014)

Evaluation of in-network adaptation of scalable high efficiency video coding (SHVC) in mobile environments

James Nightingale; Qi Wang; Christos Grecos; Sergio Goma

Abstract

High Efficiency Video Coding (HEVC), the latest video compression standard (also known as H.265), can deliver video streams of comparable quality to the current H.264 Advanced Video Coding (H.264/AVC) standard with a 50% reduction in bandwidth. Research into SHVC, the scalable extension to the HEVC standard, is still in its infancy. One important area for investigation is whether, given the greater compression ratio of HEVC (and SHVC), the loss of packets containing video content will have a greater impact on the quality of delivered video than is the case with H.264/AVC or its scalable extension H.264/SVC. In this work we empirically evaluate the layer-based, in-network adaptation of video streams encoded using SHVC in situations where dynamically changing bandwidths and datagram loss ratios require the real-time adaptation of video streams. Through the use of extensive experimentation, we establish a comprehensive set of benchmarks for SHVC-based highdefinition video streaming in loss prone network environments such as those commonly found in mobile networks. Among other results, we highlight that packet losses of only 1% can lead to a substantial reduction in PSNR of over 3dB and error propagation in over 130 pictures following the one in which the loss occurred. This work would be one of the earliest studies in this cutting-edge area that reports benchmark evaluation results for the effects of datagram loss on SHVC picture quality and offers empirical and analytical insights into SHVC adaptation to lossy, mobile networking conditions.

18 February 2014

Proceedings Volume 9030, Mobile Devices and Multimedia: Enabling Technologies, Algorithms, and Applications 2014; 90300B (2014)

Subjective evaluation of the effects of packet loss on HEVC encoded video streams

James Nightingale; Qi Wang; Christos Grecos; Sergio Goma

Abstract

The emerging High Efficiency Video Coding standard (HEVC) will bring the benefit of delivering the same statistical quality at about half of the bandwidths required in the current H.264/AVC standard. Such significantly higher compression efficiency of HEVC will, however, potentially lead to higher sensitivity to packet loss and thus have a great impact on the users of portable consumer devices such as smartphones and tablets when delivering HEVC encoded video over loss-prone networks, thereby adversely affecting the user’s quality of experience (QoE). Existing subjective evaluations of the perceptual quality of HEVC have focused on its performance in loss-free environments. In this work, we empirically transmit HEVC streams over a hybrid wired/wireless network at typical (UK) mobile broadband speeds under a range of packet loss conditions, using typical smartphone and tablet resolutions. Our subjective evaluation experiments quantify the effect, on perceptual quality, of packet loss in HEVC streams and establish a packet loss rate threshold of 3% beyond which users find poor perceptual quality has a detrimental effect on their QoE. Furthermore, we employ two error concealment schemes to mitigate the impact of packet loss/corruption and investigate their effectiveness on users’ QoE.

11 September 2013

2013 IEEE Third International Conference on Consumer Electronics ¿ Berlin (ICCE-Berlin)

Self-calibration of depth sensing systems based on structured-light 3D

Vikas Ramachandra; James Nash; Kalin Atanassov; Sergio Goma

Abstract

A structured-light system for depth estimation is a type of 3D active sensor that consists of a structured-light projector, that projects a light pattern on the scene (e.g. mask with vertical stripes), and a camera which captures the illuminated scene. Based on the received patterns, depths of different regions in the scene can be inferred. For this setup to work optimally, the camera and projector must be aligned such that the projection image plane and the image capture plane are parallel, i.e. free of any relative rotations (yaw, pitch and roll). In reality, due to mechanical placement inaccuracy, the projector-camera pair will not be aligned. In this paper we present a calibration process which measures the misalignment. We also estimate a scale factor to account for differences in the focal lengths of the projector and the camera. The three angles of rotation can be found by introducing a plane in the field of view of the camera and illuminating it with the projected light patterns. An image of this plane is captured and processed to obtain the relative pitch, yaw and roll angles, as well as the scale through an iterative process. This algorithm leverages the effects of the misalignment/ rotation angles on the depth map of the plane image.

12 March 2013

Proceedings Volume 8650, Three-Dimensional Image Processing (3DIP) and Applications 2013; 86500V (2013)

draft

Introducing the cut-out star target to evaluate the resolution performance of 3D structured-light systems

Tom Osborne; Vikas Ramachandra; Kalin Atanassov; Sergio Goma

Abstract

Structured light depth map systems are a type of 3D system where a structured light pattern is projected into the object space and an adjacent receiving camera is used to capture the image of the scene. By using the distance between the camera and the projector together with the structured pattern you can estimate the depth of objects in the scene from the camera. It is important to be able to compare two systems to see how one compares to another. Accuracy, resolution, and speed are three aspects of a structured light system that are often used for performance evaluation. It would be ideal if we could use the accuracy and resolution measurements to answer questions such as how close two cubes can be together and be resolved as two objects. Or, determine how close a person must be to the structured light system in order to determine how many fingers this person is holding up. It turns out, from our experiments, a systems ability to resolve the shape of an object is dependent on a number of factors such as the shape of an object, its orientation and how close it is to other adjacent objects. This makes the task of comparing the resolution of two systems difficult. Our goal is to choose a target or a set of targets from which we make measurements that will enable us to quantify, on the average, the comparative resolution performance of one system to another without having to make multiple measurements on scenes with a large set of object shapes, orientations and proximities to each other. In this document we will go over a number of targets we evaluated and will focus on the “Cut-out Star Target” that we selected as being the best choice. Using this target we will show our evaluation results of two systems. The metrics we used for the evaluation were developed during this work. These metrics will not directly answers the question of how close two objects can be to each other and still be resolve, but it will indicate which system will perform better over a large set of objects, orientations and proximities to other objects.

12 March 2013

Proceedings Volume 8650, Three-Dimensional Image Processing (3DIP) and Applications 2013; 86500P (2013)

draft

Lytro camera technology: theory, algorithms, performance analysis

Todor Georgiev; Zhan Yu; Andrew Lumsdaine; Sergio Goma

Abstract

The Lytro camera is the first implementation of a plenoptic camera for the consumer market. We consider it a successful example of the miniaturization aided by the increase in computational power characterizing mobile computational photography. The plenoptic camera approach to radiance capture uses a microlens array as an imaging system focused on the focal plane of the main camera lens. This paper analyzes the performance of Lytro camera from a system level perspective, considering the Lytro camera as a black box, and uses our interpretation of Lytro image data saved by the camera. We present our findings based on our interpretation of Lytro camera file structure, image calibration and image rendering; in this context, artifacts and final image resolution are discussed.

7 March 2013

Proceedings Volume 8667, Multimedia Content and Mobile Devices; 86671J (2013)

draft

Temporal image stacking for noise reduction and dynamic range improvement

Kalin Atanassov; James Nash; Sergio Goma; Vikas Ramachandra; Hasib Siddiqui

Abstract

The dynamic range of an imager is determined by the ratio of the pixel well capacity to the noise floor. As the scene dynamic range becomes larger than the imager dynamic range, the choices are to saturate some parts of the scene or “bury” others in noise. In this paper we propose an algorithm that produces high dynamic range images by “stacking” sequentially captured frames which reduces the noise and creates additional bits. The frame stacking is done by frame alignment subject to a projective transform and temporal anisotropic diffusion. The noise sources contributing to the noise floor are the sensor heat noise, the quantization noise, and the sensor fixed pattern noise. We demonstrate that by stacking images the quantization and heat noise are reduced and the decrease is limited only by the fixed pattern noise. As the noise is reduced, the resulting cleaner image enables the use of adaptive tone mapping algorithms which render HDR images in an 8-bit container without significant noise increase.

7 March 2013

Proceedings Volume 8667, Multimedia Content and Mobile Devices; 86671P (2013)

draft

Touch HDR: photograph enhancement by user controlled wide dynamic range adaptation

Steve Verrall; Hasib Siddiqui; Kalin Atanassov; Sergio Goma; Vikas Ramachandra

Abstract

High Dynamic Range (HDR) technology enables photographers to capture a greater range of tonal detail. HDR is typically used to bring out detail in a dark foreground object set against a bright background. HDR technologies include multi-frame HDR and single-frame HDR. Multi-frame HDR requires the combination of a sequence of images taken at different exposures. Single-frame HDR requires histogram equalization post-processing of a single image, a technique referred to as local tone mapping (LTM). Images generated using HDR technology can look less natural than their non- HDR counterparts. Sometimes it is only desired to enhance small regions of an original image. For example, it may be desired to enhance the tonal detail of one subject’s face while preserving the original background. The Touch HDR technique described in this paper achieves these goals by enabling selective blending of HDR and non-HDR versions of the same image to create a hybrid image. The HDR version of the image can be generated by either multi-frame or single-frame HDR. Selective blending can be performed as a post-processing step, for example, as a feature of a photo editor application, at any time after the image has been captured. HDR and non-HDR blending is controlled by a weighting surface, which is configured by the user through a sequence of touches on a touchscreen.

7 March 2013

Proceedings Volume 8667, Multimedia Content and Mobile Devices; 86671O (2013)

draft

Special Section Guest Editorial: Mobile Computational Photography

Todor G. Georgiev; Andrew Lumsdaine; Sergio R. Goma

21 February 2013

J. of Electronic Imaging, 22(1), 010901 (2013)

Digital ruler: real-time object tracking and dimension measurement using stereo cameras

James Nash; Kalin Atanassov; Sergio Goma; Vikas Ramachandra; Hasib Siddiqui

Abstract

Stereo metrology involves obtaining spatial estimates of an object’s length or perimeter using the disparity between boundary points. True 3D scene information is required to extract length measurements of an object’s projection onto the 2D image plane. In stereo vision the disparity measurement is highly sensitive to object distance, baseline distance, calibration errors, and relative movement of the left and right demarcation points between successive frames. Therefore a tracking filter is necessary to reduce position error and improve the accuracy of the length measurement to a useful level. A Cartesian coordinate extended Kalman (EKF) filter is designed based on the canonical equations of stereo vision. This filter represents a simple reference design that has not seen much exposure in the literature. A second filter formulated in a modified sensor-disparity (DS) coordinate system is also presented and shown to exhibit lower errors during a simulated experiment.

19 February 2013

Proceedings Volume 8656, Real-Time Image and Video Processing 2013; 865606 (2013)

draft

Unassisted 3D camera calibration

Kalin Atanassov; Vikas Ramachandra; James Nash; Sergio R. Goma

Abstract

With the rapid growth of 3D technology, 3D image capture has become a critical part of the 3D feature set on mobile phones. 3D image quality is affected by the scene geometry as well as on-the-device processing. An automatic 3D system usually assumes known camera poses accomplished by factory calibration using a special chart. In real life settings, pose parameters estimated by factory calibration can be negatively impacted by movements of the lens barrel due to shaking, focusing, or camera drop. If any of these factors displaces the optical axes of either or both cameras, vertical disparity might exceed the maximum tolerable margin and the 3D user may experience eye strain or headaches. To make 3D capture more practical, one needs to consider unassisted (on arbitrary scenes) calibration. In this paper, we propose an algorithm that relies on detection and matching of keypoints between left and right images. Frames containing erroneous matches, along with frames with insufficiently rich keypoint constellations, are detected and discarded. Roll, pitch yaw , and scale differences between left and right frames are then estimated. The algorithm performance is evaluated in terms of the remaining vertical disparity as compared to the maximum tolerable vertical disparity.

23 February 2012

Proceedings Volume 8288, Stereoscopic Displays and Applications XXIII; 828808 (2012)

3D discomfort from vertical and torsional disparities in natural images

Christopher W. Tyler; Lora T. Likova; Kalin Atanassov; Vikas Ramachandra; Sergio Goma

Abstract

The two major aspects of camera misalignment that cause visual discomfort when viewing images on a 3D display are vertical and torsional disparities. While vertical disparities are uniform throughout the image, torsional rotations introduce a range of disparities that depend on the location in the image. The goal of this study was to determine the discomfort ranges for the kinds of natural image that people are likely to take with 3D cameras rather than the artificial line and dot stimuli typically used for laboratory studies. We therefore assessed visual discomfort on a five-point scale from ‘none’ to ‘severe’ for artificial misalignment disparities applied to a set of full-resolution images of indoor scenes. For viewing times of 2 s, discomfort ratings for vertical disparity in both 2D and 3D images rose rapidly toward the discomfort level of 4 (‘severe’) by about 60 arcmin of vertical disparity. Discomfort ratings for torsional disparity in the same image rose only gradually, reaching only the discomfort level of 3 (‘strong’) by about 50 deg of torsional disparity. These data were modeled with a second-order hyperbolic compression function incorporating a term for the basic discomfort of the 3D display in the absence of any misalignments through a Minkowski norm. These fits showed that, at a criterion discomfort level of 2 (‘moderate’), acceptable levels of vertical disparity were about 15 arcmin. The corresponding values for the torsional disparity were about 30 deg of relative orientation.

17 February 2012

Proceedings Volume 8291, Human Vision and Electronic Imaging XVII; 82910Q (2012)

Plenoptic Principal Planes

Todor Georgiev; Andrew Lumsdaine; Sergio Goma

Abstract

We show that the plenoptic camera is optically equivalent to an array of cameras. We compute the parameters that establish that equivalence and show where the plenoptic camera is more useful than the camera array.

14 July 2011

Imaging and Applied Optics, OSA Technical Digest (CD) (Optical Society of America, 2011), paper JTuD3.

Target signature agnostic tracking with an ad-hoc network of omni-directional sensors

Kalin Atanassov; William Hodgkiss; Sergio Goma

Abstract

Ad-hoc networks of simple, omni-directional sensors present an attractive solution to low-cost, easily deployable, fault tolerant target tracking systems. In this paper, we present a tracking algorithm that relies on a real time observation of the target power, received by multiple sensors. We remove target position dependency on the emitted target power by taking ratios of the power observed by different sensors, and apply the natural logarithm to effectively transform to another coordinate system. Further, we derive noise statistics in the transformed space and demonstrate that the observation in the new coordinates is linear in the presence of additive Gaussian noise. We also show how a typical dynamic model in Cartesian coordinates can be adapted to the new coordinate system. As a consequence, the problem of tracking target position with omni-directional sensors can be adapted to the conventional Kalman filter framework. We validate the proposed methodology through simulations under different noise, target movement, and sensor density conditions.

5 May 2011

Proceedings Volume 8050, Signal Processing, Sensor Fusion, and Target Recognition XX; 805017 (2011)

Multithreaded real-time 3D image processing software architecture and implementation

Vikas Ramachandra; Kalin Atanassov; Milivoje Aleksic; Sergio R. Goma

Abstract

Recently, 3D displays and videos have generated a lot of interest in the consumer electronics industry. To make 3D capture and playback popular and practical, a user friendly playback interface is desirable. Towards this end, we built a real time software 3D video player. The 3D video player displays user captured 3D videos, provides for various 3D specific image processing functions and ensures a pleasant viewing experience. Moreover, the player enables user interactivity by providing digital zoom and pan functionalities. This real time 3D player was implemented on the GPU using CUDA and OpenGL. The player provides user interactive 3D video playback. Stereo images are first read by the player from a fast drive and rectified. Further processing of the images determines the optimal convergence point in the 3D scene to reduce eye strain. The rationale for this convergence point selection takes into account scene depth and display geometry. The first step in this processing chain is identifying keypoints by detecting vertical edges within the left image. Regions surrounding reliable keypoints are then located on the right image through the use of block matching. The difference in the positions between the corresponding regions in the left and right images are then used to calculate disparity. The extrema of the disparity histogram gives the scene disparity range. The left and right images are shifted based upon the calculated range, in order to place the desired region of the 3D scene at convergence. All the above computations are performed on one CPU thread which calls CUDA functions. Image upsampling and shifting is performed in response to user zoom and pan. The player also consists of a CPU display thread, which uses OpenGL rendering (quad buffers). This also gathers user input for digital zoom and pan and sends them to the processing thread.

2 February 2011

Proceedings Volume 7871, Real-Time Image and Video Processing 2011; 78710A (2011)

Invited

Content-based depth estimation in focused plenoptic camera

Kalin Atanassov; Sergio Goma; Vikas Ramachandra; Todor Georgiev

Abstract

Depth estimation in focused plenoptic camera is a critical step for most applications of this technology and poses interesting challenges, as this estimation is content based. We present an iterative algorithm, content adaptive, that exploits the redundancy found in focused plenoptic camera captured images. Our algorithm determines for each point its depth along with a measure of reliability allowing subsequent enhancements of spatial resolution of the depth map. We remark that the spatial resolution of the recovered depth corresponds to discrete values of depth in the captured scene to which we refer as slices. Moreover, each slice has a different depth and will allow extraction of different spatial resolutions of depth, depending on the scene content being present in that slice along with occluding areas. Interestingly, as focused plenoptic camera is not theoretically limited in spatial resolution, we show that the recovered spatial resolution is depth related, and as such, rendering of a focused plenoptic image is content dependent.

24 January 2011

Proceedings Volume 7864, Three-Dimensional Imaging, Interaction, and Measurement; 78640G (2011)

Introducing the depth transfer curve for 3D capture system characterization

Sergio R. Goma; Kalin Atanassov; Vikas Ramachandra

Abstract

3D technology has recently made a transition from movie theaters to consumer electronic devices such as 3D cameras and camcorders. In addition to what 2D imaging conveys, 3D content also contains information regarding the scene depth. Scene depth is simulated through the strongest brain depth cue, namely retinal disparity. This can be achieved by capturing an image by horizontally separated cameras. Objects at different depths will be projected with different horizontal displacement on the left and right camera images. These images, when fed separately to either eye, leads to retinal disparity. Since the perception of depth is the single most important 3D imaging capability, an evaluation procedure is needed to quantify the depth capture characteristics. Evaluating depth capture characteristics subjectively is a very difficult task since the intended and/or unintended side effects from 3D image fusion (depth interpretation) by the brain are not immediately perceived by the observer, nor do such effects lend themselves easily to objective quantification. Objective evaluation of 3D camera depth characteristics is an important tool that can be used for “black box” characterization of 3D cameras. In this paper we propose a methodology to evaluate the 3D cameras’ depth capture capabilities.

24 January 2011

Proceedings Volume 7864, Three-Dimensional Imaging, Interaction, and Measurement; 78640E (2011)

3D image processing architecture for camera phones

Kalin Atanassov; Vikas Ramachandra; Sergio R. Goma; Milivoje Aleksic

Abstract

Putting high quality and easy-to-use 3D technology into the hands of regular consumers has become a recent challenge as interest in 3D technology has grown. Making 3D technology appealing to the average user requires that it be made fully automatic and foolproof. Designing a fully automatic 3D capture and display system requires: 1) identifying critical 3D technology issues like camera positioning, disparity control rationale, and screen geometry dependency, 2) designing methodology to automatically control them. Implementing 3D capture functionality on phone cameras necessitates designing algorithms to fit within the processing capabilities of the device. Various constraints like sensor position tolerances, sensor 3A tolerances, post-processing, 3D video resolution and frame rate should be carefully considered for their influence on 3D experience. Issues with migrating functions such as zoom and pan from the 2D usage model (both during capture and display) to 3D needs to be resolved to insure the highest level of user experience. It is also very important that the 3D usage scenario (including interactions between the user and the capture/display device) is carefully considered. Finally, both the processing power of the device and the practicality of the scheme needs to be taken into account while designing the calibration and processing methodology.

24 January 2011

Proceedings Volume 7864, Three-Dimensional Imaging, Interaction, and Measurement; 786414 (2011)

RAW camera DPCM compression performance analysis

Katherine Bouman; Vikas Ramachandra; Kalin Atanassov; Mickey Aleksic; Sergio R. Goma

Abstract

The MIPI standard has adopted DPCM compression for RAW data images streamed from mobile cameras. This DPCM is line based and uses either a simple 1 or 2 pixel predictor. In this paper, we analyze the DPCM compression performance as MTF degradation. To test this scheme’s performance, we generated Siemens star images and binarized them to 2-level images. These two intensity values where chosen such that their intensity difference corresponds to those pixel differences which result in largest relative errors in the DPCM compressor. (E.g. a pixel transition from 0 to 4095 corresponds to an error of 6 between the DPCM compressed value and the original pixel value). The DPCM scheme introduces different amounts of error based on the pixel difference. We passed these modified Siemens star chart images to this compressor and compared the compressed images with the original images using IT3 MTF response plots for slanted edges. Further, we discuss the PSF influence on DPCM error and its propagation through the image processing pipe.

24 January 2011

Proceedings Volume 7867, Image Quality and System Performance VIII; 78670N (2011)

Camera Technology at the dawn of digital renascence era

Sergio Goma; Mickey Aleksic; Todor Georgiev

Abstract

Camera Technology has evolved tremendously in the last 10 years, the proliferation of camera-phones fueling unprecedented advancements on CMOS image sensors. As an emerging field, some of the problems are justified while others are bi-products of the chosen silicon technology and are not fundamental to advancement of image technology. This paper will review block by block some image processing components found in a cell-phone camera today, discussing for each its justification from technology choice vs image processing function, with emphasis on the signal degradation potential. Further, we present computational photography challenges that amplify the requirements for data with high SNR.

10 November 2010

2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers

Novel YUV 8bpp subsampling pattern

Sergio Goma; Mickey Aleksic

Abstract

We propose a novel 8bpp subsampling YUV pattern based on a checkerboard subsampling of the luminance component that explicitly preserves the edge. The proposed pattern uses 1bit to encode edge direction in the missing luminance pixel and this bit is stored in the chroma sample as the chroma sample is DPCM encoded 8 to 7bits per sample. The complexity analysis of both encoder and decoder is concluded with a proposed hardware implementation. The image quality performance of the proposed pattern is estimated using MTF measurements, quantifying the loss in high-frequencies and a comparison is presented across the YUV subsampling methods.

10 November 2010

2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers

Evaluating the quality of EDOF in camera phones

Kalin Atanassov; Sergio Goma

Abstract

Extended Depth of Focus technologies are well known in the literature, and in recent years this technology has made its way into camera phones. While the fundamental approach might have significant advantages over conventional technologies, often in practice, it turns out the results can be accompanied by undesired artifacts that are hard to quantify. In order to conduct an objective comparison with the conventional focus technology, new methods need to be devised that are able to quantify not only the quality of focus but also the artifacts introduced by the use of EDOF methods. In this paper we propose a test image and a methodology to quantify focus quality and its dependence on the distance. Our test image is created from a test image element that contains different shapes to measure frequency response.

18 January 2010

Proceedings Volume 7529, Image Quality and System Performance VII; 75290K (2010)

Evaluation methodology for Bayer demosaic algorithms in camera phones

Sergio Goma; Kalin Atanassov

Abstract

The current approach used for demosaic algorithm evaluation is mostly empirical and does not offer a meaningful quantitative metric – this disconnects the theoretical results from the results seen in practice. In camera phones, the difference is even bigger due to the low signal to noise ratios and also due to the overlapping of the color filters. This implies that a demosaic algorithm has to be designed to allow for graceful degradation in presence of noise. Also, the demosaic algorithm has to be tolerant to high color correlations. In this paper we propose a special class of images and a methodology that can be used to produce a metric indicative of a real case demosaic algorithm performance. The test image that we propose is formed by using a dual chirp signal that is a function of the distance from the center.

18 January 2010

Proceedings Volume 7537, Digital Photography VI; 753708 (2010)

High Dynamic Range Image Capture with Plenoptic 2.0 Camera

Todor Georgiev; Andrew Lumsdaine; Sergio Goma

Abstract

We demonstrate high dynamic range (HDR) imaging with the Plenoptic 2.0 camera. Multiple exposure capture is achieved with a single shot using microimages created by microlens array that has an interleaved set of different apertures.

14 October 2009

Frontiers in Optics 2009/Laser Science XXV/Fall 2009 OSA Optics & Photonics Technical Digest, OSA Technical Digest (CD) (Optical Society of America, 2009), paper SWA7P. 

Real-time development system for image processing engines

Sergio Goma; Radu Gheorghe; Milivoje Aleksic

Abstract

Certain feedback loop based algorithms contained in an image processing engine, such as auto white balance, auto exposure or auto focus, are best designed and evaluated within a real-time framework due to strong requirements of close study of the dynamics present. Furthermore, the development process entails the usual flexibility associated with any software module implementation, such as the ability to dump debugging information or placement of break points in the code. In addition, the end deployment platform is not usually available during the design process, while tuning of the above mentioned algorithms must encompass particularities of each individual target sensor. We explore in this paper a real-time hardware-software solution that addresses all the requirements mentioned before and functions on a non-real time operating system (Windows). Moreover we exemplify and quantify the hard deadlines required by such a feedback control loop algorithm and illustrate how they are supported in our implementation.

4 February 2009

Proceedings Volume 7244, Real-Time Image and Video Processing 2009; 724409 (2009)

Applying image quality in cell phone cameras: lens distortion

Donald Baxter; Sergio R. Goma; Milivoje Aleksic

Abstract

This paper describes the framework used in one of the pilot studies run under the I3A CPIQ initiative to quantify overall image quality in cell-phone cameras. The framework is based on a multivariate formalism which tries to predict overall image quality from individual image quality attributes and was validated in a CPIQ pilot program. The pilot study focuses on image quality distortions introduced in the optical path of a cell-phone camera, which may or may not be corrected in the image processing path. The assumption is that the captured image used is JPEG compressed and the cellphone camera is set to ‘auto’ mode. As the used framework requires that the individual attributes to be relatively perceptually orthogonal, in the pilot study, the attributes used are lens geometric distortion (LGD) and lateral chromatic aberrations (LCA). The goal of this paper is to present the framework of this pilot project starting with the definition of the individual attributes, up to their quantification in JNDs of quality, a requirement of the multivariate formalism, therefore both objective and subjective evaluations were used. A major distinction in the objective part from the ‘DSC imaging world’ is that the LCA/LGD distortions found in cell-phone cameras, rarely exhibit radial behavior, therefore a radial mapping/modeling cannot be used in this case.

19 January 2009

Proceedings Volume 7242, Image Quality and System Performance VI; 724213 (2009)

An image-noise filter with emphasis on low-frequency chrominance noise

Radu V. GheorgheSergiu R. GomaMilivoje Aleksic

Abstract

Chrominance noise appears as low frequency colored blotches throughout an image, especially in darker flat areas. The effect is more pronounced in lower light levels where the characteristic features are observed as irregularly shaped clusters of colored pixels that vary anywhere from 15 to 25 pixels across. This paper proposes a novel, simple and intuitive method of reducing chrominance noise in processed images while minimizing color bleeding artifacts. The approach is based on a hybrid multi scale spatial dual tree adaptive wavelet filter in hue-saturation-value color space. Results are provided in terms of comparisons on real images between the proposed method and another state of the art method.

19 January 2009

Proceedings Volume 7250, Digital Photography V; 72500B (2009)

Improving the SNR during color image processing while preserving the appearance of clipped pixels

Sergio GomaMilivoje Aleksic

Abstract

An image processing path typically involves color correction or white balance resulting in higher than unity color gains. A gain higher than unity increases the noise in that respective channel, and therefore degrades the SNR performance of the input signal. If the input signal does not have enough SNR to accommodate the extra gain, the resultant color image has increased color noise. This is the usual case for color processing in cell phone cameras, which have sensors with limited SNR and high color crosstalk. This phenomenon degrades images more as illuminants differ from D65. In addition, the incomplete information for clipped pixels often results in unsightly artifacts during color processing. To correct this dual problem, we investigate the use of under unity color gains, which, by increasing the exposure of the sensor, would improve the resultant SNR of the color corrected image. The proposed method preserves the appearance of clipped pixels and the overall luminance of the image, while applying the appropriate color gains.

12 February 2008

Proceedings Volume 6811, Real-Time Image Processing 2008; 681102 (2008) 

An approach to improve cell-phone cameras’ dynamic range using a non-linear lens correction

Sergio GomaMilivoje Aleksic

Abstract

Most cell-phone cameras today use CMOS sensors with higher and higher pixel counts, which in turn, results in smaller pixel sizes. To achieve good performance in current technologies, pixel structures are fairy complicated. Increasing complexity in pixel structure, coupled with optical constraints specific to cell-phone cameras, results in non-uniform light response over the pixel array. A cell-phone camera sensor module typically has a light-falloff of -40% from center relative to an edge. This high fall-off usually has non-radial spatial distribution making lens fall-off corrections complicated. The standard method of reducing light fall-off is linear (i.e. multiplicative gain), resulting in close to a ~2x peripheral gain and a corrected image with lower dynamic range. To address this issue, a novel idea is explored where the fall-off is used to increase the dynamic range of the captured image. As a typical lens fall-off needs a gain of up to 2x centre vs edge, the fall-off can be thought of as a 2D neutral density filter which allows up to 2x more light to be sensed towards the periphery of the sensor. The proposed solution uses a 2D scaled down gain map to correct the fall-off. For each pixel, using the gain map, an inflection point is calculated which is used to estimate the associated pixel transfer characteristic which is linear up to the inflection point and then becomes logarithmic.

3 March 2008

Proceedings Volume 6817, Digital Photography IV; 68170F (2008)

Bad pixel location algorithm for cell phone cameras

Sergio GomaMilivoje Aleksic

Abstract

As CMOS imaging technology advances, sensor to sensor differences increase, creating an increasing need for individual, per sensor, calibration. Traditionally, the cell-phone market has a low tolerance for complex per unit calibration. This paper proposes an algorithm that eliminates the need for a complex test environment and does not require a manufacturing based calibration on a per phone basis. The algorithm locates “bad pixels”, pixels with light response characteristics out of the mean range of the values specified by the manufacturer in terms of light response. It uses several images captured from a sensor without using a mechanical shutter or predefined scenes. The implementation that follows uses two blocks: a dynamic detection block (local area based) and a static correction block (location table based). The dynamic block fills the location table of the static block using clustering techniques. The result of the algorithm is a list of coordinates containing the location of the found ‘bad pixels’. An example is given of how this method can be applied to several different cell-phone CMOS sensors.

20 February 2007

Proceedings Volume 6502, Digital Photography III; 65020H (2007)

Novel bilateral filter approach: Image noise reduction with sharpening 

Milivoje AleksicMaxim SmirnovSergio Goma

Abstract

The classical bilateral filter smoothes images and preserves edges using a nonlinear combination of surrounding pixels. Our modified bilateral filter advances this approach by sharpening edges as well. This method uses geometrical and photometric distance to select pixels for combined low and high pass filtering. It also uses a simple window filter to reduce computational complexity.

10 February 2006

Proc. SPIE 6069, Digital Photography II, 60690F (10 February 2006)

Computational inexpensive two step auto white balance method

Sergio Goma; Milivoje Aleksic

Abstract

The chromaticity of an acquired image reconstructed from a Bayer pattern image sensor is heavily dependent on the scene illuminant and needs color corrections to match human visual perception. This paper presents a method to ‘white balance’ an image that is computationally inexpensive for hardware implementation, has reasonable accuracy without the need of storing the full image, and is aligned to the current technical development of the field. The proposed method introduces the use of a 2D chromaticity diagram of the image to extract information about the resultant scene reflectance. It assumes that the presence of low-saturated colors in the scene will increase the probability of retrieving accurate scene color information.

10 February 2006

Proc. SPIE 6069, Digital Photography II, 60690D (10 February 2006)

Contact Author

info@blueflagiris.com