Deep Neural Networks for Superresolution of Document and Natural Images
a). Document Image Quality Enhancement Using Deep Neural Network: In this project, we are working on improving the quality of low resolution, degraded, binary document images for better readability and OCR accuracy. As seen in Fig. (a), our technique is able to fill in most of the missing pixels. In terms of OCR recognition, there is a huge increase in the character level accuracy.
b). Computationally Efficient Approaches to Natural Image Super-resolution: In this project, we are developing computationally efficient deep neural network models to obtain generalized super-resolution of natural images. Our models are lightweight in terms of the number of parameters used. Figure (b) shows that the perceptual quality of the images generated by our methods is better than that of the state-of-the-art technique, SRGAN.
c). Indian patent under examination: Method and System for enhancing binary document image quality for improving readability and OCR performance.
Network Consistent Data Association
Existing data association techniques mostly focus on matching pairs of data-point sets and then repeating this process along space-time to achieve long term correspondences. However, in many problems such as person re-identification, a set of data-points may be observed at multiple spatio-temporal locations and/or by multiple agents in a network and simply combining the local pairwise association results between sets of data-points often leads to inconsistencies over the global space-time horizons. In this research project, we proposed a novel Network Consistent Data Association (NCDA) framework formulated as an optimization problem that not only maintains consistency in association results across the network, but also improves the pairwise data association accuracies. The proposed NCDA can be solved as a binary integer program leading to a globally optimal solution and is capable of handling the challenging data-association scenario where the number of data-points varies across different sets of instances in the network. We also presented an online implementation of NCDA method that can dynamically associate new observations to already observed data-points in an iterative fashion, while maintaining network consistency.
ChExVis: a tool for molecular channel extraction and visualization
A channel is a pathway through empty space within the molecule. Understanding channels, that lead to active sites or traverse the molecule, is important in the study of molecular functions such as ion, ligand, and small molecule transport. Efficient methods for extracting, storing, and analysing protein channels are required to support such studies. We develop an integrated framework called ChExVis that supports computation of the channels, interactive exploration of their structure, and detailed visual analysis of their properties
AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation
advances in deep learning relies heavily on availability of huge amount of clean annotated information involving substantial manual effort. This becomes a major challenge for tasks related to an entire scene understanding in contrast to a simple object classification
task. Researchers are focusing on efficient techniques to use already available synthetic data from various 3D game engines for training artificial vision systems. But, the key question is: how can we adapt such systems for natural scenes with minimal supervision?
In a recent study by Prof. Venkatesh Babu and his team from Video Analytics Lab, Department of Computational and Data Sciences, have proposed a novel adversarial learning approach to efficiently adapt synthetically trained models for real scene images. The
proposed adaptation method can be used to disentangle the domain discrepancy for improved performance on real scenes from the deployed environment in an unsupervised manner. Currently, the focus is on the task of depth estimation from monocular RGB image scene.
But the approach can be extended to other scene understanding problems as well. The method uses an efficient content consistency regularization along with an adversarial learning objective function to train the base Convolutional Neural Network (CNN) architecture.
Moreover, the proposed regularization helps to efficiently maintain the spatial dependency of deep features with respect to the given input during the adaptation process. This work will be presented at this year's CVPR conference.
Jogendra Nath Kundu, Phani Krishna, Anuj P. and R. Venkatesh Babu, "AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation", in CVPR 2018
Articulogram for characterizing speaker specific articulation
Speech articulation varies across speakers for producing a speech sound due to the differences in their vocal tract morphologies, though the speech motor actions are executed in terms of relatively invariant gestures. While the invariant articulatory gestures are driven by the linguistic content of the spoken utterance, the component of speech articulation that varies across speakers reflects speaker-specific and other paralinguistic information. For this purpose, a new representation, called articulogram, is computed using the air-tissue boundaries in the upper airway vocal tract and the Maeda grid on the real-time magnetic resonance imaging video frames. The articulograms from multiple speakers are decomposed into the variant and invariant aspects when they speak the same sentence. The variant component is found to be a better representation for discriminating speakers compared to the speech articulation which includes the invariant part.
Surveillance Face Recognition and Cross-Modal Retrieval
In the IACV lab, we are looking at different problems in computer vision. Now-a- days with increasing security concerns, surveillance cameras are installed everywhere, from shopping malls, airports, and even in personal homes. One of the main objectives of surveillance is to recognize the facial images captured by these cameras. This is very challenging, since the images usually have quite poor resolution, in addition to uncontrolled illumination, pose and expression. We are developing novel algorithms for matching the low-quality facial images captured using surveillance cameras. This is also extended to general objects as well. We are also working on traffic surveillance, especially for Indian scenarios, like vehicle detection, classification and license plate recognition, etc. Another major area we are looking at is Cross-Modal Matching. Due to increase in the number of sources of data, research in cross-modal matching is becoming an increasingly important area of research. It has several applications like matching text with image, matching near infra-red images with visible images (eg, for matching face images captured during night-time or low-light conditions to standard visible light images in the database), matching sketch images with pictures for forensic applications, etc. We are developing novel algorithms for this problem, which is extremely challenging, due to significant differences between data from different modalities.
Fast Algorithms for Kernel Based Filtering
Smoothing is a fundamental task in low-level image processing that is used to suppress irrelevant details while preserving salient image structures. A popular approach is to perform non-linear aggregation of neighboring pixels using a kernel. This includes the classical bilateral and non-local means filters. The difficulty is that the brute-force implementation of these filters require intense computations. As a result, it is rather difficult to deploy them for real-time applications. The focus of the group is on developing scalable approximation algorithms that can dramatically cut down the computation load without visibly degrading the filtering quality. We are also working on fast high-dimensional extensions of kernel-based filtering, where the dimension of the kernel feature is large (color images, hyperspectral images, patch-based representations, etc).
Regularization in Fluorescence Microscopy and Photoacoustic Imaging
Structures imaged by fluorescence microscopy and photo-acoustic imaging (PAT) have a statistical distribution of high intensity and high curvature points that is distinct from the structures imaged under other modalities such as MRI and CT. We constructed a form of regularization functional that combines the image intensity and derivatives in a unique way, to exploit these specific characteristics, and applied it to image deblurring and reconstruction for PAT and fluorescence microscopy. Under the presence of a large amount of noise, we obtained significantly improved restoration/reconstruction compared to well known state-of-the-art regularization techniques.
Virtual Reality, Image/Video Quality Assessment and Enhancement and Streaming Video
Virtual Reality: We work on several aspects of quantifying user experience in virtual reality (VR) systems. Topics include visual quality assessment of wide field of view (or 360 degree) images and videos (with respect to image stitching, resolution, compression, etc.), user discomfort assessment and mitigation, saliency and fixation prediction algorithms.
Image/Video Quality Assessment and Enhancement The primary focus in this problem is to develop blind visual quality assessment measures that work across a wide gamut of distortions and then apply them to improve the quality of the visual content. This is achieved by carefully modeling the statistics of natural scenes and distorted pictures and/or using recent tools such as deep learning and generative adversarial networks (GANs). Quality may be assessed in accordance with human visual perception or from the point of view of success of computer vision algorithms. The problem has applications in computational photography and automatic camera tuning.
Streaming Video: Rate adaptation is an important aspect of popular video streaming services such as Netflix, Amazon Prime, Youtube, etc. We develop quantitative models of user experience that capture the tradeoff between visual quality and rebuffering events in such systems. The goal is to further improve rate adaptation strategies based on such sophisticated models using tools such as reinforcement
Online Reconstruction Algorithms for Compressed Imaging
One of the focus areas of the Spectrum Lab is in solving inverse problems in imaging and image processing. In a typical compressed imaging application, one acquires projections of a scene on to random test/sensing vectors, which are often binary. In an optical imaging setup, a binary sensing array is essentially an amplitude mask that allows light to go through in some regions and blocks light in the other regions. In a magnetic resonance imaging setup, the sensing vectors are two-dimensional Fourier vectors. The goal in these new imaging modalities is to reconstruct the images from the compressed/encoded measurements. The compression in such a setup happens right at the level of acquisition. The goal is to develop computationally efficient online decoding techniques that sequentially update the reconstruction as the measurements arrive. The reconstruction technique also incorporates priors about the underlying image such as sparsity in an appropriate basis. Such a sequential reconstruction scheme has the advantage that one can stop acquiring measurements as and when the reconstructed image meets a certain objective or subjective quality requirement. We have developed low-complexity online reconstruction techniques that are on par with batch-mode reconstruction techniques in terms of the quality of reconstruction. The images show the reconstructions as the measurements are revealed sequentially. We are currently developing Deep Neural Network (DNN) based image reconstruction schemes that can be fine-tuned to maximize the reconstruction quality for specific imaging settings.
Deep Learning Techniques for Solving Inverse Problems
We address the problem of sparse spike deconvolution from noisy measurements within a Bayesian paradigm that incorporates sparsity promoting priors about the ground truth. The optimization problem arising out of this formulation warrants an iterative solution. A typical iterative algorithm includes an affine transformation and a nonlinear thresholding step. Effectively, a cascade/sequence of affine and nonlinear transformations gives rise to the reconstruction. This is also the structure in a typical deep neural network (DNN). This observation establishes the link between inverse problems and deep neural networks. The architecture of the DNN is such that the weights and biases in each layer are fixed and determined by the blur matrix/sensing matrix and the noisy measurements, and the sparsity promoting prior determines the activation function in each layer. In scenarios where the priors are not available exactly, but adequate training data is available, the formulation can be adapted to learn the priors by parameterizing the activation function using a linear expansion of threshold (LET) functions. As a proof of concept, we demonstrated successful spike deconvolution on synthetic dataset and showed significant advantages over standard reconstruction approaches such as the fast iterative shrinkage-thresholding algorithm (FISTA). We also show an application of the proposed method for performing image reconstruction in super-resolution localization microscopy. Effectively, this is a deconvolution problem and we refer to the resulting DNNs as Bayesian Deep Deconvolutional Networks.
Once a DNN based solution to the sparse coding problem is available, one can use it to also perform dictionary learning.
- Deep Learning Meets Sparse Coding: https://www.youtube.com/watch?v=r3q05c1PIRg
- NIPS 2017 Bayesian Deep Learning Workshop paper on DNNs for Sparse Coding and Dictionary Learning: http://bayesiandeeplearning.org/2017/papers/31.pdf
- NIPS 2017 Bayesian Deep Learning Workshop paper on Bayesian Deep Deconvolutional Networks: http://bayesiandeeplearning.org/2017/papers/46.pdf
- Focus on Microscopy 2018 abstract: http://www.focusonmicroscopy.org/2018/PDF/1213_Seelamantula.pdf
Artificial Intelligence for Healthcare Applications
A new focus area of the Spectrum Lab is to develop artificial intelligence systems for healthcare applications. The goal is to develop decision support systems that aid an expert doctor so that best use is made of the specialist’s time and expertise. We developed one such application in the context of wireless capsule endoscopy (WCE), which is a revolutionary approach to performing endoscopy of the entire gastrointestinal tract. In WCE, a patient swallows a miniature capsule-sized optical endoscope and carries a wireless receiver in his pocket. The capsule endoscope transmits several thousands of images as it traverses the gastrointestinal tract. The number of images is huge and sifting through all of them would take considerable time for an expert. In order to alleviate the scanning burden, we have developed an artificial intelligence based decision support system. Effectively, we have a convolution neural network (CNN) that sifts through the images and classifies them as normal or as belonging to one of eight prominent disease types, further highlighting where in the image the abnormality has been detected. This step greatly reduces the burden on the expert and makes efficient use of his/her time. Ongoing effort in this direction is to improve on the CNN architecture and develop hierarchical classification schemes to maximize the classification accuracy.
This project was funded by the Robert Bosch Centre for Cyberphysical Systems (IISc).
- A convolutional neural network approach for abnormality detection in wireless capsule endoscopy: https://ieeexplore.ieee.org/document/7950698/
Population Differences in Brain Morphology
Brain templates provide a standard anatomical platform for population based morphometric assessments, used in diagnosis of neurological disorders. Typically, standard brain templates for such assessments are created using Caucasian brains, which may not be ideal to analyze brains from other ethnicities. This study developed first Indian brain template in collaboration with NIMHANS, Bangalore, which is currently being used in assessing Dementia, Schizophrenia, and Bipolar disorders Related publication: Naren Rao, Haris Jeelani, Rashmin Achalia, Garima Achalia, Arpitha Jacob, Rose dawn Bharath, Shivarama Varambally, Ganesan Venkatasubramanian, and Phaneendra K. Yalavarthy, Population differences in Brain morphology: Need for population specific Brain template Psychiatry research: Neuroimaging 265, 1-8 (2017).