Below is a complete list of my publications. I will try to maintain an updated list here but in case I miss anything, you can visit my profiles on academic social media websites (Google Scholar / ResearchGate).
Click on the title to get a `BibTeX` entry for citation.
We present the first competitive drawing agent Pixelor that exhibits human-level performance at a Pictionary-like sketching game, where the participant whose sketch is recognized first is a winner. Our AI agent can autonomously sketch a given visual concept, and achieve a recognizable rendition as quickly or faster than a human competitor. The key to victory for the agent is to learn the optimal stroke sequencing strategies that generate the most recognizable and distinguishable strokes first. Training Pixelor is done in two steps. First, we infer the optimal stroke order that maximizes early recognizability of human training sketches. Second, this order is used to supervise the training of a sequence-to-sequence stroke generator. Our key technical contributions are a tractable search of the exponential space of orderings using neural sorting; and an improved Seq2Seq Wasserstein (S2S-WAE) generator that uses an optimal-transport loss to accommodate the multi-modal nature of the optimal stroke distribution. Our analysis shows that Pixelor is better than the human players of the Quick, Draw! game, under both AI and human judging of early recognition. To analyze the impact of human competitors’ strategies, we conducted a further human study with participants being given unlimited thinking time and training in early recognizability by feedback from an AI judge. The study shows that humans do gradually improve their strategies with training, but overall Pixelor still matches human performance. We will release the code and the dataset, optimized for the task of early recognition, upon acceptance.
The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present BézierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit Bézier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.
In this paper, we propose a word spotting based information retrieval approach for medical prescriptions/reports written by doctors. Sometimes due to almost illegible handwriting, it is difficult to understand the medication reports of doctors. This often confuses the patients about the actual medicine/disease names written by doctors and as a consequence they suffer. A medical prescription is generally partitioned into two parts, a printed letterhead part containing the doctor’s name, designation, organization name, etc. and a handwritten part where the doctor writes patient’s name and report his/her findings and suggests medicine names. There are many significance impacts of the proposed work. For example, such work can be used (i) to develop expert diagnostic systems (ii) to extract information from patient history that can be obtained by this proposed method (iii) to detect wrong medication (iv) to make different statistical analysis of the medicines prescribed by the doctors etc. To extract the information from such document images, first we extract the domain specific knowledge of doctors by identifying department names from the printed text that appears in letterhead part. From the letterhead text, the specialty/expertise of doctors is understood and this helps us to search only relevant prescription documents for word spotting in handwritten portion. Word spotting in letterhead part as well as in handwritten part has been performed using Hidden Markov Model. An efficient MLP (Multilayer Perceptron) based Tandem feature is proposed to improve the performance. From the experiment with 500 prescriptions, we have obtained encouraging results. Information from printed letterhead part improved the word spotting performance in handwritten part, significantly.
Feature Selection (FS) is an important pre-processing step in machine learning and it reduces the number of features/variables used to describe each member of a dataset. Such reduction occurs by eliminating some of the non-discriminating and redundant features and selecting a subset of the existing features with higher discriminating power among various classes in the data. In this paper, we formulate the feature selection as a bi-objective optimization problem of some real-valued weights corresponding to each feature. A subset of the weighted features is thus selected as the best subset for subsequent classification of the data. Two information theoretic measures, known as ‘relevancy’ and ‘redundancy’ are chosen for designing the objective functions for a very competitive Multi-Objective Optimization (MOO) algorithm called ‘Multi-Objective Evolutionary Algorithm based on Decomposition (MOEA/D)’. We experimentally determine the best possible constraints on the weights to be optimized. We evaluate the proposed bi-objective feature selection and weighting framework on a set of 15 standard datasets by using the popular k-Nearest Neighbor (k-NN) classifier. As is evident from the experimental results, our method appears to be quite competitive to some of the state-of-the-art FS methods of current interest. We further demonstrate the effectiveness of our framework by changing the choices of the optimization scheme and the classifier to Non-dominated Sorting Genetic Algorithm (NSGA)-II and Support Vector Machines (SVMs) respectively.
This paper presents a novel approach towards Indic handwritten word recognition using zone-wise information. Because of complex nature due to compound characters, modifiers, overlapping and touching, etc., character segmentation and recognition is a tedious job in Indic scripts (e.g. Devanagari, Bangla, Gurumukhi, and other similar scripts). To avoid character segmentation in such scripts, HMMbased sequence modeling has been used earlier in holistic way. This paper proposes an efficient word recognition framework by segmenting the handwritten word images horizontally into three zones (upper, middle and lower) and recognize the corresponding zones. The main aim of this zone segmentation approach is to reduce the number of distinct component classes compared to the total number of classes in Indic scripts. As a result, use of this zone segmentation approach enhances the recognition performance of the system. The components in middle zone where characters are mostly touching are recognized using HMM. After the recognition of middle zone, HMM based Viterbi forced alignment is applied to mark the left and right boundaries of the characters. Next, the residue components, if any, in upper and lower zones in their respective boundary are combined to achieve the final word level recognition. Water reservoir feature has been integrated in this framework to improve the zone segmentation and character alignment defects while segmentation. A novel sliding window-based feature, called Pyramid Histogram of Oriented Gradient (PHOG) is proposed for middle zone recognition. PHOG features has been compared with other existing features and found robust in Indic script recognition. An exhaustive experiment is performed on two Indic scripts namely, Bangla and Devanagari for the performance evaluation. From the experiment, it has been noted that proposed zone-wise recognition improves accuracy with respect to the traditional way of Indic word recognition.
Recognition of Bangla handwritten text is difficult due to its complex nature of having modifiers and headlines features. This paper presents a comparative study of different features namely LGH (Local Gradient of Histogram), PHOG (Pyramid Histogram of Oriented Gradient), GABOR, G-PHOG (Combined GABOR and PHOG) and profile feature by Marti-Bunke when applied in middle zone recognition of Bangla words using Hidden Markov Model (HMM) based framework. For this purpose, a zone segmentation method is applied to extract the busy (middle) zones of handwritten words and features are extracted from the middle zone. The system has been tested on a sufficiently large and variation-rich dataset consisting of 11,253 training and 3,856 testing data. From the experiment, it has been noted that PHOG feature outperforms other features in middle zone recognition. Since PHOG feature outperform others, we use this feature for full word recognition, For this purpose initially upper and lower zone components are recognized by PHOG features and SVM classifier. Finally, the zone-wise results are combined by the context information of the corresponding components in each zone to obtain the word level recognition.
In this paper, we present a date spotting based information retrieval system for natural scene image and video frames where text appears with complex backgrounds. Text retrieval in such scene/video frames is difficult because of blur, low resolution, background noise, etc. In our proposed framework, a line based date spotting approach using Hidden Markov Model is used to detect the date information in text. Given a text line image, we apply an efficient Bayesian classifier based binarization approach to extract the text components. Next, Pyramid Histogram of Oriented Gradient (PHOG) feature is computed from the binarized image for date-spotting framework. For our experiment, three different date models have been constructed to search similar date information in scene/video text. When tested in a custom dataset of 1104 text lines, our date spotting approach provided encouraging results.
In this paper we present a line based word spotting system based on Hidden Markov Model for offline Indic scripts such as Bangla (Bengali) and Devanagari. We propose a novel approach of combining foreground and background information of text line images for keyword-spotting by character filler models. The candidate keywords are searched from a line without segmenting character or words. A significant improvement in performance is noted by using both foreground and background information than anyone alone. Pyramid Histogram of Oriented Gradient (PHOG) feature has been used in our word spotting framework and it outperforms other existing features of word spotting. The framework of combining foreground and background information has been evaluated in IAM dataset (English script) to show the robustness of the proposed approach.