Projects and Areas of Interest


  • Natural Language Processing, Machine Learning, Data Mining, Artificial Intelligence - with an emphasis on Cognitive Linguistics and Cognitive Science



  • Sushobhan Nayak and Amitabha Mukerjee, Grounded Language Acquisition: A Minimal Commitment Approach [Oral, Long], In Proceedings of the 24th International Conference on Computational Linguistics(COLING '12), pages 2059-2076, Mumbai, India, Dec 2012 [Paper url]
  • Sushobhan Nayak and Amitabha Mukerjee, Concretizing the image schema: How semantics guides the bootstrapping of syntax , In Proceedings of Development and Learning and Epigenetic Robotics (ICDL), IEEE International Conference on(ICDL '12), San Diego, USA, Nov 2012 [Paper url]
  • Sushobhan Nayak and Amitabha Mukerjee, A Grounded Cognitive Model for Metaphor Acquisition [Oral], To appear in Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence(AAAI '12), Toronto, Canada, July 2012
  • Sushobhan Nayak and Amitabha Mukerjee, Learning Containment Metaphors, To appear in Proceedings of the Thirty-Fourth Annual Conference of the Cognitive Science Society(CogSci '12), Sapporo, Japan, August 2012


  • Sushobhan Nayak, Varunesh Mishra and Amitabha Mukerjee, Towards a Cognitive Model for Human Wayfinding Behavior in Regionalized Environments, in Proceedings of the AAAI Fall Symposium on Advances in Cognitive Systems (AAAI-ACS), pages 249-256, Arlington USA, November 2011 [pdf][Paper url]
    • Preliminary work on this topic " Which Strategy for Way-finding? - A Computational Evaluation " was accepted as a poster in Conference on Spatial Information Theory (COSIT'11)[pdf]
  • Sushobhan Nayak, Towards a Grounded Model for Ontological Metaphors, in Proceedings of the 2nd Student Research Workshop of International Conference on Recent Advances in Natural Language Processing (RANLP), pages 115-120, Hissar Bulgaria, September 2011 [pdf][Paper url]
  • Amitabha Mukerjee, Kruti Neema and Sushobhan Nayak, Discovering Coreference Using Image-Grounded Verb Argument Structure, in Proceedings of the 8th International Conference on Recent Advances in Natural Language Processing (RANLP), pages 610-615, Hissar Bulgaria, September 2011 [pdf][Paper url]
  • Sushobhan Nayak and Ankit Bhutani, Music Genre Classification Using GA-Induced Minimal Feature-Set, in Proceedings of the 3rd National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Hubli India, December 2011 [pdf][Paper url]



  • Mining User Interests from Twitter Bios

    Summer Research Internship at Technicolor Research, Palo Alto USA. (June-Sept, 2013)

    Mined user interests and professions from a million twitter user-bio and tweets using knowledge bases(Wordnet) and unsupervised topic modeling (LDA, Topical n-grams, Turbotopics). Reframed the problem in a supervised multi-label multi-class learning paradigm through use of a twitter user-directory service (Twellow). Implemented state-of-the-art algorithms to leverage the hierarchical structure of the labels and tagged users into 2900+ classes at >60% recall. Work to be part of product Audiencescape to analyze audience demography for Hollywood movies.
  • Research Assistant

    At Computation and Cognition Lab, Stanford USA. (Sept-Present, 2012)

    Working on a principled probabilistic approach to grounded language comprehension and production and its use to understand statements, follow imperative commands, and answer questions. Creating a Bayesian framework by representing word meanings as stochastic log-linear predicates on possible worlds and sentence meanings as product-of-experts of those predicates. Leveraging parse-tree generated compositional structure of sentences to mitigate intractability issues.
  • Si-nanowire Enhanced Solar Cells

    Summer Internship Project with Dr Pere Roca i Cabarrocas, Physics of Thin Films Lab (PICM), Ecole Polytechnique, Paris, France. (May-July 2009)[Short Report]

    We explored the effects of the development of vertical PIN-SiNW(Si-nanowires) on amorphous silicon(a-Si) based thin-layered solar cells. We looked for signs of probable degradation and explored the possibility of exploiting SiNW to our advantage by creating more efficient amorphous solar cells enhanced by perfectly crystalline vertical PIN nanowires. We experimented with various substrates and different conditions of the plasma reactor to find the viability of a novel approach (please consult report for details) of growing SiNWs. I was selected through Summer Undergraduate Research Grant for Excellence (SURGE) 2009, a prestigious program of IIT Kanpur.
  • Custom Oscilloscope for Network Measurement

    Summer Internship with Dr Ken Birman, and Dr Daniel Freedman, Cornell University, NY USA. (May-July, 2010) [Report]

    Traditional tomographic tools reside at the network endpoint and as such, a lot of distortion is introduced. Accuracy is enhanced if in situ measurement is performed by combining precisely calibrated external hardware timebase from an oscilloscope with a software post-processing stack. To investigate the issue of packet loss and perturbation of time distribution of packets, a number of such hardware devices need to be installed at strategic places throughout the network. Towards this end, I worked on a custom oscilloscope for network activity detection inside Cornell campus. The work involved designing of PLLs and ADCs (which work at > 800 MHz), sampling of a five-level noisy signal and encoding and decoding of network data-stream to investigate packet behavior.


Projects (AI/ML/NLP Related)

  • Sentiment Analysis - A Psychological and Computational Perspective

    CS224u Course Project: Natural Language Understanding , Stanford (Winter 2013) [Report]

    Abstract: We explore the effect of psychological factors based on locality, culture and language on movie reviews and ratings, and the reverse process of gathering information on the former from reviews. Specifically, we consider the effect of 'Ideal Affect' and the importance of valence and arousal scores. We first do a subjective statistical analysis to discover correlation between these scores and the reviews in our database. The reviews are used as a proxy for the sentiment of the masses (like tweets), and we try to explain psychological findings through corpus analysis. We further investigate how these scores affect reviews, ratings, locality and language and formulate a prediction model from insights gained. Our minimal augmented model increases the baseline accuracy by 8% and 20% respectively for 8-class locality and 2-class sentiment classification, with a 30% decrease in mean-squared-error for star-rating prediction, supporting our statistical findings.
  • Sentiment Analysis of Movie Reviews: A Study of Features and Classifiers

    CS221 Course Project: Artificial Intelligence , Stanford (Fall 2012) [Report]

    Abstract: We love movies, and in this project we experiment on a sentiment analysis task on movie reviews. Our objective is two-fold: 1) the binary sentiment classifi- cation of a large dataset of movie reviews from IMDB, 2) predicting the critic-assigned rating of the movie from the review. We extract bag-of-words and tf-idf and LDA-based language features from the documents to gauge the saliency of different words and sentence structures for the task. We then experiment with different learning algorithms like Naive Bayes, and different flavors of SVM with different kernels, to classify our documents – which helps us compare the importance and use of different textual features as well as the capability of the standard learning algorithms in such a task. We present a detailed analysis of the effects of the myriads of features and classifiers we have considered and support them with a battery of experiments on a massive dataset.
  • Grounded Language Acquisition: Semantics to Metaphors

    Masters Thesis under Dr. Amitabha Mukerjee, IIT Kanpur (2012) [Paper]

    Abstract: We take up the challenge of learning a grounded model of language when our agent has a body of machine learning algorithms and no prior knowledge of either the physical domain or language, in the sense of "least commitment". Based on a 2D video and co-occurring raw text, we demonstrate how this cognitively inspired model segments the world to obtain a meaning space, and combines words into hierarchical patterns for a linguistic pattern space. By associating these two spaces under temporal co-occurrence constraints, we demonstrate the acquisition of term-meaning pairs for names, actions and relations. We next map physical arguments for actions and relations to syntactical constructions resembling a cognitive grammar framework. Thus the system is able to bootstrap a rudimentary lexicon and syntax. While experiments are primarily in English, we present partial results for Hindi obtained without any change in the methods, to indicate its potential application to other languages.
  • Genetic Algorithm Induced Minimal Feature Set Selection

    Part of EE Summer Camp Mentoring, IIT Kanpur(Summer 2011) [Paper]

    Proposed a genetic algorithm-based feature- selection method for music genre classification that not only increases the efficiency of standard classifiers, but also reduces the feature space to a bare-minimum. While previous works were more focused on finding near-optimal features devoid of noise, we went for a modified fitness function capable of finding both the near-optimal and the near-minimal feature subset for classification. In addition to an enhanced performance, our model could also reduce the computational load for ill-formed sets and had the flexibility to incorporate trade-offs between efficiency and computational load. We finally demonstrated that the modified GA is capable of bringing about an 80% reduction in the feature space dimension at similar classification rates. A paper on this work was presented in NCVPRIPG 2011 , a prestigious national conference on pattern recognition.
  • Wayfinding in a Regionalized Environment

    Under the Guidance of Dr. Amitabha Mukerjee, IIT Kanpur(November 2010) [AAAI-ACS Paper][COSIT Paper][Extended Report][Presentation]

    Proposed a computational method that approximates way-finding in regionalized environments as a stochastic memoryless process. Attempted to construct an operationalizable model which incorporates three established way-finding heuristics, viz. fine-to-coarse method, cluster method and least-decision-load strategy, on a subject-to-subject basis, based on a cognitive graph generated using an ordered tree algorithm. The work was accepted as a poster in the Conference on Spatial Information Theory (COSIT'11). I further developed it into a full fledged cognitive model, complete with subjective memory graphs generated using an ordered-tree algorithm, that led to a paper in AAAI-ACS 2011.
  • Content-Based Video Indexing and Retrieval

    Group project under the Guidance of Dr. Sumana Gupta, IIT Kanpur(April 2011) [Report][Code]

    Compared the probabilistic semantic video indexing technique with our own algorithm, which employed an SVM classifier. The model learned eight concepts, viz. coast, forest, mountain, open country, tall building, street, highway and inside city from an image database. It associated these high level semantic concepts with low level features extracted from the training set. Frames were extracted from a video to be classified, and the required features were extracted. Then using the SVM classification method on the train set of the images, the frames from the video were categorized into one of the eight aforementioned concepts. Based on a heuristically set threshold, the video was then classified into one or more of the classes. Our method was less accurate, but it had a lower time complexity, leading to faster retrieval rates.
  • Scene Classification in Images

    Group project under the Guidance of Dr. Simant Dube, IIT Kanpur(November 2009) [Report]

    Classified real world scenes in eight semantic groups without going through the stages of segmentation and processing of individual objects or regions. Retrieved diagnostic information stored in the power spectrum of each category of images through Gabor filters and through supervised learning using SVM and LDA, extracted separate characteristic feature vectors of each class. Followed a sequential hierarchy in which images were first sorted according to their naturalness and through traversing a tree, ultimately reached the desired node that represented the class of the image. Investigated the ripple effect of errors.
  • Implementation of Best Wavelet Packet Bases in a Rate-Distortion Sense

    Group project under the Guidance of Dr. Narein Naik, IIT Kanpur(April 2010) [Presentation] [Short Report]

    Following the works of Ramachandran and Vetterli, we implemented the algorithm for best basis selection from the wavelet packet decomposition of any sound file, the lossy compression criterion being distortion rate theoretic in sense.


Teaching Experience

  • Teaching Assistant

    For AI: Principles and Techniques, Stanford USA. (June-Sept, 2013)

  • Teaching Assistant - Digital Electronics and Microprocessors

    Under Dr. SSK Iyer, IIT Kanpur (Aug - Nov 2011)[Website]

    Assisted course instructor in overhauling the course by introducing SPICE and Verilog simulation of taught circuits. Delivered lectures on SPICE and Verilog to a batch of 100 students. Introduced digital design learning through SDLX processor implementation. Was adjudged the Best Teaching Assistant of the department for the semester.
  • Instructor and Mentor - EE Summer Camp 2011

    (May-June, 2011) [Website]

    Reconceptualized EE Summer Camp (a team of 24 sophomore students) to include new streams of Machine Learning and Optimization, along with the old fields of Digital Design and Communications. Taught Digital Design (with emphasis on SDLX processor) and Verilog in the same. Taught concepts of Machine Learning and mentored 11 students in 8 projects. One of them led to a paper in NCVPRIPG 2011.


Projects (Systems)

  • Transactional memory Implementation in FPGA

    Group project under the Guidance of Dr. S. Qureshi, IIT Kanpur(April 2011) [Report][Presentation][Code]

    Implemented an SDLX processor in FPGA, with the capability of handling both transactional and non-transactional instruction set following a Hardware Transactional Memory(HTM) model, to wit, EazyHTM. Followed an eager conflict detection and lazy conflict resolution technique, thereby increasing the throughput. Maintained two processors that might conflict with each other, with a directory based cache maintenance. Handled challenging issues of conflict detection, conflict tracking and transaction commitment and abortion.
  • AES Implementation in FPGA

    Design part of FPGA Design Contest(Feb 2011) [Report][Code]

    Implemented a 3-stage pipelined Advanced Encryption Standard(AES) cipher encryption and decryption system on Xilinx Virtex-5 FPGA . Modified the original straight-forward algorithm for a pipelined implementation for greater throughput with minimal FPGA area usage Introduced a novel sequential and dynamic in-place key generation scheme to minimize space complexity Achieved a maximum clock speed of 340MHz and a throughput of 4 GBPS using only 530 registers and 450 XOR gates. The model won the third prize in Techkriti FPGA Design Challenge '11.
  • SDLX Processor with Cache Coherence

    Group project under the Guidance of Dr. Rajat Moona, IIT Kanpur(April 2010)

    Implemented a two-stage pipelined SDLX processor, with data and instruction cache. Extended it to incorporate full cache-coherency (MESI protocol).
  • Robust two degree-of-freedom vehicle steering controller design

    Under the guidance of Dr. Ramprasad Potluri,IIT Kanpur(April, 2009)

    Analysed Bilin Guvenc et al's paper on the issue and submitted an alternate solution. Took a single track model for car steering from the paper and designed a controller using loop shaping techniques for the same using MATLAB Control System Toolbox. Produced comparable results from this crude approach.


Term Papers

  • Spectrum Sensing in Cognitive Radio

    Investigative project EE670 under Dr. A Jagannatham, IIT Kanpur(April 2011) [Report]

    Investigated a few of the spectrum sensing techniques especially the three signal processing techniques, viz. matched-filter, energy detection and cyclostationarity detection. Explored the topic of cooperative sensing and the interaction of master and slave cognitive nodes, and blind detection. It was the only term paper to get full credits in the course.
  • Space-Time Codes for High Data-Rate Wireless Communication

    Investigative project EE624 under Dr. Adrish Banerjee, IIT Kanpur(November 2010) [Report][Presentation]

    Investigated the design of channel codes for improved data rate and reliability in case of wireless communication. Explored the use of MIMO systems to improve reliability without any penalty in the bandwidth expenses. Reviewed performance criteria for designing space-time codes and focused on two different methods of code construction, viz. block codes and the trellis codes.


Other Projects (from a past life)

  • Implementation & Analysis of following Data Structures:

    Under Guidance of Prof. Manindra Agrawal,IIT Kanpur (April 2009)

    Binary Search Tree, Red Black Tree, Hash Table, B Trees, Lists implementation in Java and their performances comparison with respect to various practical applications.
  • SD Card Reader

    During the summer of 2008, I was involved in a project to make an SD Card Reader using microcontrollers which used Atmel Atmega 16 MCU in a circuit which could read data from a memory card and communicate with the computer through serial port.
  • Obstacle Avoiding Robot

    During the summer of 2008, I also worked on a robot which used simple IR sensors and MCU to detect obstacles ahead and determine its way accordingly.
  • Line Bollower

    I was part of the team that made this fully autonomous robot in Takneek'08, the intra-college technical competition. It could navigate using line following technique, locate a ball on a platform and replace it in another predefined box.
  • Hexapod robot

    Under Guidance of Prof. NVR Reddy,IIT Kanpur

    Built a six-legged mechanical robot using metal sheets, gears, screws and basic machining processes of lathe and milling. Created a prototype of a dump truck using metal sheet and rods by techniques of casting, welding, brazing, bending and sheet metal forming.
  • Dump truck model

    The project dealt with making a prototype of a dump truck using metal sheet and rods by techniques of casting, welding, brazing, bending, sheet metal forming etc.


Non-Technical Papers

  • Sociological Impact of Economic Crises

    Group project under the Guidance of Dr. Amman Madan, IIT Kanpur(November 2009) [Report]

    The project focused on the social implications of the economic crises, especially because the media was too much focused only on the economic effects of the recession of 2009 and there was thus not much awareness of the social nature of the crisis. So, in the project we first tried to explain the crises through major sociological perspectives. Then we studied some of the micro and macro effects of economic downturn, and focused especially on the effects of social inequality and unemployment on the middle class (because we are in this class group and can relate to it easily).
  • Post-war Industrialisation in Divided Germany:The Interzonal Trade (1945-1962)

    Project under the Guidance of Dr. Amman Madan, IIT Kanpur(May 2010) [Report]

    This study focused on the comeback of divided Germany into the world market. The focus of this work was largely on the dynamics that played in the trade between the two parts of a country that had just got over a dictatorial regime and a devastating war, in which it was on the losing side, and especially when the two parts' governments were adhering to two ideologies that were hell bent on proving each other wrong in a cold war just about to grip the world.
  • Communal Violence in Kandhamal, Orissa: The Multipolar Truth

    Project under the Guidance of Dr. Amman Madan, IIT Kanpur(December 2009) [Report]

    My study focused primarily on the class and caste dynamics and the games of politics that led to the then recent surge in communal violence and tension between the Hindus and the Christians in now infamous Kandhamal, a district of Orissa. The report didn't concentrate too much on the incidents that happened, neither was it able to find a valid solution to the present problem. But, the reasons that led to the then scenario had been dealt with rigorously, with special emphasis on looking beyond only religious conflict issues. In fact, it was found that religion had little relation with the pogrom that ensued; it only acted as a catalyst whereas caste dynamics and nasty identity politics were the real culprits.