Sagar's Homepage

Research Summary

I work on different technologies related to social robotics. Social robots need the ability to perceive the environment through vision, haptics and laser scanners, communicate with humans through gesture and speech, and navigate on its surrounding. As such, my work consists of a fusion of mobile robotics, computer vision, speech processing, language processing and multi-agent system.

Software development in another major part of my work. I have co-worked on a chatbot product installed in a banks' website and messenger platform.

Interests

Multi-Agent Systems
Machine Learning
Mobile Robotics
Computer Vision
Natural Language Processing

Research Projects

Multi-Agent Pickup and Delivery (MAPD)

Collision-free, time-optimal path planning and plan execution for multiple robots assigned with different pickup and delivery tasks in an online setting.

In an automated robotic restaurant, multiple robots are equipped with food pickup and delivery task. Whenever a new food delivery task is added, a centralized server computes a non colliding path for an idle robot with regards to other paths currently under execution by other robots.

Conflict Based search (CBS), the state-of-the-art solution to Multi-agent Path Finding (MAPF) problem, was implemented along with Post-MAPF that was an implementation of MAPF with kinematic Constraints. Each time a new order appears in the system, the new agent is added to the MAPF solver with constraints derived from the paths currently under execution (see Token Passing algorithm ).

The difference between a MAPD and MAPF is that in the former, new tasks (navigation from non-task endpoint via pickup then delivery to non-task endpoint) are generated constantly which need to be assigned to a free agent. In the latter case, all tasks are input to the solver at once and the output is the non-colliding path for each agent.

Due to the discrete-time and grid-world assumptions of CBS, a more relaxed derivative of CBS called the Continuous time CBS (CCBS) is being adapted for the MAPD task. CCBS uses Safe Interval Path Planning (SIPP) as the low-level search algorithm. CCBS is essentially a MAPF solver. By implementing token passing with CCBS, we are trying to solve the MAPD task for our waiter robots.

Autonomous Navigation of Mobile robot

Mapping, Localization, Path planning and Control of mobile robot in indoor environment.

[Video]

In order to enable a robot to move from any start location to a goal location, the robot generally follows three steps, namely - localization, path planning and control generation. For navigation, the robot generally requires a map of the environment on which to localize and plan path. Mapping can either be done in advance or online simultaneously with localization.

Navigation is a key element for social robots. In our waiter robots, we have used occupancy grid mapping, particle-filter based localization, A-star based global planner and Timed Elastic band local planner. The robot is equipped with 2D laser scanner, Intertial Measurement Unit (compass + gyroscope) and encoders. Data from compass and gyroscope is fused using Kalman filter to provide an estimate of the orientation. Odometry information and laserscan data is fed into the navigation pipeline. Adaptive Monte-Carlo Localization uses the scan and map to score the particles using different sensor models. Using the location output from the localization step, A-star global path planner plans distance-optimal path from the current location to the goal location. In order to execute the resulting global plan taking into consideration the kinematic constraints of the robot, a local planner further modifies the trajectory and generates control signal to track the trajectory. Continuous replanning of global path is done to account for the deviation from the initial global plan due to imperfect plan execution.

Localization, mapping and sensor fusion approaches used in our robots are among the most popular probabilistic robotics algorithms. More details can be found in the book.

Business Assistant Robot Pari

Robot that can chat, recognize faces and navigate

[Video]

Query: A Chatbot for your business

Software that automates customer’s query deployed in two major banks in Nepal.

[Link]

I have co-worked on a major software product named Query. It was mainly designed as a query answering system for financial institutions. It answers banking related queries, can show paths to nearest atms and branches, can register complains from the users and is also connected to the bank’s Core Banking Services(CBS). With CBS connection, the chatbot can fetch account related information like balance and mini-statement. The chatbot can be deployed in facebook messenger as well as bank’s website. So far, it has been deployed in two major banks in the country -- Global IME Bank and Everest Bank LImited.

Query uses an open-source Natural Language Understanding engine and chatbot stack called RASA which is on par with commercial natural language understanding engines like Wit.ai, IBM watson etc [see this paper]. Intent classification and entity extractor works on the given query and the result is used to select appropriate response from the pool of responses available.

In addition, a database of information related to bank was maintained that contained details about ATMs, branches, major banking products, interest rates, etc. The chatbot could fetch appropriate information from the database based on the decoded query or user interface built to interact with the customer. A powerful admin panel was built in order to update the database, view analytics, user information and query logs. A demo version can be found at https://globalimebank.com/chatbot.html

Large Vocabulary Continuous Speech Recognition for Nepali Language

DNN- and GMM-HMM models for 50k vocabulary continuous speech recognition task in Nepali language

[Paper]

Speech recognition is an important part of conversational AI. A plethora of publications exist comparing performance of various methods such as speaker adaptation, discriminative training, DNN models, end-to-end models for popular languages like English and Mandarin. Nepali is a low resource language which bear significant phonetic and grammatical differences with these popular languages. As such, we (a team of two) conducted a comprehensive study of speech recognition in Nepali which involved building the entire phonetic dictionary with 50k words, language models and several GMM and DNN based acoustic models using different methodologies popular in the speech recognition literature.

As Nepali is a phonetically typed language, in many cases one to one correspondence can be found between the symbols and phonemes. Therefore, a lexicon (phonetic dictionary) was built from rule based phonetic transcription algorithm. In addition, several rules checking where done to determine schwa deletion that can happen in the middle or end of the word. The phonetic dictionary was manually checked for correctness. Secondly, N-gram language models were built from the web-scraped data, books and newspaper articles with more than a million tokens. 70+ hours of speech data was collected at Paaila Technology, which was merged with open-source 100+ hours speech data. Lastly, GMM and DNN based HMM models were trained with the corpus thus built.

Several methodologies exist to optimize the classification task and adapt to a speaker to ease or improve the performance of the models. Transformation of feature space using Linear Discriminant Analysis for better inter-class separability, sequence discriminative training for discriminative training of classifier, and speaker adaptation for adapting the models or features to better fit a certain speaker class are some popular techniques in the literature. In order to train DNN models, a baseline GMM model is first used to provide class labels to each frame of the training data. Three different DNN models were trained which included RBM pre-training, training with p-norm nonlinearity, Minimum Phone Error discriminative training, and LSTM and TDNN models.

ABU Robocon 2015, 2016

Badminton playing robots, and Eco and hybrid robots for theme based robot competition between countries of Asia-Pacific region.

[2016 Video]
[2015 Video]
I was a team member from team Nepal in an International Robot Contest, ABU Robocon (https://en.wikipedia.org/wiki/ABU_Robocon) for two consecutive years- 2015, Indonesia and 2016, Thailand. We were able to bag four awards from the two contest namely -
- Best Engineering Award
- Best Idea Award
- Mabuchi Motor Award
- Panasonic Award
ABU Robocon is a theme based robot competition in which the participating teams have to build robots that can perform specific task assigned by the organizer. The task in 2015 involved badminton playing robots where we built two robots that could play badminton with the two other opponent robots. In 2016, we built a hybrid robot and an eco-robot. The eco-robot had to be propelled by hybrid robot along a narrow path without contact and the hybrid robot had to install a propeller by climbing a pole.

The robots had line tracking sensors based on photo transistors. PID controllers were used to stay on track while moving forward. Odometry information and map of the game field were used to localize the robots and determine actions to execute.

360 Video Stitching

Panoramic stitching of video feed from twelve different cameras.

Video feeds from 12 cameras arranged in dodecahedron shape had to be stitched in real time to produce a complete 360 degree panorama. The idea was to estimate homography of each camera in advance and use the homography to stitch every frame in real time.

For this purpose, we started with the panoramic Stitching pipeline from the OpenCV which is an adaptation of David Lowe’s famous paper on image stitching). Firstly, lens correction, using barrel distortion model, is performed on the images to undistort the original image from fish-eye lens(our case). The resultant image is used to find features. We tested with several feature detector algorithms like SIFT, SURF, Akaze, and ended up using combined features from multiple detectors as the overlap between the images was very small, resulting in very few features being detected in the overlapping region with a single detector. Since, the position and orientation of the cameras were fixed, a pair of masks was prepared for each overlapping image pair that kept features in the non overlapping region from being matched so as to improve homography estimation.

Sagar Shrestha

Paaila Technlogy Pvt. Ltd | Thapathali Campus

Tribhuvan University

Research Summary

Interests

Research Projects

Multi-Agent Pickup and Delivery (MAPD)

Autonomous Navigation of Mobile robot

Business Assistant Robot Pari

Query: A Chatbot for your business

Large Vocabulary Continuous Speech Recognition for Nepali Language

ABU Robocon 2015, 2016

360 Video Stitching