Work experience

2020 - Now: Amazon Lab126
Software Development Engineer II

  • Working on Amazon Astro

2019 - 2020: Amazon Lab126
Software Development Engineer I

  • Worked on Amazon Astro

2014 - 2019: Zenva
Technical Writer

  • Writing in-depth articles on Computer Vision, Natural Language Processing, and Machine Learning for those new to the field

Summer 2018: Amazon Lab126
Software Development Intern

  • Worked on Amazon Astro

2015 - 2018: Department of Computer Science and Engineering, The Ohio University
Undergrad Researcher

  • Conducted research for Honors Thesis in structured information extraction from images

Summer 2017: Amazon Lab126
Software Development Intern

  • Created framework for visualization of various sensors and measurements
  • Designed and developed data structure, creation, and update for occupancy map from sensor data

Summer 2016: Hyland Software Inc.
Software Development Intern

  • Designed and developed a cross-platform C++ PDF bookmark parser library
  • Integrated the library and PDF bookmarks into next release of OnBase NextGen iOS app

Summer 2015: Air Force Research Lab
Student Researcher

  • Team lead for Intrepid project: a C++ system for reconstructing 3D environments from point clouds
  • Speaker on Intrepid project at the National Aerospace and Electronics Conference (NAECON 2015)
  • Keynote speaker on 3D Robotic Vision at internal conference

Skills

  • C/C++
  • Linux
  • Robotics, Perception
  • Quantum Computing
  • Computer Vision
  • Machine Learning & Deep Learning
  • Python

Open-source Contributions

Education

  • B.S. with Honors Research Distinction in Computer Science and Engineering, The Ohio State University, May 2018

Awards

  • Dean’s List
  • Honors Undergrad Thesis Scholarship
  • Honors Program with Scholarship
  • Computer Science and Engineering Department’s Undergraduate Research Award
  • 3rd Place Undergrad 3-Minute Thesis Competition
  • 2nd Place in 2014 OHI/O Hackathon and Finalist of 2015 OHI/O Hackathon

Publications

INSERT From Reality: A Schema-driven Approach to Image Capture of Structured Information Published in The Ohio State University, 2018

Recommended citation: Deshpande, Mohit; (2018). INSERT From Reality: A Schema-driven Approach to Image Capture of Structured Information. Honors Undergrad Thesis.

http://hdl.handle.net/1811/84556

There is a tremendous amount of structured information locked away on document images, e.g., receipts, invoices, medical testing documents, and banking statements. However, the document images that retain this structured information are often ad hoc and vary between businesses, organizations, or time periods. Although optical character recognition allows us to digitize document images into sequences of words, there still does not exist a means to identify schema attributes in the words of these ad hoc images and extract them into a database. In this thesis, we push beyond optical character recognition: while current information extraction techniques use only optical character recognition from structured images, we infer the visual structure and combine it with the textual information on the document image to create a highly-structured INSERT statement, ready to be executed against a database. We call this approach IFR. We use OCR to obtain the textual contents of the image. Our natural language processes annotate this with relevant information such as data type. We also prune irrelevant words to improve performance in subsequent steps. In parallel to textual analysis, we visually segment the input document image, with no a-priori information, to create a visual context window around each textual token. We merge the two analyses to augment the textual information with context from the visual context windows. Using analyst-defined heuristic functions, we can score each of these context-enabled entities to probabilistically construct the final INSERT statement. We evaluated IFR on three real-world datasets and were able to achieve F1 scores of over 83% in INSERT generation on these datasets, spending approximately 2 seconds per image on average. Comparing IFR to natural language processing approaches, such as regular expressions and conditional random fields, we found IFR to perform better at detecting the correct schema attributes. To compare IFR to a human baseline, we conducted a user study to find the human baseline of INSERT quality on our datasets and found IFR to produce INSERT statements that were comparable or exceeded that baseline.