Undergraduate Research
I recently graduated with a B.S. in Computer Science and Engineering this May, and I’d like to share my experience as a undergraduate researcher as well as some tips and tricks that I’ve accumulated over the past 2.5 years. Hopefully, this guide will be useful to someone, somewhere that is thinking about getting involved with undergrad research but doesn’t quite know how to get started. I’ll discuss how to get started, what to expect, and some useful pieces of advice that may help you during your undergrad research career.
DISCLAIMER: I was a computer science and engineering major at a fairly large university (Ohio State University), so many of these points may seem very engineering specific.
Getting Started in Undergrad Research
The beginning of my story is a bit unconventional compared to what I’ve heard from other undergrad researchers. During my first year at Ohio State, I participated in the OHI/O Hackathon with a team of two other undergrad freshmen in my Intro to Engineering class. We ended up winning 2nd place with our todo-manager Android app that used geofences for location reminders. (This was in 2014, before location reminders were integrated in stock apps like Google Keep and Apple Reminders.)
Afterward, the faculty advisor for the Hackathon, Dr. Arnab Nandi, approached my team and asked if we were interested in joining his research group. Before we made up our minds, he invited us to a weekly meeting. I attended the meeting along with another member of my hackathon team. The work the PhD/Masters students were doing certainly sounded interesting to me, even though I didn’t really understand most of it. I joined the lab the first semester of my second year.
This also brings up a good point on when to start getting involved with undergrad research. I’ve learned that starting your second or third year works out the best. By that time, you have enough requisite knowledge that a professor feels comfortable allowing you, an undergrad, to join their lab. Even when I joined, I didn’t know all of requisite knowledge, but it was now my responsibility to learn whatever knowledge I needed. Furthermore, I was more accustomed to college in general by my second year. The first year is fairly time-consuming with introductory classes and whatnot.
My experience getting started was a bit unusual since professors usually don’t have the time to actively recruit undergrads to their lab. Rightfully so, the task is placed on the student to look into which professors they would like to work with based on the research that particular professor is conducting.
The first, and most important step, is finding something you’re interested in! Poke around the Internet and find a field that’s interesting to you. Maybe that’s computer vision or databases or natural language processing or cybersecurity or operating systems or something else.
After you’ve found a topic of interest, vigorously learn about that field and get caught up in the latest advancements. This means reading research papers. If you’re serious about undergrad research, you’ll certainly have to learn how to read papers, so you might as well start now! Reading papers before approaching professors serves two purposes: you can determine your interest in a field and professors are more likely to work with you.
You can assess your interest in a field by reading cutting-edge work going on in that field. You’ll certainly spend time looking up background information, but that only helps your assessment. Research is on an entirely higher level than general knowledge of a topic or what you’re taught in class. Starting out, you may not have strong knowledge in your field of choice. But that’s okay! If you’re really interested in a topic, you’ll learn it and be up to speed in no time!
Reading papers also makes it more likely that the professor you approach will agree to work with you. (This assumes they have the time and resources!) Knowing what’s going on at the bleeding-edge of your field shows that you’re aware of the challenges in that field and that you’re highly motivated. Highly-motivated, genuinely-interested undergrads are what professors are looking for!
After you’ve decided on a topic area, look at the professors at your university that are working in the same field. At every university, the CS/CSE department page is going to have a list of different research areas or topics and the professors associated with them. These are still fairly broad so poke around list of publications of professors to see their exact interest. Read a few of their papers and try to come up with some genuine questions or points of discussion.
Set up a time to meet and discuss research opportunities. Remember that professors are looking for highly-motivated students. If you’re applying to a computer vision lab, you don’t have to be an expert in computer vision. That’s part of the learning experience. But you do have to convey the message that you’re willing to learn and get up to speed quickly. (Remember when I discussed reading papers before meeting with the professor?) If you’re successful in driving your point across, that professor will become your research advisor!
What to Expect Working as an Undergrad Researcher
Once you have a research advisor, most of the time, you’ll be paired with a grad student that’s already working on a project. You’ll help the grad student code prototypes, run experiments, or even write small sections of the paper that the grad student is responsible for writing. Even though some may consider this as “grunt work,” it’s also an opportunity to learn how to do all of these things, e.g., designing experiments, writing papers, and developing insights, from someone with experience. Ask questions, solicit feedback, and learn everything you can. This advice will be helpful when you’ll have to take the reins in the future. If you’re considering grad school, this is a great chance to ask questions about life as a grad student.
The primary outcome of your collaboration with a grad student is usually a publication where you’ll be listed as second or third or fourth author. Having a publication as an undergrad is a big deal. It shows that you’re not completely green when it comes to research. Even if you didn’t contribute grand insights or novel algorithms, you contributed something that helped the paper get published. Some experience, however small, is infinitely better than no experience.
However, if you’re pursuing an undergrad thesis like I was, you’ll be put in charge of your own research project. This is quite difficult but extremely rewarding. Starting your own novel work means you’ll be the first author and might have a first author publication, which is a big deal as an undergrad. You’ll still be working closely with your professor (or a postdoctoral researcher, i.e., a postdoc, or a PhD/Masters student) since you’re just starting in research!
The summer before joining the lab, I was working on computer vision research so I was very interested in that. When I met with Arnab, I emphasized my interest in applied computer vision, and he mentioned a new research direction involving vision-based human-computer interaction with databases. So I was put in charge of my own research working on information extraction from images. At the time, it was just myself and Arnab so he made suggestions on what to look into and which papers to read to get me started.
During this time, I was still a full-time student taking classes and participating in other extra-curricular activities. Research was another activity that I had to balance doing along with coursework. (One of the requirements at OSU for completing an Honors Thesis is a fairly high GPA.) While I was learning to balance these, I learned that research ideas appear more “sporadic” than regular classwork. For example, several times during the semester, I would think of some great improvement and get to work implementing it right away, which usually took a fair amount of time. This was time I wasn’t spending on classwork so I admit I occasionally submitted shoddy work or no work at all. The reverse usually wasn’t true: I usually prioritized paper deadlines over classwork since they were so infrequent and classwork was very frequent, besides exams and large projects worth a large chunk of my grade, of course. Those paper deadlines were the only points in time where I thought I had difficulty balancing research and coursework: there was the occasional deadline missed from exams and miscellaneous classwork. Remember that you’re a student at the university whose overall purpose is to graduate with a degree so dropping a ton of points in classes generally isn’t a good approach to undergrad research.
After a few semesters, a postdoc, Behrooz Omidvar-Tehrani, joined our group and my project. I brought him up to speed, and we started working on writing a paper for a submission. During those semesters, I learned everything I could about paper writing from Behrooz, the other PhD/Masters students in the lab, and Arnab. Since the bulk of the research itself was completed, the only major remaining part was the experiments section, yet another thing I learned how to do well. We discussed and performed the experiments and user studies and everything was coming together quite nicely!
By the time he left his postdoc at the university, we had a full 12-page conference paper ready to go! We submitted this work proudly to VLDB 2017 and were rejected! So we worked on a smaller version of the research, particularly highlighting the application: a demo paper. Again, we submitted this work to SIGMOD Demo and were rejected. From the reviews of both, we kept making changes to the work. We submitted the paper to several other venues and were rejected each time! However, our reviews were a bit better each time which indicated we were improving our paper. Toward the end of my undergrad career, the paper solidified into my undergrad thesis which I successfully defended and gave me the right to graduation “with Honors Research Distinction”.
I was a bit disappointed that I didn’t manage to publish a first-author paper as an undergrad, but the experience in technical and concise writing, designing and running experiments, offloading non-contribution work, and many other valuable skills that I picked up along the way was completely worth it!
Undergrad Research Tips and Tricks
Now that I’ve explained how to get involved with research and shared my own story, there were many little lessons that I’ve chronicled along the way that didn’t fit so neatly into my linear story. So here are some of the lessons that I’ve learning working on my research:
Maintain a website. I highly encourage you to make your own site that highlights your research experiences. Have a blog section of your website where you can write about topics. The more you write, the better you’ll get at writing. For example, write about a position/stance and back it with evidence; this helps your argumentative writing. Write a tutorial with code to help jumpstart/springboard other aspiring researchers; this helps you maintain your coding skills with a particular library/language. Write a post where you explain a complex topic; this helps you learn the topic and get better at explaining complicated material. (I recommend using a lot of pictures!) This last skill is one that I hold in the highest regard: explaining complicated topics simply but completely. Regardless of where you go and what you do, the ability to explain complicated topics concisely, simply, and completely will always be highly valued!
Go to the weekly meetings. This helps hold you accountable to the rest of the lab. Additionally, it gives you the opportunity to ask for help from other lab members if you are blocked with a particular framework or language. Countless times, someone in the lab says “I’m having issues trying to do this with X” and someone on a different project says “oh yeah, I’ve worked with X before and you have to do this: …” Along a similar vein, sharing your updates helps prevent duplicate work if your lab members are working on different aspects of a similar system or architecture. Duplicate work is a huge waste of time so prevent it the best you can by communicating! All of that being said, I admit to skipping the weekly meetings when I was swamped with homework, an upcoming paper submission, or work. I believe these to be valid excuses, but “I don’t feel like it” is not!
Reading papers efficiently. I should start this section by immediately saying that everyone has a different way of reading papers.
When I was first getting started in undergrad research, I had never completely read through a research paper. The first paper I read took me well over an hour to fully digest. I was shocked by how dense and terse it was. I was also reading it very linearly, from start to finish. My problems were twofold: lack of experience reading papers and my linear reading strategy. I was treating the reading like a textbook. As I read more and more papers to familiarize myself with the field, I became better at flying through the lingo and terminology and picking out the important parts. Now, I can read through a conference paper very quickly and remember the important points in minutes. Reading papers is just like any other thing: you get better at it by doing it more. Just like any other acquired skill, don’t expect to be fantastic at it the first time you do it. You’ll start to develop your own reading techniques that help you digest and understand the contents in an efficient manner.
Never reject an idea until you’ve spent quality time thinking about it. When I was getting started, my mind was flooding with ideas for new approaches or techniques. But I was so uncertain about them that I dismissed most of them entirely. What if some of these actually turned out to be really good ideas? I’ll never know! During these brainstorming sessions, fully write down an idea and move on to the next one. Don’t look back until you’re finished and don’t suppress any ideas. Quit judging your own ideas at first glance! I mean it! After you’re finished, spend time with each idea to assess its worth and relevance.
A few weeks before a deadline, I came up with a way to improve the use of the heuristics in my work by building probability distributions. Before, we were using boolean functions, but, after coming up with the new score matrix and classifier approach, I spent most of that weekend implementing it, knowing I’d have to completely re-write the entire heuristics section. I figured I was going to re-write a few parts anyways so might as well add that section to the list. Fortunately, this new approach worked much better than the older approach, and the overall performance increased!
Look out for undergrad grants. At Ohio State, agreeing to work on an undergrad thesis, Honors or non-Honors, provides you with some funding. There are also several other opportunities for getting money for research besides asking for a chunk from your advisor’s grants! Being able to do this on your own can help beef up your C.V. and make you a more independent researcher, and any potential prize money is icing on the cake!
Don’t re-invent the wheel. A significant part of my work’s system architecture is the optical character recognition (OCR) which takes an image and converts it into text. I spent the better part of a few months trying to effectively build my own OCR. In retrospect, that was a terrible idea. What I should have done at the start was conduct a thorough literature review and found some existing libraries that I could have used or could have helped simplified the process. Using this literature review, I would have had a better idea of what has been done before, and I could have used the state-of-the-art technologies instead of trying to roll my own subpar version.
Similarly, it’s best to iron out your work’s primary contributions out as soon as you have an idea of what they are for the same reason: you shouldn’t waste time finding new approaches for things that aren’t your primary contributions. Before considering any auxiliary component, see if someone hasn’t already built what you’re looking for already. If they have, use it! Why waste time building something from scratch that someone else has already done?
Eventually, we decided to treat OCR as a commodity, so I ended up wasting just a few months trying to effectively roll my own OCR. Though it would have been even more of a waste if I didn’t learn this valuable lesson from it! Literature review is the first thing you should do!
Metric-driven Development: Define and use metrics early on. I didn’t really do this until about December of 2016, a few months before first submission date. But when I did, I managed to do the bulk of the work and improvements in that month. The largest portion of work and new insights came when I actually defined metrics and used them. When I first found values for those metrics (we used precision and recall as quality metrics), the results were appalling. This was not looking good…
But at least I had metrics! I now had a concrete way of answering the question “how good is my work?” And the current answer was “it’s dreadful!” From that point, I could determine which quality values were particularly low and better understand why. Then, I tried to find a generic heuristic that I could code and apply to all of the documents. After writing up that heuristic, I recomputed the quality metrics and started the process over. Through this iterative approach, I was able to come up with most of the new heuristics that we described in the paper as well as how one would go about defining these heuristics.
Looking through the documents in the dataset that result in the lowest quality also helped provide insight into why our heuristics weren’t good enough. If it was unavoidable, then that’s something to discuss in the paper. If there was a way to generalize a new heuristic over it, then I would do just that and watch the quality improve. In either case, it was a win-win! All of that hinged on defining what “quality” meant for IFR.
Later on, I went back and developed a framework for experiments. All of my ground-truth data were logically organized and easily accessible from the experiment code. In retrospect, creating an experiment framework would have helped simplify the experiment code and minimized clutter. It would have also future-proofed experiment code, making it easier to do more experiments in the future. I used the experiment framework I created right after the SIGMOD submission to run more experiments, and the framework made it really easy to quickly write experiment code and run.
Similar to the point above, know your statistics! If it’s been some time since you took a statistics course, revisit concepts like significance tests. Also, become familiar with the metrics that are used in your field. For example, much AI research uses precision, recall, and F1 scores instead of simply accuracy.
In some cases, standard metrics, like precision, recall, accuracy, etc., might not be as descriptive for your particular work. In these cases, you’ll have to come up with your own metrics and evaluate them against your work. In my experience, this happens more commonly in human-computer interaction systems or most work involving user interfaces since the metrics are dependent on what you’re trying to measure with the interface/interaction. For a representative comparison, make sure evaluate your new metrics on other researcher’s work so you’re consistent! Or, at least, evaluate your new metrics on variants of your own work! These experiments, along with giving a thorough explanation of the metric, also helps dispel any fishiness about why you needed it in the first place.
Learn from others and get feedback. Write with someone who knows what they’re doing. I had never written a scientific paper/thesis before so I had no idea what I was doing. I had certainly read papers and articles before, but writing is much different than reading. If you’ve never done it before and try to do so on your own without any guidance at all, it will not end well. While you may be able to conduct the research itself without anyone’s help, you’re going to need help when writing. After all, the paper/thesis is supposed to document the work you did!
I was pretty clueless until a postdoc joined our research team. Having numerous publications, he clearly knew what he was doing, and that was fantastic. When we were actually writing the paper, he told me what sections to work on and the kinds of content and arguments to put in each section. I happily did as I was told because I was just glad that at least one of us knew what was going on. (Hint: it was not me.)
In fact, he had published to the same conference we were submitting to so he also knew the likes and dislikes of reviewers of that particular conference. That was even better because we could predict potential issues that the reviewers might point out and address them before submitting.
That being said, I didn’t just blindly write sections of the paper; I was paying attention to “paper-design” and was beginning to understand why the content of sections were that way. Just like learning any other skill, writing a paper is easier to understand if you’re actively engaged in doing it.
Know your audience. A related point to the above one: when you submit to a conference or journal, the reviewers reading your work are going to be experts in the field. This is different than undergrad competitions or contests where the judges may be from other fields or not even in computer science at all! In fact, in many cases specifically for undergrads, your audience will most likely not know anything about your field or even computer science! Remember when I said “you should get better at explaining complicated topics simply and completely”? This is where that skill can really shine! Just be aware of the exception to this rule: your undergrad thesis defense. This is where your audience are experts in the field and know more about it than you!
Writing the paper/thesis will probably require more work and effort than actually conducting the research. You have been warned! When you’re actually in the weeds, coding new approaches, uncovering insights, and running experiments, it’s really easy to get lost in the excitement. At the end of the day, you need to write it all down. Instead of writing all of the code and running experiments then writing the paper/thesis all at the end, it’s less stressful to write it as you go. This way, the approaches and techniques will be fresh in your head. In the end, you’ll probably do more re-writing, but it beats sitting down for hours and churning it all out at once! This being said, I have tried both approaches and the former works better than the latter!
Try to finish a draft of the paper a week or two before the deadline. There is no worse stress than trying to write 75% of the paper days before the submission deadline. Lots of little mistakes are made during these rush hours. To prevent all of this, aim to finish a draft your paper a week or so ahead of the deadline. That gives your advisor time to read it and comment as well, and that starts the review-revise cycle which usually doesn’t just take 2 attempts. You can also put it up for internal review to your lab at that time, but give the other group members some time as well, i.e., avoid submitting it for internal review the day of the deadline. You’re far more likely to get others to respond if you give them ample time to read and digest.
Don’t neglect code quality. At the start, code structure and architecture isn’t going to be your top priority. As deadlines approach, your code quality will noticeably drop, and the code repository is going to look like cluttered mess of experimental results, code snippets, project files, and hacks. Defining an architecture early on is usually not possible because you’ll probably discover some really cool/new approach. But making your code easy to read, change, and maintain will help when this discovery happens.
One thing I didn’t anticipate was storing the dataset. If you’re using an off-the-shelf dataset, you usually don’t have to worry about this as much. The creator of the dataset will have very structured file formats for storing the input and ground-truth data as well as documentation. However, if you’re using your own dataset, it has to be stored cleanly, i.e., it shouldn’t be fragmented across your repository. Pick the simplest structure you can get away with and stick with that!
Don’t rush when planning a user study. Planning a user study, like writing, is also easier if you work with someone who knows what they’re doing. As an undergrad, you probably don’t have any “real” experience with experimental design involving human subjects. Sure, you might have taken a statistics class where they lightly cover some experimental design and techniques, but there’s a huge gap between knowing textbook concepts and actually applying them in the real world.
As the old carpenter’s adage goes: “measure twice, cut once.” Spend that extra time designing your user study, because when you start, there is no going back! Redoing user studies is incredibly time-consuming, may be expensive if you’re paying your participants, and generally infeasible. (Though they should want to help you out for free because science is important!)
Make user studies easier on yourself and your participants. Set up a data-collection framework so you’re not frantically writing down or typing what the participant is saying. For my work, I’ve built a simple Mac app that essentially pages through questions as well as a web version that does essentially does the same thing but for a different user study. These collect information under-the-hood and caches them into files so I don’t have to actually write anything down! I just sit the participant down in front of the computer and stay there to answer questions! Very painless for both the researcher and participant!
To succeed, you must fail many times. This is the most difficult and important piece of advice I can give. (Notice how it’s the very last point?) Remember my story about how many times my work was rejected? I admit the first paper reject was unpleasant, to say the least. It was incredibly demotivating for a period of time, but I got over it because I believed in my work. (And I still do!) A few years later, I have a successfully-defended thesis!
Closing Remarks
Writing this post has made me look back on the past two-and-a-half years and think about all of the things that I could have done better. But it is in mistakes that we learn the most!
To those undergrads reading this post that are considering getting involved in undergrad research, it’s a fun ride! It has many ups, quite a few downs, and even some some loops. But at the end of the day, you go to bed knowing that you’ve contributed something, even if it’s a tiny sliver, to the infinite repository of all human knowledge. And that’s pretty awesome of you!