rohan's github.io Curiosity has its own reason for existing

Getting comfortable with the Unix pipe

Files ain’t going away anytime soon, son. If you’re a developer, engineer, technical person who’s expected to know something about coding, you’re gonna be working with files. You might as well learn how to get glean some information from it without busting open excel, lest you want to be mistaken for a business person. Pipe it like it’s hot The Unix pipe is the weapon I use to get a quick picture of what’s going...

Writing on Medium

I found that Medium is a great platform for sharing all kinds of content, and on top of that it uses algorithms to recommend posts to other uses. I plan to keep writing technical posts using my github.io, but write more mainstream articles on medium. Here are some of my posts on Medium Emacs Org Mode will improve your software engineering I trained a machine to distinguish between Trump and Clinton. Here’s what it learned...

NLP on Hip Hop Music Part 1

How has Hip Hop changed over the years? It’s a challenging question that is subject to many viewpoints and analyses. Music has changed so much over the years in so many different ways. New artists continue to spring to prominence. Production technology evolves rapidly, allowing for more iterations and interaction of different genres. The internet allows for more collaboration. A small but interesting part of this change lies in how language is imbibed within Hip...

Analyzing Twitter Part 3

In the last post, we looked at one way to analyze a collection of documents, tf-idf. This weighting technique is extremely common in Information Retrieval applications, and it helpful in favoring discriminatory traits of a document over nondisciminatory ones such as ‘Obama’ vs. ‘the’. One issue encountered while performing tf-idf weighting on tweets is the short, constrained nature of tweets. This creates an upper limit on the Term Frequency, reducing the importance of that portion...

Analyzing Twitter Part 2

In the last post, we motivated why Twitter is interesting and got started on acquiring a corpus of tweets. In this post, we’ll be talking about getting acquainted with the data. Instead of looking at our data set as such a data set, we’ll slice it in a couple of ways to become familiar with it, and understand what we’re working with. tf-idf tf-idf stands for term frequency-inverse document frequency. The definition, supplied by Wikipedia,...

Analyzing Twitter Part 1

Why Twitter? There is something fascinating about Twitter. It quivers and shakes. It sings and shouts. It wakes up and sleeps. Each tweet is like a breath of the shared consciousness. If the stock market is a way to track what people think of the market, Twitter is a way to track what people think is important. The restricted character limit encourages brevity and hence creativity. At the same time, the social engine churns out...

Using the Right Datastructure for the job

Searching for a key The process of choosing the right data structure is one that few will ever think about, one that most take for granted. In my experience, this lack of curiosity, and more importantly, an aversion to experimenting and benchmarking can lead to unnecessary overhead. Let’s take an example. One of the most common operations of any software I’ve seen written is finding a key and acting upon that, whether that means checking...

Return Value Optimization

Motivation After about a year and a half of coding in C++, things are finally starting to click. Learning the syntax is just the tip of the iceburg, what’s more important is the underlying behavior and possible side effects of our choices. For example, what happens if you don’t make an accessor of a member function const? Usually, nothing - but as time goes on, and more engineers modify the same component, the chance of...