rohan's github.io Curiosity has its own reason for existing

NLP on Hip Hop Music Part 1

How has Hip Hop changed over the years? It’s a challenging question that is subject to many viewpoints and analyses. Music has changed so much over the years in so many different ways. New artists continue to spring to prominence. Production technology evolves rapidly, allowing for more iterations and interaction of different genres. The internet allows for more collaboration. A small but interesting part of this change lies in how language is imbibed within Hip...

Analyzing Twitter Part 3

In the last post, we looked at one way to analyze a collection of documents, tf-idf. This weighting technique is extremely common in Information Retrieval applications, and it helpful in favoring discriminatory traits of a document over nondisciminatory ones such as ‘Obama’ vs. ‘the’. One issue encountered while performing tf-idf weighting on tweets is the short, constrained nature of tweets. This creates an upper limit on the Term Frequency, reducing the importance of that portion...

Analyzing Twitter Part 2

In the last post, we motivated why Twitter is interesting and got started on acquiring a corpus of tweets. In this post, we’ll be talking about getting acquainted with the data. Instead of looking at our data set as such a data set, we’ll slice it in a couple of ways to become familiar with it, and understand what we’re working with. tf-idf tf-idf stands for term frequency-inverse document frequency. The definition, supplied by Wikipedia,...

Analyzing Twitter Part 1

Why Twitter? There is something fascinating about Twitter. It quivers and shakes. It sings and shouts. It wakes up and sleeps. Each tweet is like a breath of the shared consciousness. If the stock market is a way to track what people think of the market, Twitter is a way to track what people think is important. The restricted character limit encourages brevity and hence creativity. At the same time, the social engine churns out...

Using the Right Datastructure for the job

Searching for a key The process of choosing the right data structure is one that few will ever think about, one that most take for granted. In my experience, this lack of curiosity, and more importantly, an aversion to experimenting and benchmarking can lead to unnecessary overhead. Let’s take an example. One of the most common operations of any software I’ve seen written is finding a key and acting upon that, whether that means checking...

Return Value Optimization

Motivation After about a year and a half of coding in C++, things are finally starting to click. Learning the syntax is just the tip of the iceburg, what’s more important is the underlying behavior and possible side effects of our choices. For example, what happens if you don’t make an accessor of a member function const? Usually, nothing - but as time goes on, and more engineers modify the same component, the chance of...