Saturday, July 18, 2015

Inflection

This is the 18th of July 2015.
Nothing happened to me today.

Saturday, July 4, 2015

Remembering Orkut

Back in the restless days of my undergraduate existence, I was a fairly active orkut person. I spent around two to three hours every week on orkut, and I never just surfed around - that time was spent actively messaging people, or 'scrapping' them, as it was lovingly called in orkut parlance - and I used to almost exclusively scrap girls from my high school who I had been out of touch with for over an year. Two to three hours a week might not sound like a lot to most millennials, but I am talking about 2006, and times, I kid you not, were simpler: walking half an hour at night for a Rs.10 glass of orange juice at a small corner in Rohini sector 16 used to be enough to get us boys all excited, so you can do your own calculations. Anyway, the advent of orkut had made me a bit of a popular kid among my friends. My friends were mostly small town guys: I don't think any of my close friends at DCE, except Adyansh, were Delhiites (Abhineet too, but he became a friend in 2007). At first, I had found them rather different from kids I grew up with in Delhi. It took me much longer to become thick with them than it took for them to get thick amongst themselves. But soon enough, I had not only accepted the differences, but rather realized that the ways in which they were different from me were mostly ways in which they were better than me, ways in which I'd like to change to be more like them. It has to be said, though, that one of those differences worked in my favor. I wrote more educated-sounding English sentences, and boy was that ever more prized than in the orkut days when every girl we wanted to talk to seemed to have a profile written in prim, pristine, Queen's English? True, there were also the 'i luv mah frnz' girls, but they were for some reason never the most sought-after ones. I rose to prominence in my hostel as the guy who could not only write you an orkut profile fairly eye-catching, but given the right incentives, might also write you a stellar 'testimonial'. Most of my zero-fee clients, engineers that they were, wanted the same things in their testimonial: that they don't study at all and still top their exams, that they were fun-loving and funny but were also daredevils ready to put themselves in any harm's way for their buddies. It is Delhi, mind you, and a facility with hot-blooded brawls was deemed to be a badge of honor for guys, even if the girl on the judgement chair were a Stephens educated, feminist, marxist, artist. One of my clients I wasn't a big fan of, let's call him NR. The testimonial I wrote for him read: "Basically he's a nice guy." I could have written "He's basically an asshole" to be more honest, but such testimonials were common and were supposed to be inferred as tongue-in-cheek jibes, intended actually to enhance the coolness quotient: 'we're so cool we insult each other with orkut testimonials', went the dictum. So that wouldn't have worked. But there was no redeeming factor for praise as mild as someone being "basically a nice guy", as if you couldn't find anything more remarkable to say about him even when writing a testimonial. It was the ultimate put-down. Expectedly, he didn't accept it. Moreover, the offense NR took exceeded even my expectations: he came to my room that night, and told me he had more 'fans' on his profile than I had 'friends'. ('Fans' was another orkut gimmick - basically you could choose to identify as a fan of any of your friends, and everybody's fan-count was displayed atop their profiles. It was the ultimate backscratching rehearsal for people entering the workforce.) I apologized, but he unfriended me anyway.

Monday, June 22, 2015

Talking to myself

When I started this blog a great objective was talking to myself. I talked to myself all the time anyway, but I felt that the written form would provide more structure and clarity to my conversations with myself. The reasoning was, and to this day I believe it to be true, that just by the virtue of writing writing something versus uttering it under my breath, I am able to identify things I knew but never knew I knew. And, of course, record it for future reference. In essence, I aimed to learn more about myself and how I think about other things. This was the objective, anyway. I daresay I came anywhere close to achieving it. Because soon after I started writing here, some people started telling me I wrote well or that I was funny or something. Then I forgot all about my objective and started trying to write in a way that would be considered well-written, or hilarious, or some such thing. It wasn't just the way of writing I altered but also the content. Because, of course, it was hard to write funnily about something inherently uninteresting to any audience wider than myself. So out went the window the idea of talking to myself, and I was writing mainly for applause. In striving to do so, I started working on my use of the language, tried to strengthen my grasp of the English language grammar, spent time reading books as well as just the dictionary and thesaurus to improve my vocabulary. Even read a couple of books on style and usage - "Mind the Gaffe" and "The Elements of Style" come to mind. All this, from what I recall, was around 2007, although it continued into much of the first half of 2008. It is 2015 now, I am 29 years old, and my preoccupations have changed so remarkably that I find it not merely surprising but mind-blowing that I had spent so much time and energy on something like that, especially since now it seems like the last thing I would bother about. I'm glad I put that time, though, because if I hadn't then I wouldn't ever.


While until this time, I compromised the objective of this blog for reasons related to the unexpected positive reception that some of my earlier posts received, beginning in 2008 I further compromised the objective for reasons a tad bit more convoluted. I fell for a girl who was into poetry, and started writing, yes, poems, or whatever half-assed cousins of poems that they actually ended up being. Soon, I found myself writing not for general appreciation, but a focused admiration, from the girl I wanted. Funny thing is, it continued well into after I actually was with her. I think it is fair to say that I had got so far away from the aforementioned "objective" that it is not fair to refer to it as the "objective" anymore. The objective now was just to have her see me as a clever, deep, writer. I wasn't even going for being funny anymore, and it's safe to say that I had lost most of my natural, unrehearsed funniness as well.


Lately, and that means for the last one year or two, I have felt really free to use my blog for its original objective. I haven't had any audience now for quite some time, so it has become easy for me to not fall for the temptation of writing for an audience. On the flip side, since writing well is not nearly high on my agenda, I end up writing short, rather to-the-point snippets. This one today is probably the longest post I have written in a very long time, I think.


I went skydiving the day before yesterday, with a couple of colleagues from work. They were not really friends, not even people I work regularly with, but just a couple of guys who, by some accident of common interest, got ready to take the plunge from 15000 ft. The skydiving experience itself was something I will not write much about, because the more I write about it the more I will be understating the magnificence of the true experience. I will post a video though to record the experience for the future:


I wish I knew how to change the thumbnail image on youtube.  Anyway, can only do so much ass-saving with a face like mine.




Thursday, April 30, 2015

Life as the sum of all embarrassments

All of life, I guess, is a giant exercise in gradually humbling yourself further. I find it amazing, and this is truly different from saying that I find it annoying, that there are any people at all who get more sure of what they know as they grow older.

Personally, I think, that the more I learn and get to know about things, the more it becomes embarrassingly clear to me that there is so, so much I do not know. One of the reasons I like working in finance is that it has so much to never come to know. (Yes, that sounds convoluted but that it what I wanted to say; it's not in error.)

At the start of this year, I made big, elaborate plans for what I will learn in 2015. 2 things have changed since then. One, I realize that to execute only them will take not one year but probably three, and two, that there are going to be many diversions on the way where I'd go about learning things I hadn't initially planned to, so in effect it will take more than three, maybe five or six. But the thing that I am not yet accounting for is that some of those diversions will become full-fledged highways in and of themselves, and that they will have their own diversions. And so on ad-infinitum. And with this, I'm pretty confident that the whole thing will take not five years but fifty, or however many I have left.

Which makes me wonder - is it worth planning for a period as long as an year, when so much changes during it? I would argue that there is value in planning, even when the failure of the plan is a foregone conclusion. It reminds us of the highway, when we're on our joyrides along diversions. It enhances our realization that the diversions are lovely, but also forces us to examine how lovely they are compared to what's at the end of the highway. And sometimes, to pave new paths  by which the diversion would eventually join the highway again.

That, I guess, makes us more creative..




Friday, March 27, 2015

Not normal

For the last three days I've been sleeping the whole time I'm home. After coming back home at 6 PM, I'm asleep at 6:20, waking up only at about 6:40 AM the next day, at which time I quickly get ready to work and reach office at 7:30. It's a miracle I've been coming to work on time all throughout. This isn't normal. It has got to be something in the medication.

Thursday, March 26, 2015

Lull

February and March haven't been great, healthwise. In February, I first contacted seasonal flu that lasted four days and just when I thought I had recovered, a week later I was down with a terribly irritating allergy that made breathing unbearable for a week. There is nothing like a blocked nose to make you realize how beautiful your immediately preceding life was. Then March began with a stomach that couldn't cope up with the move to India, and finally, a day before I was to travel back to the US, I contacted viral infection that led to bacterial infection and tonsillitis that I'm still battling. I'm recovering well and hoping to be fine in the next couple of days and hoping, also, that this will be the end of it for this year. All this has already set me back at least a month in my plans, and in addition to the time lost, what I worry about is the momentum lost. I think that it will now take me a couple of weeks just to get back into the groove of things and get the motivation going. Anyway. Hope that happens very very soon.

Saturday, February 21, 2015

Google Chrome

Never open more than 3 tabs at a time. If you must open another tab, first choose a tab to close.

Friday, February 6, 2015

Blogging platforms and moonlight

I'm contemplating moving the blog to wordpress. Since a big chunk of what I post nowadays tends to be code with all these greater than and less than signs, it totally throws the html on the blogger off. It is amazing to me that blogger won't add this small little feature that let's you insert code seamlessly with something like a code tag that wordpress provides. It is 2015 and it is very surprising to see that blogger would let so many users slip away but not pluck the low hanging fruit of adding code tags. I personally do not want to move the blog, but that is out of two entirely irrational reasons: one, that I'm lazy and do not have the time or enthusiasm to set up a blog from scratch, and two, that I've been on here for ten years now and it feels like home and so I don't want to leave.

I've been meaning to add tutorials for some great R libraries like dplyr, ggplot2, and shiny, and also some Python tutorials but before that I have fulfill the promise I made to myself of adding tutorials for the relatively uninspiring parts of R that form its basics. And, at any rate, to get started with any of that - it seems like moving to wordpress is the way. A couple of days ago, I tried to embed code snippets in earlier R posts but the results were clumsy and off-putting.

In any case, an interesting thing I learned today (let's assume I won't be adding anything else tonight): the earth's moon was once an asteroid that struck the earth and the hot molten ball that rebounded extremely slowly became the moon a couple hundred million years later. All that romantic poetry with moon as the evergreen metaphor seems a little incongruent now, doesn't it?

Monday, February 2, 2015

Some less talked about positives of the American Civil War

This is not a defense of wars, but the American Civil War had several positives coming out of it, and I thought I'd outline them briefly. Of course, it was at a great cost of 600,000 lives, about 2 percent of the entire US population at the time. Everybody knows about the biggest achievement of the war, which was also in large part the main engine behind the war: the abolition of slavery. Here I will mention some of the other, often overlooked positives:

1. For the first time, the recently discovered Bromine was used for healing and cleaning wounds. It improved standards of hygiene in wartime medical assistance in a big way, significantly reducing casualties where soldiers succumbed not to bullet wounds but to the ensuing gangrene.

2. Before the civil war, nursing was a primarily male occupation. With this war, the need for nursing outgrew the supply that men alone could furnish, bringing women in large numbers into the nursing profession. Alongside textile mills of the same period, this paved the way for women getting out of their houses for work in large numbers.

3. Working as a nurse at the time of the war inspired Carla Barton to start the American Red Cross post the war, an organization that has since saved millions of lives.

4. Embalming, the practice of using zinc chloride and arsenic for the preservation of the dead bodies of soldiers was an innovation of the civil war, which meant that the bodies could be received by their families, even weeks later, in recognizable and non-decomposed conditions for their last rites.

5. Telegram got a boost by Lincoln as a method of mapping and devising macro and micro level strategies. This would continue to remain a masterstroke of wartime planning for decades to come, including during the world wars.

6. In many ways, it was the first modern war. Before the civil war, the battle capacity of any regiment was limited by how much artillery they could carry with them. Once they were out of supplies, they could fight no longer. This was the first war to isolate supplies from actual offensives, by employing railroads to continually and strategically supply new arms to the fighting troops.

Sunday, February 1, 2015

R for Basic Statistics - 1

R for Simulation, Sampling and Inference


Simulation


outcomes = c("heads", "tails")
sim_fair_coin = sample(outcomes, prob=c(0.4,0.6) , size=100, replace=TRUE)
barplot(table(sim_fair_coin))


Another use of sample() is to sample n elements randomly from a vector v.
sample(v, n)


To create a vector of size 15 all of whose value are identical:
vector1=rep(0,15)
vector2=rep(NA, 15). NA is often used as placeholder for missing data in R.


For loop in R
for (i in 1:50) {}


Compare to Python (later)


Divide a plot into multiple plots using (following example divides plotting area into three rows and 1 column):


par(mfrow = c(3, 1))


Set the scale of any graph using xlim and ylim arguments.


range() when applied on vector gives a vector of length 2 showing the smallest and largest element of that vector. It is useful to set the scale of graphs using xlim and ylim. For example:


# Define the limits for the x-axis:
xlimits = range(sample_means10)
# Draw the histogram:
hist(sample_means10, breaks=20, xlim=xlimits)


A complete confidence-interval example (comment code later):


# Initialize 'samp_mean', 'samp_sd' and 'n':
samp_mean = rep(NA, 50)
samp_sd = rep(NA, 50)
n = 60


for (i in 1:50) {
   samp = sample(population, n)
   samp_mean[i] = mean(samp)
   samp_sd[i] = sd(samp)
}


# Calculate the interval bounds here:
lower=samp_mean - 1.96*samp_sd/sqrt(n)
upper=samp_mean + 1.96*samp_sd/sqrt(n)


# Plotting the confidence intervals:
pop_mean = mean(population)
plot_ci(lower, upper, pop_mean)


Please note below in the output of the program above, a great use case for plot_ci chart.

Saturday, January 31, 2015

What seems like work to other people that doesn't seem like work to you?

In a recent post, Paul Graham suggests that we ask ourselves this question, and that our answers to this question, are things we are well suited for. I totally agree.

I have always wanted to do a lot of things. Ever since I assumed a semblance of adulthood, I have wanted to to do many different things, have many different occupations. I don't use the term "wanted to" very loosely. When I say "wanted to" I mean that I have actively tried to better myself at those things for at least a month, with a view to do them professionally. These have included becoming a poet, a programmer, a short story writer, a singer, an investor, a quant, a film critic, a photographer, a cartoonist. Fundamentally, I am not a subscriber of the notion that one has to become this one specific thing in life. Right from my teenage, the one thing that the popularly reinforced idea of "you have just one life" has made me a little frantic about is the desire to pack a number of different professions into this one life. To many other people, this same idea is a great motivator pushing themselves in the opposite direction, of devoting themselves entirely to one great pursuit, and making a mark in it. I admire those people, but for some reason, doing many different things holds more sway to me than being a master of any one thing, and I think this is guided by my regret minimization utility function. Would I regret it more if I couldn't be great at one thing, or would I regret it more to not try having done so many others. For me, it is the latter.

At the same time, I very much believe in the other popular notion that "if something is worth doing, it is worth doing well". And it would be foolish not to concede to the oft proven point that trying to do several things is a big impediment in developing expertise in any one thing. Therefore, for people with dispositions such as mine, it is all that much more important that they choose their targets well, because they are only going to be good at so many things.

Which brings me back to Paul Graham's question. It is a great guide.
My answers: Writing essays, Studying statistics and probability, walking. I wish debugging was also on this list. But I suppose this list will change.
It is useful to create one's own list in answer to this question, and to come back to it periodically: both as a reminder to follow it, and as a reminder to update it.




Friday, January 30, 2015

Basic R revision - Part 4

Something I should have covered in part 1

Logical Operators in R: & and |

Lists in R

A list in R, much like a Python list, allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. To construct a list simply wrap list():
list(var1, var2, var3)

Naming the elements of a list

my_list = list(VECTOR=my_vector,MATRIX=my_matrix,DATAFRAME=my_df)
Now VECTOR, MATRIX and DATAFRAME are names of the first, second and third elements of the list.

Indexing in Lists
[[ ]] is used instead of [ ], for example mylist[[3]] gives third element of the list mylist.

To append an element to a list use c(): mylist = c (mylist, newelement)


Reading data from web
Use the read.table() function to read data from a url and then assign it to a dataset present:
present  = read.table("http://s3.amazonaws.com/assets.datacamp.com/course/dasi/present.txt")

If the table is already in the form of an R dataset, then just load it using:
A dataframe called cdc is now in your R workspace.

Plotting

To plot frequency tables use barplot(). This frequency chart function is suitable for categorical variable, after it has been converted to a frequency table by using table(categoricalVarVectorname) or summary(factor(categoricalVarVectorname)).

To plot frequency chart for continuous variables, use histogram (it buckets into ranges, and then draws bars for each range): hist(vectorname, breaks=50)

To plot xy plane use plot(x,y)

The table() command is used to create a frequency table for a categorical variable. We can also input more than one categorical variables as input arguments to the table() command. It can give you, for instance, a frequency distribution in 2 variables, such as this:
              nonsmoker   smoker
 excellent 2879 1778
 very good 3758 3214
 good       2782 2893
 fair        911 1108
 poor        229   448
mosaicplot() is a good plot to display this data

boxplot() can be used on a vector to get graph showing the various quartiles.A table of values of the various quartiles can be generated by using summary() on the vector.

Another good use is boxplot(aContinuousVarOfDataset ~ aCategoricalVarOfDataset)
This shows a graph of quartiles of continuous var for each value of categorical variable.

Here the continuous var vector can be an existing continuous variable of dataset, of course, but also a constructed vector from various continuous variables of the dataset.

Thursday, January 29, 2015

Basic R revision - Part 3

Dataframes : Datasets in R


When you work with (extremely) large datasets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire dataset. To broadly view the structure of a dataset use head() to look at the header columns and first few observations. (tail() shows the last few). Another method is to use str() which gives you the number of observations (rows), number of features or columns for each observation, a list of variable or column names and their datatypes, with their first few observations. Other useful functions are names() which gives column names, and dim() which gives a vector of two elements - nrows and ncols of the dataframe.


Creating a data frame


Normally you need your data in a very customized form before you can run any statistical algorithms on them. You can either perform that customization at the database level, that is, by querying in SQL to generate your output of the most suitably customized form, or, you can import the raw data onto your R (or Python) environment as it is, and use R (or Python) to create custom dataframes afterwards. (I do not currently have an opinion on what is the best practice - mostly common sense dictates what to do - but I will add to this post if and when I do have any nuggets of wisdom on this).* Here we will learn how to use R to create custom dataframes.


added 21Oct2015
I now feel that it is generally better to do the latter - that is - don't try to work with the SQL query too much to get customized data output - there are much better tools to deal with customization at the language level. R has data.tables and dplyr, for example. For an example, suppose there are two cols a and b and you only want to output the part of the whole dataset where a>5. Easily do-able in SQL. But suppose you only want to output the part of the whole dataset where a+b>5 - not doable as far as I know in SQL. But at R level you can do it.

We can use the data.frame() function to wrap around all the vectors we want to combine in the dataframe. All the vectors, of course, should have the same length (equal number of observations). You can think of this function as similar to cbind, except it deals with vectors of potentially different datatypes. It's not really that similar to cbind actually, as each argument to to data.frame() should be a vector, whereas arguments to cbind can be some vectors and some matrices too!


planets     = c("Mercury","Venus","Earth","Mars","Jupiter","Saturn","Uranus","Neptune");
type        = c("Terrestrial planet","Terrestrial planet","Terrestrial planet","Terrestrial planet","Gas giant","Gas giant","Gas giant","Gas giant")
diameter    = c(0.382,0.949,1,0.532,11.209,9.449,4.007,3.883);
rotation    = c(58.64,-243.02,1,1.03,0.41,0.43,-0.72,0.67);
rings       = c(FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE);


# Create the data frame:
planets_df  = data.frame(planets,type,diameter,rotation,rings)


Indexing and subsetting in dataframes works similar to matrices.


To get diameters of the first 3 planets in planets_df, we can use any of the following:
fpd1 = planets_df[1:3,"diameter"]
fpd2 = planets_df[1:3,3]
fpd3 = planets_df$diameter[1:3]


Example to get only those observations of dataset where planet has rings: planets_df[planets_df$rings,]


For an alternate way to do the same thing, use subset(): subset(planets_df, subset=(planets_df$rings == TRUE))
Use this way to get observations of dataset where planets smaller than earth: subset(planets_df, subset=(planets_df$diameter<1 span="">


To add a new feature or column or attribute to the dataframe planet_df, let's say sun_closeness_rank, simply define it while referring to it as an attribute of that dataframe:
planets_df$sun_closeness_rank = c(1,2,3,4,5,6,7,8)


Sorting a vector in R


The order() function, when applied to a vector, returns a vector with the rank of each element.
For example, order(c(6,3,8)) = {2, 1, 3} vector. Now this vector can be given as index to the original vector, to get a sorted version of original vector.
a = c(100, 9, 101)
order(a)
[1] 2 1 3
a[order(a)]
[1]   9 100 101


Sorting a dataframe by a particular column


For example, if we want to sort planets_df by diameter descending and create largest_first_df:
positions = order(-planets_df$diameter)
largest_first_df =planets_df[positions , ]

Being regular

1. Regularity is everything. Learning something for 2 hours for 25 alternate days is far, far superior to learning the same thing for 10 hours each on the first 5 days and then not coming back to it. On day 51, you will be in a much better position by following the first strategy.

2. If you don't intend to keep using a skill (WebDev, ML - anything) at least 2 to 3 times a week, for at least a couple of years, you might as well not learn it. You will unlearn it in as little as two to three months, and if you need that skill again you will have to start from the very beginning, making you question why you spent all that time learning it in the first place.

3. Set all non-focus but essential things on a reminder until it becomes an autopilot thing. This includes things such as exercising - essential, yes, but should not occupy your mental space and time. The time you spend actually exercising is all the time you should devote to it, nothing more. 

Basic R revision - Part 2

Factors

Factors are used to store categorical variables, where categorical variables are those whose value can only be one amongst a well-defined, discrete set of values. For example factor_gender is a factor that stores variables that can contain elements: "male" and "female".

To construct a factor variable out of a vector of values, just wrap the vector using factor(). For example:

> gender_vector = c("Male", "Female", "Female", "Male", "Male")
> factor_gender_vector = factor(gender_vector)
> factor_gender_vector
[1] Male   Female Female Male   Male 
Levels: Female Male

Categorical variables are of two types: nominal and ordinal.
factor_gender would be nominal as there is no grading from lower to higher between male and female unless you are a sexist asshole.
factor_bondratings would be ordinal as there is a natural grading, where we know :



AAA > AA > A > BBB > BB > CCC > CC > C > D

In R, the assumption in for the categorical nominal variable to be nominal. If you wish to specify ordinal, use the order and levels keywords:



temperature_vector = c("High","Low","High","Low","Medium")
factor_temperature_vector = factor(temperature_vector, order=TRUE, levels=c("Low","Medium","High"))
> factor_temperature_vector
[1] High   Low    High   Low    Medium
Levels: Low < Medium < High

Renaming the elements of a factor variable

Use the levels() function to do this.



> survey_vector = c("M", "F", "F", "M", "M")
> factor_survey_vector = factor(survey_vector)
> factor_survey_vector
[1] M F F M M
Levels: F M

> levels(factor_survey_vector) = c("Female", "Male")
> factor_survey_vector
[1] Male   Female Female Male   Male 
Levels: Female Male

Note that it is important to follow the correct order while naming. Using
levels(factor_survey_vector) = c("Female", "Male")
would have been incorrect, since I had run the code earlier to see the unnamed output being "Levels: F M"

Using summary()

summary() is a general R function but it's very useful with factors. For example:



> summary(factor_survey_vector)
Female   Male
     2      3

If a factor is nominal, then the comparison operator > becomes invalid. See the following (continuation) code for my favorite proof for the equality of sexes:



> # Battle of the sexes:
> # Male
> factor_survey_vector[1]
[1] Male
Levels: Female Male
> # Female
> factor_survey_vector[2]
[1] Female
Levels: Female Male
> # Male larger than female?
> factor_survey_vector[1] > factor_survey_vector[2]
'>' not meaningful for factors

Comparison operators meaningful for ordinal categorical variables. See:



> speed_vector = c("Fast", "Slow", "Slow", "Fast", "Ultra-fast")
> # Add your code below
> factor_speed_vector = factor(speed_vector, order = TRUE, levels = c("Slow", "Fast", "Ultra-fast"))
> # Print
> factor_speed_vector
[1] Fast       Slow       Slow       Fast       Ultra-fast
Levels: Slow < Fast < Ultra-fast
> # R prints automagically in the right order
> summary(factor_speed_vector)
      Slow       Fast Ultra-fast
         2          2          1

> compare_them = factor_speed_vector[2] > factor_speed_vector[5]
> # Is data analyst 2 faster than data analyst 5?
> compare_them
[1] FALSE

So Analyst 2 is not faster than Analyst 5.

Wednesday, January 28, 2015

Basic R revision - Part 1



A random useful function

To get the data type of of variable in R, use the function class().

my_numeric = 42
my_character = "forty-two"
my_logical = FALSE

> class(my_numeric)
[1] "numeric"
> class(my_character)
[1] "character"
> class(my_logical)
[1] "logical"

Always remember: Python/C++ vector indices start with 0, R vector indices start with 1
 Subset a vector in R, use vectorname[c(starting index: ending index)]
If disparate (non-adjacent elements): vectorname[c(index1, index2, index3 ..)]

Compare to Python:
Subset a vector in Python, use vectorname[starting index: ending index + 1]
Note that index numbers will be defined as per Python convention
Suppose from a vector v = ['P','O','K','E','R'], we need to output ['O','K','E']
In Python, use v[1:4]
In R, use v[c(2:4)] or just v[2:4]

Also to get all elements in Python a way is to do Mymatrix[3, : ] (gets row 3)
To do the same exercise in R the way to do is Mymatrix[3,  ]

Comparison Operators in R vs VBA and Python

Comparison Operators in R and C++
<, >, >=, ==, !=

Comparison operators in VBA
<, >, <=, =, <>
Python supports both != and
<>

For equality, Python supports == (like R and C++)

In R you can use comparison operator between a vector and a number and get a binary vector which compares each element of the vector to that number.
(Not sure if you can do that in Python. Will check later and update.) Also, you can use that binary vector as an index to get a subset of the original vector.

Matrix in R: To construct a matrix in R you need to add a matrix() wrapper to a vector. e.g. matrix(c(1:9), byrow=TRUE, nrow=3)

Naming elements of a vector and rows/cols of a matrix

Naming can often be useful later. Syntax is simple:
For vector:
vectorv = c(2,3,4)
names(vectorv)=c("a","b","c")
Now, vectorv[“a”]=2
Now, vectorv[“a”]=2

For matrix:
new_hope = c( 460.998007, 314.4)
empire_strikes = c(290.475067, 247.900000)
return_jedi = c(309.306177,165.8)
# Construct the matrix
star_wars_matrix = matrix(c(new_hope,empire_strikes,return_jedi), nrow=3, byrow=TRUE)
# Add your code here such that rows and columns of star_wars_matrix have a name!
rownames(star_wars_matrix) = c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
colnames(star_wars_matrix)= c("US", "non-US")

Another way can be to include these in the matrix definition itself:
movie_names = c("A New Hope","The Empire Strikes Back","Return of the Jedi")
col_titles = c("US","non-US")
star_wars_matrix = matrix(box_office_all, nrow=3, byrow=TRUE, dimnames=list(movie_names,col_titles))
Summing all elements of entire rows or columns, or summing all elements of any vector

To do row sums or column sums in R for a matrix just use rowSums(matrixname) or colSums(matrixname). Note that it is important to capitalize S in rowSums or colSums. Another way can be to reference the needed vector by using something like Mymatrix[3, ] and then wrapping sum() around it.

Combining/Appending functions

cbind(vectorname) can append a vector to an existing matrix as a new column, provided vector's length is same as number of matrix rows. Similarly, rbind. Note the similarity to c() wrapper to construct any vector.

Arithmetic Operators

+,-,*,/ work in an elementwise way for both vectors and matrices
matrix1 * matrix2 does elementwise multiplication, not matrix multiplication as in Linear Algebra for which we use %*% in R