Sunday, February 1, 2015

R for Basic Statistics - 1

R for Simulation, Sampling and Inference


Simulation


outcomes = c("heads", "tails")
sim_fair_coin = sample(outcomes, prob=c(0.4,0.6) , size=100, replace=TRUE)
barplot(table(sim_fair_coin))


Another use of sample() is to sample n elements randomly from a vector v.
sample(v, n)


To create a vector of size 15 all of whose value are identical:
vector1=rep(0,15)
vector2=rep(NA, 15). NA is often used as placeholder for missing data in R.


For loop in R
for (i in 1:50) {}


Compare to Python (later)


Divide a plot into multiple plots using (following example divides plotting area into three rows and 1 column):


par(mfrow = c(3, 1))


Set the scale of any graph using xlim and ylim arguments.


range() when applied on vector gives a vector of length 2 showing the smallest and largest element of that vector. It is useful to set the scale of graphs using xlim and ylim. For example:


# Define the limits for the x-axis:
xlimits = range(sample_means10)
# Draw the histogram:
hist(sample_means10, breaks=20, xlim=xlimits)


A complete confidence-interval example (comment code later):


# Initialize 'samp_mean', 'samp_sd' and 'n':
samp_mean = rep(NA, 50)
samp_sd = rep(NA, 50)
n = 60


for (i in 1:50) {
   samp = sample(population, n)
   samp_mean[i] = mean(samp)
   samp_sd[i] = sd(samp)
}


# Calculate the interval bounds here:
lower=samp_mean - 1.96*samp_sd/sqrt(n)
upper=samp_mean + 1.96*samp_sd/sqrt(n)


# Plotting the confidence intervals:
pop_mean = mean(population)
plot_ci(lower, upper, pop_mean)


Please note below in the output of the program above, a great use case for plot_ci chart.

Saturday, January 31, 2015

What seems like work to other people that doesn't seem like work to you?

In a recent post, Paul Graham suggests that we ask ourselves this question, and that our answers to this question, are things we are well suited for. I totally agree.

I have always wanted to do a lot of things. Ever since I assumed a semblance of adulthood, I have wanted to to do many different things, have many different occupations. I don't use the term "wanted to" very loosely. When I say "wanted to" I mean that I have actively tried to better myself at those things for at least a month, with a view to do them professionally. These have included becoming a poet, a programmer, a short story writer, a singer, an investor, a quant, a film critic, a photographer, a cartoonist. Fundamentally, I am not a subscriber of the notion that one has to become this one specific thing in life. Right from my teenage, the one thing that the popularly reinforced idea of "you have just one life" has made me a little frantic about is the desire to pack a number of different professions into this one life. To many other people, this same idea is a great motivator pushing themselves in the opposite direction, of devoting themselves entirely to one great pursuit, and making a mark in it. I admire those people, but for some reason, doing many different things holds more sway to me than being a master of any one thing, and I think this is guided by my regret minimization utility function. Would I regret it more if I couldn't be great at one thing, or would I regret it more to not try having done so many others. For me, it is the latter.

At the same time, I very much believe in the other popular notion that "if something is worth doing, it is worth doing well". And it would be foolish not to concede to the oft proven point that trying to do several things is a big impediment in developing expertise in any one thing. Therefore, for people with dispositions such as mine, it is all that much more important that they choose their targets well, because they are only going to be good at so many things.

Which brings me back to Paul Graham's question. It is a great guide.
My answers: Writing essays, Studying statistics and probability, walking. I wish debugging was also on this list. But I suppose this list will change.
It is useful to create one's own list in answer to this question, and to come back to it periodically: both as a reminder to follow it, and as a reminder to update it.




Friday, January 30, 2015

Basic R revision - Part 4

Something I should have covered in part 1

Logical Operators in R: & and |

Lists in R

A list in R, much like a Python list, allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. To construct a list simply wrap list():
list(var1, var2, var3)

Naming the elements of a list

my_list = list(VECTOR=my_vector,MATRIX=my_matrix,DATAFRAME=my_df)
Now VECTOR, MATRIX and DATAFRAME are names of the first, second and third elements of the list.

Indexing in Lists
[[ ]] is used instead of [ ], for example mylist[[3]] gives third element of the list mylist.

To append an element to a list use c(): mylist = c (mylist, newelement)


Reading data from web
Use the read.table() function to read data from a url and then assign it to a dataset present:
present  = read.table("http://s3.amazonaws.com/assets.datacamp.com/course/dasi/present.txt")

If the table is already in the form of an R dataset, then just load it using:
A dataframe called cdc is now in your R workspace.

Plotting

To plot frequency tables use barplot(). This frequency chart function is suitable for categorical variable, after it has been converted to a frequency table by using table(categoricalVarVectorname) or summary(factor(categoricalVarVectorname)).

To plot frequency chart for continuous variables, use histogram (it buckets into ranges, and then draws bars for each range): hist(vectorname, breaks=50)

To plot xy plane use plot(x,y)

The table() command is used to create a frequency table for a categorical variable. We can also input more than one categorical variables as input arguments to the table() command. It can give you, for instance, a frequency distribution in 2 variables, such as this:
              nonsmoker   smoker
 excellent 2879 1778
 very good 3758 3214
 good       2782 2893
 fair        911 1108
 poor        229   448
mosaicplot() is a good plot to display this data

boxplot() can be used on a vector to get graph showing the various quartiles.A table of values of the various quartiles can be generated by using summary() on the vector.

Another good use is boxplot(aContinuousVarOfDataset ~ aCategoricalVarOfDataset)
This shows a graph of quartiles of continuous var for each value of categorical variable.

Here the continuous var vector can be an existing continuous variable of dataset, of course, but also a constructed vector from various continuous variables of the dataset.

Thursday, January 29, 2015

Basic R revision - Part 3

Dataframes : Datasets in R


When you work with (extremely) large datasets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire dataset. To broadly view the structure of a dataset use head() to look at the header columns and first few observations. (tail() shows the last few). Another method is to use str() which gives you the number of observations (rows), number of features or columns for each observation, a list of variable or column names and their datatypes, with their first few observations. Other useful functions are names() which gives column names, and dim() which gives a vector of two elements - nrows and ncols of the dataframe.


Creating a data frame


Normally you need your data in a very customized form before you can run any statistical algorithms on them. You can either perform that customization at the database level, that is, by querying in SQL to generate your output of the most suitably customized form, or, you can import the raw data onto your R (or Python) environment as it is, and use R (or Python) to create custom dataframes afterwards. (I do not currently have an opinion on what is the best practice - mostly common sense dictates what to do - but I will add to this post if and when I do have any nuggets of wisdom on this).* Here we will learn how to use R to create custom dataframes.


added 21Oct2015
I now feel that it is generally better to do the latter - that is - don't try to work with the SQL query too much to get customized data output - there are much better tools to deal with customization at the language level. R has data.tables and dplyr, for example. For an example, suppose there are two cols a and b and you only want to output the part of the whole dataset where a>5. Easily do-able in SQL. But suppose you only want to output the part of the whole dataset where a+b>5 - not doable as far as I know in SQL. But at R level you can do it.

We can use the data.frame() function to wrap around all the vectors we want to combine in the dataframe. All the vectors, of course, should have the same length (equal number of observations). You can think of this function as similar to cbind, except it deals with vectors of potentially different datatypes. It's not really that similar to cbind actually, as each argument to to data.frame() should be a vector, whereas arguments to cbind can be some vectors and some matrices too!


planets     = c("Mercury","Venus","Earth","Mars","Jupiter","Saturn","Uranus","Neptune");
type        = c("Terrestrial planet","Terrestrial planet","Terrestrial planet","Terrestrial planet","Gas giant","Gas giant","Gas giant","Gas giant")
diameter    = c(0.382,0.949,1,0.532,11.209,9.449,4.007,3.883);
rotation    = c(58.64,-243.02,1,1.03,0.41,0.43,-0.72,0.67);
rings       = c(FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE);


# Create the data frame:
planets_df  = data.frame(planets,type,diameter,rotation,rings)


Indexing and subsetting in dataframes works similar to matrices.


To get diameters of the first 3 planets in planets_df, we can use any of the following:
fpd1 = planets_df[1:3,"diameter"]
fpd2 = planets_df[1:3,3]
fpd3 = planets_df$diameter[1:3]


Example to get only those observations of dataset where planet has rings: planets_df[planets_df$rings,]


For an alternate way to do the same thing, use subset(): subset(planets_df, subset=(planets_df$rings == TRUE))
Use this way to get observations of dataset where planets smaller than earth: subset(planets_df, subset=(planets_df$diameter<1 span="">


To add a new feature or column or attribute to the dataframe planet_df, let's say sun_closeness_rank, simply define it while referring to it as an attribute of that dataframe:
planets_df$sun_closeness_rank = c(1,2,3,4,5,6,7,8)


Sorting a vector in R


The order() function, when applied to a vector, returns a vector with the rank of each element.
For example, order(c(6,3,8)) = {2, 1, 3} vector. Now this vector can be given as index to the original vector, to get a sorted version of original vector.
a = c(100, 9, 101)
order(a)
[1] 2 1 3
a[order(a)]
[1]   9 100 101


Sorting a dataframe by a particular column


For example, if we want to sort planets_df by diameter descending and create largest_first_df:
positions = order(-planets_df$diameter)
largest_first_df =planets_df[positions , ]

Being regular

1. Regularity is everything. Learning something for 2 hours for 25 alternate days is far, far superior to learning the same thing for 10 hours each on the first 5 days and then not coming back to it. On day 51, you will be in a much better position by following the first strategy.

2. If you don't intend to keep using a skill (WebDev, ML - anything) at least 2 to 3 times a week, for at least a couple of years, you might as well not learn it. You will unlearn it in as little as two to three months, and if you need that skill again you will have to start from the very beginning, making you question why you spent all that time learning it in the first place.

3. Set all non-focus but essential things on a reminder until it becomes an autopilot thing. This includes things such as exercising - essential, yes, but should not occupy your mental space and time. The time you spend actually exercising is all the time you should devote to it, nothing more. 

Basic R revision - Part 2

Factors

Factors are used to store categorical variables, where categorical variables are those whose value can only be one amongst a well-defined, discrete set of values. For example factor_gender is a factor that stores variables that can contain elements: "male" and "female".

To construct a factor variable out of a vector of values, just wrap the vector using factor(). For example:

> gender_vector = c("Male", "Female", "Female", "Male", "Male")
> factor_gender_vector = factor(gender_vector)
> factor_gender_vector
[1] Male   Female Female Male   Male 
Levels: Female Male

Categorical variables are of two types: nominal and ordinal.
factor_gender would be nominal as there is no grading from lower to higher between male and female unless you are a sexist asshole.
factor_bondratings would be ordinal as there is a natural grading, where we know :



AAA > AA > A > BBB > BB > CCC > CC > C > D

In R, the assumption in for the categorical nominal variable to be nominal. If you wish to specify ordinal, use the order and levels keywords:



temperature_vector = c("High","Low","High","Low","Medium")
factor_temperature_vector = factor(temperature_vector, order=TRUE, levels=c("Low","Medium","High"))
> factor_temperature_vector
[1] High   Low    High   Low    Medium
Levels: Low < Medium < High

Renaming the elements of a factor variable

Use the levels() function to do this.



> survey_vector = c("M", "F", "F", "M", "M")
> factor_survey_vector = factor(survey_vector)
> factor_survey_vector
[1] M F F M M
Levels: F M

> levels(factor_survey_vector) = c("Female", "Male")
> factor_survey_vector
[1] Male   Female Female Male   Male 
Levels: Female Male

Note that it is important to follow the correct order while naming. Using
levels(factor_survey_vector) = c("Female", "Male")
would have been incorrect, since I had run the code earlier to see the unnamed output being "Levels: F M"

Using summary()

summary() is a general R function but it's very useful with factors. For example:



> summary(factor_survey_vector)
Female   Male
     2      3

If a factor is nominal, then the comparison operator > becomes invalid. See the following (continuation) code for my favorite proof for the equality of sexes:



> # Battle of the sexes:
> # Male
> factor_survey_vector[1]
[1] Male
Levels: Female Male
> # Female
> factor_survey_vector[2]
[1] Female
Levels: Female Male
> # Male larger than female?
> factor_survey_vector[1] > factor_survey_vector[2]
'>' not meaningful for factors

Comparison operators meaningful for ordinal categorical variables. See:



> speed_vector = c("Fast", "Slow", "Slow", "Fast", "Ultra-fast")
> # Add your code below
> factor_speed_vector = factor(speed_vector, order = TRUE, levels = c("Slow", "Fast", "Ultra-fast"))
> # Print
> factor_speed_vector
[1] Fast       Slow       Slow       Fast       Ultra-fast
Levels: Slow < Fast < Ultra-fast
> # R prints automagically in the right order
> summary(factor_speed_vector)
      Slow       Fast Ultra-fast
         2          2          1

> compare_them = factor_speed_vector[2] > factor_speed_vector[5]
> # Is data analyst 2 faster than data analyst 5?
> compare_them
[1] FALSE

So Analyst 2 is not faster than Analyst 5.

Wednesday, January 28, 2015

Basic R revision - Part 1



A random useful function

To get the data type of of variable in R, use the function class().

my_numeric = 42
my_character = "forty-two"
my_logical = FALSE

> class(my_numeric)
[1] "numeric"
> class(my_character)
[1] "character"
> class(my_logical)
[1] "logical"

Always remember: Python/C++ vector indices start with 0, R vector indices start with 1
 Subset a vector in R, use vectorname[c(starting index: ending index)]
If disparate (non-adjacent elements): vectorname[c(index1, index2, index3 ..)]

Compare to Python:
Subset a vector in Python, use vectorname[starting index: ending index + 1]
Note that index numbers will be defined as per Python convention
Suppose from a vector v = ['P','O','K','E','R'], we need to output ['O','K','E']
In Python, use v[1:4]
In R, use v[c(2:4)] or just v[2:4]

Also to get all elements in Python a way is to do Mymatrix[3, : ] (gets row 3)
To do the same exercise in R the way to do is Mymatrix[3,  ]

Comparison Operators in R vs VBA and Python

Comparison Operators in R and C++
<, >, >=, ==, !=

Comparison operators in VBA
<, >, <=, =, <>
Python supports both != and
<>

For equality, Python supports == (like R and C++)

In R you can use comparison operator between a vector and a number and get a binary vector which compares each element of the vector to that number.
(Not sure if you can do that in Python. Will check later and update.) Also, you can use that binary vector as an index to get a subset of the original vector.

Matrix in R: To construct a matrix in R you need to add a matrix() wrapper to a vector. e.g. matrix(c(1:9), byrow=TRUE, nrow=3)

Naming elements of a vector and rows/cols of a matrix

Naming can often be useful later. Syntax is simple:
For vector:
vectorv = c(2,3,4)
names(vectorv)=c("a","b","c")
Now, vectorv[“a”]=2
Now, vectorv[“a”]=2

For matrix:
new_hope = c( 460.998007, 314.4)
empire_strikes = c(290.475067, 247.900000)
return_jedi = c(309.306177,165.8)
# Construct the matrix
star_wars_matrix = matrix(c(new_hope,empire_strikes,return_jedi), nrow=3, byrow=TRUE)
# Add your code here such that rows and columns of star_wars_matrix have a name!
rownames(star_wars_matrix) = c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
colnames(star_wars_matrix)= c("US", "non-US")

Another way can be to include these in the matrix definition itself:
movie_names = c("A New Hope","The Empire Strikes Back","Return of the Jedi")
col_titles = c("US","non-US")
star_wars_matrix = matrix(box_office_all, nrow=3, byrow=TRUE, dimnames=list(movie_names,col_titles))
Summing all elements of entire rows or columns, or summing all elements of any vector

To do row sums or column sums in R for a matrix just use rowSums(matrixname) or colSums(matrixname). Note that it is important to capitalize S in rowSums or colSums. Another way can be to reference the needed vector by using something like Mymatrix[3, ] and then wrapping sum() around it.

Combining/Appending functions

cbind(vectorname) can append a vector to an existing matrix as a new column, provided vector's length is same as number of matrix rows. Similarly, rbind. Note the similarity to c() wrapper to construct any vector.

Arithmetic Operators

+,-,*,/ work in an elementwise way for both vectors and matrices
matrix1 * matrix2 does elementwise multiplication, not matrix multiplication as in Linear Algebra for which we use %*% in R

Tuesday, January 27, 2015

Carbs vs Fats

In a comparison of macronutrients, the often misunderstood fats are way better than carbohydrates. Fats, especially the unsaturated ones, provide several essential functions such as protecting our inner organs, maintaining good cholesterol levels and reducing bad cholesterol. Outside of them, Omega 3 fats found in foods such as Tuna, Walnuts and Beans are one of the healthiest things you can consume for your brain function - improving memory, fighting depression, bipolar disorder and ADHD. In addition, it is good for your cardiovascular system and bone joints. Even some saturated fats, such as those found in Desi Ghee, help break down other hard to digest food and reduce the negative effects of other fried and spicy food you eat, making them easily digestible. The one category of fat that is unequivocally unhealthy is trans fats, which would be all fatty food items with a good shelf life - biscuits, chips, pies, donuts, cake etc. In comparison, there are no “essential” carbohydrates, that is, there are no essential body functions that require carbohydrates. So the only function carbs serve is to provide energy, which is provided in just as much quantity by more functional nutrients like fats and proteins.

Indian diet is very heavy on carbs, which provide energy but aid no body function, except that a basic amount is needed to aid digestion. You would be surprised that that basic amount is so little as 3 slices of bread a day for a full grown adult, and that is if that were all the carbs he were consuming. Of course, an average adult will also be consuming carbs in decent quantities from vegetables, beans and fruits.
__


Disclaimer: This is a log for my personal use and does not constitute medical advice from me. I am not qualified to give medical advice to others and you should consult your doctor before making any nutritional or medicinal changes. 

Making good on promises past.

So on the first day of the year I promised to write a blog post here about something I learn "almost daily" and then stopped after day 2. That is a familiar theme with new year resolutions. I see a problem there in that statement I made. It was the qualifier "almost". Qualified promises are like qualified love: imaginary. Secondly, vague goals detract from implementation. So, yeah, I shamelessly claim that I'll be updating it daily. Yes, that would be every day. One of the main motivations for writing things down on the blog is the belief that writing notes is not only helpful as a revision tool for committing things to memory, but also that there are things you learn while writing about a subject that you had not learned while reading about it before.

I have been spending a few hours everyday learning some interesting stuff, only I never got around to writing about that here - so I'm hoping to also backfill some entries retroactively for the lost 26 days of the year whenever I go back to revising that stuff.

Here we go again.


Tuesday, January 20, 2015

World changing developments

I read a lot of history stuff at the beginning of this year, where my focus was on breadth rather than depth. I did not study in great detail about any one period, wanting instead to get some sort of a hang of everything. Of course, I failed miserably. But I did come out at the end of those five or six days subconsciously wondering about the developments, from a very wide angle camera, that I thought really changed the world the most. Even though a lot of entries on my list are so obvious when I look at the list ex-post that I wonder why I needed to spend those days to arrive at it, but, in any case, here's my list:

1) Discovery of fire - Because duh uh.

2) Discovery that seeds can be cultivated - This was the single most important developments that changed humans from hunting gathering nomads to co-existing people in civilizations.

3) Mass Production and standardized parts - Pioneered by the development of the Chinese crossbow

4) Concrete - Changed the way structures were built forever. Introduced by the Roman emperor Claudius while getting Aqua Claudia built, an aqueduct colossal even by modern standards.

5) Printing Press - Because it changed literacy from 2-3% to 70-80% in the western world in a matter of decades. That is a great leap for mankind.

6) Steam Engine - Because it paved the way for the Industrial revolution which would give us trains, cars, other engines, heavy machinery, and pretty much everything.

6) Malaria and Polio vaccines - Because you can now expect to celebrate your kindergarten birthday party.

7) Internet - Because I say so.

And now, some trivia that I found fascinating. Two of the greatest empires - the Egyptian and the Roman - were decimated by people one would never guess were capable of it. The Egyptian one, the largest empire of its time and perhaps the most long lasting one ever with a period of continuous rule between 1000 to 1500 years, was brought down by people nobody knew at that time even existed, and historians still don't know where they came from. They were people whom the Egyptians called "People of the Sea", because they came from the sea. The Roman empire, the biggest  ever, was brought down, once again, by a small tribe of germanic people called The Vandals under the leadership of Gaiseric - not by another big empire of that time such as the Persian or the Chinese.

Finally, next on my history reading is "Guns, Germs and Steel", again a book that focuses on breadth (history of the world from pre-civilization to today) rather than depth into a particular place or era, although it does supposedly go into great depths vis-a-vis outlining some specific themes that pervade all of history. I have heard good things about the book, but I cannot say when I will get around to it. 

Friday, January 9, 2015

Filling time

It's 9:23 PM on a Friday night. I'm alone in the office. If I stand up and look across the hall, I see two hundred computer screens, most of them dead black, some of them blinking with screensavers. "We innovate" lights up the screen in bright yellow, followed by a bold screen-size "passion for performance" in green. A few TV screens hanging from the walls, muted of course, show a nameless man opening and closing his mouth in a magnificently determined way, and the old guy with the crew cut sits before him nodding sagely. The recycled fiber glass on my desk was filled with soy milk an hour ago. I drank it with corn chips from the vending machine. Later, I felt unsatisfied and went back and got a pack of cheese crackers. Then I thought I'd go back and walked out into the freezing parking lot. Suddenly dawned on my it was -12 degree celsius and I had forgotten my ear muffs on my desk. Who'll go back, I thought, and ran to my car about fifty meters away. The car's battery had broken down, so I had to come back anyway. I called car services, and the woman asked me my address thrice. I do not understand how my awful accent could be so awful that each time she mistook my three for a two. You know, it makes no sense to me. The second time, I had made sure to speak three with great emphasis on the ree, and also followed it by saying, you know, like free, like a tree? She repeated the address afterwards, again having noted down two in place of three, and told me she's sorry about my accent. I was, like, I love India. I'm now waiting for the guy to come give my car a jump start.

I never asked for roadside services in India, wasn't even aware that there was some such thing. Three times in my driving life in India, I had to change the tyre. I knew how to do it, and on top of that, each of those times I had friends with me. The first time was late 2009 in Gurgaon, when I used to live in Vipin's house, when Vipin and his dad helped me. Actually, that was the time I was taught how to do it by his dad. Then in the monsoons of 2010, at 2 AM in the night while leaving work, I found my tyre punctured. Vibhor and I fixed it, it took almost forty minutes for us newbies. The third time was a couple of months before I left India, in 2012. I had Kavish, Vaibhav and Gyanesh with me, and this time was the hardest, for none of them helped any more than cracking jokes in the background while I attached the stepney. I do not know why it is a fond memory. Really, I have no idea.

The guy just called. He said he'll be here in five minutes, so I better wind up the post quickly. Actually, it's no admissions essay, so I do not have to worry about an appropriate closing, I can end pretty much anywhere. What'chya gonna do about it?

Friday, January 2, 2015

The fine difference between GDP deflator and CPI

GDP Deflator and CPI or Consumer Price Index are both price level indices, with two crucial differences in the way they are calculated:

1) GDP deflator keeps quantities fixed at the "current year", whereas CPI keeps quantities fixed at the "base year".

So, going from 2013 (assuming that is the base year) to 2014 (current year), the value of the deflator in 2014 will be calculated using weights for different goods that correspond to their weights as observed in the 2014 economy. 

Deflator as of 2014 = Σ [P2014 (i) * X2014 (i)] / Σ [P2013 (i) * X2014 (i)] 

Or in other words, Deflator= (Nominal GDP in 2014 / Real GDP in 2014)

On the other hand CPI keeps quantities fixed at the "base year", so in this case:

CPI as  of 2014 = Σ [P2014 (i) * X2013 (i)] / Σ [P2013 (i) * X2013 (i)] 

(Fairly obvious, but let's note that P's represent prices, X's represent quantities)

2) In both cases above, I used a summation with variable i. It is important to note that in the case of deflator, the "i" loops over all the goods and services produced in the economy of that country, whereas in the case of CPI it loops over a well defined basket of goods that are identified as consumer goods.

This distinction is important for quite a few reasons. Inflation can be calculated both as the rate of change of deflator or as the rate of change of CPI, but the purpose for which that inflation estimate is required determines which of the two methods to use. For example, policy decisions often take CPI into account as it is a better indicator of inflation as it is felt by the general population. 

Secondly, CPI is much more volatile than Deflator. You would expect that to be the case since statistically the Deflator calculation seems to average (weighted-average, to be precise) a whole lot more quantities, and you'd expect the law of large numbers to kick in. But there's also an intuitive explanation on top of that: CPI has a large percentage presence of some of the most volatile sectors, such as, housing, energy and food.

For this reason there is another important measure, commonly called "core inflation", which when considering the rate of change of CPI excludes its volatile components like energy and food which vary a lot merely as a result of short term vagaries of weather, yields etc while not representing any structural change in economic trends.

A word of caveat. Both these measures are hard to estimate with precision, the GDP deflator more so because it has so many more moving parts. CPI estimations, although relatively simpler, have long been tainted with accusations of manipulation by government bodies that estimate them, and often have agendas that outweigh the pursuit of precision. Recently, the startup premise dot com has employed crowdsourcing principles to estimate food inflation from the ground up by having people send prices to them of food they buy everyday while their engine does the almost real time number crunching. You would be surprised at how much their estimates of food inflation differ from the "official" figures, especially for developing, populous countries like Brazil and India. 

Thursday, January 1, 2015

Turn of the year musings

I used to think until fairly recently that people don't change. I still think so, by and large, but have mellowed down the firmness with which I think so. On the one hand, a person's character is something I'm still positive does not change, but on the other hand, I do agree now that it is possible for people to start experiencing the same old things in new ways that they had not known earlier. Somewhere in the beginning of last year, I found myself having entered a new kind of life where I spend a massive proportion of my waking hours entirely alone. I wouldn't say it changed the person I am, but things have started to happen that never did before. For example, I get unusually pensive while taking a shower. It works like magic, as if the knob of the shower were a switch into my sometimes dark, often colourful past. The moment I stand before the warm water springing onto me, I have recollections, very fine and vivid and almost hyper-real, of episodes of my past. Only I don't get to choose the episodes. It could be anything from a long day spent waiting for a phonecall, an evening where my dad corrected my spellings, to fighting over the sole plate of chowmein with five of my contending classmates, or reliving the heated anticipation of walking into a fresh acquaintance's apartment while she unnecessarily brings her cat for both of us to play with. It's amazing what a hot shower can do. And how life comes back to the forever nondescript present, the second the shower is stopped.

It's not a resolution, but I've planned to update the blog daily with a new thing I learn. Mostly, it's just to impel myself to keep learning, for I'm starting to realize that time spent away from any kind of learning really rusts your brains and makes you slower. And older. Which reminds me, I don't like getting older. In two months, I will be 29. I'm already at the age that if I were a world-class tennis player, which I'm far from, I would be considered a fading veteran by now. On my 29th birthday, I will take a picture of myself, and compare it to my older pictures, and try to conclude that I hardly look older. I am willing to resort to Adobe Photoshop to reach my goal.