Showing posts with label Learning. Show all posts
Showing posts with label Learning. Show all posts

Saturday, December 26, 2015

Notes on learning how to learn


If you saw my reading list I posted previously on my weblog, you might know that I'm interested in how the mind works, how we learn. Based on my often chaotic and unsystematic reading on some of the books on that list (The Master and His Emissary, The Use of The Margin, Thinking as a Science, Thinking Fast and Slow) and mostly, some reading and online videos ("Learning How to Learn" course videos), I have compiled the following notes that I'll share in this post. Recall that the last time I reviewed a couple of books, I had expressed my misgivings about "reviewing" stuff, as I feel it would be a little preposterous of me to review people far more learned than myself. The best I can do is synthesize what I learned from them and try to present that briefly on the blog.

Special thanks, in addition to the authors of the books listed above, goes to Barbara Oakley (whom my notes will sometimes copy verbatim) and Terrence Sejnowski who conduct the course "Learning how to learn" on Coursera, which form the basis for a lot of notes below.

A small note: the notes presented here mostly cover the insights on this topic from the perspective of what modern neuroscience and modern psychology have uncovered. I'm also interested in the ancient Vedantic and Yogic systems that have done immense work on these same things following a very different approach, maybe, than what we call 'the scientific method'. But since much of what the modern methods are uncovering had been articulated by these schools rather impressively thousands of years ago, I'm convinced they were on to something, and see great value in their approach as well. In my opinion, there's a lot in that canon that is not yet uncovered by modern science, so it would be at one's own loss to ignore it. However, I will only touch upon my takeaways from the modern works discussed in the books and in the Coursera course in this post. In case you're interested, you can see certain Vedantic ideas on these topics here and here. A Yogic perspective can be found here and here.

So here we go.

1. The Very Basics

- When learning a subject, learn a little bit every day rather than overwork on one day. The rest period is when the neural connections form. The brain works in two modes - focused and diffused. While the focused mode is important to learn new material analytically, the diffuse mode helps form connections between a bunch of separately learned things, and fosters bigger picture and creative thinking. It is important to alternate between focused and diffuse modes. When you put your head down and study you employ just the focused mode. Bring diffuse mode into play by taking a step away from the study table to for exercising, walking, train-ride, or a shower. During these activities what you learned in the focused mode has room to roam around and form associations and consolidate a bigger picture context, and offer 'aha' moments. One caveat: insights from diffuse mode can be forgotten, so carry a notebook.

- Context-switching or multitasking is hard. Human beings are bad at it. Don't. If you do many things, serial-task. But at any given time, focus on one thing.

- Stimulating learning environments are often better than solitude for generating new neurons. If unavailable to you, exercise also provides this benefit.

 - Pomodoro - A simple cure for procrastination. The idea is to make sure to be completely focused on studying (don't do anything else) for 25 minutes, and then giving yourself a 5 minute break (use it to relax, draw a doodle, listen to a song etc) and then go back to another pomodoro, that is 25 minutes of uninterrupted study followed by 5 minutes to relax. Do 4 pomodoros before you give yourself a longer 15-30 minute break before jumping into another set of 4 pomodoros. This has proven to work in numerous studies. To help you stick to this, there are pomodoro device clocks available, or you could just download a pomodoro app on your phone. If you do that, make sure to keep your phone free of socializing apps, and on airline mode or you'll be distracted by incoming calls and messages.

- Know one thing about procrastination: that it is spurred by a feeling of impending pain (intellectual pain in this case, in contrast to physical or emotional pain). Know that the pain is far less once you’ve begun, than when you’re about to begin.

- Practice and spaced repetition make things learned permanent. Caveat, spaced shouldn’t be too spaced.

- When you study before bed and dream about it, it greatly enhances learning by employing the diffuse mode to augment the focused learning you did. Sleep helps retention by removing toxins, and creativity by employing diffuse mode. Salvador Dali and Thomas Edison made use of this.

- Learning by doing is way more effective in deepening and embedding the material in your circuitry. When just reading or listening, try to aim for active listening (vs passive), meaning ask questions, take notes etc.


2. Chunking
Most of us are able to store only about four to seven different items in our short-term memory. One way to get past this limit is to use a technique called chunking. The idea is that by grouping several items into one larger whole, you'll be able to remember much more.

A chunk is a grouping of information sets bound together through meaning or use. To form a chunk is mostly to employ your focused mode thinking to tie together information.

How to form chunks? Steps:
1. Focus your undivided attention on the information you want to chunk - remember that your working memory is very limited. (On average can hold four items/chunks). Quiet, no-distractions.
2. Understand the gist of the thing, the basic idea of the chunk. For this step, it is useful to alternate focus and diffuse modes. Note at this point while you have in a way understood the concept, it is not yet a chunk, or a primitive that you can call seamlessly.
3. Grasping a concept or a solution by reading it ("aha" moment) is not sufficient for expertise. Attempt and solve the problems yourself without external help. This will help you focus not just on how individual steps work but also the connections between the steps. That will glue the steps together to form a chunk. Only doing stuff yourself can create the "mastery" neural patterns in your brain.
4. Context. Not only getting how to use a chunk but also when to use the chunk. Practicing related and unrelated problems helps you see when to use or not to use the chunk. This makes sure chunk is not only firm but also accessible from many paths. Step 4 combines bottom-up "chunking" process (Steps 1-3) with the top-down "big picture" process. Complete learning happens as a result of the top-down and bottom-up processes. One tip here is to skim the whole chapter perfunctorily before you read it in detail, to have context. Also see "Interleaving" later.

Illusions of competence
Importance of recall, mini testing and making mistakes

Simply rereading is much less productive than “Recall what you’ve just read without looking at the book” after each reading. This retrieval process itself enhances deeper learning. (note to self: JEE screening less helpful than JEE main as a learning aid as looking at options did away with the need to recall everything. But that practice is important)

Re-reading is useful only after some space in time, as a means of spaced repetition.

Glancing at a solution and thinking you know it yourself is the most common illusion of competence. Do it yourself to have the knowledge persistent in your memory.

Underlining/highlighting also fools us into thinking we understood the material. Do it carefully, and sparsely. Underline lines that synthesize key ideas, or note those ideas in the margin.

Super helpful way to make sure you’re learning and not fooling yourself with illusions of competence, is to test yourself. In some sense, "recall" does that. If mistake happens, it is a good thing. Mistakes help correct your thinking.

Recalling material is extra helpful when you’re at various places outside your usual place of study, such as while walking in a park, waiting in a line, riding in a bus.

Motivation
Know what motivates you - if you’re genuinely interested in learning something, it’s easy to learn it.

Chemicals in the brain
- Acetylcholine - for focused learning
- Dopamine - controls motivation
- Seratonin - affects social life. alpha males have high. Depressed people have less. Prozac raises level of seratonin. Low seratonin also linked to high risk taking behavior, e.g. among jail inmates.
- Emotions intertwined with learning and memory. Be happy to be good learner.

The value of a library of Chunks, Compaction, Transfer, Creativity, Law of Serendipity

Library of Chunks.

Transfer.
Once you have many chunks. You see analogies between physics and business, language and CS. A chunk is a compressor. Chunking is like winzip.

As you gain more experience in chunking, you are able to create darker and longer chunking ribbons, meaning more expansive chunks, and better embedded in your head. Once you have a good library of chunks, you can easily get to good solutions by listening to whispers from your diffused mode. The more you practice, the darker the chunks. If you don't they're faint and will go away.

In building a library, you're training your brain to recognize not only a specific concept, but different classes of concepts.

There are broadly 2 ways to figure something out or solve a problem or understand a chapter: Sequential thinking using focus mode, and holistic/global/gestalt using mostly diffuse mode. Often, the most difficult concepts are grasped through the latter. Small caveat, solutions provided by the latter are less reliable and should be checked with the former.

Law of serendipity. Lady luck favors the one who tries. Just focus on whatever you're studying, you'll find that once you put the first concept in your mental library, the second will go in a little more easily and so on.

"Serendipity (or what Johnson calls “happy accidents”) accounts for other breakthroughs. He includes dreams, contemplative walks, long showers, and carving out time to read a variety of books and papers that might lead to “serendipitous collisions” of ideas. "- Bill Gates.

Overlearning, Choking, Einstellung Effect, Interleaving.

Overlearning
When learning a new idea/problem solving approach/concept, you may do it over and over again during the same study session. Some of it is useful, but continuing to do it after you've already mastered as much as you can in a session is called overlearning. Overlearning can help produce "automaticity" in playing piano, tennis. The fact that people can talk while driving between complex traffic is because they have overlearned it and "automaticity" has taken over. If you choke on exams, overlearning can be helpful in overcoming that.

But beware of repeated overlearning in a single study session. It can be a waste of valuable learning time. Once you have an idea down, continuing to hammer it down doesn't strengthen it. Using a subsequent study session to strengthen what you learned is just fine, it deepens your chunked neural patterns. Repeating something you already know perfectly well, is, just, easy. (It rarely helps, for example, with hard math). It also promotes illusion of competence that you've mastered the full range of material, when you've only mastered the easy stuff. Instead you should balance your studies by deliberately focusing on what you find more difficult. Deliberate practice is the difference between a good student and a great student.

Einstellung: Blocked thoughts due to your preceding training.
Your initial simple thought, or a neural pattern that you've already strengthened may prevent a better idea or thought from coming, by creating a rut. Inertia. It is important to be able to unlearn your old erroneous ideas while you're learning new stuff.

Interleaving.
Understanding and mastering a new subject means not only learning the basic chunks but also practicing jumping back and forth between problems that require different techniques. This is called interleaving. Once you have a basic idea or technique down, start interleaving your practice with problems of different types or approaches. When you do the problem right after a concept in a book, you already know it's going to use that concept, so it becomes easy and does not let you practice interleaving. That's why it is very important to do end-of-chapter problems. Also, ask yourself, why some problems call for one technique as opposed to another: knowing how to use a concept or technique isn't enough, you also should know when to use it. Interleaving is hugely important when it comes to building flexibility, creativity, or independent mastery. This is where you leave practice and repetition, and get into ‘expertise’.

Learning by teaching, and by doing: very important methods in addition to learning by learning, and more powerful.


3. Procrastination and Memory. 

Procastination & Memory are related. Why?
For committing to long term memory "spaced repetitions" are a must. But you can only do that if you don't procrastinate, otherwise you'll cram at the last moment. Building solid chunks in long term memory, chunks that are easily accessible by your short term memory takes time. It's not the thing that you want to be putting off till the last minute.
Always remember: Good learning is a bit by bit activity.

How procrastination happens and how to tackle it? (other than Pomodoro)
First things first: willpower is hard. Procrastination, on the other hand, is easy, a negative entropy process, if you will. We procrastinate about things that make us uncomfortable, uneasy, things that trigger our pain centers (intellectual pain). You funnel attention onto a more pleasant task and feel happy temporarily. But sadly, longer term effects of doing this can, in fact, be painful. For example, when you put off study for some time, it can become even more painful to think about studying. The daunting thing that led you into procrastination just became more daunting with less time on your hands. Procrastination begets procrastination. Mark this: procrastination ia a monumental, a keystone bad habit. It shares features with addiction: you start to tell yourself stories to justify it.

Now, let's tackle it. This journey of tackling procrastination is one from unconscious living to conscious living. You should be making your decisions, not your unthinking zombies."Zombie mode" means acting out of habit. A habit can be good or bad. "Chunking" is creating good zombies, good habits. Procrastination is also a habit, a bad one.

Habits have four parts:
1. The cue. This is the trigger that launches you into zombie mode. Seeing a text message from a friend is a trigger, a study-reminder is also one. What we do in reaction to these cues is what matters.
2. The routine: the habitual response on receiving the cue. The zombie mode.
3. The reward. Habits develop and continue when they reward us in some way. Procrastination is an easy habit to make, because its rewards are so immediate and easy. But good habits can also be rewarded. Find ways to reward good study habits.
4. The belief. Habits have power because of your beliefs in them. To change your habit you have to change your underlying beliefs.

Mental tools and tricks to inspire and motivating yourself. 

Its normal to start with a a few negative feelings about beginning a learning session, even when you like the subject. It's how you handle this that matters. Non-procrastinators put their negative feelings aside telling themselves "quit wasting time just get on with it once you get going you'll feel better about it".

Another helpful way: Focus on process, not product or outcome. "I'm gonna spend 1 hour working" is a process-oriented goal vs "I'm gonna finish the homework" which is product or outcome-oriented. To avoid procrastination, focus on process, avoid focusing on the product. Product is what triggers the pain that causes you to procrastinate, because it puts you face to face with the question of whether you'll attain the product, leading to fear, escapism, and procrastination. Focus on the process or processes, the small chunks of time you need over days. Calmly put forth your best effort for a short period. That's easier. Focus "on the moment". Pomodoro works because it rooted in the same idea. By focusing on process instead of product, you back away from judging yourself and instead relax into the flow of the work. The key is when a distraction arises, which it inevitably will, you want to train yourself to just let it flow by. Setting yourself up so that distractions are minimal is also a very good idea: think quiet spaces, switched off phones.

Harnessing your zombies to help you. 
Using our understanding of habits to form good ones.

The Cue: Since willpower is hard, let's minimize the use of willpower in the tackling procrastination. The only place you need to employ willpower is where you look to change to reaction to the "cue", that is, when you go from cue to routine zombie reaction. Cues fall into 4 categories: Location, time, how you feel, reactions. You can prevent the most damaging cues from striking you by shutting cellphone and internet while you're doing pomodoro study sessions. For me food is a distraction/cue, understand what puts you into zombie mode and act to fix it.

The Routine: Key to re-wiring your reactions to the cues is to "have a plan". Plan ahead to "leave your phone in your car when you go to class" etc. By doing this, you took care of the hard part where you struggle with altering your reaction to the cue, by doing something easier: making a conscious decision to cut off the cue before it could strike you. Plans may not work right away but keep at it.

The Reward: Investigate why are you procrastinating, for what reward? Can you substitute an emotional payoff even if small: a sense of satisfaction, maybe? Make it a personal game, does challenging yourself to do 4 pomodoros, as though it were a game, work? You could reward yourself with something you value: an episode of your favorite show, a phone-call to some loved one, an ice-cream. A small caveat, here: stopping periodically for rewards can hamper "flow". Don't be discouraged though, since it anyway takes a few days of ‘pomodoros’ before "Flow" begins to unfold.

Tricks 2.0: The better you get at something the more enjoyable it can become. Deliberately delay rewards until you get task done.

The Belief: Most important part of overcoming procrastination. A strong belief that your new system works is what can take you through. Hang out with non-procrastinators, or people trying hard to be non-procrastinators. Friends who believe in these values.

Juggling Life and Work - Practical tips

- Make a weekly list of key tasks to do (preferably process-oriented goals)
- Make a daily to-do list for the next day the evening before and go to sleep (only 5-6 items, mostly process oriented, product oriented only if totally doable. Some of those can be diffuse mode tasks - such as taking a walk. Get a good mix of tasks, let them not all be similar. Be realistic.) Having it written down the night before helps to internalize it while you sleep and precludes the need to carry the list in your limited working-memory.
- Important: In your daily plan, decide quitting time for the day!
- As you go along with this habit, make notes about what works what doesn't.
- ALWAYS make time for healthy leisure time, it's much better for your productivity than working all day! Preferably play, movement oriented.
- Get at least one pomodoro done as soon as you wake up, preferably the most disliked task.

Ways to access your brains most powerful long-term memory systems.

Visual memory
-Tap into your naturally great visual spatial memory system.
- The funnier and more evocative (i.e. using other senses than sight) the images the better.
For something to move from Working memory (WM) to Long term memory (LTM), first, the idea should be memorized, AND two, should be repeated. Repeat not a bunch of times in one day, but sporadically over several days ("spaced repetition").

Index cards
- Hand-writing things more deeply encodes them in your brain. Ever noticed how you learn so much better from blackboard teaching than from teachers using power-point slides?
- Once you have several cards together. Try shuffling them and running through them all to see if you can remember them. This is practicing interleaving. Once you've given them a try, put them away. Wait and take them out again, before you go to sleep. Briefly repeat what you want to remember, for a few minutes each morning/evening. Gradually expand the time between repetitions as you become more certain.
Another interleaving tip: study every subject every day, even if only for 15 minutes.

Meaningful groups 
- Acronymizing lists. and assigning memorable alternates. e.g. in Trigonometry, "Pandit Badri Prasad Sona Chandi Tole har har bole" to remember Sin, Cos, Tan formulas.
- Memory palace is a powerful technique of grouping things, useful for remembering unrelated items: Walking through a place you know well coupled with shockingly memorable images of things you want to remember. The more you do it the better it gets.

As you begin to internalize the key aspects of the material taking a little time to commit the most important parts to memory, you come to understand it much more deeply. The formulas would mean far more to you, and you develop great flexibility in slinging them around, and navigating through them flexibly, when needed.

4. How to become a better learner.

Two basic tips.
Tip 1. The best gift you can give your brain is physical exercise.
Tip 2. Practice makes perfect but only when your brain is prepared. There are certain critical periods in the development of your brain when sudden improvements occur in specific abilities. Expect them to happen and prepare your brain for them. e.g. The critical period for first language acquisition extends up to puberty.

Renaissance learning and unlocking your potential.

1. Visual metaphors as aids to memory. Discussed before.

2. No need for genius envy. Smaller working memory mean less Einstellung problems. Forming chunks may take longer, but once done, you can use it with great versatility. Also, "deliberate practice" makes gifted. Deliberate practice is the idea that you need to make your practice, the problems you attempt successively harder. Deliberate practice is doing everything to avoid illusions of competence, being meticulous about identifying your weaker aspects in a subject/skill and inventing methods to work and test that aspect. Of course, broader planning is necessary to keep this sustained. 

3. Change your thoughts, change your life. One, understand Fixed mindset vs Growth mindset. Two, the ability to change your mind and admit errors is another type of intelligence - the virtue of the less brilliant, as Santiago Cajal calls it. Another virtue to imbibe is taking responsibility of your own learning - referring, by your own choosing, different books and videos for the same topic makes you realize the true reality of the subject has more dimensions than what your teacher taught you. One more advice - with dispassion, know when to cut willful detractors out of your circle.

4. Teamwork. The left hemisphere of your brain, which is responsible for focused mode, analytical thinking, also has a tendency for rigidity, dogmatism, clinging to ideas, and egocentricity. For example, when you're absolutely certain that what you've done on a homework or test is fine, and refuse to check it, it means that you are refusing to use your right-brain, the part that makes sense of the whole, the big picture. Be aware that this feeling may be based on overly confident perspectives arising in part from the left hemisphere. When you step back from a problem and recheck the solution, you're allowing for more interaction between the hemispheres, taking advantage of the special perspectives and abilities of each.
Teamwork is a great way to overcome such blind-spots, as working in teams forces you to use your brain in various different ways - focused mode as well as diffuse mode. But group study sessions shouldn't become socializing occasions, in that case you're best off to find another group.

5. Testing. Take frequent mini-tests. Taking a test you study for 3 hours seamlessly, compare this to how hard it is for you to study with such concentration otherwise!  Checklist before exams: did you make a serious effort to understand the text? (Just hunting for relevant worked out examples doesn't count) Did you work with classmates or at least check your solution with others? Did you attempt to do every problem yourself before working with classmates? Did you participate actively in group discussions? Did you consult with the instructor when you were having trouble? Did you understand all your homework problem solutions? Did you ask for explanations for solutions that weren't clear to you? Did you attempt to outline a lot of the solutions quickly without getting into details? Most importantly, did you get a reasonable night's sleep before the test? Hot tip: Start with hard - Jump to Easy. Activates both the focus and diffuse modes, also avoids Einstulleng.

 

Friday, February 6, 2015

Blogging platforms and moonlight

I'm contemplating moving the blog to wordpress. Since a big chunk of what I post nowadays tends to be code with all these greater than and less than signs, it totally throws the html on the blogger off. It is amazing to me that blogger won't add this small little feature that let's you insert code seamlessly with something like a code tag that wordpress provides. It is 2015 and it is very surprising to see that blogger would let so many users slip away but not pluck the low hanging fruit of adding code tags. I personally do not want to move the blog, but that is out of two entirely irrational reasons: one, that I'm lazy and do not have the time or enthusiasm to set up a blog from scratch, and two, that I've been on here for ten years now and it feels like home and so I don't want to leave.

I've been meaning to add tutorials for some great R libraries like dplyr, ggplot2, and shiny, and also some Python tutorials but before that I have fulfill the promise I made to myself of adding tutorials for the relatively uninspiring parts of R that form its basics. And, at any rate, to get started with any of that - it seems like moving to wordpress is the way. A couple of days ago, I tried to embed code snippets in earlier R posts but the results were clumsy and off-putting.

In any case, an interesting thing I learned today (let's assume I won't be adding anything else tonight): the earth's moon was once an asteroid that struck the earth and the hot molten ball that rebounded extremely slowly became the moon a couple hundred million years later. All that romantic poetry with moon as the evergreen metaphor seems a little incongruent now, doesn't it?

Monday, February 2, 2015

Some less talked about positives of the American Civil War

This is not a defense of wars, but the American Civil War had several positives coming out of it, and I thought I'd outline them briefly. Of course, it was at a great cost of 600,000 lives, about 2 percent of the entire US population at the time. Everybody knows about the biggest achievement of the war, which was also in large part the main engine behind the war: the abolition of slavery. Here I will mention some of the other, often overlooked positives:

1. For the first time, the recently discovered Bromine was used for healing and cleaning wounds. It improved standards of hygiene in wartime medical assistance in a big way, significantly reducing casualties where soldiers succumbed not to bullet wounds but to the ensuing gangrene.

2. Before the civil war, nursing was a primarily male occupation. With this war, the need for nursing outgrew the supply that men alone could furnish, bringing women in large numbers into the nursing profession. Alongside textile mills of the same period, this paved the way for women getting out of their houses for work in large numbers.

3. Working as a nurse at the time of the war inspired Carla Barton to start the American Red Cross post the war, an organization that has since saved millions of lives.

4. Embalming, the practice of using zinc chloride and arsenic for the preservation of the dead bodies of soldiers was an innovation of the civil war, which meant that the bodies could be received by their families, even weeks later, in recognizable and non-decomposed conditions for their last rites.

5. Telegram got a boost by Lincoln as a method of mapping and devising macro and micro level strategies. This would continue to remain a masterstroke of wartime planning for decades to come, including during the world wars.

6. In many ways, it was the first modern war. Before the civil war, the battle capacity of any regiment was limited by how much artillery they could carry with them. Once they were out of supplies, they could fight no longer. This was the first war to isolate supplies from actual offensives, by employing railroads to continually and strategically supply new arms to the fighting troops.

Sunday, February 1, 2015

R for Basic Statistics - 1

R for Simulation, Sampling and Inference


Simulation


outcomes = c("heads", "tails")
sim_fair_coin = sample(outcomes, prob=c(0.4,0.6) , size=100, replace=TRUE)
barplot(table(sim_fair_coin))


Another use of sample() is to sample n elements randomly from a vector v.
sample(v, n)


To create a vector of size 15 all of whose value are identical:
vector1=rep(0,15)
vector2=rep(NA, 15). NA is often used as placeholder for missing data in R.


For loop in R
for (i in 1:50) {}


Compare to Python (later)


Divide a plot into multiple plots using (following example divides plotting area into three rows and 1 column):


par(mfrow = c(3, 1))


Set the scale of any graph using xlim and ylim arguments.


range() when applied on vector gives a vector of length 2 showing the smallest and largest element of that vector. It is useful to set the scale of graphs using xlim and ylim. For example:


# Define the limits for the x-axis:
xlimits = range(sample_means10)
# Draw the histogram:
hist(sample_means10, breaks=20, xlim=xlimits)


A complete confidence-interval example (comment code later):


# Initialize 'samp_mean', 'samp_sd' and 'n':
samp_mean = rep(NA, 50)
samp_sd = rep(NA, 50)
n = 60


for (i in 1:50) {
   samp = sample(population, n)
   samp_mean[i] = mean(samp)
   samp_sd[i] = sd(samp)
}


# Calculate the interval bounds here:
lower=samp_mean - 1.96*samp_sd/sqrt(n)
upper=samp_mean + 1.96*samp_sd/sqrt(n)


# Plotting the confidence intervals:
pop_mean = mean(population)
plot_ci(lower, upper, pop_mean)


Please note below in the output of the program above, a great use case for plot_ci chart.

Saturday, January 31, 2015

What seems like work to other people that doesn't seem like work to you?

In a recent post, Paul Graham suggests that we ask ourselves this question, and that our answers to this question, are things we are well suited for. I totally agree.

I have always wanted to do a lot of things. Ever since I assumed a semblance of adulthood, I have wanted to to do many different things, have many different occupations. I don't use the term "wanted to" very loosely. When I say "wanted to" I mean that I have actively tried to better myself at those things for at least a month, with a view to do them professionally. These have included becoming a poet, a programmer, a short story writer, a singer, an investor, a quant, a film critic, a photographer, a cartoonist. Fundamentally, I am not a subscriber of the notion that one has to become this one specific thing in life. Right from my teenage, the one thing that the popularly reinforced idea of "you have just one life" has made me a little frantic about is the desire to pack a number of different professions into this one life. To many other people, this same idea is a great motivator pushing themselves in the opposite direction, of devoting themselves entirely to one great pursuit, and making a mark in it. I admire those people, but for some reason, doing many different things holds more sway to me than being a master of any one thing, and I think this is guided by my regret minimization utility function. Would I regret it more if I couldn't be great at one thing, or would I regret it more to not try having done so many others. For me, it is the latter.

At the same time, I very much believe in the other popular notion that "if something is worth doing, it is worth doing well". And it would be foolish not to concede to the oft proven point that trying to do several things is a big impediment in developing expertise in any one thing. Therefore, for people with dispositions such as mine, it is all that much more important that they choose their targets well, because they are only going to be good at so many things.

Which brings me back to Paul Graham's question. It is a great guide.
My answers: Writing essays, Studying statistics and probability, walking. I wish debugging was also on this list. But I suppose this list will change.
It is useful to create one's own list in answer to this question, and to come back to it periodically: both as a reminder to follow it, and as a reminder to update it.




Friday, January 30, 2015

Basic R revision - Part 4

Something I should have covered in part 1

Logical Operators in R: & and |

Lists in R

A list in R, much like a Python list, allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. To construct a list simply wrap list():
list(var1, var2, var3)

Naming the elements of a list

my_list = list(VECTOR=my_vector,MATRIX=my_matrix,DATAFRAME=my_df)
Now VECTOR, MATRIX and DATAFRAME are names of the first, second and third elements of the list.

Indexing in Lists
[[ ]] is used instead of [ ], for example mylist[[3]] gives third element of the list mylist.

To append an element to a list use c(): mylist = c (mylist, newelement)


Reading data from web
Use the read.table() function to read data from a url and then assign it to a dataset present:
present  = read.table("http://s3.amazonaws.com/assets.datacamp.com/course/dasi/present.txt")

If the table is already in the form of an R dataset, then just load it using:
A dataframe called cdc is now in your R workspace.

Plotting

To plot frequency tables use barplot(). This frequency chart function is suitable for categorical variable, after it has been converted to a frequency table by using table(categoricalVarVectorname) or summary(factor(categoricalVarVectorname)).

To plot frequency chart for continuous variables, use histogram (it buckets into ranges, and then draws bars for each range): hist(vectorname, breaks=50)

To plot xy plane use plot(x,y)

The table() command is used to create a frequency table for a categorical variable. We can also input more than one categorical variables as input arguments to the table() command. It can give you, for instance, a frequency distribution in 2 variables, such as this:
              nonsmoker   smoker
 excellent 2879 1778
 very good 3758 3214
 good       2782 2893
 fair        911 1108
 poor        229   448
mosaicplot() is a good plot to display this data

boxplot() can be used on a vector to get graph showing the various quartiles.A table of values of the various quartiles can be generated by using summary() on the vector.

Another good use is boxplot(aContinuousVarOfDataset ~ aCategoricalVarOfDataset)
This shows a graph of quartiles of continuous var for each value of categorical variable.

Here the continuous var vector can be an existing continuous variable of dataset, of course, but also a constructed vector from various continuous variables of the dataset.

Thursday, January 29, 2015

Basic R revision - Part 3

Dataframes : Datasets in R


When you work with (extremely) large datasets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire dataset. To broadly view the structure of a dataset use head() to look at the header columns and first few observations. (tail() shows the last few). Another method is to use str() which gives you the number of observations (rows), number of features or columns for each observation, a list of variable or column names and their datatypes, with their first few observations. Other useful functions are names() which gives column names, and dim() which gives a vector of two elements - nrows and ncols of the dataframe.


Creating a data frame


Normally you need your data in a very customized form before you can run any statistical algorithms on them. You can either perform that customization at the database level, that is, by querying in SQL to generate your output of the most suitably customized form, or, you can import the raw data onto your R (or Python) environment as it is, and use R (or Python) to create custom dataframes afterwards. (I do not currently have an opinion on what is the best practice - mostly common sense dictates what to do - but I will add to this post if and when I do have any nuggets of wisdom on this).* Here we will learn how to use R to create custom dataframes.


added 21Oct2015
I now feel that it is generally better to do the latter - that is - don't try to work with the SQL query too much to get customized data output - there are much better tools to deal with customization at the language level. R has data.tables and dplyr, for example. For an example, suppose there are two cols a and b and you only want to output the part of the whole dataset where a>5. Easily do-able in SQL. But suppose you only want to output the part of the whole dataset where a+b>5 - not doable as far as I know in SQL. But at R level you can do it.

We can use the data.frame() function to wrap around all the vectors we want to combine in the dataframe. All the vectors, of course, should have the same length (equal number of observations). You can think of this function as similar to cbind, except it deals with vectors of potentially different datatypes. It's not really that similar to cbind actually, as each argument to to data.frame() should be a vector, whereas arguments to cbind can be some vectors and some matrices too!


planets     = c("Mercury","Venus","Earth","Mars","Jupiter","Saturn","Uranus","Neptune");
type        = c("Terrestrial planet","Terrestrial planet","Terrestrial planet","Terrestrial planet","Gas giant","Gas giant","Gas giant","Gas giant")
diameter    = c(0.382,0.949,1,0.532,11.209,9.449,4.007,3.883);
rotation    = c(58.64,-243.02,1,1.03,0.41,0.43,-0.72,0.67);
rings       = c(FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE);


# Create the data frame:
planets_df  = data.frame(planets,type,diameter,rotation,rings)


Indexing and subsetting in dataframes works similar to matrices.


To get diameters of the first 3 planets in planets_df, we can use any of the following:
fpd1 = planets_df[1:3,"diameter"]
fpd2 = planets_df[1:3,3]
fpd3 = planets_df$diameter[1:3]


Example to get only those observations of dataset where planet has rings: planets_df[planets_df$rings,]


For an alternate way to do the same thing, use subset(): subset(planets_df, subset=(planets_df$rings == TRUE))
Use this way to get observations of dataset where planets smaller than earth: subset(planets_df, subset=(planets_df$diameter<1 span="">


To add a new feature or column or attribute to the dataframe planet_df, let's say sun_closeness_rank, simply define it while referring to it as an attribute of that dataframe:
planets_df$sun_closeness_rank = c(1,2,3,4,5,6,7,8)


Sorting a vector in R


The order() function, when applied to a vector, returns a vector with the rank of each element.
For example, order(c(6,3,8)) = {2, 1, 3} vector. Now this vector can be given as index to the original vector, to get a sorted version of original vector.
a = c(100, 9, 101)
order(a)
[1] 2 1 3
a[order(a)]
[1]   9 100 101


Sorting a dataframe by a particular column


For example, if we want to sort planets_df by diameter descending and create largest_first_df:
positions = order(-planets_df$diameter)
largest_first_df =planets_df[positions , ]