Saturday, December 22, 2018

Building a chatbot using TF-IDF


We want to build a basic chatbot which trains on previous messages and responses. In this tutorial we look at the math that we are using to convert the messages and their associated responses into weights using term frequency and inverse document frequency. (tf-idf).

Once we have the appropriate weights of words present in messages and responses. We write the messages and responses in vector form of the weight present. We then try to find how similar are these vectors using cosine similarity.

We multiply term-frequency and inverse document frequency to obtain the final weight of the word that would be used to construct the vector.  

Cosine Similarity:
This is a measure of orientation and not magnitude. The reason we are not considering magnitude of the vectors is because the magnitude can be more depending on the length of the query or response associated but that does not tell us about how similar is the query and the messages that we have in our training data.

Angle gives us the direction where the vector points towards thus if the query has similar weighted words only 5 times and the message has 500 words but having similar weights then they would point in same direction and be more similar.

The reason for choosing cos(theta) is because it is monotonically decreasing function in [0, pi/2]. We use dot product to calculate the cos(theta) as shown in figure.


In this tutorial we would give a walkthrough of the code. The libraries that have been used are the scikit learn and numpy.

Full code present on github.

Friday, December 21, 2018

(College Education - 1) Data Analytics for Teachers/Students.

Disclaimer: the views presented in this article are personal. 

Initially, I was going to write a rant on how teachers are shit in colleges and continue the age old blame game. In this game, the teachers' think that students are stupid or uninterested and students think that teachers don't know how to teach. It is true (to a certain extent ofc), but the problem is no one ever addresses it. No one thinks of any innovative methods that can be adapted to address what's wrong. Most people (including me) are involved in their own self-interests (includes teachers and students both) and to some extent rightfully so.

Before I propose the solution, I would like you to go through my line of thought.

Teachers of today feel inclined to play entertainers as compared to knowledge imparters. In this information age which we've become privy to, fuelling curiosity is far more important as compared to imparting knowledge. Students need to be introduced to concepts in a way which makes the learning process heuristic. Enabling them to relate these to life, applications around them and have a positive impact using that knowledge.

A lot of students tend to blame the syllabus but I disagree. I think that the syllabus is well defined and in accordance with a given branch of study. The reason most students feel disengaged from the syllabus is that they are unaware of the possibilities that it holds. As students move away from immersive learning and focus only on the parts that are necessary to get them better grades the whole ideology of a model student and a model teacher changes drastically. A model teacher is often one who is able to make sure that knowledge (or the method involved in its dissemination) is transmitted to students in a way that aids them in remembering it for a duration often limited to the exam period. If a teacher can assign tasks to students that lead to good marks then they are a model teacher (and hence they are diligent to their duties) and a model student becomes one who duly completes the tasks assigned to them. The students who are regular, sincere and complete everything on time.

Let us consider the problems that arise because of this.
  • Less than 10% of students/teachers fall into the model student - model teacher zone. 
  • Little accountability and deliverables on teacher's part.  
  • Independent line of thought by the student is not given proper importance. 
  • Fuelling and engaging with the community (online forums) is more important than completing the assigned tasks. 
  • Holistic development is not taken with the same level of sincerity as compared to knowledge importation. 

Solution: Proper Data Analytics for students and teachers. 

1. ) Actionable insights for teachers and students.

Teachers often do not have the time for every student. and students struggle needlessly on things that can be quickly understood. By enabling collection of proper data (for both students and teachers) following actionable insights can be generated.

2.) Regular after class tests instead of end semester / mid-semester examination patterns. 

In order to create real-time data for analysis and actionable insights for teachers and students, it is important to create data points on a short-term basis. This would also allow machine learning techniques such as reinforcement learning come into play and interact with students, thus reducing the workload for teachers. 
Not just that, more data points would result in more answerability on teacher's part. 

3.) Venn Like Diagram for multi-discipline projects and grading on basis of those projects.

I personally think this would be super cool if implemented. The idea is to use a graphical representation shown below to grade projects. 

Here is how it could work.

  • a radius of a circle would be determined by the number of topics covered by the project. 
  • the colour of the circle would be determined by the depth of the topic understood by the person. The darker the shade of circle would imply better understanding. 
  • community comments (feedback) from people who have expertise in that area would be also listed for every project.
  • deep learning model on the employability of these projects based on the above data as input parameters to be measured. 

4.) Awarding in-depth knowledge and understanding in a unique way.

Instead of assignment submissions (which have been reduced to handwriting practice for the majority of students), the assignments should include engaging with the online community (such as StackExchange/medium) on different topics of interest. The idea is to enforce students' interests instead of adding work pressure. By having communications with a community the students would feel more appreciated for their work as opposed to now. 

5.) Incorporating extra-curricular activities (sports) as an important part of a system. 

There is nothing more important than sports. A consistent sport should have some weight-age associated with it in all educational institutes of every field as it teaches teamwork, risk-taking and communication.  

Tuesday, December 4, 2018

Installing Anaconda, Running Jupyter on Google Cloud Remotely

I was just using google collab when I realised it cannot really replace a remote server with a GPU. It is super awesome if you are trying to collaborate on a notebook with multiple authors but it does not really provide you the flexibility of terminal. There is certain extent to which "!" can go. Had google collab provided a virtual instance, it would have been super.
It is already amazing that they are providing GPUs and TPUs completely free of cost. It is too much to ask to give shell access free too, and it would be hard for them to nail down the activities such as mining or torrenting if they did, thus people would be making money on their hardware meant for educational purposes.

This post is about how to setup NGINX along with jupyter notebook.

1.) Let us first install Anaconda by downloading it from here,

Once, you have installed anaconda on your virtual machine, it is time to install and make sure nginx is running.

2.) Start jupyter notebook using the following command. Copy the link

3.) Go to terminal and type the following command.

Wednesday, September 5, 2018

A happy teachers' day! Startups in Colleges.

This post was made by one of my linkedin connections (Ramesh Kumar).

Transcript of that post:-
Startups by Engineering Students I am glad to see so many Engineering students getting interested in having their startups while studying. And every University has an Entrepreneur wing! There is huge interest, which is good. But, I do not see many startups coming into existence! Why?

#1. Focus - They seem to lack focus. They get one 'great' idea today and even before they do something on this, they get 'another greater' idea. And it is a loop.

#2. lack of time - They are students and are expected to attend classes, prepare for exams, practicals etc. They may find it difficult to spend more time on their venture. They knew this and can not give this as excuse for shutting down the project.

#3. Unrealistic Projects- If they see and experience the pain point of some one and if they find a solution, they are likely to succeed. Many times they seem to identify projects which they can not complete!

#4. Lack of Industry Support - Few companies are investing in the talent in the colleges to get their projects done by students along with the Professors

#5. Glamour - 'Entrepreneurship' seems to be a glamour word for most of them. They seem to enjoy living in that glamour for a short time and then get back to fun life!

I found this assessment super accurate and hence this post got me thinking what can be done to solve these issues and have more students pursue entrepreneurship. Here are the solutions that I think are feasible.

#1 The reason students lack focus is that they are not threatened by the lack of not doing the startup. They can very well think of it as a failure rather than fighting an uphill battle. Another reason for lack of focus is there is no continuous gamified benchmark ratings for a startup. I think giving stars to startups based on benchmarks agreed upon by the e-cells and startups would enable the student startups be more productive.(Hey, are you a 2 star startup or 3 star startup?)

#2 Lack of time is very real. Some allowance in attendance should be given for project work.

#3 I think proper brainstorming sessions which not only revolve around the novelty of idea but practical things like amount of time required, amount of money required, and importance of generating proper numbers in terms of deliverables from a startup need to be taken into account. A proper document should be created and updated on weekly basis.

#4 I think a long term platform of communication between colleges and companies should exist(instead of speaker sessions), where professors, CEOs and students can share their opinion. (dedicated college site). If that is too hard to accomplish, maybe a group of students can list down all the events pertaining to specific niche that are happening in country and forward it to students that are interested in that niche.

 #5 When work comes into picture, glamour shits itself.

Enclosing,I think it is very important to have introduction of X vs time graph for every startup. The X should change according to time but a progress needs to updated by the startup and the direction they are heading.

Wednesday, August 8, 2018

Rick Sanchez on School

I'll tell you how I feel about school, Jerry: it's a waste of time. Bunch of people runnin' around bumpin' into each other, got a guy up front says, '2 + 2,' and the people in the back say, '4.' Then the bell rings and they give you a carton of milk and a piece of paper that says you can go take a dump or somethin'. I mean, it's not a place for smart people, Jerry. I know that's not a popular opinion, but that's my two cents on the issue.

This is one of those shows that you can rewatch a million times and still find something fascinating. The first time you watch it, you are like "Oh! Is that what it meant.. I think I get it."The second time, you watch it, you are like "holy crap! this is genius". The third time you cannot stop laughing.

I think here is what Rick might say about college 
"It is petty capitalism rolled up to brainwash student's minds, Jerry. It'ssss *burp* a make-belief world with make-belief leaders that hand down scribbled notes or something to massproduce people with alter ego. Not worth ruining a person's prime years jerry. Not at all worth it. What is the opposite of Wabba lubba dub dub?"

Sunday, July 8, 2018

Sacred Games : Short Review

Just completed binge watching the show. It is a great series with amazing plot twists and surprises. It runs in a parallel story of past and present. It is about one man's sins and other man's redemption. You would keep guessing whose sins and whose redumption the plot wants to show you. It tries to play devil's advocate to some measure in Ganesh Gaitonde's charachter potrayal.

 Both charachters complemented each other and for some reason skinny short nawaz seems completely beilivable as mafia head.

I would give it 9/10 for story. 9/10 for acting and 8.5/10 for direction and cinematagrohy. Some of the scenes were thrilling and action packed. The series would definitely keep you on the edge of your seat. Must Watch.

Friday, June 22, 2018

How to use Redis for caching data in Rails

Why Redis is used?

This is a very basic question and it comes because you can do what redis does with postgresql for the purpose of solving the functionality aspect. What redis does is, it stores the key value pair and most of the operations can be executed in O(N) time only (for more information for every command time complexity check the redis documentation). Thus caching enables retrieval of data at much higher speeds and is optimal for tasks such as pub/sub and queries that need quicker access. 

Redis Configuration 

Initialization - redis/config/ini_redis.rb
redis_host = Rails.application.secrets.redis && Rails.application.secrets.redis['host'] || 'localhost'
redis_port = Rails.application.secrets.redis && Rails.application.secrets.redis['port'] || 6379

# The constant below will represent ONE connection, present globally in models, controllers, views etc for the instance. No need to do everytime
REDIS = redis_host, port: redis_port.to_i)

model.rb file - models/user.rb

def online?
You can use views such as like this, views/users/show.html.erb
@users_ol = User.where(:id => @id).select(&:online?)

Thursday, June 14, 2018

Monday, May 28, 2018

CodeChef Workshop Experience

A little self bragging (About the Author)

I have been involved in and enjoyed building projects since a very long time. As of now I am doing my integrated msc mathematics from NIT Surat (aka SVNIT). I have had very little experience with competitive coding problems and the workshop had a lot of new things in store for me. This workshop taught us concepts with problem solving approach which I enjoyed thoroughly. My main aim to attend this workshop was to improve my coding skills and be a more productive/efficient developer than before. 
I heard about this workshop through the codechef website and was one of the early bird registratant to it. The workshop was well timed and started 4 days after my end-semester exams, so I packed up my bags left for Hyderabad. 


The workshop was conducted at MLRIT (stands for something something Institute of Technology). The campus was not widespread or huge but had some really amazing facilities. It had olympic size indoor badminton stadium, a well mantained cricket ground, a flourshing incubation cell for startups and some pretty amazing eating outlets. The hostels' hygine was not upto the mark and apparently the institute took good care of its' mosquitoes and insects as well as the students in it. The quality of food was pretty good considering it was mess food. Although a little variety for students coming from north would have been more apprciated. (Yes. I come from north and I like eating chapatti and using spoons for god's sake!)

I arrived on 14th May to MLRIT and the workshop kicked off from 15th May. 


The main objective of the workshop was to incultivate problem solving skills using data strucutres and algorithms. It was a beginner level workshop which covered, Greedy Algorithms, Arrays, Stacks, Queues, Dynamic Programming. We had 7-12 questions discussed on a daily basis, out of which 4-5 were compulsary problems whereas rest were optional. The questions were from codechef website.  As the workshop was targetted at people who wanted to be introduced to programming the questions were kept fairly simple and basic. (and rightly so!). 


The attendees of the workshop was a pretty good mix of people. We had couple of school students and a few professionals but most of the crowd was college students. My team of 8 also had pretty high diversity and included a pro coder, a born management guy, some sporty folks, some sincere folks and some seriously lazy people. To everyone's suprise (and our's also) we ended up coming third as a team in the complete workshop. 
The lectures were conducted by Arjun Arul (google his resume, it's pretty dope) and he made sure that we understood what he was saying. He very quickly understood the wrong approaches as well as the correct ones and took us on lullaby ride of a wrong approach just to prove it is wrong and how to arrive at the right one. We also had trainers that helped us during the lab sessions. All of them were 5-star rated codechef members and were really helpful and got us through from understanding the logic to debugging our code!

Stuff I need to mention

We also had a lot of games. This included badminton, table tennis, volleyball, dodgeball, chess, scrabble, othello and football. Our team went to the semi-finals of Volleyball, Chess and Scrabble. The points from the games were added to our team score on which the top 3 teams were declared. Overall the workshop served as complete package which had sports, learning, coding and a lot of fun. 


The people were easy to make friends with(including trainers, and lecturer) and the environement was very receptive of the differences of ideologies and intellectual capacties. It gave me good grasp on basic concepts and broke the inertia which prevented me from participating and being involved in coding contests. I plan to carry forward the learning by myself now and see where it leads me. 

P.S Will update this posts with some pics as soon as I receive them. 

Tuesday, February 27, 2018

Mid Semester Exams are arriving

In one of the leading universities across India, there has been a massive revolution. The students of SVNIT (Seriously Vella National Institute of TimePass) have started to laugh at the professor's jokes. When we reached out to the students and asked them why this sudden degradation in sense of humour, the students were just as clueless as if they have been asked about the topics that they are studying. "Students are just reacting to the stress, this is temporary and would soon go away" we overheard one watchman, talking to another. One of the students named Shashank has already started with the syllabus, when the spokesperson reached out to him he denied all the allegations and vehemently accused our great education system to show his loyalty for the majority. He then quickly disappeared into the abyss of library. Another student Rakesh has also picked up a book, .. no no not the subject book  but the book "Exam Warriors", by our honorable Prime Minister which talks about how to reduce stress among students. After reading the book, Rakesh has decided to increase his sleeping quota from 14 hours a day to 16 hours a day as there is a complete chapter in the book titled, "Sleep is your greatest weapon, embrace it". Rakesh also claims that he has been supporting Prime Minister's campaign of "Exam pe charcha" since last year but the faculty here had him detained for the same. He says "Exam pe charcha is one of the best ways to reduce stress during exams as most students study just a night before exams" but for some reason the authorities have not allowed discussions during the exam as of time of writing this article. Some of the students have even reduced meme tagging their friends from 4 a day to just 2 a day. Inspired by this sudden shift in student's behaviour the UN has amended it's Article 3 of the Geneva Convention which now states that all prisoners must be treated humanely, and not tortured with techniques like induced psychosis, physical torture and one night exam cramming.

Tuesday, February 20, 2018

Amazon Slay - Available only for prime members.

Amazon has been creating disruptive innovations to get into our mobile phones, bedrooms and now they have hit the highroad with the launch of their new service - Amazon Slay. Amazon Slay provides one day killing service for prime members where as normal users would have to wait longer. It also integrates well with other amazon services such as kindle, echo and bundles with developer ready API. One of the spokesperson with the knowledge of this subject told us that "buyers can choose from hanging, gun or getting beaten to death while ordering the service, as compared to just gunshot offered by their competitors". This was clearly a reference to the service Flipkart Bhai which was started a month ago but has faced a very stiff competition from global players. We tried reaching out to people from Flipkart Bhai but they declined to comment on the issue. Amazon claims that they have managed to improve suicide attempts by whooping 11% and are a one stop service for anyone planning any type of homicide.  Not only this amazon offers real time notifications of the killer and gives an option to kill the neighbour in case the person in question is not available. This service is currently available to prime members only and would be expanded soon given the unprecedented demand.  

My experiments on Optimizing Profits in CryptoCurrencies.

Sometimes you just know that something won't work and yet somehow you need to watch it fail just to believe that it won't. Some people may call it a foolhardy attempt but this is the kind of incredulity of men for new ideas until they had good amount of exposure of things often brings disruptive innovation.

CryptoCurrency is the new buzzword around town and everyone wants to get their hands dirty.  I decided to give it a shot myself. The idea was simple, get the live feed of cryptocurrency (using binance API) and somehow optimize the algorithm for buying and selling to make maximise profits. Fueled by this idea, I signed up on binance (it is an exchange for trading cryptocurrencies) and got onboard their api platform.

Approach #1.
I begum hitting the API at an interval of 0.5 seconds and recorded the highs and lows. The point there was a change of slope, I decided to create sandbox transaction. If the slope changes form positive to negative, it is a good time to sell because the probability that it is going to keep following the decrement pattern are high. Similarly