Posts

Showing posts from July, 2018

Week 4(23/7/18 - 27/7/18)

Image
As a part of Udemy course's assignment we implemented the code of Blackjack card game by using Objects and classes in Python. Blackjack Game Play To play a hand of Blackjack the following steps must be followed: Create a deck of 52 cards Shuffle the deck Ask the Player for their bet Make sure that the Player's bet does not exceed their available chips Deal two cards to the Dealer and two cards to the Player Show only one of the Dealer's cards, the other remains hidden Show both of the Player's cards Ask the Player if they wish to Hit, and take another card If the Player's hand doesn't Bust (go over 21), ask if they'd like to Hit again. If a Player Stands, play the Dealer's hand. The dealer will always Hit until the Dealer's value meets or exceeds 17 Determine the winner and adjust the Player's chips accordingly Ask the Player if they'd like to play again. Implementation gave us a more clear perspec

Week 3 (16/07/18 - 20/07/18)

Image
In week 3, I was introduced to new algorithm in unsupervised learning that is K-means.I saw the lecture video of ISB course understood it,discussed it with my peers and teacher and then implemented it in python using jupyter notebook.Later i used it to find results in other applications like clustering news articles in recommendation system,tf-idf vectorizer etc.To understand it better i implemented it in two ways using for-loop and without for-loop. Along with it i was introduced to visualization which i will be covering in my next blog. K-means K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable  K . The algorithm works iteratively to assign each data point to one of  K  groups based on the features that are provided. Data points are clustered based on feature similari

Week 3 continued

Third week also gave me chance to know about Visualisation and story telling, i was introduced to practical visualization by   Dr. Sawinder Pal Kaur, Data Science Expert, SAP Labs India ,in one of her sessions where she gave a code walkthrough on Bank Loan defaulter detection.The code was implemented in python using its libraries such as seaborn, matplotlib, pandas and it was very lucidly explained my mam from that i came to know various functionalities using which i can draw results from my visualisation.Visualisation and story telling is most important part of  data science and it comes before model training and feature engineering.After that we were given assignment to choose our dataset and do visualisation for practice.I really enjoyed doing visualisation and found out some cool insights.

Week 2 ( 9/7/18 - 13/7/18)

                                                Day1 In today's udemy course we learned about following topics: Methods and functions *args and **kwargs lambda expressions , maps and filters Along with this we solved few coding excercises related to this and decided to work on udemy course milestone project-1.                                                  Day2 Today, firstly we discussed about the milestone project and sorted out the problems related to it.After that we continued with our python course and learnt about object oriented programming in python which includes following topics: Class object attributes and methods Inheritance and polymorphism Special(Magic/Dunder) Methods and solved some homework excercises related to it.                                                 Day3 Today we started with the scikit-learn implementation of linear regression model and compared the results of this model with previously implemented code of linear regressi

Day 5 - Linear Regression Numpy Code and Python

Python  In the course today we learned about the following concepts: Lists  Dictionaries Tuples Sets Booleans Dealing with Files in Python Iterating in a file Linear Regression Numpy Code The code was completed and it was: import math beta = beta_zero cost_diff = 100 rmse =-1 for i in range(10000):     old_rmse = rmse     y_hatnew = x_data.dot(beta)              y_diff =y_true.reshape(len(x_inputs),1) - y_hatnew              rmse = math.sqrt(y_diff.T.dot(y_diff)/x_data.shape[0])     print(i,":",rmse)              if abs(rmse-old_rmse) < 0.000000000001:         break          derivative = 2*y_diff.T.dot(x_data)/x_data.shape[0]     beta = beta+step*derivative.T print(beta)  The next task given to us was to implement the sklearn function of Linear Regression and to compare the results of our version and the sklearn function.

Day 4 - Linear Regression Numpy code and Python course

Linear Regression Numpy code   We finished coding generating data for the numpy version of Linear regression. We didn't use data from an excel sheet or a Kaggle dataset and hence we had to create our own data. For this, we created random integer data for our X and betas. Then we created a noise, as real data always has noise, using this we created Y data. the code for the same was as follows: import numpy as np samplesize=1000 num_attrs= 3 step = 0.1 x_inputs = np.random.rand(samplesize,num_attrs-1) x0 = np.ones((samplesize,1)) x_data = np.concatenate((x0, x_inputs), axis=1) noise = np.random.randn(len(x_inputs),1)  betas = np.random.rand(num_attrs,1) y_true = x_data.dot(betas) + noise  #understand this y_true.reshape(1000,1) Python course  We started an Udemy course on Python. The concepts we covered today were: Pros and Cons of Dynamic Typing String Indexing and Slicing Various String Methods String Interpolation:            a) format()  

Day 3 - Introduction to Logistic regression

Introduction to Logistic regression  While we continued to write the numpy code for linear regression we were introduced to Logistic Regression. It  i s a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. To represent the binary/categorical outcome, we use dummy variables. You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as the dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a  logit  function. It required us to understand sigmoid function. I n order to map predicted values to probabilities, we use the   sigmoid   function. The function

Day 2 - Introduction to Linear Regression

Introduction to Linear Regression Simple linear regression is useful for finding a relationship between two continuous variables. One is a predictor or independent variable and other is a response or dependent variable. It looks for a statistical relationship but not a deterministic relationship. The relationship between the two variables is said to be deterministic if one variable can be accurately expressed by the other. For example, using temperature in degree Celsius it is possible to accurately predict Fahrenheit. Statistical relationship is not accurate in determining the relationship between two variables. For example, the relationship between height and weight. With simple linear regression we want to model our data as follows: y = B0 + B1 * x This is a line where y is the output variable we want to predict, x is the input variable we know and B0 and B1 are coefficients we need to estimate. It also required us to understand the concept of gradient descent.  Gradi

Day 1 - Numpy Revision

Numpy Revision  We made a python program for the following random walk question: Say you are standing at the bottom of a staircase with a dice. With each throw of the dice, you either move down one step (if you get a 1 or 2 on the dice) or move up one step (if you get a 3, 4 or 5 on the dice). If you throw a 6 on the dice, you throw the dice again and move up the staircase by the number you get on that second throw. Note if you are on the base of the staircase you cannot move down! What is the probability that you will reach more than 60 steps after 250 throws of the dice? This question required the knowledge of np.random.seed. It s eeds the generator.  It makes the random numbers predictable. When the value is reset, the same numbers will appear every time. If a  seed is  not assigned , NumPy automatically selects a random seed value based on the system's random number generator device or on the clock The code was as follows: import numpy as np np.random.seed(1