Week 7 (13/8/18 - 17/8/18)

- August 17, 2018

DAY 1

Today , we first started with our ISB video lecture no.3 which was about "Bayesian Learning".In this lecture we learnt about different distributions(Bernoulli, Categorical & continuous probability densities).Next we learnt about Joint probability distributions and marginalisation.It was explained using the concept of generative and discriminative model.

After break Vikram sir give us overview about the "Feature Engineering".Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. If feature engineering is done correctly, it increases the predictive power of machine learning algorithms by creating features from raw data that help facilitate the machine learning process.After that sir discussed the problems that we were facing in EDA.

DAY2

Today in morning , first we had session with Virkam sir , he guided us further in feature engineering.He discussed feature selection with us in which he said we have to train a model on the subset of features and then store the result.Repeat this with different subsets and compare them.This is known as sequential feature selection.He gave some overview about Lasso Regression which is a part of Selection by modelling.

Next , we continued with the 3rd video of ISB , we learnt about probability independence , prior and posterior probabilities , bayes rule with example , bayesian belief network and then samples and estimation was illustrated using examples of binary and continuous observation.

DAY 3

Today Vikram sir gave us a hands on session on "RFE" i.e Recursive Feature Elimination.Recursive Feature Elimination or RFE uses a model ( eg. linear Regression or SVM) to select either the best or worst-performing feature, and then excludes this feature. The whole process is then iterated until all features in the dataset are used up ( or up to a user-defined limit). Sklearn conveniently possesses a RFE function via the sklearn.feature_selection call and we will use this along with a simple linear regression model for our ranking search as follows:

# Construct our Linear Regression model
lr = LinearRegression(normalize=True)
lr.fit(X,Y)
#stop the search when only the last feature is left
rfe = RFE(lr, n_features_to_select=1, verbose =3 )
rfe.fit(X,Y)
ranks["RFE"] = ranking(list(map(float, rfe.ranking_)), colnames, order=-1)

After we practised coding on the same and then continued with our ISB 3rd lecture and finished this with model for cascading with some probability based examples.

DAY 4

Today Vikram sir introduced us to a new topic "Featuretools".Featuretools is a python library for automated feature engineering. It uses Deep Feature Synthesis (DFS) to perform automated feature engineering.Featuretools can automatically create a single table of features for any "target entity".Sir explained this concept with the help of an example.we use a multi-table dataset of 3 million online grocery orders from Instacart to predict what a customer will buy next. We show how to generate features with automated feature engineering and build an accurate machine learning pipeline using Featuretools, which can be reused for multiple prediction problems.

Next , we started our next ISB video on "Clustering".The lecture started with the description of our first project i.e. News Recommender which is a cold start project which means we have to generate even the data for the first time users of this news recommender app.After that sir explained the basic approaches of clustering , one was partitioning method (k-means)and other was density based approach(db scan).
DAY 5
Yesterday Vikram sir told us to do some R&D on hypertools.So firstly we all were exploring about hypertools before the session and we found following points:

Hypertools is a python library that reduces high dimensional data and plots it
It uses PCA at it’s core and is built on top of libraries like seaborn, scikit-learn and matplotlib
The visualizations it produces are intuitive and amazing

After that we had a call with Vikram sir and he explained to us about the hypertools thoroughly and told us to do practice on the same on a large complex dataset.
After half-time , we continued with clustering lecture.Today in this , we learnt some more approaches like hierarchial and model based approach and before that Gurmukh sir has already discussed the K-means clustering algorithm so it was easy to understand now.

Search This Blog

Sabudh Intern's blog

Week 7 (13/8/18 - 17/8/18)

DAY 1

Comments

Post a Comment

Popular posts from this blog

Day 4 - Linear Regression Numpy code and Python course

Week 11 (10/8/18 - 14/8/18)

Week 17 ( 22/10/18 - 26/10/18)