Posts

Week 12 and 13: 17/9/2018 - 30/9/2018

Image
Anomaly Detection    Session with Satnam Singh sir  All the interns had a 2 hour long session with Dr.  Satnam Singh ,chief data scientist, Acalvio technologies, Bengaluru, India. In the session, he discussed various points and issues related to cyber security and online frauds and he shared some domain knowledge on the related topics and his team's work. It was an interactive session and he also asked about the work interns were doing. He shared his experience which benefited students and we learned some new approaches and terminology. Problem statement Satnam sir shared a  kaggle problem  and asked all of us to work on it. It was a credit card fraud detection problem and was to be solved as an anomaly detection problem with statistical way without using any libraries such as scikit-learn etc. Sir gave us ample amount of time to work on it before he would review all our progress and code. So after the session, we started exploring different...

Week 11 (10/8/18 - 14/8/18)

Image
After completing my news article recommender last week the upcoming week brought me opportunity to explore a library for high dimensional space visualization t-SNE , I was told by my mentor Mr.Vikram Jha to explore it and tell insights. t-SNE t-Distributed Stochastic Neighbor Embedding (t-SNE) is a ( prize-winning ) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. Before jumping to t-SNE i knew about old technique of dimensionality reduction that is PCA,  Principal Component Analysis, I first studied in ISB videos but when Sarabjot sir explained,it  became thorough PCA Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize.

Week 10 (3/8/18 - 8/8/18)

Image
Starting of September that is week 10 brought something interesting,i was introduced to Recommendation system by Dr. Sarabjot Singh Anand, Co-Founder Sabudh Foundation ,Er. Niharika Arora ,Data Scientist at Tatras Data and Er. Gurmukh Singh ,Trainee Data Scientist at Tatras Data. Recommender systems are one of the most successful and widespread application of machine learning technologies in business . You can apply recommender systems in scenarios where many users interact with many items. You can find large scale recommender systems in  retail ,  video on demand , or music streaming . In order to develop and maintain such systems, a company typically needs a group of expensive data scientist and engineers. That is why even large corporates such as BBC decided to  outsource  its recommendation services. Machine learning algorithms in recommender systems are typically classified into two categories — content based and collaborative filtering methods al...

Week 9 (27/08/18 - 31/08/18)

Image
ISB Videos This week we ended our ISB's Machine Learning course with the last two lectures which were on Text Analysis and Mining graphs. The topics covered were as follows: Word2vec Word2vec is a two-layer neural net that processes text. Its input is a text corpus and its output is a set of vectors: feature vectors for words in that corpus. While Word2vec is not a  deep neural network , it turns text into a numerical form that deep nets can understand. The purpose and usefulness of Word2vec are to group the vectors of similar words together in vector space. That is, it detects similarities mathematically. Word2vec creates vectors that are distributed numerical representations of word features, features such as the context of individual words. It does so without human intervention. T here are two types of Word2Vec, Skip-gram and Continuous Bag of Words (CBOW). I will briefly describe how these two methods work in the following paragraphs. Skip-gram Words are ...

Week 8 ( 20/08/2018 - 24/08/2018)

Week 7 (13/8/18 - 17/8/18)

                                                 DAY 1 Today , we first started with our ISB video lecture no.3 which was about "Bayesian Learning".In this lecture we learnt about different distributions(Bernoulli, Categorical & continuous probability densities).Next we learnt about Joint probability distributions and marginalisation.It was explained using the concept of generative and discriminative model. After break Vikram sir give us overview about the "Feature Engineering". Feature engineering is the process of using  domain knowledge  of the data to create features that make machine learning algorithms work. If feature engineering is done correctly, it increases the predictive power of machine learning algorithms by creating features from raw data that help facilitate the machine learning process.After that sir discussed the problems that...

Week 6 (6/8/2018 - 10/8/2018)

Image
ISB Videos Continuing our ISB course on Unsupervised Learning, we completed 2 videos this week whose topics are as follows: Introduction to Bayesian Learning Bayes' Theorem :   Bayes’ theorem describes how the conditional probability of an event or a hypothesis can be computed using evidence and prior knowledge. The Bayes’ theorem is given by: P ( θ | X ) = P ( X | θ ) P ( θ ) P ( X )   P ( θ ) - P ( θ ) Prior Probability is the probability of the hypothesis  θ θ  being true before applying the Bayes’ theorem. Prior represents the beliefs that we have gained through past experience, which refers to either common sense or an outcome of Bayes’ theorem for some past observations. P ( X | θ ) P ( X | θ )  - Likelihood is the conditional probability of the evidence given a hypothesis. P ( X ) P ( X )  - Evidence term denotes the probability of evidence or data. Types of distributions:  Binomial distribution ...