Posts

Showing posts from September, 2018

Week 12 and 13: 17/9/2018 - 30/9/2018

Image
Anomaly Detection    Session with Satnam Singh sir  All the interns had a 2 hour long session with Dr.  Satnam Singh ,chief data scientist, Acalvio technologies, Bengaluru, India. In the session, he discussed various points and issues related to cyber security and online frauds and he shared some domain knowledge on the related topics and his team's work. It was an interactive session and he also asked about the work interns were doing. He shared his experience which benefited students and we learned some new approaches and terminology. Problem statement Satnam sir shared a  kaggle problem  and asked all of us to work on it. It was a credit card fraud detection problem and was to be solved as an anomaly detection problem with statistical way without using any libraries such as scikit-learn etc. Sir gave us ample amount of time to work on it before he would review all our progress and code. So after the session, we started exploring different ways to approach this pr

Week 11 (10/8/18 - 14/8/18)

Image
After completing my news article recommender last week the upcoming week brought me opportunity to explore a library for high dimensional space visualization t-SNE , I was told by my mentor Mr.Vikram Jha to explore it and tell insights. t-SNE t-Distributed Stochastic Neighbor Embedding (t-SNE) is a ( prize-winning ) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. Before jumping to t-SNE i knew about old technique of dimensionality reduction that is PCA,  Principal Component Analysis, I first studied in ISB videos but when Sarabjot sir explained,it  became thorough PCA Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize.

Week 10 (3/8/18 - 8/8/18)

Image
Starting of September that is week 10 brought something interesting,i was introduced to Recommendation system by Dr. Sarabjot Singh Anand, Co-Founder Sabudh Foundation ,Er. Niharika Arora ,Data Scientist at Tatras Data and Er. Gurmukh Singh ,Trainee Data Scientist at Tatras Data. Recommender systems are one of the most successful and widespread application of machine learning technologies in business . You can apply recommender systems in scenarios where many users interact with many items. You can find large scale recommender systems in  retail ,  video on demand , or music streaming . In order to develop and maintain such systems, a company typically needs a group of expensive data scientist and engineers. That is why even large corporates such as BBC decided to  outsource  its recommendation services. Machine learning algorithms in recommender systems are typically classified into two categories — content based and collaborative filtering methods although modern recom

Week 9 (27/08/18 - 31/08/18)

Image
ISB Videos This week we ended our ISB's Machine Learning course with the last two lectures which were on Text Analysis and Mining graphs. The topics covered were as follows: Word2vec Word2vec is a two-layer neural net that processes text. Its input is a text corpus and its output is a set of vectors: feature vectors for words in that corpus. While Word2vec is not a  deep neural network , it turns text into a numerical form that deep nets can understand. The purpose and usefulness of Word2vec are to group the vectors of similar words together in vector space. That is, it detects similarities mathematically. Word2vec creates vectors that are distributed numerical representations of word features, features such as the context of individual words. It does so without human intervention. T here are two types of Word2Vec, Skip-gram and Continuous Bag of Words (CBOW). I will briefly describe how these two methods work in the following paragraphs. Skip-gram Words are