Scikit-learn is a popular Python machine learning library. In this tutorial, I’ll give an introduction to the core concepts of machine learning, using scikit-learn to demonstrate applications of these concepts on real-world datasets. We’ll cover some of the most powerful and popular supervised and unsupervised learning techniques, including classification and regression models like Support Vector Machines and Random Forests, clustering models like K Means and Gaussian Mixtures, and dimensionality reduction models like PCA and manifold learning. Throughout, I’ll emphasize the key features of the scikit-learn API, so that participants will be well-poised to begin exploring their own datasets using the wide array of algorithms implemented in scikit-learn.
Jake Vanderplas is an NSF Postdoctoral fellow working jointly in the Computer Science and Astronomy departments at the University of Washington. His research involves large-scale machine learning applications within astronomy and astrophysics. He is a maintainer of the Python packages Scikit-learn and Scipy, and regularly contributes to several of the other packages within the numpy/scipy ecosystem. He occasionally blogs about Python-related topics at Pythonic Perambulations - jakevdp.github.com.
What is PyData?
PyData.org is the home for all things related to the use of Python in data management and analysis. This site aims to make open source data science tools easily accessible by listing the links in one location. If you would like to submit a download link or any items to be listed in PyData News, please let us know at: firstname.lastname@example.org
PyData conferences are a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply the language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization.
We aim to be an accessible, community-driven conference, with tutorials for novices, advanced topical workshops for practitioners, and opportunities for package developers and users to meet in person.
A major goal of PyData events and conferences is to provide a venue for users across all the various domains of data analysis to share their experiences and their techniques, as well as highlight the triumphs and potential pitfalls of using Python for certain kinds of problems.
PyData is organized by NumFOCUS with the generous help and support of our sponsors. Proceeds from PyData are donated to NumFOCUS and used for the continued development of the open-source tools used by data scientists If you would like to volunteer to be a part of the PyData team contact us at: email@example.com
Nakul Verma (Janelia Farm Research Campus, HHMI)
A tutorial on metric learning with some recent advances.
Goal of metric learning is to learn a notion of distance in the representation space that yields good prediction performance on data. In this tutorial we explore some classic ways one can efficiently find good metrics. Starting from the basics, we’ll cover classic techniques like Large Margin Nearest Neighbor (LMNN) and Information Theoretic Metric Learning (ITML) and discuss key principles what makes these techniques effective. We will also study some extensions and see how metric learning has helped in ranking problems (information retrieval) and large scale classification.
Dr. Nakul Verma is a research specialist at Janelia Farm Research Campus, a center for conducting fundamental research in basic sciences, where he is developing novel statistical techniques to help biologists quantitatively analyze behavioral phenotypes in model organisms and better understand the underlying neuroscience and genetic principles. His interests include high dimensional data analysis and exploiting intrinsic structure in data to design effective learning algorithms. Previously Dr. Verma worked at Amazon as a research scientist developing risk assessment models for real-time fraud detection. Dr. Verma received his PhD in Computer Science from UC San Diego specializing in Machine Learning.
-Flurry for hosting
-Tommy Chheng for recording
Main Talk: Neural Networks for Machine Perception
Speaker: Ilya Sutskever (Google)
Neural Networks are computational learning models that are loosely based on real neurons. They can learn to perform various tasks by iteratively adjusting their connections. Recently, Neural Networks have enjoyed considerable success in speech recognition and visual object recognition. In this introductory talk, I will explain how neural networks learn and why they succeed, then describe how they’ve been used to achieve true state-of-the-art results on speech and visual object recognition.
Lightning Talk: Data Science at Flurry
Speaker: Soups Ranjan (Flurry)
Flurry provides mobile app analytics and mobile advertising products to app developers. In this talk I will provide insights in to how our Data Science team applies machine learning to a variety of problems, including Ad Revenue Optimization, Real-Time Bidding (RTB) strategy to purchase ad inventory programmatically, and Recommender Systems.
-Flurry for hosting!
-CayMay Education for recording the event.
Speaker: Paco Nathan
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for “best of breed” and what features would be great to see across the board for many frameworks… leading up to a “scorecard” to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
Paco Nathan, is a “player/coach” who’s led innovative Data teams building large-scale apps for 10+ years, and worked as an OSS evangelist for the past 2+ years. Expert in distributed systems, machine learning, cloud computing, functional programming — with a focus on Enterprise data workflows. Paco is an O’Reilly author, and an advisor for several firms including The Data Guild andZettacap. Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.
Special thanks to The Climate Corporation for hosting the event, and Tommy Chheng for recording.
While Machine Learning practitioners routinely use a wide range of tools and languages, C# is conspicuously absent from that arsenal. Is .NET inadequate for Machine Learning? In this talk, I’ll argue that it can be a great fit, as long as you use the right language for the job, namely F#.
F# is a functional-first language, with a concise and expressive syntax that will feel familiar to data scientists used to Python or Matlab. It combines the performance and maintainability benefits of statically typed languages, with the flexibility of Type Providers, a unique mechanism that enables seamless consumption of virtually any data source. And as a first-class .NET citizen, it interops smoothly with C#. So if you are interested in a language that can handle both flexible data exploration and the pressure of a real production system, come check out what F# has to offer
This video shows how you can use the iOS Deep Belief SDK to build a custom object recognizer for your own iPhone apps, using the example of Dude the cat! See jetpac.com/deepbelief for more details.
An unofficial guide to setting up the Caffe open-source deep learning machine vision project on Ubuntu 14.04 by Pete Warden (firstname.lastname@example.org)
Making IIFl commercial part of IIFL’s new finance benefits.Was most fun as I got a chance to stop motion,mix media, and explore something more than traditional animation. It was change from my regular work and I loved it as I could see the videos being played all over in Mumbai buses. You will see more and better IIFL promos.
Its an old film I made in college. Darpan is clay animated film, which is based on desires of an individual. The story is based on a family in which every one is so concerned about each other, that they forget that they are unknowingly suppressing each others hidden desires.
We made this short film in a group of 5 in our 2nd year of college. It was made in 10 days time using toonboom and classical animation. I just wanted to share it as I was missing college fun :D
The video is a part of an informative film Artist alert, a project sponsored by IIT IDC, Mumbai. It is a film which is an awareness to female artists and painter, who paint during pregnancy, in turn harm their body. Their backbone bends creating permanent back aches. And inhaleing of turpentine oil harms the child
15 sec web banner to promote new Cranberry Bournville