Machine Learning with Spark - Second Edition.

Saved in:

Bibliographic Details
Main Author:	Dua, Rajdeep
Other Authors:	Ghotra, Manpreet Singh, Pentreath, Nick
Format:	eBook
Language:	English
Published:	Birmingham : Packt Publishing, 2016.
Edition:	2nd ed.
Subjects:	Spark (Electronic resource : Apache Software Foundation) Machine learning. Machine learning
Online Access:	Click for online access

MARC


LEADER	00000cam a2200000Mu 4500
001	ocn990674354
003	OCoLC
005	20240909213021.0
006	m o d
007	cr \|n\|---\|\|\|\|\|
008	170624s2016 enk o 000 0 eng d
040			\|a EBLCP \|b eng \|e pn \|c EBLCP \|d MERUC \|d CHVBK \|d OCLCQ \|d OCLCO \|d OCLCF \|d OCLCQ \|d LVT \|d OCLCQ \|d LOA \|d OCLCO \|d K6U \|d OCLCQ \|d OCLCO \|d OCLCL
020			\|a 9781785886423
020			\|a 1785886428
035			\|a (OCoLC)990674354
050		4	\|a QA76.9.D343 \|b .D83 2017
049			\|a HCDD
100	1		\|a Dua, Rajdeep.
245	1	0	\|a Machine Learning with Spark - Second Edition.
250			\|a 2nd ed.
260			\|a Birmingham : \|b Packt Publishing, \|c 2016.
300			\|a 1 online resource (523 pages)
336			\|a text \|b txt \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
588	0		\|a Print version record.
505	0		\|a Cover -- Credits -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Customer Feedback -- Table of Contents -- Preface -- Chapter 1: Getting Up and Running with Spark -- Installing and setting up Spark locally -- Spark clusters -- The Spark programming model -- SparkContext and SparkConf -- SparkSession -- The Spark shell -- Resilient Distributed Datasets -- Creating RDDs -- Spark operations -- Caching RDDs -- Broadcast variables and accumulators -- SchemaRDD -- Spark data frame -- The first step to a Spark program in Scala -- The first step to a Spark program in Java -- The first step to a Spark program in Python -- The first step to a Spark program in R -- SparkR DataFrames -- Getting Spark running on Amazon EC2 -- Launching an EC2 Spark cluster -- Configuring and running Spark on Amazon Elastic Map Reduce -- UI in Spark -- Supported machine learning algorithms by Spark -- Benefits of using Spark ML as compared to existing libraries -- Spark Cluster on Google Compute Engine -- DataProc -- Hadoop and Spark Versions -- Creating a Cluster -- Submitting a Job -- Summary -- Chapter 2: Math for Machine Learning -- Linear algebra -- Setting up the Scala environment in Intellij -- Setting up the Scala environment on the Command Line -- Fields -- Real numbers -- Complex numbers -- Vectors -- Vector spaces -- Vector types -- Vectors in Breeze -- Vectors in Spark -- Vector operations -- Hyperplanes -- Vectors in machine learning -- Matrix -- Types of matrices -- Matrix in Spark -- Distributed matrix in Spark -- Matrix operations -- Determinant -- Eigenvalues and eigenvectors -- Singular value decomposition -- Matrices in machine learning -- Functions -- Function types -- Functional composition -- Hypothesis -- Gradient descent -- Prior, likelihood, and posterior -- Calculus -- Differential calculus -- Integral calculus.
505	8		\|a Lagranges multipliers -- Plotting -- Summary -- Chapter 3: Designing a Machine Learning System -- What is Machine Learning? -- Introducing MovieStream -- Business use cases for a machine learning system -- Personalization -- Targeted marketing and customer segmentation -- Predictive modeling and analytics -- Types of machine learning models -- The components of a data-driven machine learning system -- Data ingestion and storage -- Data cleansing and transformation -- Model training and testing loop -- Model deployment and integration -- Model monitoring and feedback -- Batch versus real time -- Data Pipeline in Apache Spark -- An architecture for a machine learning system -- Spark MLlib -- Performance improvements in Spark ML over Spark MLlib -- Comparing algorithms supported by MLlib -- Classification -- Clustering -- Regression -- MLlib supported methods and developer APIs -- Spark Integration -- MLlib vision -- MLlib versions compared -- Spark 1.6 to 2.0 -- Summary -- Chapter 4: Obtaining, Processing, and Preparing Data with Spark -- Accessing publicly available datasets -- The MovieLens 100k dataset -- Exploring and visualizing your data -- Exploring the user dataset -- Count by occupation -- Movie dataset -- Exploring the rating dataset -- Rating count bar chart -- Distribution of number ratings -- Processing and transforming your data -- Filling in bad or missing data -- Extracting useful features from your data -- Numerical features -- Categorical features -- Derived features -- Transforming timestamps into categorical features -- Extract time of Day -- Extract time of day -- Text features -- Simple text feature extraction -- Sparse Vectors from Titles -- Normalizing features -- Using ML for feature normalization -- Using packages for feature extraction -- TFID -- IDF -- Word2Vector -- Skip-gram model -- Standard scalar -- Summary.
505	8		\|a Chapter 5: Building a Recommendation Engine with Spark -- Types of recommendation models -- Content-based filtering -- Collaborative filtering -- Matrix factorization -- Explicit matrix factorization -- Implicit Matrix Factorization -- Basic model for Matrix Factorization -- Alternating least squares -- Extracting the right features from your data -- Extracting features from the MovieLens 100k dataset -- Training the recommendation model -- Training a model on the MovieLens 100k dataset -- Training a model using Implicit feedback data -- Using the recommendation model -- ALS Model recommendations -- User recommendations -- Generating movie recommendations from the MovieLens 100k dataset -- Inspecting the recommendations -- Item recommendations -- Generating similar movies for the MovieLens 100k dataset -- Inspecting the similar items -- Evaluating the performance of recommendation models -- ALS Model Evaluation -- Mean Squared Error -- Mean Average Precision at K -- Using MLlib's built-in evaluation functions -- RMSE and MSE -- MAP -- FP-Growth algorithm -- FP-Growth Basic Sample -- FP-Growth Applied to Movie Lens Data -- Summary -- Chapter 6: Building a Classification Model with Spark -- Types of classification models -- Linear models -- Logistic regression -- Multinomial logistic regression -- Visualizing the StumbleUpon dataset -- Extracting features from the Kaggle/StumbleUpon evergreen classification dataset -- StumbleUponExecutor -- Linear support vector machines -- The naive Bayes model -- Decision trees -- Ensembles of trees -- Random Forests -- Gradient-Boosted Trees -- Multilayer perceptron classifier -- Extracting the right features from your data -- Training classification models -- Training a classification model on the Kaggle/StumbleUpon evergreen classification dataset -- Using classification models.
505	8		\|a Generating predictions for the Kaggle/StumbleUpon evergreen classification dataset -- Evaluating the performance of classification models -- Accuracy and prediction error -- Precision and recall -- ROC curve and AUC -- Improving model performance and tuning parameters -- Feature standardization -- Additional features -- Using the correct form of data -- Tuning model parameters -- Linear models -- Iterations -- Step size -- Regularization -- Decision trees -- Tuning tree depth and impurity -- The naive Bayes model -- Cross-validation -- Summary -- Chapter 7: Building a Regression Model with Spark -- Types of regression models -- Least squares regression -- Decision trees for regression -- Evaluating the performance of regression models -- Mean Squared Error and Root Mean Squared Error -- Mean Absolute Error -- Root Mean Squared Log Error -- The R-squared coefficient -- Extracting the right features from your data -- Extracting features from the bike sharing dataset -- Training and using regression models -- BikeSharingExecutor -- Training a regression model on the bike sharing dataset -- Linear regression -- Generalized linear regression -- Decision tree regression -- Ensembles of trees -- Random forest regression -- Gradient boosted tree regression -- Improving model performance and tuning parameters -- Transforming the target variable -- Impact of training on log-transformed targets -- Tuning model parameters -- Creating training and testing sets to evaluate parameters -- Splitting data for Decision tree -- The impact of parameter settings for linear models -- Iterations -- Step size -- L2 regularization -- L1 regularization -- Intercept -- The impact of parameter settings for the decision tree -- Tree depth -- Maximum bins -- The impact of parameter settings for the Gradient Boosted Trees -- Iterations -- MaxBins -- Summary.
505	8		\|a Chapter 8: Building a Clustering Model with Spark -- Types of clustering models -- k-means clustering -- Initialization methods -- Mixture models -- Hierarchical clustering -- Extracting the right features from your data -- Extracting features from the MovieLens dataset -- K-means -- training a clustering model -- Training a clustering model on the MovieLens dataset -- K-means -- interpreting cluster predictions on the MovieLens dataset -- Interpreting the movie clusters -- Interpreting the movie clusters -- K-means -- evaluating the performance of clustering models -- Internal evaluation metrics -- External evaluation metrics -- Computing performance metrics on the MovieLens dataset -- Effect of iterations on WSSSE -- Bisecting KMeans -- Bisecting K-means -- training a clustering model -- WSSSE and iterations -- Gaussian Mixture Model -- Clustering using GMM -- Plotting the user and item data with GMM clustering -- GMM -- effect of iterations on cluster boundaries -- Summary -- Chapter 9: Dimensionality Reduction with Spark -- Types of dimensionality reduction -- Principal components analysis -- Singular value decomposition -- Relationship with matrix factorization -- Clustering as dimensionality reduction -- Extracting the right features from your data -- Extracting features from the LFW dataset -- Exploring the face data -- Visualizing the face data -- Extracting facial images as vectors -- Loading images -- Converting to grayscale and resizing the images -- Extracting feature vectors -- Normalization -- Training a dimensionality reduction model -- Running PCA on the LFW dataset -- Visualizing the Eigenfaces -- Interpreting the Eigenfaces -- Using a dimensionality reduction model -- Projecting data using PCA on the LFW dataset -- The relationship between PCA and SVD -- Evaluating dimensionality reduction models.
630	0	0	\|a Spark (Electronic resource : Apache Software Foundation)
630	0	7	\|a Spark (Electronic resource : Apache Software Foundation) \|2 fast
650		0	\|a Machine learning.
650		7	\|a Machine learning \|2 fast
700	1		\|a Ghotra, Manpreet Singh.
700	1		\|a Pentreath, Nick.
758			\|i has work: \|a Machine Learning with Spark - Second Edition (Text) \|1 https://id.oclc.org/worldcat/entity/E39PCYTyG3Gm8QRPqQCBFw8kCP \|4 https://id.oclc.org/worldcat/ontology/hasWork
776	0	8	\|i Print version: \|a Dua, Rajdeep. \|t Machine Learning with Spark - Second Edition. \|d Birmingham : Packt Publishing, ©2016
856	4	0	\|u https://ebookcentral.proquest.com/lib/holycrosscollege-ebooks/detail.action?docID=4853045 \|y Click for online access
903			\|a EBC-AC
994			\|a 92 \|b HCD

Machine Learning with Spark - Second Edition.

MARC

Similar Items