Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller
Big Data Analytics With Spark: A Practitioner's Guide To Using Spark For Large Scale Data Analysis, By Mohammed Guller When composing can change your life, when writing can improve you by providing much cash, why do not you try it? Are you still quite confused of where getting the ideas? Do you still have no concept with exactly what you are going to compose? Now, you will require reading Big Data Analytics With Spark: A Practitioner's Guide To Using Spark For Large Scale Data Analysis, By Mohammed Guller An excellent writer is a good visitor at the same time. You could specify just how you create relying on what publications to check out. This Big Data Analytics With Spark: A Practitioner's Guide To Using Spark For Large Scale Data Analysis, By Mohammed Guller could aid you to fix the problem. It can be among the right resources to create your composing ability.
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller
Read Ebook Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller
Big Data Analytics with Spark is a step-by-step guide for learning Spark, which is an open-source fast and general-purpose cluster computing framework for large-scale data analysis. You will learn how to use Spark for different types of big data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. In addition, this book will help you become a much sought-after Spark expert.
Spark is one of the hottest Big Data technologies. The amount of data generated today by devices, applications and users is exploding. Therefore, there is a critical need for tools that can analyze large-scale data and unlock value from it. Spark is a powerful technology that meets that need. You can, for example, use Spark to perform low latency computations through the use of efficient caching and iterative algorithms; leverage the features of its shell for easy and interactive Data analysis; employ its fast batch processing and low latency features to process your real time data streams and so on. As a result, adoption of Spark is rapidly growing and is replacing Hadoop MapReduce as the technology of choice for big data analytics.
This book provides an introduction to Spark and related big-data technologies. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, and MLlib. Big Data Analytics with Spark is therefore written for busy professionals who prefer learning a new technology from a consolidated source instead of spending countless hours on the Internet trying to pick bits and pieces from different sources.
The book also provides a chapter on Scala, the hottest functional programming language, and the program that underlies Spark. You’ll learn the basics of functional programming in Scala, so that you can write Spark applications in it.
What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, like Hive, Avro, Kafka and so on. So the book is self-sufficient; all the technologies that you need to know to use Spark are covered. The only thing that you are expected to know is programming in any language.
There is a critical shortage of people with big data expertise, so companies are willing to pay top dollar for people with skills in areas like Spark and Scala. So reading this book and absorbing its principles will provide a boost—possibly a big boost—to your career.What youll learn1) Interactively analyze large-scale data with Spark
2) Write Spark applications in Scala for analyzing large-scale data in batch mode
3) Use Spark SQL to analyze large-scale data using standard SQL and Hive Query Language
4) Analyze large volume of stream data with Spark Streaming
5) Develop machine learning applications with MLlib
6) Deploy Spark in a variety of situationsWho this book is forBig Data Analytics with Spark is for data scientists, business analysts, data architects, and data analysts looking for a better and faster tool for large-scale data analysis. It is also for software engineers and developers building Big Data products. Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller- Amazon Sales Rank: #339942 in eBooks
- Published on: 2015-11-25
- Released on: 2015-11-25
- Format: Kindle eBook
About the Author
Mohammed Guller is the principal architect at Glassbeam, where he leads the development of advanced and predictive analytics products. He is a big data and Spark expert. He is frequently invited to speak at big data–related conferences. He is passionate about building new products, big data analytics, and machine learning.
Over the last 20 years, Mohammed has successfully led the development of several innovative technology products from concept to release. Prior to joining Glassbeam, he was the founder of TrustRecs.com, which he started after working at IBM for five years. Before IBM, he worked in a number of hi-tech start-ups, leading new product development.Mohammed has a master's of business administration from the University of California, Berkeley, and a master's of computer applications from RCC, Gujarat University, India.
Where to Download Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller
Most helpful customer reviews
3 of 3 people found the following review helpful. If you want to learn Spark, buy this book. Highly recommended By Ian Stirk Hi,I have written a detailed chapter-by-chapter review of this book on www DOT i-programmer DOT info, the first and last parts of this review are given here. For my review of all chapters, search i-programmer DOT info for STIRK together with the book's title.This book aims to provide a “...concise and easy-to-understand tutorial for big data and Spark”. How does it fare?Spark is increasing the tool of choice for big data processing, being much faster than Hadoop’s MapReduce. After putting Spark into a big data context, the book aims to cover Spark’s core library, together with its more specialized libraries for Streaming, Machine Learning, SQL, and Graphing.The book is aimed at developers that are new to Spark, some general background programming knowledge required, but little else.Chapter 1 Big Data Technology LandscapeThis chapter opens with a discussion about the current big data age, with data as the lifeblood of organizations, and growing exponentially. The standard 3Vs definition of big data is explored (velocity, variety, volume). Traditional relational database management systems (RDBMS) are unable to process these large volumes in a timely manner – this is where the scalability of big data systems comes into its own.Next, the chapter discusses some technologies that are either used with Spark, or Spark competes with. The first technology is Hadoop, this is fault tolerant and scalable, and runs on commodity hardware. The three major components of Hadoop are discussed: YARN (Yet Another Resource Negotiator), MapReduce (distributed processing model), and HDFS (Hadoop Distributed File System). Spark is increasingly being used in place of MapReduce owning to its faster speed. The section briefly discusses Hive, a data warehouse with a SQL like interface, Spark SQL is expected to supersede Hive on many systems.The chapter continues with a look at some common binary formats for serializing (storing on disk) big data, and their pros and cons. Specifically Avro, Thrift, Protocol Buffers, and SequenceFile are examined. Next, some column storage formats, which have performance advantages when the client requires a subset of columns, were briefly discussed, namely: RCFile, ORC, and Parquet.Then a brief overview of messaging systems is provided, together with the advantages of having a layer of abstraction between producers and consumers. Specifically, Kafka and ZeroMQ are discussed with the aid of useful supporting diagrams.NoSQL is then examined. The various types of NoSQL databases have different aims to the traditional RDBMS, typically trading Atomicity, Consistency, Isolation, Durability (ACID) for scalability and flexibility. The specific NoSQL databases briefly discussed are Cassandra and HBase. I sometimes wonder if it is meaningful to group NoSQL databases together. Is it meaningful to divide sports into Football and NoFootball? Are all the NoFootball sports meaningful as a group?The chapter ends with a look at some distributed SQL query engines, these do not use MapReduce batch jobs, and are thus more oriented to interactive querying. The engines briefly examined are: Impala, Presto, and Apache Drill.This chapter provides an excellent overview of big data technology. It should be noted there are many more technologies than described, but the examples given are sufficient to explain the topic areas. This is possibly the best backgrounder to big data I’ve read.The discussions are very well written, concise and clear, with helpful diagrams, and no wasted words. There’s a good flow between the topics, and useful links between chapters. There are website links for further information. These traits apply to all the chapters in the book....ConclusionThis book aims to provide a “...concise and easy-to-understand tutorial for big data and Spark”, and clearly succeeds. The book is exceptionally well written. Helpful explanations, diagrams, practical step-by-step walkthroughs, annotated code, inter-chapter links, and website links abound throughout.The book is aimed at developers that are new to Spark, and explains concepts from the beginning. If you work through the book you should become competent in the use of Spark, there is much more to learn of course, but this book gives a solid foundation in both core Spark and its major specialized libraries: Streaming, Machine Learning, SQL, and Graphing.The book is based on workshops given by the author, and clearly the feedback from these has been useful in creating this book, since it seems to have answered all the questions I had.This book provides everything you need to know to get started with Spark, explained in an easy-to-follow manner. If you want to learn Spark, buy this book. Highly recommended
2 of 2 people found the following review helpful. An excellent book that covers depth and is also readable .. By A. Jaokar Cross posted from my blog review: I have been reading and reviewing a number of excellent books for the Data Science for IoT course and also my Oxford University course.Big Data Analytics with Spark By Mohammed Guller is for data scientists, business analysts, data architects, and data analysts looking for a better and faster tool for large-scale data analysis. It is also for software engineers and developers building Big Data products. The book covers a subject which I have been focussing on through my teaching and research. It provides a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. The book covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML.My analysis:The book covers Mllib, Scala, Spark and Analytics in detail but it is also readable. It also covers Code for all these sections. The only recommendations I would make are: A better index and releasing code in Github. However, the book pdf can be bought for an extra $5(so you can copy and paste the code if you need it)I see the book comprising three sections:a) The main theme of the book i.e. Big Data Analytics with Sparkb) The first five chapters leading up to the themec) The last three chapters on Spark deploymentThe main theme of the book i.e. Big Data Analytics with SparkChapter 6: Spark Streaming (10 pages): Introduce Spark streaming and show an example app using Spark streaming includes Spark streaming introduction, How Spark streaming works and A spark streaming example app.Chapter 7: Spark SQL (20 pages): Introduce Spark SQL along with a few examplesChapter 8: MLlib (40 pages): Introduce machine learning and MLlib along with a few examples covers Machine learning introduction, Linear regression, Logistic regression, Classification, Clustering, Recommender system. Building a machine learning application with MLlib, MLBaseThe first five chapters leading up to the themeChapter 1: Big Data Technology Landscape : Cluster computing(Hadoop MapReduce, HDFS, Hive), Data serialization( Avro, Proto Buffer), Columnar storage (Parquet), Messaging system (Kafka, ZeroMQ), NoSQL databases (HBase, Cassandra), Distributed SQL Query engine (Apache Drill, Impala, PrestoDB)Chapter 2: Functional Programming in Scala (30 pages) Introduce Scala so that readers can understand and write Spark applications in Scala, which is the primary language supported by Spark. This includes Key functional programming concepts including Basic Scala constructs, Scala Shell etcChapter 3: Spark’s Essentials (35 pages): Introduce Spark fundamentals and key conceptsWhat is Spark, Why Spark is hot, Why Spark is faster than Hadoop MapReduce, Resilient Distributed Datasets (RDD)Chapter 4: Spark Shell (10 pages): Introduce Spark Shell and show how it can be used for interactive data analysis, Spark shell introduction, Interactive data analysis in Spark-shellChapter 5: A Stand-alone Spark Application (10 pages): Provide step-by-step directions for writing and running a Spark application. Basic structure of a stand-alone Spark application, Compiling a Spark applicationThe last three chapters (Deployment Chapters)Chapter 9: GraphX Introduce Graph analysis and GraphX along with a few examplesChapter 10: Deploying Spark – a walkthrough of Spark deployment with different cluster management technologies such as YARN, Mesos, and services like AWS (EC2)Chapter 11: Monitoring a Spark Cluster (20 pages)Overall, I very much recommend this book. Big Data Analytics with Spark A Practitioner’s Guide to Using Spark for Large Scale Data Analysis By Mohammed GullerI also plan to use this book in the Data Science for IoT course and also my Oxford University course which I will teach later in the year.
1 of 1 people found the following review helpful. Well Organized and informative book By DiptiB This book is a very well written definitive overview of Spark. This is a great book for those who want to learn about spark but dont know where to start from.Fundamentals are very well explained in the book for developers who are new to spark. It starts with great overview of big data technology and helps in building basics and then moves on to explore more advanced topics. The book covers Spark core and its specialized add-on libraries too. This book also contains a plenty of sample examples which are really useful. Even if you are new to the subject this book has enough information to get a developer started on spark projects.In short this book is really well organized, very informative and easy to follow. Highly recommended .
See all 7 customer reviews... Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed GullerBig Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller PDF
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller iBooks
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller ePub
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller rtf
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller AZW
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller Kindle
Tidak ada komentar:
Posting Komentar