Skip to content

olyamasaeva/Made_mlbd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Title: Machine Learning in Big Data - Laboratory Works in MADE Repository

Introduction:

Welcome to the GitHub repository for our comprehensive laboratory works in MADE academy in Machine Learning for Big Data! This repository aims to provide hands-on experience and practical insights into the powerful world of machine learning techniques and their application in big data environments. The repository is structured into four branches, each covering a specific homework topic, including HDFS and MapReduce, Hive, Scala, and Spark ML.

Project Overview:

In this repository, you'll find a collection of laboratory works that delve into key concepts and tools essential for understanding and implementing machine learning algorithms in big data settings. From data storage and processing to advanced machine learning models, each branch focuses on a distinct aspect of big data analytics.

Branches:

  • HDFS and MapReduce: This branch explores the fundamentals of Hadoop Distributed File System (HDFS) and MapReduce programming. It covers how to manage large-scale data storage and leverage MapReduce for parallel processing and distributed computing.

  • Hive: The Hive branch introduces the popular data warehouse infrastructure for Hadoop. It covers how to query and analyze structured data using HiveQL, making data exploration and manipulation efficient and intuitive.

  • Scala: In this branch, I have dived into the Scala programming language, a versatile and powerful language for working with big data tools.It was the first try of Scala and its role in big data processing.

  • Spark ML: This branch focuses on Apache Spark's MLlib, a robust library for scalable machine learning. It covers Spark's distributed machine learning algorithms, enabling to build and deploy ML models on large datasets.

Project Structure:

  • homework_1: Contains laboratory works and code samples related to HDFS and MapReduce.
  • homework_2: Includes queries, data samples, and HiveQL scripts for data analysis using Hive.
  • homework_3: Comprises Scala first try code and examples for big data processing.
  • homework_4: Contains notebooks and code snippets demonstrating Spark MLlib's machine learning capabilities.

Getting Started:

I hope this repository serves as a valuable resource for your machine learning journey in the realm of big data analytics. Happy learning and exploring the exciting world of machine learning in big data! 🚀