DSC 232R: Big Data Analytics Using Spark

Jan 5, 2026 · 1 min read
courses

Course Overview

DSC 232R: Big Data Analytics Using Spark is a graduate-level course in the Master of Data Science program at UC San Diego’s Halicioglu Data Science Institute.

Topics Covered

  • Distributed computing fundamentals
  • Apache Spark architecture and programming model
  • Large-scale data processing with PySpark
  • Machine learning at scale with MLlib
  • Real-world applications in genomics and industry
  • Cloud computing and cluster management

Learning Outcomes

Students completing this course will be able to:

  1. Design and implement distributed data processing pipelines
  2. Apply machine learning algorithms to large-scale datasets
  3. Optimize Spark applications for performance
  4. Work with real-world big data from genomics and other domains

Offering Schedule

  • Spring 2024
  • Fall 2025
  • Winter 2026
Edwin Solares
Authors
Lecturer in Computer Science & Data Science
I am a computational biologist and data scientist bridging artificial intelligence, evolutionary genomics, and climate-resilient agriculture. My research leverages cutting-edge machine learning and bioinformatics to address global food security challenges in the face of rapid climate change. With publications in high-impact journals including Nature Plants, PNAS, and Genome Research (h-index: 7), I develop tools and methods that advance both computational science and real-world applications.