DSC 232R: Big Data Analytics Using Spark
Jan 5, 2026
·
1 min read
Course Overview
DSC 232R: Big Data Analytics Using Spark is a graduate-level course in the Master of Data Science program at UC San Diego’s Halicioglu Data Science Institute.
Topics Covered
- Distributed computing fundamentals
- Apache Spark architecture and programming model
- Large-scale data processing with PySpark
- Machine learning at scale with MLlib
- Real-world applications in genomics and industry
- Cloud computing and cluster management
Learning Outcomes
Students completing this course will be able to:
- Design and implement distributed data processing pipelines
- Apply machine learning algorithms to large-scale datasets
- Optimize Spark applications for performance
- Work with real-world big data from genomics and other domains
Offering Schedule
- Spring 2024
- Fall 2025
- Winter 2026

Authors
Edwin Solares
(he/him)
Lecturer in Computer Science & Data Science
I am a computational biologist and data scientist bridging artificial intelligence,
evolutionary genomics, and climate-resilient agriculture. My research leverages
cutting-edge machine learning and bioinformatics to address global food security
challenges in the face of rapid climate change. With publications in high-impact
journals including Nature Plants, PNAS, and Genome Research (h-index: 7), I develop
tools and methods that advance both computational science and real-world applications.