Data Engineering
2 skills with this tag
wshobson
Passed
Spark Optimization
A comprehensive reference guide for optimizing Apache Spark jobs. Covers partitioning strategies, join optimization (broadcast, sort-merge, bucket joins), caching patterns, memory tuning, shuffle optimization, and data format best practices with PySpark code examples.
SparkData EngineeringPerformance+3
8024.0k
wshobson
Passed
Ml Pipeline Workflow
This skill provides comprehensive guidance for building production machine learning pipelines. It covers the full MLOps lifecycle including data preparation, model training, validation, and deployment, with templates and best practices for workflow orchestration tools like Airflow, Dagster, and Kubeflow.
MlopsMachine LearningPipeline+3
8024.0k