Big Data
2 skills with this tag
wshobson
Passed
Spark Optimization
A comprehensive reference guide for optimizing Apache Spark jobs. Covers partitioning strategies, join optimization (broadcast, sort-merge, bucket joins), caching patterns, memory tuning, shuffle optimization, and data format best practices with PySpark code examples.
SparkData EngineeringPerformance+3
8024.0k
K-Dense-AI
Passed
Vaex
Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that don't fit in memory.
Big DataData ProcessingDataframes+3
802.5k