Big Data
4 skills with this tag
wshobson
Passed
spark-optimization
A comprehensive reference guide for optimizing Apache Spark jobs. Covers partitioning strategies, join optimization (broadcast, sort-merge, bucket joins), caching patterns, memory configuration, shuffle reduction techniques, and data format optimization with practical PySpark code examples.
SparkData EngineeringPerformance+3
51527.0k
K-Dense-AI
Passed
Vaex
A comprehensive reference skill for Vaex, a high-performance Python library for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Covers DataFrame operations, data loading, filtering, aggregations, machine learning pipelines, visualization, and performance optimization strategies.
Data AnalysisPythonBig Data+3
1507.3k
K-Dense-AI
Passed
Dask
A comprehensive documentation skill for Dask, a Python library for parallel and distributed computing. It provides detailed reference guides for working with larger-than-memory datasets using DataFrames, Arrays, Bags, and Futures, along with scheduler selection and best practices for performance optimization.
DaskParallel ComputingData Processing+3
7037.3k
wgzhao
Passed
SKILL: Addax 项目知识
This skill provides AI assistants with deep knowledge about the Addax open-source ETL tool. It covers the project's architecture, plugin system for 20+ data sources (MySQL, PostgreSQL, MongoDB, Kafka, HDFS, etc.), job configuration format, data transformation capabilities, and development guidelines for building custom reader/writer plugins.
EtlData IntegrationJava+3
621.4k