Data Science

9 skills with this tag

wshobson
Passed
Ml Pipeline Workflow
A comprehensive MLOps skill that provides specialized AI agents (data scientist, ML engineer, MLOps engineer) and workflow templates for building production machine learning pipelines. It guides users through data preparation, model training, validation, deployment, and monitoring stages using modern ML tools like MLflow, Kubeflow, and Feast.
MlopsMachine LearningPipeline+3
116127.0k
K-Dense-AI
Passed
Datamol
This skill provides comprehensive documentation and guidance for using datamol, a Python library that simplifies molecular cheminformatics tasks. It covers SMILES parsing, molecular descriptors, fingerprints, clustering, 3D conformer generation, visualization, and chemical reactions with sensible defaults built on top of RDKit.
CheminformaticsPythonRdkit+3
4077.3k
K-Dense-AI
Passed
Umap Learn
This skill provides comprehensive documentation and guidance for using UMAP (Uniform Manifold Approximation and Projection), a fast dimensionality reduction technique for visualization and machine learning. It covers installation, parameter tuning, supervised/unsupervised learning, clustering preprocessing with HDBSCAN, and advanced features like Parametric UMAP and inverse transforms.
Machine LearningDimensionality ReductionVisualization+3
9257.3k
K-Dense-AI
Passed
Pydeseq2
PyDESeq2 is a bioinformatics skill for analyzing bulk RNA-seq count data to identify differentially expressed genes. It provides a complete workflow from data loading through statistical testing (Wald tests with FDR correction), including support for single-factor and multi-factor experimental designs, optional LFC shrinkage, and visualization with volcano/MA plots.
BioinformaticsRna SeqGene Expression+3
8117.3k
K-Dense-AI
Passed
Cellxgene Census
This skill provides comprehensive guidance for programmatically accessing the CZ CELLxGENE Census, a collection of 61+ million single-cell genomics data. It covers querying expression data by cell type, tissue, or disease, integrating with PyTorch for machine learning, and using scanpy for analysis workflows.
BioinformaticsSingle CellGenomics+3
4997.3k
K-Dense-AI
Passed
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model.
Machine LearningModel InterpretabilityExplainability+3
7463.0k
K-Dense-AI
Passed
Scikit Learn
Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.
Machine LearningScikit LearnPython+3
5573.0k
K-Dense-AI
Passed
Gtars
High-performance toolkit for genomic interval analysis in Rust with Python bindings. Use when working with genomic regions, BED files, coverage tracks, overlap detection, tokenization for ML models, or fragment analysis in computational genomics and machine learning applications.
GenomicsBioinformaticsRust+3
4413.0k
K-Dense-AI
Passed
Aeon
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
Time SeriesMachine LearningPython+3
4883.0k