Machine Learning

31 skills with this tag

ml-pipeline-workflow

This skill provides comprehensive documentation and best practices for building production MLOps pipelines. It covers the full ML lifecycle including data preparation, model training, validation, and deployment strategies with guidance on using orchestration tools like Airflow, Dagster, and Kubeflow.

MlopsMachine LearningPipeline+3

Ml Pipeline Workflow

A comprehensive MLOps skill that provides specialized AI agents (data scientist, ML engineer, MLOps engineer) and workflow templates for building production machine learning pipelines. It guides users through data preparation, model training, validation, deployment, and monitoring stages using modern ML tools like MLflow, Kubeflow, and Feast.

MlopsMachine LearningPipeline+3

A comprehensive reference skill for Vaex, a high-performance Python library for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Covers DataFrame operations, data loading, filtering, aggregations, machine learning pipelines, visualization, and performance optimization strategies.

Data AnalysisPythonBig Data+3

This skill provides comprehensive documentation and guidance for using UMAP (Uniform Manifold Approximation and Projection), a fast dimensionality reduction technique for visualization and machine learning. It covers installation, parameter tuning, supervised/unsupervised learning, clustering preprocessing with HDBSCAN, and advanced features like Parametric UMAP and inverse transforms.

Machine LearningDimensionality ReductionVisualization+3

TorchDrug is a documentation skill that provides comprehensive guidance for using the TorchDrug PyTorch library in drug discovery and molecular science. It covers graph neural networks for molecules and proteins, including molecular property prediction, protein modeling, knowledge graph reasoning, molecular generation, and retrosynthesis planning with 40+ curated datasets and 20+ model architectures.

Drug DiscoveryMachine LearningPytorch+3

Torch Geometric

This skill provides comprehensive guidance for PyTorch Geometric (PyG), a library for developing Graph Neural Networks. It covers graph creation, GNN architectures (GCN, GAT, GraphSAGE, GIN), node/graph classification, molecular property prediction, and large-scale graph learning with extensive reference documentation and utility scripts.

Graph Neural NetworksPytorchDeep Learning+3

A comprehensive reference skill for the statsmodels Python library, covering statistical modeling techniques including linear regression, generalized linear models, discrete choice models, time series analysis, and statistical diagnostics. Provides code examples, best practices, and detailed explanations for econometrics and rigorous statistical inference.

StatisticsPythonData Analysis+3

Stable Baselines3

A comprehensive reference skill for reinforcement learning with Stable Baselines3. It provides algorithm selection guides, training templates, custom environment creation tutorials, callback documentation, and vectorized environment usage patterns for efficient RL agent development.

Reinforcement LearningMachine LearningPytorch+3

A comprehensive documentation skill for scvi-tools, a Python framework for probabilistic deep generative models in single-cell genomics. It provides guidance on models for RNA-seq, ATAC-seq, multimodal data integration, spatial transcriptomics, and specialized modalities like methylation and cytometry analysis.

BioinformaticsSingle CellGenomics+3

Pytorch Lightning

This skill provides comprehensive documentation and templates for PyTorch Lightning, a framework that organizes PyTorch code for scalable deep learning. It includes ready-to-use templates for LightningModules and DataModules, Trainer configurations for various scenarios (single GPU, multi-GPU, FSDP, DeepSpeed), and detailed guides for callbacks, logging, distributed training, and best practices.

Pytorch LightningDeep LearningMachine Learning+3

PyTDC (Therapeutics Data Commons) provides AI-ready datasets and benchmarks for drug discovery and development. It offers curated datasets spanning ADME, toxicity, drug-target interactions, and molecular generation with standardized evaluation metrics and meaningful data splits for therapeutic machine learning applications.

Drug DiscoveryMachine LearningTherapeutics+3

PufferLib is a high-performance reinforcement learning framework optimized for fast parallel training. It provides templates for creating custom environments, training scripts with PPO, and seamless integration with popular RL frameworks like Gymnasium and PettingZoo, achieving millions of steps per second.

Reinforcement LearningMachine LearningPytorch+3

This skill provides comprehensive documentation and code examples for PennyLane, a quantum computing library for training quantum circuits like neural networks. It covers quantum machine learning, chemistry simulations (VQE), optimization algorithms (QAOA), and integration with classical ML frameworks like PyTorch, JAX, and TensorFlow.

Quantum ComputingMachine LearningPennylane+3

Molfeat is a comprehensive guide for molecular featurization in machine learning. It provides documentation for converting chemical structures (SMILES strings) into numerical representations using 100+ featurizers including fingerprints (ECFP, MACCS), descriptors (RDKit, Mordred), and pretrained models (ChemBERTa, GIN). Ideal for QSAR modeling, virtual screening, and cheminformatics tasks.

CheminformaticsMachine LearningMolecular Features+3

Modal is a documentation skill that guides users in running Python code on Modal's serverless cloud platform. It covers GPU-accelerated computing, autoscaling, persistent storage with Volumes, scheduled jobs, and web endpoints for ML workloads and batch processing.

ModalServerlessGpu+3

Hypogenic is a framework for automated scientific hypothesis generation using large language models. It can generate testable hypotheses from data alone (HypoGeniC), combine literature insights with empirical patterns (HypoRefine), or use both approaches together (Union methods). The skill provides configuration templates and documentation for accelerating research discovery across domains like deception detection, content analysis, and predictive modeling.

Hypothesis GenerationScientific ResearchLlm Application+3

Histolab is a documentation skill for the histolab Python library used in digital pathology. It provides comprehensive guidance on processing whole slide images (WSI), including tissue detection, tile extraction strategies, preprocessing filters, and visualization techniques for preparing datasets for deep learning pipelines.

Digital PathologyImage ProcessingPython+3

A comprehensive molecular machine learning skill using DeepChem for predicting chemical properties like solubility and toxicity. It supports graph neural networks, transfer learning with pretrained models (ChemBERTa, GROVER), and MoleculeNet benchmarks for drug discovery and materials science applications.

ChemistryMachine LearningDrug Discovery+3

Cellxgene Census

This skill provides comprehensive guidance for programmatically accessing the CZ CELLxGENE Census, a collection of 61+ million single-cell genomics data. It covers querying expression data by cell type, tissue, or disease, integrating with PyTorch for machine learning, and using scanpy for analysis workflows.

BioinformaticsSingle CellGenomics+3

DSPy Ruby is a comprehensive guide for building LLM-powered Ruby applications using the DSPy.rb framework. It provides type-safe signatures, composable modules, multi-provider support (OpenAI, Anthropic, Gemini, Ollama), and patterns for testing, optimization, and production monitoring of AI applications.

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

Machine LearningTransformersHuggingface+3

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model.

Machine LearningModel InterpretabilityExplainability+3

Scikit Survival

Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.

Survival AnalysisMachine LearningStatistics+3

Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.

Machine LearningScikit LearnPython+3

High-performance toolkit for genomic interval analysis in Rust with Python bindings. Use when working with genomic regions, BED files, coverage tracks, overlap detection, tokenization for ML models, or fragment analysis in computational genomics and machine learning applications.

GenomicsBioinformaticsRust+3

This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.

GenomicsMachine LearningBioinformatics+3

Comprehensive toolkit for protein language models including ESM3 (generative multimodal protein design across sequence, structure, and function) and ESM C (efficient protein embeddings and representations). Use this skill when working with protein sequences, structures, or function prediction; designing novel proteins; generating protein embeddings; performing inverse folding; or conducting protein engineering tasks. Supports both local model usage and cloud-based Forge API for scalable inference.

Protein EngineeringBioinformaticsMachine Learning+3

Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets.

BioinformaticsGene Regulatory NetworksTranscriptomics+3

This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.

Time SeriesMachine LearningPython+3

Hugging Face Cli

Execute Hugging Face Hub operations using the hf CLI. Download models and datasets, upload files to Hub repositories, create repos, manage local cache, and run compute jobs on HF infrastructure. Supports authentication, file transfers, and cloud compute management.

HuggingfaceMachine LearningCli+3