Machine Learning

25 skills with this tag

wshobson
Passed
Embedding Strategies
A comprehensive reference guide for selecting and optimizing embedding models for vector search and RAG applications. Covers model comparisons (OpenAI, Voyage, BGE, E5), chunking strategies (token-based, sentence-based, semantic), domain-specific pipelines, and retrieval quality evaluation metrics.
EmbeddingsVector SearchRag+3
8024.0k
wshobson
Passed
Ml Pipeline Workflow
This skill provides comprehensive guidance for building production machine learning pipelines. It covers the full MLOps lifecycle including data preparation, model training, validation, and deployment, with templates and best practices for workflow orchestration tools like Airflow, Dagster, and Kubeflow.
MlopsMachine LearningPipeline+3
8024.0k
wshobson
Passed
Llm Evaluation
This skill helps you implement comprehensive evaluation strategies for LLM applications. It covers automated metrics like BLEU, ROUGE, and BERTScore for measuring text quality, LLM-as-judge patterns for using stronger models to evaluate outputs, human evaluation frameworks with inter-rater agreement calculations, and statistical A/B testing for comparing model variants with proper significance testing.
Llm EvaluationMachine LearningMetrics+3
10024.0k
K-Dense-AI
Passed
Vaex
Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that don't fit in memory.
Big DataData ProcessingDataframes+3
802.5k
K-Dense-AI
Passed
Umap Learn
UMAP dimensionality reduction. Fast nonlinear manifold learning for 2D/3D visualization, clustering preprocessing (HDBSCAN), supervised/parametric UMAP, for high-dimensional data.
Dimensionality ReductionMachine LearningVisualization+3
702.5k
K-Dense-AI
Passed
Transformers
This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.
Machine LearningTransformersNlp+3
502.5k
K-Dense-AI
Passed
Torchdrug
Graph-based drug discovery toolkit. Molecular property prediction (ADMET), protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis, GNNs (GIN, GAT, SchNet), 40+ datasets, for PyTorch-based ML on molecules, proteins, and biomedical graphs.
Drug DiscoveryMachine LearningGraph Neural Networks+3
302.5k
K-Dense-AI
Passed
Torch Geometric
Graph Neural Networks (PyG). Node/graph classification, link prediction, GCN, GAT, GraphSAGE, heterogeneous graphs, molecular property prediction, for geometric deep learning.
Graph Neural NetworksPytorchMachine Learning+3
702.5k
K-Dense-AI
Passed
Stable Baselines3
Use this skill for reinforcement learning tasks including training RL agents (PPO, SAC, DQN, TD3, DDPG, A2C, etc.), creating custom Gym environments, implementing callbacks for monitoring and control, using vectorized environments for parallel training, and integrating with deep RL workflows. This skill should be used when users request RL algorithm implementation, agent training, environment design, or RL experimentation.
Reinforcement LearningMachine LearningPytorch+3
702.5k
K-Dense-AI
Passed
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model.
Machine LearningModel InterpretabilityExplainability+3
702.5k
K-Dense-AI
Passed
Scvi Tools
This skill should be used when working with single-cell omics data analysis using scvi-tools, including scRNA-seq, scATAC-seq, CITE-seq, spatial transcriptomics, and other single-cell modalities. Use this skill for probabilistic modeling, batch correction, dimensionality reduction, differential expression, cell type annotation, multimodal integration, and spatial analysis tasks.
Single CellGenomicsMachine Learning+3
502.5k
K-Dense-AI
Passed
Scikit Survival
Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.
Survival AnalysisMachine LearningStatistics+3
502.5k
K-Dense-AI
Passed
Scikit Learn
Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.
Machine LearningScikit LearnPython+3
1002.5k
K-Dense-AI
Passed
Pytorch Lightning
Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.
Pytorch LightningDeep LearningDistributed Training+3
602.5k
K-Dense-AI
Passed
Pytdc
Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.
Drug DiscoveryMachine LearningTherapeutics+3
302.5k
K-Dense-AI
Passed
Pufferlib
This skill should be used when working with reinforcement learning tasks including high-performance RL training, custom environment development, vectorized parallel simulation, multi-agent systems, or integration with existing RL environments (Gymnasium, PettingZoo, Atari, Procgen, etc.). Use this skill for implementing PPO training, creating PufferEnv environments, optimizing RL performance, or developing policies with CNNs/LSTMs.
Reinforcement LearningMachine LearningPpo+3
602.5k
K-Dense-AI
Passed
Molfeat
Molecular featurization for ML (100+ featurizers). ECFP, MACCS, descriptors, pretrained models (ChemBERTa), convert SMILES to features, for QSAR and molecular ML.
Molecular FeaturizationCheminformaticsMachine Learning+3
502.5k
K-Dense-AI
Passed
Modal
Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.
ServerlessGpu ComputingMachine Learning+3
402.5k
K-Dense-AI
Passed
Hypogenic
Automated hypothesis generation and testing using large language models. Use this skill when generating scientific hypotheses from datasets, combining literature insights with empirical data, testing hypotheses against observational data, or conducting systematic hypothesis exploration for research discovery in domains like deception detection, AI content detection, mental health analysis, or other empirical research tasks.
Hypothesis GenerationScientific DiscoveryLlm+3
702.5k
K-Dense-AI
Passed
Gtars
High-performance toolkit for genomic interval analysis in Rust with Python bindings. Use when working with genomic regions, BED files, coverage tracks, overlap detection, tokenization for ML models, or fragment analysis in computational genomics and machine learning applications.
GenomicsBioinformaticsRust+3
602.5k
K-Dense-AI
Passed
Geniml
This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.
GenomicsMachine LearningBioinformatics+3
402.5k
K-Dense-AI
Passed
Esm
Comprehensive toolkit for protein language models including ESM3 (generative multimodal protein design across sequence, structure, and function) and ESM C (efficient protein embeddings and representations). Use this skill when working with protein sequences, structures, or function prediction; designing novel proteins; generating protein embeddings; performing inverse folding; or conducting protein engineering tasks. Supports both local model usage and cloud-based Forge API for scalable inference.
Protein EngineeringBioinformaticsMachine Learning+3
402.5k
K-Dense-AI
Passed
Deepchem
Molecular machine learning toolkit. Property prediction (ADMET, toxicity), GNNs (GCN, MPNN), MoleculeNet benchmarks, pretrained models, featurization, for drug discovery ML.
Machine LearningDrug DiscoveryChemistry+3
502.5k
K-Dense-AI
Passed
Arboreto
Infer gene regulatory networks (GRNs) from gene expression data using scalable algorithms (GRNBoost2, GENIE3). Use when analyzing transcriptomics data (bulk RNA-seq, single-cell RNA-seq) to identify transcription factor-target gene relationships and regulatory interactions. Supports distributed computation for large-scale datasets.
BioinformaticsGene Regulatory NetworksTranscriptomics+3
202.5k
K-Dense-AI
Passed
Aeon
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
Time SeriesMachine LearningPython+3
302.5k