API atlas¶
The API atlas combines the package story, the capability map, and the API handbook so a user can answer three questions in one place:
- What job am I trying to do?
- Which API layer exists for that job?
- In which environment should I use that API?
Read this page in order¶
- Read the package story so you know what tscfbench is for.
- Read the capability map so you know which layer solves your problem.
- Read the API handbook so you know the exact entry points.
tscfbench¶
A benchmark-and-workflow package for time-series counterfactual inference.
python -m tscfbench helps you turn a counterfactual question into a reproducible study, a readable report, and a reusable workflow. It is not only a model package: it also provides benchmark protocols, canonical studies, teaching surfaces, and agent-friendly artifacts.
What it is¶
- A stable schema for impact and panel counterfactual tasks.
- A benchmark layer for single studies, canonical studies, and model sweeps.
- A workflow layer for reports, notebooks, docs, CI, and coding-agent use.
What it is not¶
- It is not a claim that one built-in baseline is the last word in methodology.
- It is not a giant all-in-one causal inference framework.
- It is not only a demo notebook; it is meant to survive in real research workflows.
Why people adopt it¶
- It starts from recognizable research jobs instead of source files.
- It tells users why each API exists, where it works best, and what it returns.
- It ships canonical studies, benchmark cards, tutorials, and release-facing docs.
- It is also designed for token-aware, agent-driven research workflows.
First commands to run¶
python -m tscfbench package-story
python -m tscfbench capability-map
python -m tscfbench api-atlas
python -m tscfbench scenario-matrix
python -m tscfbench tutorial-index
Capability map¶
This page explains what each part of tscfbench is for, why that part exists, and where it works best.
Orientation and package framing¶
Question it answers: What is this package and where should I start?
Why this exists: Research packages are hard to adopt when a newcomer has to reverse-engineer the repo before they can run a first result.
Primary APIs: package_overview, recommend_start_path, workflow_recipes
Primary CLI commands
python -m tscfbench intro
python -m tscfbench start-here
python -m tscfbench workflow-recipes
Best environments: docs site, CLI, teaching, notebook onboarding
Typical outputs
- package mental model
- recommended first path
- onboarding reading order
Counterfactual task schema¶
Question it answers: How do I express my own data so different models and workflows share one protocol?
Why this exists: Counterfactual tooling is fragmented across panel, impact, and forecasting ecosystems; the schema layer keeps them interoperable.
Primary APIs: ImpactCase, PanelCase, PredictionResult
Primary CLI commands
python -m tscfbench make-panel-spec
Best environments: notebook, python script, library integration
Typical outputs
- validated case objects
- shared prediction contract
- JSON specs
Single-study benchmarking¶
Question it answers: How do I run one interpretable benchmark with diagnostics instead of just a fitted curve?
Why this exists: Researchers need metrics, placebo logic, and readable outputs, not only predictions.
Primary APIs: benchmark, benchmark_panel, PanelProtocolConfig, render_panel_markdown
Primary CLI commands
python -m tscfbench demo
python -m tscfbench run-panel-spec
python -m tscfbench render-panel-report
Best environments: notebook, script, CLI
Typical outputs
- metrics
- placebo tables
- markdown report
Canonical benchmark studies¶
Question it answers: How do I benchmark on recognizable cases that other researchers already know?
Why this exists: A benchmark package becomes more legible when it ships with a small number of public landmark studies.
Primary APIs: CanonicalBenchmarkSpec, run_canonical_benchmark, render_canonical_markdown
Primary CLI commands
python -m tscfbench make-canonical-spec
python -m tscfbench run-canonical
python -m tscfbench render-canonical-report
Best environments: CLI, paper companion, docs site, CI
Typical outputs
- canonical benchmark JSON
- cross-study report
- snapshot regression runs
Model comparison and ecosystem planning¶
Question it answers: How do I compare built-in and external models under one protocol?
Why this exists: Benchmark stacks often break because dependency planning and experiment comparison live in different places.
Primary APIs: SweepMatrixSpec, run_sweep, adapter_catalog, install_matrix
Primary CLI commands
python -m tscfbench make-sweep-spec
python -m tscfbench run-sweep
python -m tscfbench install-matrix
python -m tscfbench list-adapters
Best environments: CLI, CI, shared server, methods notebook
Typical outputs
- sweep specs
- comparison reports
- install plans
Teaching, tutorials, and dissemination¶
Question it answers: How do I make the package understandable to collaborators, reviewers, and students?
Why this exists: Good code still fails to spread if there is no public-facing package story, tutorial order, or benchmark card layer.
Primary APIs: render_benchmark_cards_markdown, render_workflow_recipes_markdown
Primary CLI commands
python -m tscfbench benchmark-cards
python -m tscfbench tutorial-index
python -m tscfbench package-story
Best environments: README, docs site, teaching, conference tutorial
Typical outputs
- benchmark cards
- tutorial reading order
- release-facing markdown
Agent-native research workflows¶
Question it answers: How do I use coding agents without sending the whole repo and the whole dataset every turn?
Why this exists: Agent-friendly workflows need specs, bundles, handles, and context plans rather than giant free-form prompts.
Primary APIs: AgentResearchTaskSpec, build_panel_agent_bundle, build_context_plan, export_openai_function_tools
Primary CLI commands
python -m tscfbench make-agent-spec
python -m tscfbench build-agent-bundle
python -m tscfbench plan-context
python -m tscfbench export-openai-tools
Best environments: agent IDE, tool-calling runtime, CI
Typical outputs
- agent spec JSON
- bundle manifest
- context plan
- tool schemas
API handbook¶
This page organizes the package by jobs rather than by source files. Each section explains why that API layer exists, when to use it, and what it returns.
Core data model¶
Layer: foundation
Entry points: ImpactCase, PanelCase, PredictionResult
Why this API exists
This layer exists because most counterfactual libraries encode data differently. tscfbench needs a small, stable schema that lets the same benchmark protocol work with built-in models, external adapters, and custom user data.
When to use it
Use these classes whenever you want to bring your own dataset into the package or write a new model adapter.
What it returns
- Validated case objects with explicit intervention boundaries.
- PredictionResult objects with counterfactual path, effect path, and optional intervals.
Works well in: notebook, python script, library integration, teaching
Notes
- ImpactCase is for one treated series plus controls/covariates.
- PanelCase is for one treated unit in a long-format panel.
- PredictionResult is the common output contract for all model wrappers.
Single-case benchmarking¶
Layer: benchmark protocol
Entry points: benchmark, benchmark_panel, PanelProtocolConfig
Why this API exists
Researchers usually need more than raw predictions: they need comparable metrics and, in panel studies, placebo-based diagnostics. This layer turns one model + one case into a protocol-aware result object.
When to use it
Use this when you have one case and one model and want an interpretable benchmark result quickly.
What it returns
- Point metrics such as RMSE, MAE, R², and cumulative-effect error for synthetic tasks.
- Panel diagnostics such as pre/post RMSPE style summaries and placebo tables.
Works well in: notebook, python script, quick experiment, teaching
CLI counterparts
python -m tscfbench demo
python -m tscfbench make-panel-spec
python -m tscfbench run-panel-spec
Notes
- benchmark() is the generic entry point for cases with ground-truth counterfactuals.
- benchmark_panel() adds panel-specific placebo logic and reporting metadata.
Experiment specs and reproducibility¶
Layer: experiment definition
Entry points: PanelExperimentSpec, ImpactExperimentSpec, run_panel_experiment
Why this API exists
Once a benchmark leaves a notebook, ad hoc parameter passing becomes fragile. The spec layer exists so experiments can be serialized, versioned, diffed, and rerun by humans, CI jobs, or agents.
When to use it
Use this layer when you want JSON-first reproducibility or when you want CLI and Python workflows to mirror each other.
What it returns
- Serializable experiment specifications.
- Protocol outputs that can be rendered into Markdown or packed into bundles.
Works well in: CLI, git-based collaboration, CI, agent workflows
CLI counterparts
python -m tscfbench make-panel-spec
python -m tscfbench run-panel-spec
python -m tscfbench render-panel-report
Notes
- This is the best entry point for people who want reproducible experiments without writing lots of orchestration code.
Canonical benchmark studies¶
Layer: research benchmarks
Entry points: list_canonical_studies, CanonicalBenchmarkSpec, run_canonical_benchmark, render_canonical_markdown
Why this API exists
A benchmark package becomes easier to trust and teach when it offers a small set of recognizable studies. This layer is the package's public face for empirical panel counterfactual benchmarking.
When to use it
Use this layer when you want a standard study battery rather than a single custom case.
What it returns
- A study catalog with Germany, Prop99, and Basque metadata.
- Cross-study benchmark runs and a shareable Markdown report.
Works well in: paper companion, tutorials, teaching, benchmark release
CLI counterparts
python -m tscfbench list-canonical-studies
python -m tscfbench make-canonical-spec
python -m tscfbench run-canonical
python -m tscfbench render-canonical-report
Notes
- Use snapshot mode for reproducible tutorials and CI.
- Use auto/remote mode when you want fuller study data in normal research runs.
Model discovery and ecosystem planning¶
Layer: ecosystem navigation
Entry points: install_matrix, adapter_catalog, recommend_adapter_stack, list_model_ids
Why this API exists
Researchers rarely know up front which package stack is easiest to install, easiest to explain, or most suitable for a given task family. This layer exists to make that choice explicit rather than tribal knowledge.
When to use it
Use this layer before you commit to a benchmark stack or when you need to explain optional dependencies to users.
What it returns
- Structured install metadata and import/package names.
- Adapter cards that describe strengths, caveats, and runtime characteristics.
- Recommendations for a small, research-oriented starting stack.
Works well in: package maintenance, onboarding, teaching, agent planning
CLI counterparts
python -m tscfbench install-matrix
python -m tscfbench list-adapters
python -m tscfbench recommend-stack
python -m tscfbench list-model-ids
Notes
- This layer is especially useful when your audience is global and needs a clearer install story.
Sweep studies and comparison grids¶
Layer: multi-run orchestration
Entry points: SweepMatrixSpec, make_default_sweep_spec, run_sweep, render_sweep_markdown
Why this API exists
Researchers often compare several model/dataset combinations at once. The sweep layer exists so those comparisons are explicit, machine-readable, and robust to partial adapter failures.
When to use it
Use this layer when you are comparing multiple models, datasets, or backends in a single benchmark run.
What it returns
- Per-cell results with success/error status.
- Comparison tables and study-level summaries.
Works well in: benchmarking, CI, method comparison, release validation
CLI counterparts
python -m tscfbench make-sweep-spec
python -m tscfbench run-sweep
python -m tscfbench render-sweep-report
Notes
- External-package failures are recorded as cell-level errors rather than crashing the full sweep by default.
Agent-native workflow layer¶
Layer: automation
Entry points: AgentResearchTaskSpec, build_panel_agent_bundle, build_context_plan, export_openai_function_tools, TSCFBenchMCPServer
Why this API exists
Agent workflows need smaller, more structured artifacts than notebook-centric research code. This layer exists to turn benchmark runs into token-bounded specs, manifests, digests, and tool surfaces.
When to use it
Use this layer when a coding agent or tool-calling runtime participates in your research workflow.
What it returns
- Compact JSON specs and bundles.
- Repo maps, context plans, and manifest-based artifact access.
- Function-tool and MCP surfaces so the package can explain itself to agents.
Works well in: Cursor/Codex/ChatGPT, tool-calling backends, CI automation, multi-step research assistants
CLI counterparts
python -m tscfbench make-agent-spec
python -m tscfbench build-agent-bundle
python -m tscfbench plan-context
python -m tscfbench export-openai-tools
python -m tscfbench mcp-server
Notes
- This layer matters when you want lower token usage, smaller context windows, and resumable research tasks.
Reports, teaching surfaces, and project communication¶
Layer: dissemination
Entry points: render_panel_markdown, render_sweep_markdown, render_canonical_markdown
Why this API exists
A benchmark package spreads only if the outputs are understandable outside the codebase. This layer exists so results can become readable artifacts for papers, tutorials, internal memos, and classrooms.
When to use it
Use this layer whenever you need a human-readable output rather than raw Python objects.
What it returns
- Markdown reports that summarize configuration, metrics, and comparison tables.
- A cleaner handoff from computation to writing or teaching.
Works well in: paper writing, teaching, project website, release notes
CLI counterparts
python -m tscfbench render-panel-report
python -m tscfbench render-sweep-report
python -m tscfbench render-canonical-report
Notes
- These renderers are intentionally simple so they are easy to diff and easy to post-process.