API handbook¶
This page organizes the package by jobs rather than by source files. Each section explains why that API layer exists, when to use it, and what it returns.
Core data model¶
Layer: foundation
Entry points: ImpactCase, PanelCase, PredictionResult
Why this API exists
This layer exists because most counterfactual libraries encode data differently. tscfbench needs a small, stable schema that lets the same benchmark protocol work with built-in models, external adapters, and custom user data.
When to use it
Use these classes whenever you want to bring your own dataset into the package or write a new model adapter.
What it returns
- Validated case objects with explicit intervention boundaries.
- PredictionResult objects with counterfactual path, effect path, and optional intervals.
Works well in: notebook, python script, library integration, teaching
Notes
- ImpactCase is for one treated series plus controls/covariates.
- PanelCase is for one treated unit in a long-format panel.
- PredictionResult is the common output contract for all model wrappers.
Single-case benchmarking¶
Layer: benchmark protocol
Entry points: benchmark, benchmark_panel, PanelProtocolConfig
Why this API exists
Researchers usually need more than raw predictions: they need comparable metrics and, in panel studies, placebo-based diagnostics. This layer turns one model + one case into a protocol-aware result object.
When to use it
Use this when you have one case and one model and want an interpretable benchmark result quickly.
What it returns
- Point metrics such as RMSE, MAE, R², and cumulative-effect error for synthetic tasks.
- Panel diagnostics such as pre/post RMSPE style summaries and placebo tables.
Works well in: notebook, python script, quick experiment, teaching
CLI counterparts
python -m tscfbench demo
python -m tscfbench make-panel-spec
python -m tscfbench run-panel-spec
Notes
- benchmark() is the generic entry point for cases with ground-truth counterfactuals.
- benchmark_panel() adds panel-specific placebo logic and reporting metadata.
Experiment specs and reproducibility¶
Layer: experiment definition
Entry points: PanelExperimentSpec, ImpactExperimentSpec, run_panel_experiment
Why this API exists
Once a benchmark leaves a notebook, ad hoc parameter passing becomes fragile. The spec layer exists so experiments can be serialized, versioned, diffed, and rerun by humans, CI jobs, or agents.
When to use it
Use this layer when you want JSON-first reproducibility or when you want CLI and Python workflows to mirror each other.
What it returns
- Serializable experiment specifications.
- Protocol outputs that can be rendered into Markdown or packed into bundles.
Works well in: CLI, git-based collaboration, CI, agent workflows
CLI counterparts
python -m tscfbench make-panel-spec
python -m tscfbench run-panel-spec
python -m tscfbench render-panel-report
Notes
- This is the best entry point for people who want reproducible experiments without writing lots of orchestration code.
Canonical benchmark studies¶
Layer: research benchmarks
Entry points: list_canonical_studies, CanonicalBenchmarkSpec, run_canonical_benchmark, render_canonical_markdown
Why this API exists
A benchmark package becomes easier to trust and teach when it offers a small set of recognizable studies. This layer is the package's public face for empirical panel counterfactual benchmarking.
When to use it
Use this layer when you want a standard study battery rather than a single custom case.
What it returns
- A study catalog with Germany, Prop99, and Basque metadata.
- Cross-study benchmark runs and a shareable Markdown report.
Works well in: paper companion, tutorials, teaching, benchmark release
CLI counterparts
python -m tscfbench list-canonical-studies
python -m tscfbench make-canonical-spec
python -m tscfbench run-canonical
python -m tscfbench render-canonical-report
Notes
- Use snapshot mode for reproducible tutorials and CI.
- Use auto/remote mode when you want fuller study data in normal research runs.
Model discovery and ecosystem planning¶
Layer: ecosystem navigation
Entry points: install_matrix, adapter_catalog, recommend_adapter_stack, list_model_ids
Why this API exists
Researchers rarely know up front which package stack is easiest to install, easiest to explain, or most suitable for a given task family. This layer exists to make that choice explicit rather than tribal knowledge.
When to use it
Use this layer before you commit to a benchmark stack or when you need to explain optional dependencies to users.
What it returns
- Structured install metadata and import/package names.
- Adapter cards that describe strengths, caveats, and runtime characteristics.
- Recommendations for a small, research-oriented starting stack.
Works well in: package maintenance, onboarding, teaching, agent planning
CLI counterparts
python -m tscfbench install-matrix
python -m tscfbench list-adapters
python -m tscfbench recommend-stack
python -m tscfbench list-model-ids
Notes
- This layer is especially useful when your audience is global and needs a clearer install story.
Sweep studies and comparison grids¶
Layer: multi-run orchestration
Entry points: SweepMatrixSpec, make_default_sweep_spec, run_sweep, render_sweep_markdown
Why this API exists
Researchers often compare several model/dataset combinations at once. The sweep layer exists so those comparisons are explicit, machine-readable, and robust to partial adapter failures.
When to use it
Use this layer when you are comparing multiple models, datasets, or backends in a single benchmark run.
What it returns
- Per-cell results with success/error status.
- Comparison tables and study-level summaries.
Works well in: benchmarking, CI, method comparison, release validation
CLI counterparts
python -m tscfbench make-sweep-spec
python -m tscfbench run-sweep
python -m tscfbench render-sweep-report
Notes
- External-package failures are recorded as cell-level errors rather than crashing the full sweep by default.
Agent-native workflow layer¶
Layer: automation
Entry points: AgentResearchTaskSpec, build_panel_agent_bundle, build_context_plan, export_openai_function_tools, TSCFBenchMCPServer
Why this API exists
Agent workflows need smaller, more structured artifacts than notebook-centric research code. This layer exists to turn benchmark runs into token-bounded specs, manifests, digests, and tool surfaces.
When to use it
Use this layer when a coding agent or tool-calling runtime participates in your research workflow.
What it returns
- Compact JSON specs and bundles.
- Repo maps, context plans, and manifest-based artifact access.
- Function-tool and MCP surfaces so the package can explain itself to agents.
Works well in: Cursor/Codex/ChatGPT, tool-calling backends, CI automation, multi-step research assistants
CLI counterparts
python -m tscfbench make-agent-spec
python -m tscfbench build-agent-bundle
python -m tscfbench plan-context
python -m tscfbench export-openai-tools
python -m tscfbench mcp-server
Notes
- This layer matters when you want lower token usage, smaller context windows, and resumable research tasks.
Reports, teaching surfaces, and project communication¶
Layer: dissemination
Entry points: render_panel_markdown, render_sweep_markdown, render_canonical_markdown
Why this API exists
A benchmark package spreads only if the outputs are understandable outside the codebase. This layer exists so results can become readable artifacts for papers, tutorials, internal memos, and classrooms.
When to use it
Use this layer whenever you need a human-readable output rather than raw Python objects.
What it returns
- Markdown reports that summarize configuration, metrics, and comparison tables.
- A cleaner handoff from computation to writing or teaching.
Works well in: paper writing, teaching, project website, release notes
CLI counterparts
python -m tscfbench render-panel-report
python -m tscfbench render-sweep-report
python -m tscfbench render-canonical-report
Notes
- These renderers are intentionally simple so they are easy to diff and easy to post-process.