Skip to content

Bring your own data

If you already have a CSV, start here instead of the demo gallery.

tscfbench now has two direct Python entry points for real data:

  • run_panel_data: one treated unit plus comparison units over time
  • run_impact_data: one treated series plus one or more control series

CLI wrappers still exist as run-csv-panel and run-csv-impact, but the docs lead with Python because this page is for users, not agents.

1. Panel data: one treated unit plus donor pool

Use this when you have many units over time and exactly one treated unit.

Expected CSV shape:

city,date,traffic_index
Harbor City,2024-03-01,101.2
Harbor City,2024-03-02,100.7
North City,2024-03-01,98.4
North City,2024-03-02,98.9
...

Required columns:

  • one unit column such as city or region
  • one time column such as date or year
  • one outcome column such as traffic_index or employment_index

Run it in Python:

import pandas as pd
from tscfbench import run_panel_data

df = pd.read_csv("my_panel.csv")
result = run_panel_data(
    df,
    unit_col="city",
    time_col="date",
    y_col="traffic_index",
    treated_unit="Harbor City",
    intervention_t="2024-03-06",
    output_dir="my_panel_run",
)

result["summary"]

CLI equivalent:

python -m tscfbench run-csv-panel my_panel.csv --unit-col city --time-col date --y-col traffic_index --treated-unit "Harbor City" --intervention-t 2024-03-06 --output my_panel_run

That writes:

  • panel_prediction_frame.csv
  • panel_metrics.json
  • panel_report.md
  • treated-vs-counterfactual charts
  • point-effect and cumulative-impact charts

2. Impact data: one treated series plus controls

Use this when you have one main outcome series and one or more control series in the same table.

Expected CSV shape:

date,signups,peer_signups,search_interest
2024-04-01,120,116,54
2024-04-02,123,117,53
2024-04-03,121,115,55
...

Required columns:

  • one time column
  • one outcome column
  • one or more control columns

Run it in Python:

import pandas as pd
from tscfbench import run_impact_data

df = pd.read_csv("my_impact.csv")
result = run_impact_data(
    df,
    time_col="date",
    y_col="signups",
    x_cols=["peer_signups", "search_interest"],
    intervention_t="2024-04-23",
    output_dir="my_impact_run",
)

result["summary"]

CLI equivalent:

python -m tscfbench run-csv-impact my_impact.csv --time-col date --y-col signups --x-cols peer_signups search_interest --intervention-t 2024-04-23 --output my_impact_run

That writes:

  • impact_prediction_frame.csv
  • impact_metrics.json
  • impact_report.md
  • treated-vs-counterfactual charts
  • point-effect and cumulative-impact charts

3. How to choose between the two

  • Use run_panel_data when your data is naturally unit x time
  • Use run_impact_data when your data is one treated series with control columns already aligned by time

4. Time column and intervention format

  • date columns can be normal date strings such as 2024-07-14
  • integer-like time columns such as year also work
  • intervention_t should match one value in your time column

5. If you still prefer CLI

Use run-csv-panel or run-csv-impact when you want copy-paste terminal commands, CI jobs, or shell scripts. They are thin wrappers around the same workflow.