Irregular earthquake streams can still be compared
A real irregular-data example using 2024 USGS earthquakes from California and Alaska, with explicit timestamps kept in the comparison instead of being flattened away.
California earthquakes vs Alaska earthquakes: Pearson r 0.12, Spearman rho 0.08, Kendall tau 0.05. The weakest agreement appears in spectral similarity and derivative similarity, so timing or regime differences probably matter.
This is the minimal script a human user would actually run: load data, call EchoTime, inspect the returned object.
from pathlib import Path
import pandas as pd
from echotime import compare_series, profile_dataset
data_path = Path(__file__).resolve().parents[1] / "data" / "real_usgs_earthquakes_ca_ak_2024.csv"
df = pd.read_csv(data_path)
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True, format="mixed").astype("int64") / 1_000_000_000
df["event_type"] = df["magnitude"].map(lambda value: "M4+" if value >= 4.0 else "M2.5-4")
california = df.loc[df["region"] == "California"].sort_values("timestamp")
alaska = df.loc[df["region"] == "Alaska"].sort_values("timestamp")
report = compare_series(
california["magnitude"],
alaska["magnitude"],
left_timestamps=california["timestamp"],
right_timestamps=alaska["timestamp"],
left_name="California earthquakes",
right_name="Alaska earthquakes",
)
profile = profile_dataset(
df.rename(columns={"region": "subject", "magnitude": "value"})[["timestamp", "value", "subject", "event_type"]],
domain="earth_science",
)
print(report.to_summary_card_markdown())
print(profile.to_summary_card_markdown())
You should get a usable similarity verdict plus an event-stream profile explaining why irregularity and burstiness matter.
# EchoTime similarity summary
**Compared:** California earthquakes vs Alaska earthquakes
## Headline
California earthquakes vs Alaska earthquakes: Pearson r 0.12, Spearman rho 0.08, Kendall tau 0.05. The weakest agreement appears in spectral similarity and derivative similarity, so timing or regime differences probably matter.
## Familiar statistics
| metric | value |
|---|---:|
| Pearson r | 0.121 |
| Spearman rho | 0.082 |
| Kendall tau | 0.050 |
| Best-lag Pearson r | 0.132 |
| Mutual info | 0.068 |
| First-difference r | 0.133 |
## Time-series-specific metrics
| plain-language label | score |
|---|---:|
| dtw similarity | 0.471 |
| spectral similarity | 0.349 |
| derivative similarity | 0.133 |
| shape similarity | 0.127 |
## Recommended next actions
- Plot both series after z-score normalization to show the shared shape without scale differences.
- Run rolling or windowed similarity if you expect the relationship to change over time.
- Use structural-profile similarity when scales, frequencies, or observation modes differ too much for raw-shape comparison.
- Pearson 0.12 / Spearman 0.08 / Mutual info 0.07 is only part of the story; the event-timestamp handling matters just as much here.
- The comparison respects event timing instead of forcing both regions onto a regular daily grid first.
- The dataset profile adds event-stream context around burstiness, irregularity, and heterogeneity.
This is the right pattern when your own data live in a long table with real timestamps.
- Rename long-table columns to subject, timestamp, channel, and value before calling profile_dataset(df, domain='generic').
- If you want to compare two specific trajectories directly, pass both the values and their timestamps into compare_series(...).
- Do not regularize away the gaps before the first pass; EchoTime is designed to read them.
The overlay is shown in event order, but the actual comparison also uses the real timestamps from the USGS feed.
The radar keeps the irregular-event comparison interpretable without pretending one scalar is enough.
The profile makes burstiness, irregularity, and heterogeneity visible before you start modelling.