Quickstart Guide¶

Get started with grid_data_retrieval in 5 minutes.

Prerequisites¶

Before you begin, ensure you have:

Python ≥ 3.11
The OSME repository cloned
osme_common package installed (shared utilities)

Installation¶

1. Install the Module¶

From the repository root:

# Install in editable mode
pip install -e packages/grid_data_retrieval

This registers two CLI commands: - osme-grid (recommended) - gdr (short alias)

2. Verify Installation¶

osme-grid --help

You should see usage information and available options.

First Run: Fetch 2 Months of Data¶

Let's fetch grid data for November-December 2018.

Using CLI Arguments¶

osme-grid \
  --start-date "2018-11-01 00:00:00" \
  --end-date "2018-12-31 23:55:00" \
  --verbose

Using a Configuration File¶

Create configs/grid/quickstart.json:

{
  "start_date": "2018-11-01 00:00:00",
  "end_date": "2018-12-31 23:55:00"
}

Run:

osme-grid --config configs/grid/quickstart.json --verbose

Expected Output¶

============================================================
Starting Grid Data Retrieval at 2025-02-10T14:30:00
============================================================

============================================================
Fetching Monthly Batches from API
============================================================
Fetching monthly batches from 2018-11-01 00:00:00 → 2018-12-31 23:55:00
API URL: https://32u36xakx6.execute-api.us-east-2.amazonaws.com/v4/get-merit-data
Output directory: /path/to/data/grid_data/raw/monthly

Fetching monthly data: 100%|████████████████| 2/2 [00:15<00:00,  7.5s/it]
Saved: carbontracker_grid-data_2018_11.parquet
Saved: carbontracker_grid-data_2018_12.parquet
Fetched 2 monthly file(s).

============================================================
Combining Monthly Files
============================================================
Found 2 monthly file(s) to combine.
Date range: 2018-11-01 00:00:00 → 2018-12-31 23:55:00
Writing combined file: carbontracker_grid-data_2018-11_2018-12.parquet
Combined file created successfully.

============================================================
Retrieval completed successfully!
Raw data saved to: /path/to/data/grid_data/raw/carbontracker_grid-data_2018-11_2018-12.parquet
============================================================

Next steps:
  - Use data_cleaning_and_joining module for processing
  - Apply gap-filling, resampling, timezone conversion as needed

Check Your Output¶

File Structure¶

data/grid_data/raw/
├── monthly/
│   ├── carbontracker_grid-data_2018_11.parquet
│   └── carbontracker_grid-data_2018_12.parquet
└── carbontracker_grid-data_2018-11_2018-12.parquet  # Combined

logs/grid_data_retrieval/
└── grid_retrieval_20250210_143000.log

Inspect Data¶

Using Python:

import polars as pl

# Load combined file
df = pl.read_parquet("data/grid_data/raw/carbontracker_grid-data_2018-11_2018-12.parquet")

print(f"Shape: {df.shape}")
print(f"Columns: {df.columns}")
print(f"\nFirst 5 rows:")
print(df.head())

# Summary stats
print(f"\nSummary:")
print(df.describe())

Expected Variables¶

Your dataset should include:

timestamp - UTC datetime (5-min intervals)
thermal_generation - MW
gas_generation - MW
hydro_generation - MW
nuclear_generation - MW
renewable_generation - MW
total_generation - MW
demand_met - MW
net_demand - MW
g_co2_per_kwh - g CO₂/kWh
tons_co2_per_mwh - tons CO₂/MWh
tons_co2 - tons CO₂

Common Use Cases¶

Fetch Full Year¶

osme-grid \
  --start-date "2020-01-01 00:00:00" \
  --end-date "2020-12-31 23:55:00" \
  --verbose

Re-Download Existing Data¶

By default, existing months are skipped. To re-download:

osme-grid \
  --config configs/grid/my_config.json \
  --overwrite-existing

Keep Monthly Files Separate¶

If you don't want a combined file:

osme-grid \
  --start-date "..." \
  --end-date "..." \
  --no-combine

Custom Output Directory¶

osme-grid \
  --config configs/grid/my_config.json \
  --output-dir /path/to/custom/directory

Configuration File Examples¶

Minimal Config¶

{
  "start_date": "2020-01-01 00:00:00",
  "end_date": "2020-12-31 23:55:00"
}

Full Config¶

{
  "start_date": "2020-01-01 00:00:00",
  "end_date": "2020-12-31 23:55:00",
  "api_url": "https://32u36xakx6.execute-api.us-east-2.amazonaws.com/v4/get-merit-data",
  "output_dir": null,
  "overwrite_existing": false,
  "combine_files": true
}

Multi-Year Batch¶

For large downloads, create separate configs per year:

configs/grid/india_2019.json:

{
  "start_date": "2019-01-01 00:00:00",
  "end_date": "2019-12-31 23:55:00"
}

configs/grid/india_2020.json:

{
  "start_date": "2020-01-01 00:00:00",
  "end_date": "2020-12-31 23:55:00"
}

Run sequentially or in parallel:

# Sequential
osme-grid --config configs/grid/india_2019.json
osme-grid --config configs/grid/india_2020.json

# Parallel (in separate terminals)
osme-grid --config configs/grid/india_2019.json --output-dir data/grid_data/2019 &
osme-grid --config configs/grid/india_2020.json --output-dir data/grid_data/2020 &

Next Steps¶

1. Process Your Data¶

Raw grid data is at 5-minute intervals in UTC. You'll likely want to:

Resample to 30-minute intervals (to match weather data)
Convert timezone to local time (e.g., Asia/Kolkata for India)
Check for gaps and fill if necessary

These operations are handled by the data_cleaning_and_joining module:

# Resample (example - not yet implemented)
python -m data_cleaning_and_joining.grid.resample \
  --input data/grid_data/raw/carbontracker_grid-data_2020-01_2020-12.parquet \
  --output data/grid_data/processed/grid_30min.parquet \
  --frequency "30min"

# Convert timezone (example - not yet implemented)
python -m data_cleaning_and_joining.grid.set_timezone \
  --input data/grid_data/processed/grid_30min.parquet \
  --output data/grid_data/processed/grid_30min_ist.parquet \
  --timezone "Asia/Kolkata"

2. Fetch Weather Data¶

To train MEF models, you'll also need weather data:

osme-weather --config configs/weather/india_era5_2020.json

See the weather_data_retrieval documentation for details.

3. Join Datasets¶

Once you have both grid and weather data processed to the same temporal resolution and timezone, join them:

import polars as pl

grid = pl.read_parquet("data/grid_data/processed/grid_30min_ist.parquet")
weather = pl.read_parquet("data/weather_data/processed/weather_30min_ist.parquet")

# Join on timestamp
joined = grid.join(weather, on="timestamp", how="inner")

joined.write_parquet("data/processed/grid_weather_joined.parquet")

4. Build MEF Models¶

Use marginal_emissions_modelling to train and evaluate models.

Troubleshooting¶

Command not found¶

# Make sure package is installed
pip install -e packages/grid_data_retrieval

# Verify entry points
pip show grid-data-retrieval | grep "Entry points" -A 5

Rate limiting errors¶

The API has a 5-second delay between requests. If you see rate limit errors, increase the delay in sources/carbontracker.py:

API_DELAY_SECONDS = 10  # Change from 5 to 10

Path resolution issues¶

Check your data directory:

python -c "from osme_common.paths import data_dir; print(data_dir())"

If it's not what you expect, set the environment variable:

export OSME_DATA_DIR="/path/to/your/data"

Network errors¶

If downloads are interrupted:

Re-run the same command - existing months will be skipped automatically
Check your internet connection
Verify the API is accessible: curl https://carbontracker.in/

CLI Options Reference¶

Quick reference for all CLI options:

Option	Type	Default	Description
`--config`	path	-	JSON config file
`--start-date`	datetime	2018-11-21 00:00:00	Start date
`--end-date`	datetime	2019-01-31 23:55:00	End date
`--api-url`	URL	CarbonTracker	API endpoint
`--output-dir`	path	data/grid_data/raw/	Output directory
`--overwrite-existing`	flag	false	Re-download existing
`--no-combine`	flag	false	Don't combine files
`--verbose`	flag	false	Console output
`--quiet`	flag	false	Suppress console

Python API¶

If you prefer to script your data retrieval:

from grid_data_retrieval.runner import run_grid_retrieval

config = {
    "start_date": "2020-01-01 00:00:00",
    "end_date": "2020-12-31 23:55:00",
    "overwrite_existing": False,
    "combine_files": True,
}

exit_code = run_grid_retrieval(config, verbose=True)

if exit_code == 0:
    print("Success!")
else:
    print("Failed!")

What's Next?¶

✅ You've successfully retrieved grid data
✅ You understand the output structure
✅ You know how to customize the retrieval

Next: - Read the Codebase Reference to understand the module architecture - Check the main README for detailed API documentation - Explore the OSME documentation for the full MEF workflow

Need help? Open an issue on GitHub or email daniel.kaupa@outlook.com