Codebase Reference¶
This document provides an overview of the main components of the weather_data_retrieval package, detailing the primary modules and their functionalities.
weather_data_retrieval.runner ¶
run ¶
run(
config,
run_mode="interactive",
verbose=True,
logger=None,
)
Unified orchestration entry point for both interactive and automatic runs. Handles validation, logging, estimation, and download orchestration.
Returns: 0=success, 1=fatal error, 2=some downloads failed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
dict
|
Configuration dictionary with all required parameters. |
required |
run_mode
|
str
|
Run mode, either 'interactive' or 'automatic', by default "interactive". |
'interactive'
|
logger
|
Logger
|
Pre-configured logger instance, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
int
|
Exit code: 0=success, 1=fatal error, 2=some downloads failed. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/runner.py
| |
run_batch_from_config ¶
run_batch_from_config(cfg_path, logger=None)
Run automatic batch from a config file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
dict
|
Configuration dictionary with all required parameters. |
required |
logger
|
Logger
|
Pre-configured logger instance, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
int
|
Exit code: 0=success, 1=fatal error, 2=some downloads failed. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/runner.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
weather_data_retrieval.io.cli ¶
parse_args ¶
parse_args()
Parse command-line arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
None
|
|
required |
Returns:
| Type | Description |
|---|---|
Namespace
|
Parsed arguments. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/cli.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
run_prompt_wizard ¶
run_prompt_wizard(session, logger=None)
Drives the interactive prompt flow (no config-source step). Returns True if all fields completed; False if user exits.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
The session state to populate. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
True if completed; False if exited early. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/cli.py
| |
weather_data_retrieval.io.config_loader ¶
load_and_validate_config ¶
load_and_validate_config(
path, *, logger=None, run_mode="automatic"
)
Load JSON config and validate it using the centralized validator. This lets the validator log coercions/warnings (e.g., case_by_case → skip_all).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to JSON config file. |
required |
logger
|
Logger
|
Logger instance for validation messages, by default None. |
None
|
run_mode
|
str
|
Run mode, either 'interactive' or 'automatic', by default "automatic". |
'automatic'
|
Returns:
| Type | Description |
|---|---|
dict
|
Validated configuration dictionary. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/config_loader.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
load_config ¶
load_config(file_path)
Load configuration from a JSON requirements file (without validation).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Path to JSON config file. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Configuration dictionary. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/config_loader.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
weather_data_retrieval.io.prompts ¶
read_input ¶
read_input(prompt, *, logger=None)
Centralized input handler with built-in 'exit' and 'back' controls.
Parameters:
prompt : str The prompt to display to the user. logger : logging.Logger, optional Logger to log the prompt message.
Returns:
str The user input, or special command indicators.
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
say ¶
say(text, *, logger=None)
Centralized output handler to log and print messages.
Parameters:
text : str The message to display. logger : logging.Logger, optional Logger to log the message.
Returns:
None
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |
prompt_data_provider ¶
prompt_data_provider(session, *, logger=None)
Prompt user for which data provider to use (CDS or Open-Meteo).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store selected data provider. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Normalized provider name ("cds" or "open-meteo"), or special control token "BACK" or "EXIT". |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
prompt_dataset_short_name ¶
prompt_dataset_short_name(
session, provider, *, logger=None
)
Prompt for dataset choice.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store selected dataset. |
required |
provider
|
str
|
Data provider name. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
str: Normalized dataset name or 'exit' / 'back'.
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 | |
prompt_cds_url ¶
prompt_cds_url(
session,
api_url_default="https://cds.climate.copernicus.eu/api",
*,
logger=None
)
Prompt for CDS API URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store API URL. |
required |
api_url_default
|
str
|
Default CDS API URL. https://cds.climate.copernicus.eu/api |
'https://cds.climate.copernicus.eu/api'
|
Returns:
| Type | Description |
|---|---|
str: CDS API URL or 'exit' / 'back'.
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 | |
prompt_cds_api_key ¶
prompt_cds_api_key(session, *, logger=None)
Prompt only for the CDS API key (hidden input).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store API key. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
CDS API key or 'exit' / 'back'. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 | |
prompt_save_directory ¶
prompt_save_directory(session, default_dir, *, logger=None)
Ask for save directory, create if necessary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store save directory. |
required |
default_dir
|
Path
|
Default directory to suggest. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
Path | str
|
Path to save directory, or control token "BACK" / "EXIT". |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 | |
prompt_date_range ¶
prompt_date_range(session, *, logger=None)
Ask user for start and end date, with validation. Accepts formats: YYYY-MM-DD or YYYY-MM - Start dates without day default to first day of month (YYYY-MM-01) - End dates without day default to last day of month (YYYY-MM-[last day])
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store date range. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[str, str]
|
(start_date_str, end_date_str) in ISO format (YYYY-MM-DD), or ("EXIT", "EXIT") / ("BACK", "BACK") |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 | |
prompt_coordinates ¶
prompt_coordinates(session, *, logger=None)
Prompt user for geographic boundaries (N, S, W, E) with validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store geographic boundaries. |
required |
Returns:
| Type | Description |
|---|---|
list[float]
|
[north, west, south, east] boundaries or special tokens "EXIT" / "BACK". |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 | |
prompt_variables ¶
prompt_variables(
session,
variable_restrictions_list,
*args,
restriction_allow=False,
logger=None
)
Ask for variables to download, validate each against allowed/disallowed list, and only update session if the full set is valid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store selected variables. |
required |
variable_restrictions_list
|
list[str]
|
List of variables that are either allowed or disallowed. |
required |
restriction_allow
|
bool
|
If True, variable_restrictions_list is an allowlist (i.e. in). If False, it's a denylist (i.e. not in) |
False
|
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
list[str] | str
|
List of selected variable names, or control token "BACK" / "EXIT". |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 | |
prompt_skip_overwrite_files ¶
prompt_skip_overwrite_files(session, *, logger=None)
Prompt user to choose skip/overwrite/case-by-case for existing files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Session state to store user choice. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
One of "overwrite_all", "skip_all", "case_by_case" |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 | |
prompt_parallelisation_settings ¶
prompt_parallelisation_settings(session, *, logger=None)
Ask user about parallel downloads and concurrency cap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store parallelisation settings. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
dict | str
|
Dictionary with parallelisation settings, or control token "BACK" / "EXIT". |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 | |
prompt_retry_settings ¶
prompt_retry_settings(
session,
default_retries=6,
default_delay=15,
*,
logger=None
)
Ask user for retry limits.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state to store retry settings. |
required |
default_retries
|
int
|
Default number of retry attempts (default = 6). |
6
|
default_delay
|
int
|
Default delay (in seconds) between retries (default = 15). |
15
|
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
dict | str
|
Dictionary with 'max_retries' and 'retry_delay_sec', or control token "BACK" / "EXIT". |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 | |
prompt_continue_confirmation ¶
prompt_continue_confirmation(session, *, logger=None)
Display a formatted download summary and confirm before starting downloads.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
session state to summarise. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
Returns:
| Type | Description |
|---|---|
bool | str
|
True if user confirms, False if user declines, or control token "BACK" / "EXIT". |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/io/prompts.py
734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 | |
weather_data_retrieval.sources.cds_era5 ¶
prepare_cds_download ¶
prepare_cds_download(
session,
filename_base,
year,
month,
*,
logger,
echo_console,
allow_prompts,
dataset_config_mapping=CDS_DATASETS
)
Check if a monthly ERA5 file already exists and decide whether to download.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Session containing user configuration. |
required |
filename_base
|
str
|
Base name for the file. |
required |
year
|
int
|
Year of the data to download. |
required |
month
|
int
|
Month of the data to download. |
required |
logger
|
Logger
|
Logger for logging messages. |
required |
echo_console
|
bool
|
Whether to echo prompts to console. |
required |
allow_prompts
|
bool
|
Whether to allow interactive prompts. |
required |
dataset_config_mapping
|
dict
|
Mapping of dataset short names to their configurations. |
CDS_DATASETS
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(download: bool, save_path: str)
|
download: Whether to perform the download. save_path: Full path for the target file. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/sources/cds_era5.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | |
execute_cds_download ¶
execute_cds_download(
session,
save_path,
year,
month,
*,
logger,
echo_console,
dataset_config_mapping=CDS_DATASETS
)
Execute a single ERA5 monthly download with retry logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Session state containing the authenticated CDS API client. |
required |
save_path
|
str
|
Full path to save the downloaded file. |
required |
year
|
int
|
Year of the data to download. |
required |
month
|
int
|
Month of the data to download. |
required |
logger
|
Logger
|
Logger for logging messages. |
required |
echo_console
|
bool
|
Whether to echo prompts to console. |
required |
dataset_config_mapping
|
dict
|
Mapping of dataset short names to their configurations. |
CDS_DATASETS
|
Returns:
| Type | Description |
|---|---|
(year, month, status): tuple
|
status = "success" | "failed" |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/sources/cds_era5.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
download_cds_month ¶
download_cds_month(
session,
filename_base,
year,
month,
*,
logger,
echo_console,
allow_prompts,
successful_downloads,
failed_downloads,
skipped_downloads
)
Orchestrate ERA5 monthly download: handle file checks, then execute download.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Combines
|
|
required |
Returns:
| Type | Description |
|---|---|
(year, month, status): tuple
|
status = "success" | "failed" | "skipped" |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/sources/cds_era5.py
262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 | |
plan_cds_months ¶
plan_cds_months(
session,
filename_base,
*,
logger,
echo_console,
allow_prompts
)
Build the list of months to download and which are being skipped due to existing files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Session containing user configuration. |
required |
filename_base
|
str
|
Base filename (without date or extension). |
required |
logger
|
Logger
|
Logger for logging messages. |
required |
echo_console
|
bool
|
Whether to echo prompts to console. |
required |
allow_prompts
|
bool
|
Whether to allow interactive prompts. |
required |
Returns:
| Type | Description |
|---|---|
(months_to_download, months_skipped)
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/sources/cds_era5.py
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 | |
orchestrate_cds_downloads ¶
orchestrate_cds_downloads(
session,
filename_base,
successful_downloads,
failed_downloads,
skipped_downloads,
*,
logger,
echo_console,
allow_prompts,
dataset_config_mapping=CDS_DATASETS
)
Handle and orchestrate ERA5 monthly downloads, supporting parallel or sequential execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Session containing user configuration and authenticated client. |
required |
successful_downloads
|
list
|
Mutable list to collect (year, month) tuples for successful downloads. |
required |
failed_downloads
|
list
|
Mutable list to collect (year, month) tuples for failed downloads. |
required |
skipped_downloads
|
list
|
Mutable list to collect (year, month) tuples for skipped downloads. |
required |
logger
|
Logger
|
Logger for logging messages. |
required |
echo_console
|
bool
|
Whether to echo prompts to console. |
required |
allow_prompts
|
bool
|
Whether to allow interactive prompts. |
required |
dataset_config_mapping
|
dict
|
Mapping of dataset configurations, by default CDS_DATASETS. |
CDS_DATASETS
|
Returns:
| Type | Description |
|---|---|
None
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/sources/cds_era5.py
411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 | |
weather_data_retrieval.sources.open_meteo ¶
weather_data_retrieval.utils.data_validation ¶
normalize_input ¶
normalize_input(value, category)
Normalize user input to canonical internal value as defined in NORMALIZATION_MAP.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
The user input value to normalize. |
required |
category
|
str
|
The category of normalization (e.g., 'data_provider', 'dataset_short_name') |
required |
Returns:
| Type | Description |
|---|---|
str
|
The normalized value. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | |
format_duration ¶
format_duration(seconds)
Convert seconds to a nice Hh Mm Ss string (with decimal seconds).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seconds
|
float
|
Duration in seconds. |
required |
Returns:
| Type | Description |
|---|---|
str: Formatted duration string.
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | |
format_coordinates_nwse ¶
format_coordinates_nwse(boundaries)
Extracts and formats coordinates as integers in N-W-S-E order Used for compact representation in filenames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
boundaries
|
list
|
List of boundaries in the order [north, west, south, east] |
required |
Returns:
| Type | Description |
|---|---|
str: Formatted string in the format 'N{north}W{west}S{south}E{east}'
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 | |
month_days ¶
month_days(year, month)
Get list of days in a month formatted as two-digit strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
year
|
int
|
Year of interest. |
required |
month
|
int
|
Month of interest. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of days in the month as two-digit strings. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | |
validate_data_provider ¶
validate_data_provider(provider)
Ensure dataprovider is recognized and implemented.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
str
|
Name of the data provider. |
required |
Returns:
| Type | Description |
|---|---|
bool: True if valid, False otherwise.
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | |
validate_dataset_short_name ¶
validate_dataset_short_name(dataset_short_name, provider)
Check dataset compatibility with provider.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_short_name
|
str
|
Dataset short name. |
required |
provider
|
str
|
Name of the data provider. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if valid, False otherwise. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 | |
validate_cds_api_key ¶
validate_cds_api_key(
url, key, *, logger=None, echo_console=False
)
Validate CDS API credentials by attempting to initialize a cdsapi.Client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
CDS API URL. |
required |
key
|
str
|
CDS API key. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
echo_console
|
bool
|
Whether to echo messages to console, by default False. |
False
|
Returns:
| Type | Description |
|---|---|
Client | None
|
Authenticated client if successful, otherwise None. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 | |
validate_directory ¶
validate_directory(path)
Check if path exists or can be created.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Directory path to validate. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if path exists or was created successfully, False otherwise. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 | |
validate_date ¶
validate_date(value, allow_month_only=False)
Validate date format as YYYY-MM-DD or optionally YYYY-MM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
Date string to validate. |
required |
allow_month_only
|
bool
|
If True, also accept YYYY-MM format, by default False. |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
True if valid, False otherwise. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 | |
parse_date_with_defaults ¶
parse_date_with_defaults(
date_str, default_to_month_end=False
)
Parse date string and apply defaults for incomplete dates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date_str
|
str
|
Date string in format YYYY-MM-DD or YYYY-MM. |
required |
default_to_month_end
|
bool
|
If True and date is YYYY-MM format, default to last day of month. If False and date is YYYY-MM format, default to first day of month. By default False. |
False
|
Returns:
| Type | Description |
|---|---|
tuple[datetime, str]
|
Tuple of (parsed datetime object, ISO format string YYYY-MM-DD) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If date string is invalid. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 | |
clamp_era5_available_end_date ¶
clamp_era5_available_end_date(end)
Clamp end date to ERA5 data availability boundary (8 days ago).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
end
|
datetime
|
Desired end date. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
datetime: Clamped end date.
|
|
|
NOTES |
ERA5 data is available up to 8 days prior to the current date.
|
|
8-day lag is used to ensure data availability.
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 | |
validate_coordinates ¶
validate_coordinates(north, west, south, east)
Ensure coordinates are within realistic bounds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
north
|
int | float
|
Northern latitude boundary. |
required |
west
|
int | float
|
Western longitude boundary. |
required |
south
|
int | float
|
Southern latitude boundary. |
required |
east
|
int | float
|
Eastern longitude boundary. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if coordinates are valid, False otherwise. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 | |
validate_variables ¶
validate_variables(
variable_list,
variable_restrictions,
restriction_allow=False,
)
Ensure user-specified variables are available for this dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variable_list
|
list[str]
|
List of variable names to validate. |
required |
variable_restrictions
|
list[str]
|
List of variables that are either allowed or disallowed. |
required |
restriction_allow
|
bool
|
If True, variable_restrictions is an allowlist (i.e. in). If False, it's a denylist (i.e. not in) |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
True if all variables are valid, False otherwise. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 | |
validate_existing_file_action ¶
validate_existing_file_action(
session, *, allow_prompts, logger, echo_console=False
)
Normalize existing_file_action for the current run-mode. - If 'case_by_case' is set but prompts are not allowed (automatic mode), coerce to 'skip_all' and warn.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
Any
|
Current session state. |
required |
allow_prompts
|
bool
|
Whether prompts are allowed (i.e., interactive mode). |
required |
logger
|
Logger
|
Logger for logging messages. |
required |
echo_console
|
bool
|
Whether to echo messages to console. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
Normalized existing_file_action policy. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 | |
validate_config ¶
validate_config(
config,
*,
logger=None,
run_mode="automatic",
echo_console=False,
live_auth_check=False
)
Entry point. Validates common shape then dispatches to provider-specific validator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
dict
|
Configuration dictionary. |
required |
logger
|
Logger
|
Logger for logging messages, by default None. |
None
|
run_mode
|
str
|
Run mode, either 'interactive' or 'automatic', by default "automatic". |
'automatic'
|
echo_console
|
bool
|
Whether to echo messages to console, by default False. |
False
|
live_auth_check
|
bool
|
Whether to perform live authentication checks (e.g., CDS API), by default False. |
False
|
Returns:
| Type | Description |
|---|---|
None
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/data_validation.py
751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 | |
weather_data_retrieval.utils.file_management ¶
generate_filename_hash ¶
generate_filename_hash(
dataset_short_name, variables, boundaries
)
Generate a unique hash for the download parameters that will be used to create the filename.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_short_name
|
str
|
The dataset short name (era5-world etc). |
required |
variables
|
list[str]
|
List of variable names. |
required |
boundaries
|
list[float]
|
List of boundaries [north, west, south, east]. |
required |
Returns:
| Type | Description |
|---|---|
str: A unique hash string representing the download parameters.
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/file_management.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
find_existing_month_file ¶
find_existing_month_file(
save_dir, filename_base, year, month
)
Tolerant matcher that finds an existing file for the given month.
Accepts both _YYYY-MM.ext and _YYYY_MM.ext patterns and any extension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_dir
|
Path
|
Directory where files are saved. |
required |
filename_base
|
str
|
Base filename (without date or extension). |
required |
year
|
int
|
Year of the file. |
required |
month
|
int
|
Month of the file. |
required |
Returns:
| Type | Description |
|---|---|
Optional[Path]
|
Path to the existing file if found, else None. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/file_management.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
estimate_era5_monthly_file_size ¶
estimate_era5_monthly_file_size(
variables,
area,
grid_resolution=0.25,
timestep_hours=1.0,
bytes_per_value=4.0,
)
Estimate ERA5 GRIB file size (MB) for a monthly dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variables
|
list[str]
|
Variables requested (e.g. ['2m_temperature', 'total_precipitation']). |
required |
area
|
list[float]
|
[north, west, south, east] geographic bounds in degrees. |
required |
grid_resolution
|
float
|
Grid spacing in degrees (default 0.25° for ERA5). |
0.25
|
timestep_hours
|
float
|
Temporal resolution in hours (1 = hourly, 3 = 3-hourly, 6 = 6-hourly, etc.). |
1.0
|
bytes_per_value
|
float
|
Bytes per gridpoint per variable (float32 = 4 bytes). |
4.0
|
Returns:
| Type | Description |
|---|---|
float
|
Estimated monthly file size in MB. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/file_management.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | |
estimate_cds_download ¶
estimate_cds_download(
variables,
area,
start_date,
end_date,
observed_speed_mbps,
grid_resolution=0.25,
timestep_hours=1.0,
bytes_per_value=4.0,
overhead_per_request_s=180.0,
overhead_per_var_s=12.0,
)
Estimate per-file and total download size/time for CDS (ERA5) retrievals, using an empirically grounded file size model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variables
|
list[str]
|
Variables selected (e.g. ['2m_temperature', 'total_precipitation']). |
required |
area
|
list[float]
|
[north, west, south, east] bounds in degrees. |
required |
start_date
|
str
|
Date range (YYYY-MM-DD). |
required |
end_date
|
str
|
Date range (YYYY-MM-DD). |
required |
observed_speed_mbps
|
float
|
Measured internet speed in megabits per second (Mbps). |
required |
grid_resolution
|
float
|
Grid resolution in degrees (default 0.25°). |
0.25
|
timestep_hours
|
float
|
Temporal resolution in hours (default 1-hourly). |
1.0
|
bytes_per_value
|
float
|
Bytes per stored value (float32 = 4). |
4.0
|
overhead_per_request_s
|
float
|
Fixed CDS request overhead time (queue/prep). |
180.0
|
overhead_per_var_s
|
float
|
Per-variable overhead for CDS throttling/prep. |
12.0
|
Returns:
| Type | Description |
|---|---|
dict
|
{ "months": int, "file_size_MB": float, "total_size_MB": float, "time_per_file_sec": float, "total_time_sec": float } |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/file_management.py
| |
expected_save_path ¶
expected_save_path(
save_dir, filename_base, year, month, data_format="grib"
)
Construct canonical save path for monthly data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_dir
|
str | Path | None
|
Base directory for saving. If None, defaults to osme_common.paths.data_dir(). |
required |
filename_base
|
str
|
Base name without date or extension. |
required |
year
|
int
|
Year and month of the file. |
required |
month
|
int
|
Year and month of the file. |
required |
data_format
|
str
|
File extension, e.g., 'grib' or 'nc'. |
'grib'
|
Returns:
| Type | Description |
|---|---|
Path
|
Resolved path under the proper data directory. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/file_management.py
314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 | |
weather_data_retrieval.utils.logging ¶
build_download_summary ¶
build_download_summary(session, estimates, speed_mbps)
Construct a formatted summary string of the current download configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
Current session state containing all parameters. |
required |
estimates
|
dict
|
Dictionary containing download size and time estimates. |
required |
speed_mbps
|
float
|
Measured or estimated internet speed in Mbps. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Nicely formatted summary string for display or logging. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/logging.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
setup_logger ¶
setup_logger(
save_dir=None, run_mode="interactive", verbose=False
)
Initialize and return a configured logger.
Logs are written to
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_dir
|
str or None
|
Directory to save log files. If None, defaults to osme_common.paths.log_dir(). |
None
|
run_mode
|
str
|
Either 'interactive' or 'automatic', by default 'interactive'. |
'interactive'
|
verbose
|
bool
|
Whether to echo logs to console in automatic mode, by default False. |
False
|
Returns:
| Type | Description |
|---|---|
Logger
|
Configured logger instance. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/logging.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
log_msg ¶
log_msg(msg, logger, *, level='info', echo_console=False)
Unified logging utility. - Always logs to file. - Echo to console (via tqdm.write) only in interactive mode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
msg
|
str
|
Message to log. |
required |
logger
|
Logger
|
Logger instance. |
required |
level
|
str
|
Logging level: 'info', 'warning', 'error', 'exception', by default "info". |
'info'
|
echo_console
|
bool
|
Whether to also echo to console, by default False. |
False
|
Returns:
| Type | Description |
|---|---|
None
|
|
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/logging.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
create_final_log_file ¶
create_final_log_file(
session,
filename_base,
original_logger,
*,
delete_original=True,
reattach_to_final=True
)
Create a final log file with the same naming pattern as data files. Copies content from the original log file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
Any(SessionState)
|
Current session state. |
required |
filename_base
|
str
|
Base filename pattern (same as data files). |
required |
original_logger
|
Logger
|
The original logger instance. |
required |
delete_original
|
bool
|
Whether to delete the original log file after creating the final one, by default True. |
True
|
reattach_to_final
|
bool
|
Whether to reattach the logger to the final log file, by default True. |
True
|
Returns:
| Type | Description |
|---|---|
str
|
Path to the final log file. |
Source code in packages/weather_data_retrieval/src/weather_data_retrieval/utils/logging.py
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 | |
weather_data_retrieval.utils.session_management ¶
SessionState ¶
first_unfilled_key ¶
first_unfilled_key()
Return the first key in the ordered fields that is not filled.
This enables a simple wizard-like progression and supports
backtracking by clearing fields with unset(key).
to_dict ¶
to_dict(only_filled=False)
Flatten the session into a plain dict suitable for runner.run(...). If only_filled=True, include only keys that have been filled.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
only_filled
|
bool
|
Whether to include only filled keys, by default False. |
False
|
Returns:
| Type | Description |
|---|---|
dict
|
Flattened session dictionary. |
get_cds_dataset_config ¶
get_cds_dataset_config(session, dataset_config_mapping)
Return dataset configuration dictionary based on session short name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
SessionState
|
The current session state containing user selections. |
required |
dataset_config_mapping
|
dict
|
The mapping of dataset short names to their configurations. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
The configuration dictionary for the selected dataset. |
map_config_to_session ¶
map_config_to_session(cfg, session, *, logger=None)
Validate and map a loaded JSON config into SessionState.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
dict
|
Loaded configuration dictionary. |
required |
session
|
SessionState
|
The session state to populate. |
required |
Returns:
tuple : (bool, list[str]) (ok, messages): ok=False if any hard error prevents continuing.
ensure_cds_connection ¶
ensure_cds_connection(
client,
creds,
max_reauth_attempts=6,
wait_between_attempts=15,
)
Ensure a valid CDS API client. Re-authenticate automatically if the connection drops.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
Client
|
Current CDS API client. |
required |
creds
|
dict
|
{'url': str, 'key': str} stored from initial login. |
required |
max_reauth_attempts
|
int
|
Maximum reconnection attempts before aborting. |
6
|
wait_between_attempts
|
int
|
Wait time (seconds) between re-auth attempts. |
15
|
Returns:
| Type | Description |
|---|---|
Client | None
|
Valid client or None if re-authentication ultimately fails. |
internet_speedtest ¶
internet_speedtest(
test_urls=None,
max_seconds=15,
logger=None,
echo_console=True,
)
Download ~100MB test file from a fast CDN to estimate speed (MB/s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_urls
|
list[str]
|
List of URLs of the test files. |
None
|
max_seconds
|
int
|
Maximum time to wait for a response, by default 15 seconds. |
15
|
Returns:
| Type | Description |
|---|---|
float: Estimated download speed in Mbps.
|
|