API reference¶
Dataset
¶
Bases: BaseModel, Generic[T]
Store and operate on a collection of Timeseries.
Attributes:
| Name | Type | Description |
|---|---|---|
timeseries |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/core/dataset.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 | |
coverage: Coverage
property
¶
Coverage summary of the dataset.
Renders as a per-timeseries table (records and time span per location /
variable / sensor) and exposes :meth:Coverage.plot for a coverage timeline.
Examples:
>>> ds.coverage # the table
>>> ds.coverage.plot() # the timeline
info: pd.DataFrame
property
¶
Per-timeseries metadata summary, rendered as a table.
One row per timeseries — location, variable, sensor, the number of
records, and the start / end of its time span. A quick look at what
a Dataset holds before processing it (the default repr only shows the timeseries
count). See :attr:coverage for a plottable version and :func:gensor.diff to
line this up across datasets.
Examples:
>>> ds.info
loc: DatasetIndexer
property
¶
Label-based selection applied to every timeseries in the dataset.
ds.loc[start:end] returns a new Dataset where each timeseries is sliced by
.loc[start:end] (e.g. a date range), forwarding the key to each series' own
pandas .loc. Empty slices yield empty timeseries (every series is kept).
Examples:
>>> ds.loc["2021-01-01":"2021-12-31"]
__contains__(location)
¶
Return True if any timeseries in the dataset has the given location.
Source code in gensor/core/dataset.py
190 191 192 | |
__getitem__(key)
¶
Retrieve Timeseries by integer index, location name, or (location, variable[, unit]) tuple.
dataset[0]returns the Timeseries at that position (a reference).dataset["PB01A"]returns the Timeseries at that location, or a Dataset if the location has several timeseries (e.g. pressure and temperature). A list of names (dataset[["PB01A", "PB02A"]]) always returns a Dataset.dataset["PB01A", "pressure"](or["PB01A", "pressure", "cmh2o"]) narrows by variable/unit, returning a single Timeseries when one matches. For full control use :meth:filter/ :meth:one.
Warning
Integer indexing returns a reference to the timeseries. Location /
tuple indexing returns copies (it delegates to .filter()).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key |
int | str | list | tuple
|
Position, location name, list of names, or a (location, variable[, unit]) tuple. |
required |
Returns:
| Type | Description |
|---|---|
T | None | Dataset
|
Timeseries | Dataset: The matching timeseries or a dataset of them. |
Raises:
| Type | Description |
|---|---|
IndexOutOfRangeError
|
If an integer index is out of range. |
KeyError
|
If no timeseries matches the given location(s)/filters. |
Source code in gensor/core/dataset.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | |
__iter__()
¶
Allows to iterate directly over the dataset.
Source code in gensor/core/dataset.py
131 132 133 | |
__len__()
¶
Gives the number of timeseries in the Dataset.
Source code in gensor/core/dataset.py
135 136 137 | |
add(other)
¶
Appends new Timeseries to the Dataset.
If an equal Timeseries already exists, merge the new data into the existing Timeseries, dropping duplicate timestamps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other |
Timeseries
|
The Timeseries object to add. |
required |
Source code in gensor/core/dataset.py
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 | |
diff(*others, labels=None, key=('location', 'variable'))
¶
Compare this dataset's coverage with one or more others.
Convenience wrapper over :func:gensor.diff. labels names this dataset
and the others (default ds0, ds1 ...).
Examples:
>>> raw.diff(trimmed, labels=["raw", "trimmed"]).plot()
Source code in gensor/core/dataset.py
263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | |
filter(*predicates, location=None, variable=None, unit=None, **kwargs)
¶
Return a Timeseries or a new Dataset filtered by station, sensor, and/or variable.
Any of location/variable/unit (and the keyword attributes) may be
a single value or a list of values, matching a timeseries when its attribute
equals (or is in) the given value(s).
Prefix a value with ~ to negate it - drop timeseries with that value
rather than keep them (e.g. location="~PB16D" keeps everything except
PB16D; sensor="~AV319" drops just that sensor). Positive and negated
values may be mixed within one attribute and across attributes; for a given
attribute a timeseries is kept when its value is in the positives (if any are
given) and not in the negatives, and attributes are AND-ed together.
For conditions the per-attribute keywords can't express - notably a combined
match across attributes - pass one or more :class:Where predicates
positionally. filter(~Where(location="PB03B", sensor="AV319")) drops only that
sensor at that location (the whole combination negated as a unit), while
filter(Where(location="PB16A") | Where(location="PB16B")) keeps either.
Predicates are AND-ed with the keyword filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*predicates |
Where
|
Predicate objects; all must match for a timeseries to
be kept (combine with |
()
|
location |
str | list
|
The location name(s); |
None
|
variable |
str | list
|
The variable(s) being measured; |
None
|
unit |
str | list
|
Unit(s) of the measurement; |
None
|
**kwargs |
str | list
|
Attributes of subclassed timeseries used for
filtering (e.g., sensor, method); |
{}
|
Returns:
| Type | Description |
|---|---|
T | Dataset
|
Timeseries | Dataset: A single Timeseries if exactly one match is found, or a new Dataset if multiple matches are found. |
Source code in gensor/core/dataset.py
341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 | |
get_locations()
¶
List all unique locations in the dataset, preserving first-seen order.
Source code in gensor/core/dataset.py
194 195 196 197 198 199 200 | |
one(**filters)
¶
Return exactly one matching Timeseries.
A convenience over :meth:filter for when a single result is expected:
it always returns a Timeseries (never a Dataset) and raises if zero or
more than one timeseries match - avoiding the "is it a Timeseries or a
Dataset?" ambiguity of :meth:filter / dataset[name].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**filters |
Any
|
Same keyword filters as :meth: |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Timeseries |
T
|
The single matching timeseries. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If zero or more than one timeseries match the filters. |
Source code in gensor/core/dataset.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | |
plot(facet='variable', variable=None, ncols=5, sharex=False, include_outliers=False, plot_kwargs=None, legend_kwargs=None)
¶
Plot the dataset's timeseries, in one of two layouts.
facet="variable"(default): one subplot per variable (pressure, temperature, ...), every location's series overlaid on that axis. Returns(fig, axes)whereaxesis a list (one per variable).facet="location": a separate figure per variable, each a grid with one panel per location (ncolswide). Every location gets a panel - left empty if it has no (or empty) series for that variable - and unused trailing cells are hidden. Multiple sensors at a location are overlaid in the same panel, and a legend (labelled by sensor serial) is shown only then; single-series panels get no legend. Panels are titled by location and carry no x-label (the dates are on the shared/rotated ticks). Returns{variable: (fig, axes)}.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
facet |
str
|
|
'variable'
|
variable |
str | list
|
restrict to these variable(s); default is every unique variable in the dataset. |
None
|
ncols |
int
|
panels per row for the |
5
|
sharex |
bool
|
for |
False
|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot(). |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend(). |
None
|
Returns:
| Type | Description |
|---|---|
tuple[Figure, list] | dict[str, tuple[Figure, list]]
|
|
tuple[Figure, list] | dict[str, tuple[Figure, list]]
|
for |
Source code in gensor/core/dataset.py
486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 | |
pop(*predicates, location=None, variable=None, unit=None, **kwargs)
¶
Remove and return the matching timeseries, mutating the Dataset in place.
Selection works exactly like :meth:filter (same location / variable /
unit / keyword filters, ~ negation, and :class:Where predicates), but
the matched timeseries are removed from this Dataset and returned by
reference (not copied) - so you can alter them and add() them back in their
new form::
ts = ds.pop(location="PB03B", sensor="AV319") # taken out of ds
ts.ts = ts.ts - 300 # edit the live series
ds.add(ts) # put it back, changed
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*predicates |
Where
|
Predicate objects; all must match (combine with |
()
|
location |
str | list
|
The location name(s); |
None
|
variable |
str | list
|
The variable(s) being measured; |
None
|
unit |
str | list
|
Unit(s) of the measurement; |
None
|
**kwargs |
str | list
|
Other timeseries attributes to match (e.g., sensor). |
{}
|
Returns:
| Type | Description |
|---|---|
T | Dataset
|
Timeseries | Dataset: A single Timeseries if exactly one match is removed, a new Dataset of them if several match, or an empty Dataset if none match (in which case nothing is removed). |
Source code in gensor/core/dataset.py
394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 | |
to_sql(db)
¶
Save the entire timeseries to a SQLite database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db |
DatabaseConnection
|
SQLite database connection object. |
required |
Source code in gensor/core/dataset.py
468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 | |
Timeseries
¶
Bases: BaseTimeseries
Timeseries of groundwater sensor data.
Attributes:
| Name | Type | Description |
|---|---|---|
ts |
Series
|
The timeseries data. |
variable |
Literal['temperature', 'pressure', 'conductivity', 'flux']
|
The type of the measurement. |
unit |
Literal['degC', 'mmH2O', 'mS/cm', 'm/s']
|
The unit of the measurement. |
sensor |
str
|
The serial number of the sensor. |
sensor_alt |
float
|
Altitude of the sensor (ncessary to compute groundwater levels). |
Source code in gensor/core/timeseries.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
__eq__(other)
¶
Check equality based on location, sensor, variable, unit and sensor_alt.
Source code in gensor/core/timeseries.py
40 41 42 43 44 45 46 47 48 | |
plot(include_outliers=False, ax=None, plot_kwargs=None, legend_kwargs=None)
¶
Plots the timeseries data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
ax |
Axes
|
Matplotlib axes object to plot on. If None, a new figure and axes are created. |
None
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot() method to customize the plot. |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend() to customize the legend. |
None
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
Matplotlib figure and axes to allow further customization. |
Source code in gensor/core/timeseries.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
Where
¶
A composable predicate over a Timeseries' attributes, for Dataset.filter/drop.
A leaf Where(**conditions) matches a Timeseries when every condition holds;
each condition matches when the timeseries' attribute equals (or is in, for a list)
the given value(s), and a leading ~ on a value negates that single condition.
Compose leaves with & (and), | (or) and ~ (not) to express anything the
per-attribute keyword filters can't - in particular a combined exclusion::
~Where(location="PB03B", sensor="AV319") # not (PB03B and AV319)
Where(variable="pressure") & ~Where(location="PB16D")
Where(location="PB16A") | Where(location="PB16B")
Pass instances straight to Dataset.filter (keep matches) or Dataset.drop
(remove matches); they are AND-ed with the keyword filters in the same call.
Source code in gensor/core/dataset.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
compensate(raw, barometric, alignment_period='h', threshold_wc=0.025, fieldwork_dates=None, interpolate_method=None)
¶
Compensate raw sensor pressure to groundwater head (m asl).
Computes the water column (see :func:water_column) and adds the sensor altitude.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw |
Timeseries | Dataset
|
Raw sensor timeseries |
required |
barometric |
Timeseries | float
|
Barometric pressure timeseries or a single float value. If a float value is provided, it is assumed to be in cmH2O. |
required |
alignment_period |
Literal['D', 'ME', 'SME', 'MS', 'YE', 'YS', 'h', 'min', 's']
|
The alignment period for the timeseries. Default is 'h'. See pandas offset aliases for definitinos. |
'h'
|
threshold_wc |
float | None
|
Lower cutoff (in m) for the water column; records at or below it are dropped. Defaults to 0.025 m (25 mm) and is always applied; lower it to keep shallower columns, or set 0 to drop only negatives. Negative water columns are always dropped regardless, being physically impossible. |
0.025
|
fieldwork_dates |
Dict[str, list]
|
Dictionary of location name and a list of fieldwork days. All records on the fieldwork day are set to None. |
None
|
interpolate_method |
str
|
String representing the interpolate method as in pd.Series.interpolate() method. |
None
|
Returns:
| Type | Description |
|---|---|
Timeseries | Dataset | None
|
Timeseries | Dataset | None: head (variable 'head', unit 'm asl'). |
Source code in gensor/processing/compensation.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | |
diff(datasets, key=('location', 'variable'))
¶
Compare the coverage of two or more datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasets |
dict[str, Dataset] | list[Dataset]
|
a mapping |
required |
key |
tuple[str, ...]
|
attributes used to align series across datasets (default
|
('location', 'variable')
|
Returns:
| Name | Type | Description |
|---|---|---|
CoverageDiff |
CoverageDiff
|
renders as a comparison table; |
Source code in gensor/core/dataset.py
922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 | |
read_from_csv(path, file_format='vanessen', **kwargs)
¶
Loads the data from csv files with given file_format and returns a list of Timeseries objects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
The path to the file or directory containing the files. |
required |
**kwargs |
dict
|
Optional keyword arguments passed to the parsers: * serial_number_pattern (str): The regex pattern to extract the serial number from the file. * location_pattern (str): The regex pattern to extract the station from the file. * col_names (list): The column names for the dataframe. * location (str): Name of the location of the timeseries. * sensor (str): Sensor serial number. |
{}
|
Source code in gensor/io/read.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
read_from_sql(db, load_all=True, location=None, variable=None, unit=None, timestamp_start=None, timestamp_stop=None, **kwargs)
¶
Returns the timeseries or a dataset from a SQL database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db |
DatabaseConnection
|
The database connection object. |
required |
load_all |
bool
|
Whether to load all timeseries from the database. |
True
|
location |
str
|
The station name. |
None
|
variable |
str
|
The measurement type. |
None
|
unit |
str
|
The unit of the measurement. |
None
|
timestamp_start |
Timestamp
|
Start timestamp filter. |
None
|
timestamp_stop |
Timestamp
|
End timestamp filter. |
None
|
**kwargs |
dict
|
Any additional filters matching attributes of the particular timeseries. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Dataset |
Timeseries | Dataset
|
Dataset with retrieved objects or an empty Dataset. |
Source code in gensor/io/read.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
set_log_level(level)
¶
Set the logging level for the package.
Source code in gensor/log.py
4 5 6 7 | |
water_column(raw, barometric, alignment_period='h', threshold_wc=0.025, fieldwork_dates=None, interpolate_method=None)
¶
Barometrically compensate raw sensor pressure to the water column above the sensor.
This is the first step of :func:compensate exposed on its own: subtract the
barometric pressure, convert to mH2O, mask fieldwork days, and drop out-of-water
records (see threshold_wc) - without adding the sensor altitude, so the result is
the water column height in metres (variable 'water_column', unit 'm') rather than head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw |
Timeseries | Dataset
|
Raw sensor timeseries |
required |
barometric |
Timeseries | float
|
Barometric pressure timeseries or a single float value. If a float value is provided, it is assumed to be in cmH2O. |
required |
alignment_period |
Literal['D', 'ME', 'SME', 'MS', 'YE', 'YS', 'h', 'min', 's']
|
The alignment period for the timeseries. Default is 'h'. See pandas offset aliases for definitinos. |
'h'
|
threshold_wc |
float | None
|
Lower cutoff (in m) for the water column; records at or below it are dropped. Defaults to 0.025 m (25 mm) and is always applied; lower it to keep shallower columns, or set 0 to drop only negatives. Negative water columns are always dropped regardless, being physically impossible. |
0.025
|
fieldwork_dates |
Dict[str, list]
|
Dictionary of location name and a list of fieldwork days. All records on the fieldwork day are set to None. |
None
|
interpolate_method |
str
|
String representing the interpolate method as in pd.Series.interpolate() method. |
None
|
Returns:
| Type | Description |
|---|---|
Timeseries | Dataset | None
|
Timeseries | Dataset | None: the water column height (variable 'water_column', unit 'm'). |
Source code in gensor/processing/compensation.py
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 | |
analysis
¶
outliers
¶
OutlierDetection
¶
Detecting outliers in groundwater timeseries data.
Each method in this class returns a pandas.Series containing predicted outliers in the dataset.
Methods:
| Name | Description |
|---|---|
iqr |
Use interquartile range (IQR). |
zscore |
Use the z-score method. |
isolation_forest |
Using the isolation forest algorithm. |
lof |
Using the local outlier factor (LOF) method. |
Source code in gensor/analysis/outliers.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
__init__(data, method, rolling, window, **kwargs)
¶
Find outliers in a time series using the specified method, with an option for rolling window.
Source code in gensor/analysis/outliers.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
iqr(data, k, rolling)
staticmethod
¶
Use interquartile range (IQR).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
Series
|
The time series data. |
required |
Other Parameters:
| Name | Type | Description |
|---|---|---|
k |
float
|
The multiplier for the IQR to define the range. Defaults to 1.5. |
Returns:
| Type | Description |
|---|---|
ndarray
|
np.ndarray: Binary mask representing the outliers as 1. |
Source code in gensor/analysis/outliers.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |
isolation_forest(data, **kwargs)
¶
Using the isolation forest algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
Series
|
The time series data. |
required |
Other Parameters:
| Name | Type | Description |
|---|---|---|
n_estimators |
int
|
The number of base estimators in the ensemble. Defaults to 100. |
max_samples |
int | auto | float
|
The number of samples to draw from X to train each base estimator. Defaults to 'auto'. |
contamination |
float
|
The proportion of outliers in the data. Defaults to 0.01. |
max_features |
int | float
|
The number of features to draw from X to train each base estimator. Defaults to 1.0. |
bootstrap |
bool
|
Whether to use bootstrapping when sampling the data. Defaults to False. |
n_jobs |
int
|
The number of jobs to run in parallel. Defaults to 1. |
random_state |
int | RandomState | None
|
The random state to use. Defaults to None. |
verbose |
int
|
The verbosity level. Defaults to 0. |
warm_start |
bool
|
Whether to reuse the solution of the previous call to fit and add more estimators to the ensemble. Defaults to False. |
Note
For details on kwargs see: sklearn.ensemble.IsolationForest.
Source code in gensor/analysis/outliers.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
lof(data, **kwargs)
¶
Using the local outlier factor (LOF) method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
Series
|
The time series data. |
required |
Other Parameters:
| Name | Type | Description |
|---|---|---|
n_neighbors |
int
|
The number of neighbors to consider for each sample. Defaults to 20. |
algorithm |
str
|
The algorithm to use. Either 'auto', 'ball_tree', 'kd_tree' or 'brute'. Defaults to 'auto'. |
leaf_size |
int
|
The leaf size of the tree. Defaults to 30. |
metric |
str
|
The distance metric to use. Defaults to 'minkowski'. |
p |
int
|
The power parameter for the Minkowski metric. Defaults to 2. |
contamination |
float
|
The proportion of outliers in the data. Defaults to 0.01. |
novelty |
bool
|
Whether to consider the samples as normal or outliers. Defaults to False. |
n_jobs |
int
|
The number of jobs to run in parallel. Defaults to 1. |
Note: For details on kwargs see: sklearn.neighbors.LocalOutlierFactor.
Source code in gensor/analysis/outliers.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
zscore(data, threshold, rolling)
staticmethod
¶
Use the z-score method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
Series
|
The time series data. |
required |
Other Parameters:
| Name | Type | Description |
|---|---|---|
threshold |
float
|
The threshold for the z-score method. Defaults to 3.0. |
Returns:
| Type | Description |
|---|---|
ndarray
|
pandas.Series: Binary mask representing outliers. |
Source code in gensor/analysis/outliers.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
stats
¶
Module to compute timeseries statistics, similar to pastas.stats.signatures module and following Heudorfer et al. 2019
To be implemented:
- Structure
- Flashiness
- Distribution
- Modality
- Density
- Shape
- Scale
- Slope
config
¶
Warning
Whenever Timeseries objects are created via read_from_csv and use a parser (e.g., 'vanessen'), the timestamps are localized and converted to UTC. Therefore, if the user creates his own timeseries outside the read_from_csv, they should ensure that the timestamps are in UTC format.
core
¶
base
¶
BaseTimeseries
¶
Bases: BaseModel
Generic base class for timeseries with metadata.
Timeseries is a series of measurements of a single variable, in the same unit, from a single location with unique timestamps.
Attributes:
| Name | Type | Description |
|---|---|---|
ts |
Series
|
The timeseries data. |
variable |
Literal['temperature', 'pressure', 'conductivity', 'flux']
|
The type of the measurement. |
unit |
Literal['degC', 'mmH2O', 'mS/cm', 'm/s']
|
The unit of the measurement. |
outliers |
Series
|
Measurements marked as outliers. |
transformation |
Any
|
Metadata of transformation the timeseries undergone. |
Methods:
| Name | Description |
|---|---|
validate_ts |
if the pd.Series is not exactly what is required, coerce. |
Source code in gensor/core/base.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 | |
__eq__(other)
¶
Check equality based on location, sensor, variable, unit and sensor_alt.
Source code in gensor/core/base.py
78 79 80 81 82 83 84 85 86 87 | |
__getattr__(attr)
¶
Delegate attribute access to the underlying pandas Series if it exists.
Special handling is implemented for pandas indexer.
Source code in gensor/core/base.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
concatenate(other)
¶
Concatenate two Timeseries objects if they are considered equal.
Source code in gensor/core/base.py
134 135 136 137 138 139 140 141 142 143 144 145 | |
detect_outliers(method, rolling=False, window=6, remove=True, **kwargs)
¶
Detects outliers in the timeseries using the specified method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method |
Literal['iqr', 'zscore', 'isolation_forest', 'lof']
|
The method to use for outlier detection. |
required |
**kwargs |
Any
|
Additional kewword arguments for OutlierDetection. |
{}
|
Returns:
| Type | Description |
|---|---|
T
|
Updated deep copy of the Timeseries object with outliers, |
T
|
optionally removed from the original timeseries. |
Source code in gensor/core/base.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | |
mask_with(other, mode='remove')
¶
Removes records not present in 'other' by index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other |
Timeseries
|
Another Timeseries whose indices are used to mask the current one. |
required |
mode |
Literal['keep', 'remove']
|
|
'remove'
|
Returns:
| Name | Type | Description |
|---|---|---|
Timeseries |
T
|
A new Timeseries object with the filtered data. |
Source code in gensor/core/base.py
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 | |
plot(include_outliers=False, ax=None, plot_kwargs=None, legend_kwargs=None)
¶
Plots the timeseries data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
ax |
Axes
|
Matplotlib axes object to plot on. If None, a new figure and axes are created. |
None
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot() method to customize the plot. |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend() to customize the legend. |
None
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
Matplotlib figure and axes to allow further customization. |
Source code in gensor/core/base.py
373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 | |
resample(freq, agg_func=pd.Series.mean, **resample_kwargs)
¶
Resample the timeseries to a new frequency with a specified aggregation function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
freq |
Any
|
The offset string or object representing target conversion (e.g., 'D' for daily, 'W' for weekly). |
required |
agg_func |
Any
|
The aggregation function to apply after resampling. Defaults to pd.Series.mean. |
mean
|
**resample_kwargs |
Any
|
Additional keyword arguments passed to the pandas.Series.resample method. |
{}
|
Returns:
| Type | Description |
|---|---|
T
|
Updated deep copy of the Timeseries object with the resampled timeseries data. |
Source code in gensor/core/base.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
serialize_timestamps(value)
¶
Serialize pd.Timestamp to ISO format.
Source code in gensor/core/base.py
73 74 75 76 | |
to_sql(db)
¶
Converts the timeseries to a list of dictionaries and uploads it to the database.
The Timeseries data is uploaded to the SQL database by using the pandas
to_sql method. Additionally, metadata about the timeseries is stored in the
'timeseries_metadata' table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db |
DatabaseConnection
|
The database connection object. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A message indicating the number of rows inserted into the database. |
Source code in gensor/core/base.py
267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 | |
transform(method, **transformer_kwargs)
¶
Transforms the timeseries using the specified method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method |
str
|
The method to use for transformation ('minmax', 'standard', 'robust'). |
required |
transformer_kwargs |
Any
|
Additional keyword arguments passed to the transformer definition. See gensor.preprocessing. |
{}
|
Returns:
| Type | Description |
|---|---|
T
|
Updated deep copy of the Timeseries object with the transformed timeseries data. |
Source code in gensor/core/base.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | |
dataset
¶
Coverage
¶
Coverage summary of a :class:Dataset, returned by Dataset.coverage.
Holds a per-timeseries table (one row per location / variable / sensor with
its record count and time span) and renders as that table in a notebook. Call
:meth:plot for a coverage timeline (one row per location; bars span contiguous
data, breaks mark gaps longer than max_gap).
The table is :attr:Dataset.info with a derived duration column appended, so
the per-series summary has a single source.
Source code in gensor/core/dataset.py
664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 | |
plot(max_gap='7D', ax=None, color='#1f4e79')
¶
Plot a coverage timeline: one row per location, with bars spanning
contiguous data and breaks wherever the gap between consecutive samples
exceeds max_gap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_gap |
str
|
pandas timedelta string; a gap longer than this splits a bar so within-record holes (e.g. a missing season) stay visible. |
'7D'
|
ax |
Axes | None
|
existing axes to draw on; a new figure is created if None. |
None
|
color |
str
|
bar colour. |
'#1f4e79'
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
Matplotlib figure and axes. |
Source code in gensor/core/dataset.py
698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 | |
CoverageDiff
¶
Coverage comparison of two or more datasets, returned by :func:gensor.diff
(or Dataset.diff).
Series are aligned across datasets by key (default ("location",
"variable")); multiple sensors sharing a key are unioned and the sensor(s)
reported. Renders as a wide table (per-dataset record count / start / end,
plus present and status summary columns) and exposes :meth:plot for an
N-way coverage timeline grouped by timeseries.
Source code in gensor/core/dataset.py
747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 | |
plot(max_gap='7D', ax=None, colors=None)
¶
Plot an N-way coverage timeline grouped by timeseries.
One row per key (e.g. location + variable); within each row a coverage
sub-bar per dataset (colour-coded, with a legend). Series present in only one
dataset, or covering different spans, are immediately visible.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_gap |
str
|
pandas timedelta string; gaps longer than this split a bar. |
'7D'
|
ax |
Axes | None
|
existing axes to draw on; a new figure is created if None. |
None
|
colors |
dict | None
|
optional |
None
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
Matplotlib figure and axes. |
Source code in gensor/core/dataset.py
860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 | |
Dataset
¶
Bases: BaseModel, Generic[T]
Store and operate on a collection of Timeseries.
Attributes:
| Name | Type | Description |
|---|---|---|
timeseries |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/core/dataset.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 | |
coverage: Coverage
property
¶
Coverage summary of the dataset.
Renders as a per-timeseries table (records and time span per location /
variable / sensor) and exposes :meth:Coverage.plot for a coverage timeline.
Examples:
>>> ds.coverage # the table
>>> ds.coverage.plot() # the timeline
info: pd.DataFrame
property
¶
Per-timeseries metadata summary, rendered as a table.
One row per timeseries — location, variable, sensor, the number of
records, and the start / end of its time span. A quick look at what
a Dataset holds before processing it (the default repr only shows the timeseries
count). See :attr:coverage for a plottable version and :func:gensor.diff to
line this up across datasets.
Examples:
>>> ds.info
loc: DatasetIndexer
property
¶
Label-based selection applied to every timeseries in the dataset.
ds.loc[start:end] returns a new Dataset where each timeseries is sliced by
.loc[start:end] (e.g. a date range), forwarding the key to each series' own
pandas .loc. Empty slices yield empty timeseries (every series is kept).
Examples:
>>> ds.loc["2021-01-01":"2021-12-31"]
__contains__(location)
¶
Return True if any timeseries in the dataset has the given location.
Source code in gensor/core/dataset.py
190 191 192 | |
__getitem__(key)
¶
Retrieve Timeseries by integer index, location name, or (location, variable[, unit]) tuple.
dataset[0]returns the Timeseries at that position (a reference).dataset["PB01A"]returns the Timeseries at that location, or a Dataset if the location has several timeseries (e.g. pressure and temperature). A list of names (dataset[["PB01A", "PB02A"]]) always returns a Dataset.dataset["PB01A", "pressure"](or["PB01A", "pressure", "cmh2o"]) narrows by variable/unit, returning a single Timeseries when one matches. For full control use :meth:filter/ :meth:one.
Warning
Integer indexing returns a reference to the timeseries. Location /
tuple indexing returns copies (it delegates to .filter()).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key |
int | str | list | tuple
|
Position, location name, list of names, or a (location, variable[, unit]) tuple. |
required |
Returns:
| Type | Description |
|---|---|
T | None | Dataset
|
Timeseries | Dataset: The matching timeseries or a dataset of them. |
Raises:
| Type | Description |
|---|---|
IndexOutOfRangeError
|
If an integer index is out of range. |
KeyError
|
If no timeseries matches the given location(s)/filters. |
Source code in gensor/core/dataset.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | |
__iter__()
¶
Allows to iterate directly over the dataset.
Source code in gensor/core/dataset.py
131 132 133 | |
__len__()
¶
Gives the number of timeseries in the Dataset.
Source code in gensor/core/dataset.py
135 136 137 | |
add(other)
¶
Appends new Timeseries to the Dataset.
If an equal Timeseries already exists, merge the new data into the existing Timeseries, dropping duplicate timestamps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other |
Timeseries
|
The Timeseries object to add. |
required |
Source code in gensor/core/dataset.py
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 | |
diff(*others, labels=None, key=('location', 'variable'))
¶
Compare this dataset's coverage with one or more others.
Convenience wrapper over :func:gensor.diff. labels names this dataset
and the others (default ds0, ds1 ...).
Examples:
>>> raw.diff(trimmed, labels=["raw", "trimmed"]).plot()
Source code in gensor/core/dataset.py
263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | |
filter(*predicates, location=None, variable=None, unit=None, **kwargs)
¶
Return a Timeseries or a new Dataset filtered by station, sensor, and/or variable.
Any of location/variable/unit (and the keyword attributes) may be
a single value or a list of values, matching a timeseries when its attribute
equals (or is in) the given value(s).
Prefix a value with ~ to negate it - drop timeseries with that value
rather than keep them (e.g. location="~PB16D" keeps everything except
PB16D; sensor="~AV319" drops just that sensor). Positive and negated
values may be mixed within one attribute and across attributes; for a given
attribute a timeseries is kept when its value is in the positives (if any are
given) and not in the negatives, and attributes are AND-ed together.
For conditions the per-attribute keywords can't express - notably a combined
match across attributes - pass one or more :class:Where predicates
positionally. filter(~Where(location="PB03B", sensor="AV319")) drops only that
sensor at that location (the whole combination negated as a unit), while
filter(Where(location="PB16A") | Where(location="PB16B")) keeps either.
Predicates are AND-ed with the keyword filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*predicates |
Where
|
Predicate objects; all must match for a timeseries to
be kept (combine with |
()
|
location |
str | list
|
The location name(s); |
None
|
variable |
str | list
|
The variable(s) being measured; |
None
|
unit |
str | list
|
Unit(s) of the measurement; |
None
|
**kwargs |
str | list
|
Attributes of subclassed timeseries used for
filtering (e.g., sensor, method); |
{}
|
Returns:
| Type | Description |
|---|---|
T | Dataset
|
Timeseries | Dataset: A single Timeseries if exactly one match is found, or a new Dataset if multiple matches are found. |
Source code in gensor/core/dataset.py
341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 | |
get_locations()
¶
List all unique locations in the dataset, preserving first-seen order.
Source code in gensor/core/dataset.py
194 195 196 197 198 199 200 | |
one(**filters)
¶
Return exactly one matching Timeseries.
A convenience over :meth:filter for when a single result is expected:
it always returns a Timeseries (never a Dataset) and raises if zero or
more than one timeseries match - avoiding the "is it a Timeseries or a
Dataset?" ambiguity of :meth:filter / dataset[name].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**filters |
Any
|
Same keyword filters as :meth: |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Timeseries |
T
|
The single matching timeseries. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If zero or more than one timeseries match the filters. |
Source code in gensor/core/dataset.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | |
plot(facet='variable', variable=None, ncols=5, sharex=False, include_outliers=False, plot_kwargs=None, legend_kwargs=None)
¶
Plot the dataset's timeseries, in one of two layouts.
facet="variable"(default): one subplot per variable (pressure, temperature, ...), every location's series overlaid on that axis. Returns(fig, axes)whereaxesis a list (one per variable).facet="location": a separate figure per variable, each a grid with one panel per location (ncolswide). Every location gets a panel - left empty if it has no (or empty) series for that variable - and unused trailing cells are hidden. Multiple sensors at a location are overlaid in the same panel, and a legend (labelled by sensor serial) is shown only then; single-series panels get no legend. Panels are titled by location and carry no x-label (the dates are on the shared/rotated ticks). Returns{variable: (fig, axes)}.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
facet |
str
|
|
'variable'
|
variable |
str | list
|
restrict to these variable(s); default is every unique variable in the dataset. |
None
|
ncols |
int
|
panels per row for the |
5
|
sharex |
bool
|
for |
False
|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot(). |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend(). |
None
|
Returns:
| Type | Description |
|---|---|
tuple[Figure, list] | dict[str, tuple[Figure, list]]
|
|
tuple[Figure, list] | dict[str, tuple[Figure, list]]
|
for |
Source code in gensor/core/dataset.py
486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 | |
pop(*predicates, location=None, variable=None, unit=None, **kwargs)
¶
Remove and return the matching timeseries, mutating the Dataset in place.
Selection works exactly like :meth:filter (same location / variable /
unit / keyword filters, ~ negation, and :class:Where predicates), but
the matched timeseries are removed from this Dataset and returned by
reference (not copied) - so you can alter them and add() them back in their
new form::
ts = ds.pop(location="PB03B", sensor="AV319") # taken out of ds
ts.ts = ts.ts - 300 # edit the live series
ds.add(ts) # put it back, changed
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*predicates |
Where
|
Predicate objects; all must match (combine with |
()
|
location |
str | list
|
The location name(s); |
None
|
variable |
str | list
|
The variable(s) being measured; |
None
|
unit |
str | list
|
Unit(s) of the measurement; |
None
|
**kwargs |
str | list
|
Other timeseries attributes to match (e.g., sensor). |
{}
|
Returns:
| Type | Description |
|---|---|
T | Dataset
|
Timeseries | Dataset: A single Timeseries if exactly one match is removed, a new Dataset of them if several match, or an empty Dataset if none match (in which case nothing is removed). |
Source code in gensor/core/dataset.py
394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 | |
to_sql(db)
¶
Save the entire timeseries to a SQLite database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db |
DatabaseConnection
|
SQLite database connection object. |
required |
Source code in gensor/core/dataset.py
468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 | |
DatasetIndexer
¶
Applies a pandas .loc selection to every Timeseries in a Dataset.
Returned by :attr:Dataset.loc. ds.loc[start:end] slices each timeseries by label
(e.g. a date range) via its own .loc and returns a new Dataset of the results.
Intended for label slices; a key that selects a single scalar from a timeseries (a
point lookup) is rejected, since the per-series scalars can't form a Dataset.
Source code in gensor/core/dataset.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
Where
¶
A composable predicate over a Timeseries' attributes, for Dataset.filter/drop.
A leaf Where(**conditions) matches a Timeseries when every condition holds;
each condition matches when the timeseries' attribute equals (or is in, for a list)
the given value(s), and a leading ~ on a value negates that single condition.
Compose leaves with & (and), | (or) and ~ (not) to express anything the
per-attribute keyword filters can't - in particular a combined exclusion::
~Where(location="PB03B", sensor="AV319") # not (PB03B and AV319)
Where(variable="pressure") & ~Where(location="PB16D")
Where(location="PB16A") | Where(location="PB16B")
Pass instances straight to Dataset.filter (keep matches) or Dataset.drop
(remove matches); they are AND-ed with the keyword filters in the same call.
Source code in gensor/core/dataset.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
diff(datasets, key=('location', 'variable'))
¶
Compare the coverage of two or more datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasets |
dict[str, Dataset] | list[Dataset]
|
a mapping |
required |
key |
tuple[str, ...]
|
attributes used to align series across datasets (default
|
('location', 'variable')
|
Returns:
| Name | Type | Description |
|---|---|---|
CoverageDiff |
CoverageDiff
|
renders as a comparison table; |
Source code in gensor/core/dataset.py
922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 | |
indexer
¶
TimeseriesIndexer
¶
A wrapper for the Pandas indexers (e.g., loc, iloc) to return Timeseries objects.
Source code in gensor/core/indexer.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
__getitem__(key)
¶
Allows using the indexer (e.g., loc) and wraps the result in the parent Timeseries.
Source code in gensor/core/indexer.py
20 21 22 23 24 25 26 27 28 29 30 31 32 | |
__setitem__(key, value)
¶
Allows setting values directly using the indexer (e.g., loc, iloc).
Source code in gensor/core/indexer.py
34 35 36 37 | |
timeseries
¶
Timeseries
¶
Bases: BaseTimeseries
Timeseries of groundwater sensor data.
Attributes:
| Name | Type | Description |
|---|---|---|
ts |
Series
|
The timeseries data. |
variable |
Literal['temperature', 'pressure', 'conductivity', 'flux']
|
The type of the measurement. |
unit |
Literal['degC', 'mmH2O', 'mS/cm', 'm/s']
|
The unit of the measurement. |
sensor |
str
|
The serial number of the sensor. |
sensor_alt |
float
|
Altitude of the sensor (ncessary to compute groundwater levels). |
Source code in gensor/core/timeseries.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
__eq__(other)
¶
Check equality based on location, sensor, variable, unit and sensor_alt.
Source code in gensor/core/timeseries.py
40 41 42 43 44 45 46 47 48 | |
plot(include_outliers=False, ax=None, plot_kwargs=None, legend_kwargs=None)
¶
Plots the timeseries data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
ax |
Axes
|
Matplotlib axes object to plot on. If None, a new figure and axes are created. |
None
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot() method to customize the plot. |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend() to customize the legend. |
None
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
Matplotlib figure and axes to allow further customization. |
Source code in gensor/core/timeseries.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
db
¶
DB¶
Module handling database connection in case saving and loading from SQLite database is used.
Modules:
connection.py
DatabaseConnection
¶
Bases: BaseModel
Database connection object. If no database exists at the specified path, it will be created. If no database is specified, an in-memory database will be used.
Attributes metadata (MetaData): SQLAlchemy metadata object. db_directory (Path): Path to the database to connect to. db_name (str): Name for the database to connect to. engine (Engine | None): SQLAlchemy Engine instance.
Source code in gensor/db/connection.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
__enter__()
¶
Enable usage in a with block by returning the engine.
Source code in gensor/db/connection.py
83 84 85 86 87 88 | |
__exit__(exc_type, exc_val, exc_tb)
¶
Dispose of the engine when exiting the with block.
Source code in gensor/db/connection.py
90 91 92 | |
connect()
¶
Connect to the database and initialize the engine. If engine is None > create it with verified path > reflect. After connecting, ensure the timeseries_metadata table is present.
Source code in gensor/db/connection.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
create_metadata()
¶
Create a metadata table if it doesn't exist yet and store ts metadata.
Source code in gensor/db/connection.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
create_table(schema_name, column_name)
¶
Create a table in the database.
Schema name is a string representing the location, sensor, variable measured and
unit of measurement. This is a way of preserving the metadata of the Timeseries.
The index is always timestamp and the column name is dynamicly create from
the measured variable.
Source code in gensor/db/connection.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
dispose()
¶
Dispose of the engine, closing all connections.
Source code in gensor/db/connection.py
76 77 78 79 80 81 | |
get_timeseries_metadata(location=None, variable=None, unit=None, **kwargs)
¶
List timeseries available in the database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
location |
str
|
Location attribute to match. |
None
|
variable |
str
|
Variable attribute to match. |
None
|
unit |
str
|
Unit attribute to match. |
None
|
**kwargs |
dict
|
Additional filters. Must match the attributes of the Timeseries instance user is trying to retrieve. |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: The name of the matching table or None if no table is found. |
Source code in gensor/db/connection.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
connection
¶
Module defining database connection object.
Classes:
| Name | Description |
|---|---|
DatabaseConnection |
Database connection object |
DatabaseConnection
¶
Bases: BaseModel
Database connection object. If no database exists at the specified path, it will be created. If no database is specified, an in-memory database will be used.
Attributes metadata (MetaData): SQLAlchemy metadata object. db_directory (Path): Path to the database to connect to. db_name (str): Name for the database to connect to. engine (Engine | None): SQLAlchemy Engine instance.
Source code in gensor/db/connection.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
__enter__()
¶
Enable usage in a with block by returning the engine.
Source code in gensor/db/connection.py
83 84 85 86 87 88 | |
__exit__(exc_type, exc_val, exc_tb)
¶
Dispose of the engine when exiting the with block.
Source code in gensor/db/connection.py
90 91 92 | |
connect()
¶
Connect to the database and initialize the engine. If engine is None > create it with verified path > reflect. After connecting, ensure the timeseries_metadata table is present.
Source code in gensor/db/connection.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
create_metadata()
¶
Create a metadata table if it doesn't exist yet and store ts metadata.
Source code in gensor/db/connection.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
create_table(schema_name, column_name)
¶
Create a table in the database.
Schema name is a string representing the location, sensor, variable measured and
unit of measurement. This is a way of preserving the metadata of the Timeseries.
The index is always timestamp and the column name is dynamicly create from
the measured variable.
Source code in gensor/db/connection.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
dispose()
¶
Dispose of the engine, closing all connections.
Source code in gensor/db/connection.py
76 77 78 79 80 81 | |
get_timeseries_metadata(location=None, variable=None, unit=None, **kwargs)
¶
List timeseries available in the database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
location |
str
|
Location attribute to match. |
None
|
variable |
str
|
Variable attribute to match. |
None
|
unit |
str
|
Unit attribute to match. |
None
|
**kwargs |
dict
|
Additional filters. Must match the attributes of the Timeseries instance user is trying to retrieve. |
{}
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: The name of the matching table or None if no table is found. |
Source code in gensor/db/connection.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
exceptions
¶
IndexOutOfRangeError
¶
Bases: IndexError
Custom exception raised when an index is out of range in the dataset.
Source code in gensor/exceptions.py
37 38 39 40 41 42 43 | |
InvalidMeasurementTypeError
¶
Bases: ValueError
Raised when a timeseries of a wrong measurement type is operated upon.
Source code in gensor/exceptions.py
1 2 3 4 5 6 7 | |
MissingInputError
¶
Bases: ValueError
Raised when a required input is missing.
Source code in gensor/exceptions.py
10 11 12 13 14 15 16 17 | |
TimeseriesUnequal
¶
Bases: ValueError
Raised when Timeseries objects are compared and are unequal.
Source code in gensor/exceptions.py
26 27 28 29 30 31 32 33 34 | |
io
¶
read
¶
Fetching the data from various sources.
TODO: Fix up the read_from_sql() function to actually work properly.
read_from_api()
¶
Fetch data from the API.
Source code in gensor/io/read.py
190 191 192 | |
read_from_csv(path, file_format='vanessen', **kwargs)
¶
Loads the data from csv files with given file_format and returns a list of Timeseries objects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
The path to the file or directory containing the files. |
required |
**kwargs |
dict
|
Optional keyword arguments passed to the parsers: * serial_number_pattern (str): The regex pattern to extract the serial number from the file. * location_pattern (str): The regex pattern to extract the station from the file. * col_names (list): The column names for the dataframe. * location (str): Name of the location of the timeseries. * sensor (str): Sensor serial number. |
{}
|
Source code in gensor/io/read.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
read_from_sql(db, load_all=True, location=None, variable=None, unit=None, timestamp_start=None, timestamp_stop=None, **kwargs)
¶
Returns the timeseries or a dataset from a SQL database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db |
DatabaseConnection
|
The database connection object. |
required |
load_all |
bool
|
Whether to load all timeseries from the database. |
True
|
location |
str
|
The station name. |
None
|
variable |
str
|
The measurement type. |
None
|
unit |
str
|
The unit of the measurement. |
None
|
timestamp_start |
Timestamp
|
Start timestamp filter. |
None
|
timestamp_stop |
Timestamp
|
End timestamp filter. |
None
|
**kwargs |
dict
|
Any additional filters matching attributes of the particular timeseries. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Dataset |
Timeseries | Dataset
|
Dataset with retrieved objects or an empty Dataset. |
Source code in gensor/io/read.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
log
¶
set_log_level(level)
¶
Set the logging level for the package.
Source code in gensor/log.py
4 5 6 7 | |
parse
¶
parse_plain(path, **kwargs)
¶
Parse a simple csv without metadata header, just columns with variables
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
The path to the file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/parse/plain.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
parse_vanessen_csv(path, **kwargs)
¶
Parses a van Essen csv file and returns a list of Timeseries objects. At this point it does not matter whether the file is a barometric or piezometric logger file.
The function will use regex patterns to extract the serial number and station from the file. It is important to use the appropriate regex patterns, particularily for the station. If the default patterns are not working (whihc most likely will be the case), the user should provide their own patterns. The patterns can be provided as keyword arguments to the function and it is possible to use OR (|) in the regex pattern.
Warning
A better check for the variable type and units has to be implemented.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
The path to the file. |
required |
Other Parameters:
| Name | Type | Description |
|---|---|---|
serial_number_pattern |
str
|
The regex pattern to extract the serial number from the file. |
location_pattern |
str
|
The regex pattern to extract the station from the file. |
col_names |
list
|
The column names for the dataframe. |
Returns:
| Name | Type | Description |
|---|---|---|
list |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/parse/vanessen.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
plain
¶
parse_plain(path, **kwargs)
¶
Parse a simple csv without metadata header, just columns with variables
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
The path to the file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/parse/plain.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
utils
¶
detect_encoding(path, num_bytes=1024)
¶
Detect the encoding of a file using chardet.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
The path to the file. |
required |
num_bytes |
int
|
Number of bytes to read for encoding detection (default is 1024). |
1024
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The detected encoding of the file. |
Source code in gensor/parse/utils.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
get_data(text, data_start, data_end, column_names)
¶
Search for data in the file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text |
str
|
string obtained from the CSV file. |
required |
data_start |
str
|
string marking the data header row. |
required |
data_end |
str
|
string marking the end of the data block. When it is not present (some exports omit the trailing marker), the data is read to the end of the file. |
required |
column_names |
list
|
list of expected column names. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame |
Source code in gensor/parse/utils.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
get_header_fields(text)
¶
Parse the key = value lines of a Diver-Office header into a dict.
Diver-Office files carry labelled fields in the header (e.g. Location and
Serial number); reading those directly is far more reliable than matching a
regex against the whole file, which can pick up stray matches from the embedded
FILENAME path. Parsing stops at the data block, section markers
([Logger settings] ...) and lines without a key = value shape are
skipped, and the first occurrence of each key wins.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text |
str
|
string obtained from the CSV file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
header field name -> value (both stripped). |
Source code in gensor/parse/utils.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
get_metadata(text, patterns)
¶
Search for metadata in the file header with given regex patterns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text |
str
|
string obtained from the CSV file. |
required |
patterns |
dict
|
regex patterns matching the location and sensor information. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
metadata of the timeseries. |
Source code in gensor/parse/utils.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
handle_timestamps(df, tz_string)
¶
Converts timestamps in the dataframe to the specified timezone (e.g., 'UTC+1').
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame
|
The dataframe with timestamps. |
required |
tz_string |
str
|
A timezone string like 'UTC+1' or 'UTC-5'. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: The dataframe with timestamps converted to UTC. |
Source code in gensor/parse/utils.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
vanessen
¶
Logic parsing CSV files from van Essen Instruments Divers.
parse_vanessen_csv(path, **kwargs)
¶
Parses a van Essen csv file and returns a list of Timeseries objects. At this point it does not matter whether the file is a barometric or piezometric logger file.
The function will use regex patterns to extract the serial number and station from the file. It is important to use the appropriate regex patterns, particularily for the station. If the default patterns are not working (whihc most likely will be the case), the user should provide their own patterns. The patterns can be provided as keyword arguments to the function and it is possible to use OR (|) in the regex pattern.
Warning
A better check for the variable type and units has to be implemented.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
The path to the file. |
required |
Other Parameters:
| Name | Type | Description |
|---|---|---|
serial_number_pattern |
str
|
The regex pattern to extract the serial number from the file. |
location_pattern |
str
|
The regex pattern to extract the station from the file. |
col_names |
list
|
The column names for the dataframe. |
Returns:
| Name | Type | Description |
|---|---|---|
list |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/parse/vanessen.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
processing
¶
compensation
¶
Compensating the raw data from the absolute pressure transducer to the actual water level using the barometric pressure data.
Because van Essen Instrument divers are non-vented pressure transducers, to obtain the pressure resulting from the water column above the logger (i.e. the water level), the barometric pressure must be subtracted from the raw pressure measurements. In the first step the function aligns the two series to the same time step and then subtracts the barometric pressure from the raw pressure measurements. For short time periods (when for instance a slug test is performed) the barometric pressure can be provided as a single float value.
Subsequently the function filters out all records where the water column is less than or
equal to the cutoff value, and - always, regardless of the cutoff - every record with a
negative water column. The water column above a submerged sensor is physically
non-negative, so the near-zero readings taken while the logger is out of the water (which
produce erroneous results and spikes in the plots) and any negative values (out-of-water
/ noise / barometric-alignment artefacts) are all erroneous. The comparison is signed,
not on the absolute value, so large negative spikes are dropped rather than kept. The
cutoff defaults to 25 mm (threshold_wc=0.025) and is always applied; lower it to keep
shallower columns, or set it to 0 to drop only negatives.
Functions:
water_column: Barometrically compensate raw pressure to the water column above the
sensor (the first step, without adding the sensor altitude).
compensate: Full compensation of raw sensor pressure to groundwater head, using
``water_column`` and then adding the sensor altitude.
Compensator
¶
Bases: BaseModel
Compensate raw sensor pressure measurement with barometric pressure.
Attributes:
| Name | Type | Description |
|---|---|---|
ts |
Timeseries
|
Raw sensor timeseries |
barometric |
Timeseries | float
|
Barometric pressure timeseries or a single float value. If a float value is provided, it is assumed to be in cmH2O. |
Source code in gensor/processing/compensation.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | |
compensate(alignment_period, threshold_wc, fieldwork_dates)
¶
Perform full compensation to groundwater head (m asl).
Computes the water column with :meth:water_column, then adds the sensor
altitude (sensor_alt) to express it as head above the reference datum.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alignment_period |
Literal['D', 'ME', 'SME', 'MS', 'YE', 'YS', 'h', 'min', 's']
|
The alignment period for the timeseries. Default is 'h'. See pandas offset aliases for definitinos. |
required |
threshold_wc |
float | None
|
Lower cutoff (in m) for the water column; see
:meth: |
required |
fieldwork_dates |
Optional[list]
|
List of dates when fieldwork was done. All measurement from a fieldwork day will be set to None. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Timeseries |
Timeseries | None
|
A new Timeseries instance with the compensated data and updated unit and variable. Optionally removed outliers are included. |
Source code in gensor/processing/compensation.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | |
water_column(alignment_period, threshold_wc, fieldwork_dates)
¶
Compute the barometrically compensated water column above the sensor.
Aligns the raw and barometric series to alignment_period, subtracts the
barometric pressure, converts cmH2O to mH2O, masks fieldwork days, and drops the
out-of-water records (see threshold_wc). This is the first step of
:meth:compensate and can be used on its own to obtain just the water column
height (it does not require sensor_alt).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alignment_period |
Literal['D', 'ME', 'SME', 'MS', 'YE', 'YS', 'h', 'min', 's']
|
The alignment period for the timeseries. Default is 'h'. See pandas offset aliases for definitinos. |
required |
threshold_wc |
float | None
|
Lower cutoff (in m) for the water column.
Records at or below it are dropped, along with all negative water columns
(which are always dropped as physically impossible). |
required |
fieldwork_dates |
Optional[list]
|
List of dates when fieldwork was done. All measurement from a fieldwork day will be set to None. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Timeseries |
Timeseries | None
|
A new Timeseries of the water column height in metres (variable
'water_column', unit 'm'); dropped out-of-water records are kept in
|
Source code in gensor/processing/compensation.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
compensate(raw, barometric, alignment_period='h', threshold_wc=0.025, fieldwork_dates=None, interpolate_method=None)
¶
Compensate raw sensor pressure to groundwater head (m asl).
Computes the water column (see :func:water_column) and adds the sensor altitude.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw |
Timeseries | Dataset
|
Raw sensor timeseries |
required |
barometric |
Timeseries | float
|
Barometric pressure timeseries or a single float value. If a float value is provided, it is assumed to be in cmH2O. |
required |
alignment_period |
Literal['D', 'ME', 'SME', 'MS', 'YE', 'YS', 'h', 'min', 's']
|
The alignment period for the timeseries. Default is 'h'. See pandas offset aliases for definitinos. |
'h'
|
threshold_wc |
float | None
|
Lower cutoff (in m) for the water column; records at or below it are dropped. Defaults to 0.025 m (25 mm) and is always applied; lower it to keep shallower columns, or set 0 to drop only negatives. Negative water columns are always dropped regardless, being physically impossible. |
0.025
|
fieldwork_dates |
Dict[str, list]
|
Dictionary of location name and a list of fieldwork days. All records on the fieldwork day are set to None. |
None
|
interpolate_method |
str
|
String representing the interpolate method as in pd.Series.interpolate() method. |
None
|
Returns:
| Type | Description |
|---|---|
Timeseries | Dataset | None
|
Timeseries | Dataset | None: head (variable 'head', unit 'm asl'). |
Source code in gensor/processing/compensation.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | |
water_column(raw, barometric, alignment_period='h', threshold_wc=0.025, fieldwork_dates=None, interpolate_method=None)
¶
Barometrically compensate raw sensor pressure to the water column above the sensor.
This is the first step of :func:compensate exposed on its own: subtract the
barometric pressure, convert to mH2O, mask fieldwork days, and drop out-of-water
records (see threshold_wc) - without adding the sensor altitude, so the result is
the water column height in metres (variable 'water_column', unit 'm') rather than head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw |
Timeseries | Dataset
|
Raw sensor timeseries |
required |
barometric |
Timeseries | float
|
Barometric pressure timeseries or a single float value. If a float value is provided, it is assumed to be in cmH2O. |
required |
alignment_period |
Literal['D', 'ME', 'SME', 'MS', 'YE', 'YS', 'h', 'min', 's']
|
The alignment period for the timeseries. Default is 'h'. See pandas offset aliases for definitinos. |
'h'
|
threshold_wc |
float | None
|
Lower cutoff (in m) for the water column; records at or below it are dropped. Defaults to 0.025 m (25 mm) and is always applied; lower it to keep shallower columns, or set 0 to drop only negatives. Negative water columns are always dropped regardless, being physically impossible. |
0.025
|
fieldwork_dates |
Dict[str, list]
|
Dictionary of location name and a list of fieldwork days. All records on the fieldwork day are set to None. |
None
|
interpolate_method |
str
|
String representing the interpolate method as in pd.Series.interpolate() method. |
None
|
Returns:
| Type | Description |
|---|---|
Timeseries | Dataset | None
|
Timeseries | Dataset | None: the water column height (variable 'water_column', unit 'm'). |
Source code in gensor/processing/compensation.py
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 | |
smoothing
¶
Tools for smoothing the data.
smooth_data(data, window=5, method='rolling_mean', print_statistics=False, inplace=False, plot=False)
¶
Smooth a time series using a rolling mean or median.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
Series
|
The time series data. |
required |
window |
int
|
The size of the window for the rolling mean or median. Defaults to 5. |
5
|
method |
str
|
The method to use for smoothing. Either 'rolling_mean' or 'rolling_median'. Defaults to 'rolling_mean'. |
'rolling_mean'
|
Returns:
| Type | Description |
|---|---|
Series | None
|
pandas.Series: The smoothed time series. |
Source code in gensor/processing/smoothing.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
transform
¶
Transformation
¶
Source code in gensor/processing/transform.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
box_cox(**kwargs)
¶
Apply the Box-Cox transformation to the time series data. Only works for all positive datasets!
Other Parameters:
| Name | Type | Description |
|---|---|---|
lmbda |
float
|
The transformation parameter. If not provided, it is automatically estimated. |
Returns:
| Type | Description |
|---|---|
tuple[Series, str]
|
pandas.Series: The Box-Cox transformed time series data. |
Source code in gensor/processing/transform.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
difference(**kwargs)
¶
Difference the time series data.
Keword Arguments
periods (int): The number of periods to shift. Defaults to 1.
Returns:
| Type | Description |
|---|---|
tuple[Series, str]
|
pandas.Series: The differenced time series data. |
Source code in gensor/processing/transform.py
54 55 56 57 58 59 60 61 62 63 64 65 66 | |
log()
¶
Take the natural logarithm of the time series data.
Returns:
| Type | Description |
|---|---|
tuple[Series, str]
|
pandas.Series: The natural logarithm of the time series data. |
Source code in gensor/processing/transform.py
68 69 70 71 72 73 74 75 | |
maxabs_scaler()
¶
Normalize a pandas Series using MaxAbsScaler.
Source code in gensor/processing/transform.py
141 142 143 144 145 146 147 148 | |
minmax_scaler()
¶
Normalize a pandas Series using MinMaxScaler.
Source code in gensor/processing/transform.py
123 124 125 126 127 128 129 130 | |
robust_scaler()
¶
Normalize a pandas Series using RobustScaler.
Source code in gensor/processing/transform.py
132 133 134 135 136 137 138 139 | |
square_root()
¶
Take the square root of the time series data.
Returns:
| Type | Description |
|---|---|
tuple[Series, str]
|
pandas.Series: The square root of the time series data. |
Source code in gensor/processing/transform.py
77 78 79 80 81 82 83 84 | |
standard_scaler()
¶
Normalize a pandas Series using StandardScaler.
Source code in gensor/processing/transform.py
114 115 116 117 118 119 120 121 | |
testdata
¶
Test data for Gensor package:
Attributes:
all (Path): The whole directory of test groundwater sensor data.
baro (Path): Timeseries of barometric pressure measurements.
pb01a (Path): Timeseries of a submerged logger.
pb02a_plain (Path): Timeseries from PB02A with the metadata removed.
all_paths: Traversable = resources.files(__name__)
module-attribute
¶
The whole directory of test groundwater sensor data.
baro: Traversable = all_paths / 'Barodiver_220427183008_BY222.csv'
module-attribute
¶
Timeseries of barometric pressure measurements.
pb01a: Traversable = all_paths / 'PB01A_moni_AV319_220427183019_AV319.csv'
module-attribute
¶
Timeseries of a submerged logger.
pb02a_plain: Traversable = all_paths / 'PB02A_plain.csv'
module-attribute
¶
Timeseries from PB02A with the metadata removed.