API reference¶
Dataset
¶
Bases: BaseModel
, Generic[T]
Store and operate on a collection of Timeseries.
Attributes:
Name | Type | Description |
---|---|---|
timeseries |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/core/dataset.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
__getitem__(index)
¶
Retrieve a Timeseries object by its index in the dataset.
Warning
Using index will return the reference to the timeseries. If you need a copy, use .filter() instead of Dataset[index]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the Timeseries to retrieve. |
required |
Returns:
Name | Type | Description |
---|---|---|
Timeseries |
T | None
|
The Timeseries object at the specified index. |
Raises:
Type | Description |
---|---|
IndexError
|
If the index is out of range. |
Source code in gensor/core/dataset.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
__iter__()
¶
Allows to iterate directly over the dataset.
Source code in gensor/core/dataset.py
25 26 27 |
|
__len__()
¶
Gives the number of timeseries in the Dataset.
Source code in gensor/core/dataset.py
29 30 31 |
|
add(other)
¶
Appends new Timeseries to the Dataset.
If an equal Timeseries already exists, merge the new data into the existing Timeseries, dropping duplicate timestamps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
Timeseries
|
The Timeseries object to add. |
required |
Source code in gensor/core/dataset.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
filter(location=None, variable=None, unit=None, **kwargs)
¶
Return a Timeseries or a new Dataset filtered by station, sensor, and/or variable.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location |
Optional[str]
|
The location name. |
None
|
variable |
Optional[str]
|
The variable being measured. |
None
|
unit |
Optional[str]
|
Unit of the measurement. |
None
|
**kwargs |
dict
|
Attributes of subclassed timeseries used for filtering (e.g., sensor, method). |
{}
|
Returns:
Type | Description |
---|---|
T | Dataset
|
Timeseries | Dataset: A single Timeseries if exactly one match is found, or a new Dataset if multiple matches are found. |
Source code in gensor/core/dataset.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
get_locations()
¶
List all unique locations in the dataset.
Source code in gensor/core/dataset.py
57 58 59 |
|
plot(include_outliers=False, plot_kwargs=None, legend_kwargs=None)
¶
Plots the timeseries data, grouping by variable type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot() method to customize the plot. |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend() to customize the legend. |
None
|
Returns:
Type | Description |
---|---|
(fig, ax)
|
Matplotlib figure and axes to allow further customization. |
Source code in gensor/core/dataset.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
to_sql(db)
¶
Save the entire timeseries to a SQLite database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db |
DatabaseConnection
|
SQLite database connection object. |
required |
Source code in gensor/core/dataset.py
151 152 153 154 155 156 157 158 159 160 |
|
Timeseries
¶
Bases: BaseTimeseries
Timeseries of groundwater sensor data.
Attributes:
Name | Type | Description |
---|---|---|
ts |
Series
|
The timeseries data. |
variable |
Literal['temperature', 'pressure', 'conductivity', 'flux']
|
The type of the measurement. |
unit |
Literal['degC', 'mmH2O', 'mS/cm', 'm/s']
|
The unit of the measurement. |
sensor |
str
|
The serial number of the sensor. |
sensor_alt |
float
|
Altitude of the sensor (ncessary to compute groundwater levels). |
Source code in gensor/core/timeseries.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
__eq__(other)
¶
Check equality based on location, sensor, variable, unit and sensor_alt.
Source code in gensor/core/timeseries.py
40 41 42 43 44 45 46 47 48 |
|
plot(include_outliers=False, ax=None, plot_kwargs=None, legend_kwargs=None)
¶
Plots the timeseries data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
ax |
Axes
|
Matplotlib axes object to plot on. If None, a new figure and axes are created. |
None
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot() method to customize the plot. |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend() to customize the legend. |
None
|
Returns:
Type | Description |
---|---|
(fig, ax)
|
Matplotlib figure and axes to allow further customization. |
Source code in gensor/core/timeseries.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
compensate(raw, barometric, alignment_period='h', threshold_wc=None, fieldwork_dates=None, interpolate_method=None)
¶
Constructor for the Comensator object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raw |
Timeseries | Dataset
|
Raw sensor timeseries |
required |
barometric |
Timeseries | float
|
Barometric pressure timeseries or a single float value. If a float value is provided, it is assumed to be in cmH2O. |
required |
alignment_period |
Literal['D', 'ME', 'SME', 'MS', 'YE', 'YS', 'h', 'min', 's']
|
The alignment period for the timeseries. Default is 'h'. See pandas offset aliases for definitinos. |
'h'
|
threshold_wc |
float
|
The threshold for the absolute water column. If it is provided, the records below that threshold are dropped. |
None
|
fieldwork_dates |
Dict[str, list]
|
Dictionary of location name and a list of fieldwork days. All records on the fieldwork day are set to None. |
None
|
interpolate_method |
str
|
String representing the interpolate method as in pd.Series.interpolate() method. |
None
|
Source code in gensor/processing/compensation.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
|
read_from_csv(path, file_format='vanessen', **kwargs)
¶
Loads the data from csv files with given file_format and returns a list of Timeseries objects.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The path to the file or directory containing the files. |
required |
**kwargs |
dict
|
Optional keyword arguments passed to the parsers: * serial_number_pattern (str): The regex pattern to extract the serial number from the file. * location_pattern (str): The regex pattern to extract the station from the file. * col_names (list): The column names for the dataframe. * location (str): Name of the location of the timeseries. * sensor (str): Sensor serial number. |
{}
|
Source code in gensor/io/read.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
read_from_sql(db, load_all=True, location=None, variable=None, unit=None, timestamp_start=None, timestamp_stop=None, **kwargs)
¶
Returns the timeseries or a dataset from a SQL database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db |
DatabaseConnection
|
The database connection object. |
required |
load_all |
bool
|
Whether to load all timeseries from the database. |
True
|
location |
str
|
The station name. |
None
|
variable |
str
|
The measurement type. |
None
|
unit |
str
|
The unit of the measurement. |
None
|
timestamp_start |
Timestamp
|
Start timestamp filter. |
None
|
timestamp_stop |
Timestamp
|
End timestamp filter. |
None
|
**kwargs |
dict
|
Any additional filters matching attributes of the particular timeseries. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Timeseries | Dataset
|
Dataset with retrieved objects or an empty Dataset. |
Source code in gensor/io/read.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
set_log_level(level)
¶
Set the logging level for the package.
Source code in gensor/log.py
4 5 6 7 |
|
analysis
¶
outliers
¶
OutlierDetection
¶
Detecting outliers in groundwater timeseries data.
Each method in this class returns a pandas.Series containing predicted outliers in the dataset.
Methods:
Name | Description |
---|---|
iqr |
Use interquartile range (IQR). |
zscore |
Use the z-score method. |
isolation_forest |
Using the isolation forest algorithm. |
lof |
Using the local outlier factor (LOF) method. |
Source code in gensor/analysis/outliers.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
__init__(data, method, rolling, window, **kwargs)
¶
Find outliers in a time series using the specified method, with an option for rolling window.
Source code in gensor/analysis/outliers.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
iqr(data, k, rolling)
staticmethod
¶
Use interquartile range (IQR).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Series
|
The time series data. |
required |
Other Parameters:
Name | Type | Description |
---|---|---|
k |
float
|
The multiplier for the IQR to define the range. Defaults to 1.5. |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: Binary mask representing the outliers as 1. |
Source code in gensor/analysis/outliers.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
isolation_forest(data, **kwargs)
¶
Using the isolation forest algorithm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Series
|
The time series data. |
required |
Other Parameters:
Name | Type | Description |
---|---|---|
n_estimators |
int
|
The number of base estimators in the ensemble. Defaults to 100. |
max_samples |
int | auto | float
|
The number of samples to draw from X to train each base estimator. Defaults to 'auto'. |
contamination |
float
|
The proportion of outliers in the data. Defaults to 0.01. |
max_features |
int | float
|
The number of features to draw from X to train each base estimator. Defaults to 1.0. |
bootstrap |
bool
|
Whether to use bootstrapping when sampling the data. Defaults to False. |
n_jobs |
int
|
The number of jobs to run in parallel. Defaults to 1. |
random_state |
int | RandomState | None
|
The random state to use. Defaults to None. |
verbose |
int
|
The verbosity level. Defaults to 0. |
warm_start |
bool
|
Whether to reuse the solution of the previous call to fit and add more estimators to the ensemble. Defaults to False. |
Note
For details on kwargs see: sklearn.ensemble.IsolationForest.
Source code in gensor/analysis/outliers.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
|
lof(data, **kwargs)
¶
Using the local outlier factor (LOF) method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Series
|
The time series data. |
required |
Other Parameters:
Name | Type | Description |
---|---|---|
n_neighbors |
int
|
The number of neighbors to consider for each sample. Defaults to 20. |
algorithm |
str
|
The algorithm to use. Either 'auto', 'ball_tree', 'kd_tree' or 'brute'. Defaults to 'auto'. |
leaf_size |
int
|
The leaf size of the tree. Defaults to 30. |
metric |
str
|
The distance metric to use. Defaults to 'minkowski'. |
p |
int
|
The power parameter for the Minkowski metric. Defaults to 2. |
contamination |
float
|
The proportion of outliers in the data. Defaults to 0.01. |
novelty |
bool
|
Whether to consider the samples as normal or outliers. Defaults to False. |
n_jobs |
int
|
The number of jobs to run in parallel. Defaults to 1. |
Note: For details on kwargs see: sklearn.neighbors.LocalOutlierFactor.
Source code in gensor/analysis/outliers.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
zscore(data, threshold, rolling)
staticmethod
¶
Use the z-score method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Series
|
The time series data. |
required |
Other Parameters:
Name | Type | Description |
---|---|---|
threshold |
float
|
The threshold for the z-score method. Defaults to 3.0. |
Returns:
Type | Description |
---|---|
ndarray
|
pandas.Series: Binary mask representing outliers. |
Source code in gensor/analysis/outliers.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
stats
¶
Module to compute timeseries statistics, similar to pastas.stats.signatures module and following Heudorfer et al. 2019
To be implemented:
- Structure
- Flashiness
- Distribution
- Modality
- Density
- Shape
- Scale
- Slope
config
¶
Warning
Whenever Timeseries objects are created via read_from_csv and use a parser (e.g., 'vanessen'), the timestamps are localized and converted to UTC. Therefore, if the user creates his own timeseries outside the read_from_csv, they should ensure that the timestamps are in UTC format.
core
¶
base
¶
BaseTimeseries
¶
Bases: BaseModel
Generic base class for timeseries with metadata.
Timeseries is a series of measurements of a single variable, in the same unit, from a single location with unique timestamps.
Attributes:
Name | Type | Description |
---|---|---|
ts |
Series
|
The timeseries data. |
variable |
Literal['temperature', 'pressure', 'conductivity', 'flux']
|
The type of the measurement. |
unit |
Literal['degC', 'mmH2O', 'mS/cm', 'm/s']
|
The unit of the measurement. |
outliers |
Series
|
Measurements marked as outliers. |
transformation |
Any
|
Metadata of transformation the timeseries undergone. |
Methods:
Name | Description |
---|---|
validate_ts |
if the pd.Series is not exactly what is required, coerce. |
Source code in gensor/core/base.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 |
|
__eq__(other)
¶
Check equality based on location, sensor, variable, unit and sensor_alt.
Source code in gensor/core/base.py
78 79 80 81 82 83 84 85 86 87 |
|
__getattr__(attr)
¶
Delegate attribute access to the underlying pandas Series if it exists.
Special handling is implemented for pandas indexer.
Source code in gensor/core/base.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
concatenate(other)
¶
Concatenate two Timeseries objects if they are considered equal.
Source code in gensor/core/base.py
134 135 136 137 138 139 140 141 142 143 144 145 |
|
detect_outliers(method, rolling=False, window=6, remove=True, **kwargs)
¶
Detects outliers in the timeseries using the specified method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method |
Literal['iqr', 'zscore', 'isolation_forest', 'lof']
|
The method to use for outlier detection. |
required |
**kwargs |
Any
|
Additional kewword arguments for OutlierDetection. |
{}
|
Returns:
Type | Description |
---|---|
T
|
Updated deep copy of the Timeseries object with outliers, |
T
|
optionally removed from the original timeseries. |
Source code in gensor/core/base.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
|
mask_with(other, mode='remove')
¶
Removes records not present in 'other' by index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
Timeseries
|
Another Timeseries whose indices are used to mask the current one. |
required |
mode |
Literal['keep', 'remove']
|
|
'remove'
|
Returns:
Name | Type | Description |
---|---|---|
Timeseries |
T
|
A new Timeseries object with the filtered data. |
Source code in gensor/core/base.py
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
|
plot(include_outliers=False, ax=None, plot_kwargs=None, legend_kwargs=None)
¶
Plots the timeseries data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
ax |
Axes
|
Matplotlib axes object to plot on. If None, a new figure and axes are created. |
None
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot() method to customize the plot. |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend() to customize the legend. |
None
|
Returns:
Type | Description |
---|---|
(fig, ax)
|
Matplotlib figure and axes to allow further customization. |
Source code in gensor/core/base.py
362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 |
|
resample(freq, agg_func=pd.Series.mean, **resample_kwargs)
¶
Resample the timeseries to a new frequency with a specified aggregation function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
freq |
Any
|
The offset string or object representing target conversion (e.g., 'D' for daily, 'W' for weekly). |
required |
agg_func |
Any
|
The aggregation function to apply after resampling. Defaults to pd.Series.mean. |
mean
|
**resample_kwargs |
Any
|
Additional keyword arguments passed to the pandas.Series.resample method. |
{}
|
Returns:
Type | Description |
---|---|
T
|
Updated deep copy of the Timeseries object with the resampled timeseries data. |
Source code in gensor/core/base.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
|
serialize_timestamps(value)
¶
Serialize pd.Timestamp
to ISO format.
Source code in gensor/core/base.py
73 74 75 76 |
|
to_sql(db)
¶
Converts the timeseries to a list of dictionaries and uploads it to the database.
The Timeseries data is uploaded to the SQL database by using the pandas
to_sql
method. Additionally, metadata about the timeseries is stored in the
'timeseries_metadata' table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db |
DatabaseConnection
|
The database connection object. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
A message indicating the number of rows inserted into the database. |
Source code in gensor/core/base.py
267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 |
|
transform(method, **transformer_kwargs)
¶
Transforms the timeseries using the specified method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method |
str
|
The method to use for transformation ('minmax', 'standard', 'robust'). |
required |
transformer_kwargs |
Any
|
Additional keyword arguments passed to the transformer definition. See gensor.preprocessing. |
{}
|
Returns:
Type | Description |
---|---|
T
|
Updated deep copy of the Timeseries object with the transformed timeseries data. |
Source code in gensor/core/base.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
|
dataset
¶
Dataset
¶
Bases: BaseModel
, Generic[T]
Store and operate on a collection of Timeseries.
Attributes:
Name | Type | Description |
---|---|---|
timeseries |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/core/dataset.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
__getitem__(index)
¶
Retrieve a Timeseries object by its index in the dataset.
Warning
Using index will return the reference to the timeseries. If you need a copy, use .filter() instead of Dataset[index]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
int
|
The index of the Timeseries to retrieve. |
required |
Returns:
Name | Type | Description |
---|---|---|
Timeseries |
T | None
|
The Timeseries object at the specified index. |
Raises:
Type | Description |
---|---|
IndexError
|
If the index is out of range. |
Source code in gensor/core/dataset.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
__iter__()
¶
Allows to iterate directly over the dataset.
Source code in gensor/core/dataset.py
25 26 27 |
|
__len__()
¶
Gives the number of timeseries in the Dataset.
Source code in gensor/core/dataset.py
29 30 31 |
|
add(other)
¶
Appends new Timeseries to the Dataset.
If an equal Timeseries already exists, merge the new data into the existing Timeseries, dropping duplicate timestamps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
Timeseries
|
The Timeseries object to add. |
required |
Source code in gensor/core/dataset.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
filter(location=None, variable=None, unit=None, **kwargs)
¶
Return a Timeseries or a new Dataset filtered by station, sensor, and/or variable.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location |
Optional[str]
|
The location name. |
None
|
variable |
Optional[str]
|
The variable being measured. |
None
|
unit |
Optional[str]
|
Unit of the measurement. |
None
|
**kwargs |
dict
|
Attributes of subclassed timeseries used for filtering (e.g., sensor, method). |
{}
|
Returns:
Type | Description |
---|---|
T | Dataset
|
Timeseries | Dataset: A single Timeseries if exactly one match is found, or a new Dataset if multiple matches are found. |
Source code in gensor/core/dataset.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
get_locations()
¶
List all unique locations in the dataset.
Source code in gensor/core/dataset.py
57 58 59 |
|
plot(include_outliers=False, plot_kwargs=None, legend_kwargs=None)
¶
Plots the timeseries data, grouping by variable type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot() method to customize the plot. |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend() to customize the legend. |
None
|
Returns:
Type | Description |
---|---|
(fig, ax)
|
Matplotlib figure and axes to allow further customization. |
Source code in gensor/core/dataset.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
to_sql(db)
¶
Save the entire timeseries to a SQLite database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db |
DatabaseConnection
|
SQLite database connection object. |
required |
Source code in gensor/core/dataset.py
151 152 153 154 155 156 157 158 159 160 |
|
indexer
¶
TimeseriesIndexer
¶
A wrapper for the Pandas indexers (e.g., loc, iloc) to return Timeseries objects.
Source code in gensor/core/indexer.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
__getitem__(key)
¶
Allows using the indexer (e.g., loc) and wraps the result in the parent Timeseries.
Source code in gensor/core/indexer.py
20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
__setitem__(key, value)
¶
Allows setting values directly using the indexer (e.g., loc, iloc).
Source code in gensor/core/indexer.py
34 35 36 37 |
|
timeseries
¶
Timeseries
¶
Bases: BaseTimeseries
Timeseries of groundwater sensor data.
Attributes:
Name | Type | Description |
---|---|---|
ts |
Series
|
The timeseries data. |
variable |
Literal['temperature', 'pressure', 'conductivity', 'flux']
|
The type of the measurement. |
unit |
Literal['degC', 'mmH2O', 'mS/cm', 'm/s']
|
The unit of the measurement. |
sensor |
str
|
The serial number of the sensor. |
sensor_alt |
float
|
Altitude of the sensor (ncessary to compute groundwater levels). |
Source code in gensor/core/timeseries.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
__eq__(other)
¶
Check equality based on location, sensor, variable, unit and sensor_alt.
Source code in gensor/core/timeseries.py
40 41 42 43 44 45 46 47 48 |
|
plot(include_outliers=False, ax=None, plot_kwargs=None, legend_kwargs=None)
¶
Plots the timeseries data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_outliers |
bool
|
Whether to include outliers in the plot. |
False
|
ax |
Axes
|
Matplotlib axes object to plot on. If None, a new figure and axes are created. |
None
|
plot_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.plot() method to customize the plot. |
None
|
legend_kwargs |
dict[str, Any] | None
|
kwargs passed to matplotlib.axes.Axes.legend() to customize the legend. |
None
|
Returns:
Type | Description |
---|---|
(fig, ax)
|
Matplotlib figure and axes to allow further customization. |
Source code in gensor/core/timeseries.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
db
¶
DB¶
Module handling database connection in case saving and loading from SQLite database is used.
Modules:
connection.py
DatabaseConnection
¶
Bases: BaseModel
Database connection object. If no database exists at the specified path, it will be created. If no database is specified, an in-memory database will be used.
Attributes metadata (MetaData): SQLAlchemy metadata object. db_directory (Path): Path to the database to connect to. db_name (str): Name for the database to connect to. engine (Engine | None): SQLAlchemy Engine instance.
Source code in gensor/db/connection.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
__enter__()
¶
Enable usage in a with
block by returning the engine.
Source code in gensor/db/connection.py
83 84 85 86 87 88 |
|
__exit__(exc_type, exc_val, exc_tb)
¶
Dispose of the engine when exiting the with
block.
Source code in gensor/db/connection.py
90 91 92 |
|
connect()
¶
Connect to the database and initialize the engine. If engine is None > create it with verified path > reflect. After connecting, ensure the timeseries_metadata table is present.
Source code in gensor/db/connection.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
create_metadata()
¶
Create a metadata table if it doesn't exist yet and store ts metadata.
Source code in gensor/db/connection.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
create_table(schema_name, column_name)
¶
Create a table in the database.
Schema name is a string representing the location, sensor, variable measured and
unit of measurement. This is a way of preserving the metadata of the Timeseries.
The index is always timestamp
and the column name is dynamicly create from
the measured variable.
Source code in gensor/db/connection.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
dispose()
¶
Dispose of the engine, closing all connections.
Source code in gensor/db/connection.py
76 77 78 79 80 81 |
|
get_timeseries_metadata(location=None, variable=None, unit=None, **kwargs)
¶
List timeseries available in the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location |
str
|
Location attribute to match. |
None
|
variable |
str
|
Variable attribute to match. |
None
|
unit |
str
|
Unit attribute to match. |
None
|
**kwargs |
dict
|
Additional filters. Must match the attributes of the Timeseries instance user is trying to retrieve. |
{}
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: The name of the matching table or None if no table is found. |
Source code in gensor/db/connection.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
|
connection
¶
Module defining database connection object.
Classes:
Name | Description |
---|---|
DatabaseConnection |
Database connection object |
DatabaseConnection
¶
Bases: BaseModel
Database connection object. If no database exists at the specified path, it will be created. If no database is specified, an in-memory database will be used.
Attributes metadata (MetaData): SQLAlchemy metadata object. db_directory (Path): Path to the database to connect to. db_name (str): Name for the database to connect to. engine (Engine | None): SQLAlchemy Engine instance.
Source code in gensor/db/connection.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
__enter__()
¶
Enable usage in a with
block by returning the engine.
Source code in gensor/db/connection.py
83 84 85 86 87 88 |
|
__exit__(exc_type, exc_val, exc_tb)
¶
Dispose of the engine when exiting the with
block.
Source code in gensor/db/connection.py
90 91 92 |
|
connect()
¶
Connect to the database and initialize the engine. If engine is None > create it with verified path > reflect. After connecting, ensure the timeseries_metadata table is present.
Source code in gensor/db/connection.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
create_metadata()
¶
Create a metadata table if it doesn't exist yet and store ts metadata.
Source code in gensor/db/connection.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
create_table(schema_name, column_name)
¶
Create a table in the database.
Schema name is a string representing the location, sensor, variable measured and
unit of measurement. This is a way of preserving the metadata of the Timeseries.
The index is always timestamp
and the column name is dynamicly create from
the measured variable.
Source code in gensor/db/connection.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
dispose()
¶
Dispose of the engine, closing all connections.
Source code in gensor/db/connection.py
76 77 78 79 80 81 |
|
get_timeseries_metadata(location=None, variable=None, unit=None, **kwargs)
¶
List timeseries available in the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
location |
str
|
Location attribute to match. |
None
|
variable |
str
|
Variable attribute to match. |
None
|
unit |
str
|
Unit attribute to match. |
None
|
**kwargs |
dict
|
Additional filters. Must match the attributes of the Timeseries instance user is trying to retrieve. |
{}
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: The name of the matching table or None if no table is found. |
Source code in gensor/db/connection.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
|
exceptions
¶
IndexOutOfRangeError
¶
Bases: IndexError
Custom exception raised when an index is out of range in the dataset.
Source code in gensor/exceptions.py
37 38 39 40 41 42 43 |
|
InvalidMeasurementTypeError
¶
Bases: ValueError
Raised when a timeseries of a wrong measurement type is operated upon.
Source code in gensor/exceptions.py
1 2 3 4 5 6 7 |
|
MissingInputError
¶
Bases: ValueError
Raised when a required input is missing.
Source code in gensor/exceptions.py
10 11 12 13 14 15 16 17 |
|
TimeseriesUnequal
¶
Bases: ValueError
Raised when Timeseries objects are compared and are unequal.
Source code in gensor/exceptions.py
26 27 28 29 30 31 32 33 34 |
|
io
¶
read
¶
Fetching the data from various sources.
TODO: Fix up the read_from_sql() function to actually work properly.
read_from_api()
¶
Fetch data from the API.
Source code in gensor/io/read.py
190 191 192 |
|
read_from_csv(path, file_format='vanessen', **kwargs)
¶
Loads the data from csv files with given file_format and returns a list of Timeseries objects.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The path to the file or directory containing the files. |
required |
**kwargs |
dict
|
Optional keyword arguments passed to the parsers: * serial_number_pattern (str): The regex pattern to extract the serial number from the file. * location_pattern (str): The regex pattern to extract the station from the file. * col_names (list): The column names for the dataframe. * location (str): Name of the location of the timeseries. * sensor (str): Sensor serial number. |
{}
|
Source code in gensor/io/read.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
read_from_sql(db, load_all=True, location=None, variable=None, unit=None, timestamp_start=None, timestamp_stop=None, **kwargs)
¶
Returns the timeseries or a dataset from a SQL database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db |
DatabaseConnection
|
The database connection object. |
required |
load_all |
bool
|
Whether to load all timeseries from the database. |
True
|
location |
str
|
The station name. |
None
|
variable |
str
|
The measurement type. |
None
|
unit |
str
|
The unit of the measurement. |
None
|
timestamp_start |
Timestamp
|
Start timestamp filter. |
None
|
timestamp_stop |
Timestamp
|
End timestamp filter. |
None
|
**kwargs |
dict
|
Any additional filters matching attributes of the particular timeseries. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Timeseries | Dataset
|
Dataset with retrieved objects or an empty Dataset. |
Source code in gensor/io/read.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
log
¶
set_log_level(level)
¶
Set the logging level for the package.
Source code in gensor/log.py
4 5 6 7 |
|
parse
¶
parse_plain(path, **kwargs)
¶
Parse a simple csv without metadata header, just columns with variables
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The path to the file. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/parse/plain.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
parse_vanessen_csv(path, **kwargs)
¶
Parses a van Essen csv file and returns a list of Timeseries objects. At this point it does not matter whether the file is a barometric or piezometric logger file.
The function will use regex patterns to extract the serial number and station from the file. It is important to use the appropriate regex patterns, particularily for the station. If the default patterns are not working (whihc most likely will be the case), the user should provide their own patterns. The patterns can be provided as keyword arguments to the function and it is possible to use OR (|) in the regex pattern.
Warning
A better check for the variable type and units has to be implemented.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The path to the file. |
required |
Other Parameters:
Name | Type | Description |
---|---|---|
serial_number_pattern |
str
|
The regex pattern to extract the serial number from the file. |
location_pattern |
str
|
The regex pattern to extract the station from the file. |
col_names |
list
|
The column names for the dataframe. |
Returns:
Name | Type | Description |
---|---|---|
list |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/parse/vanessen.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
plain
¶
parse_plain(path, **kwargs)
¶
Parse a simple csv without metadata header, just columns with variables
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The path to the file. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/parse/plain.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
utils
¶
detect_encoding(path, num_bytes=1024)
¶
Detect the encoding of a file using chardet.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The path to the file. |
required |
num_bytes |
int
|
Number of bytes to read for encoding detection (default is 1024). |
1024
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The detected encoding of the file. |
Source code in gensor/parse/utils.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
get_data(text, data_start, data_end, column_names)
¶
Search for data in the file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
string obtained from the CSV file. |
required |
data_start |
str
|
string at the first row of the data. |
required |
data_end |
str
|
string at the last row of the data. |
required |
column_names |
list
|
list of expected column names. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame |
Source code in gensor/parse/utils.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
get_metadata(text, patterns)
¶
Search for metadata in the file header with given regex patterns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
string obtained from the CSV file. |
required |
patterns |
dict
|
regex patterns matching the location and sensor information. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
metadata of the timeseries. |
Source code in gensor/parse/utils.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
handle_timestamps(df, tz_string)
¶
Converts timestamps in the dataframe to the specified timezone (e.g., 'UTC+1').
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe with timestamps. |
required |
tz_string |
str
|
A timezone string like 'UTC+1' or 'UTC-5'. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: The dataframe with timestamps converted to UTC. |
Source code in gensor/parse/utils.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
vanessen
¶
Logic parsing CSV files from van Essen Instruments Divers.
parse_vanessen_csv(path, **kwargs)
¶
Parses a van Essen csv file and returns a list of Timeseries objects. At this point it does not matter whether the file is a barometric or piezometric logger file.
The function will use regex patterns to extract the serial number and station from the file. It is important to use the appropriate regex patterns, particularily for the station. If the default patterns are not working (whihc most likely will be the case), the user should provide their own patterns. The patterns can be provided as keyword arguments to the function and it is possible to use OR (|) in the regex pattern.
Warning
A better check for the variable type and units has to be implemented.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Path
|
The path to the file. |
required |
Other Parameters:
Name | Type | Description |
---|---|---|
serial_number_pattern |
str
|
The regex pattern to extract the serial number from the file. |
location_pattern |
str
|
The regex pattern to extract the station from the file. |
col_names |
list
|
The column names for the dataframe. |
Returns:
Name | Type | Description |
---|---|---|
list |
list[Timeseries]
|
A list of Timeseries objects. |
Source code in gensor/parse/vanessen.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
processing
¶
compensation
¶
Compensating the raw data from the absolute pressure transducer to the actual water level using the barometric pressure data.
Because van Essen Instrument divers are non-vented pressure transducers, to obtain the pressure resulting from the water column above the logger (i.e. the water level), the barometric pressure must be subtracted from the raw pressure measurements. In the first step the function aligns the two series to the same time step and then subtracts the barometric pressure from the raw pressure measurements. For short time periods (when for instance a slug test is performed) the barometric pressure can be provided as a single float value.
Subsequently the function filters out all records where the absolute water column is less than or equal to the cutoff value. This is because when the logger is out of the water when the measurement is taken, the absolute water column is close to zero, producing erroneous results and spikes in the plots. The cutoff value is set to 5 cm by default, but can be adjusted using the cutoff_wc kwarg.
Functions:
compensate: Compensate raw sensor pressure measurement with barometric pressure.
Compensator
¶
Bases: BaseModel
Compensate raw sensor pressure measurement with barometric pressure.
Attributes:
Name | Type | Description |
---|---|---|
ts |
Timeseries
|
Raw sensor timeseries |
barometric |
Timeseries | float
|
Barometric pressure timeseries or a single float value. If a float value is provided, it is assumed to be in cmH2O. |
Source code in gensor/processing/compensation.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
compensate(alignment_period, threshold_wc, fieldwork_dates)
¶
Perform compensation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
alignment_period |
Literal['D', 'ME', 'SME', 'MS', 'YE', 'YS', 'h', 'min', 's']
|
The alignment period for the timeseries. Default is 'h'. See pandas offset aliases for definitinos. |
required |
threshold_wc |
float
|
The threshold for the absolute water column. |
required |
fieldwork_dates |
Optional[list]
|
List of dates when fieldwork was done. All measurement from a fieldwork day will be set to None. |
required |
Returns:
Name | Type | Description |
---|---|---|
Timeseries |
Timeseries | None
|
A new Timeseries instance with the compensated data and updated unit and variable. Optionally removed outliers are included. |
Source code in gensor/processing/compensation.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
compensate(raw, barometric, alignment_period='h', threshold_wc=None, fieldwork_dates=None, interpolate_method=None)
¶
Constructor for the Comensator object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raw |
Timeseries | Dataset
|
Raw sensor timeseries |
required |
barometric |
Timeseries | float
|
Barometric pressure timeseries or a single float value. If a float value is provided, it is assumed to be in cmH2O. |
required |
alignment_period |
Literal['D', 'ME', 'SME', 'MS', 'YE', 'YS', 'h', 'min', 's']
|
The alignment period for the timeseries. Default is 'h'. See pandas offset aliases for definitinos. |
'h'
|
threshold_wc |
float
|
The threshold for the absolute water column. If it is provided, the records below that threshold are dropped. |
None
|
fieldwork_dates |
Dict[str, list]
|
Dictionary of location name and a list of fieldwork days. All records on the fieldwork day are set to None. |
None
|
interpolate_method |
str
|
String representing the interpolate method as in pd.Series.interpolate() method. |
None
|
Source code in gensor/processing/compensation.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
|
smoothing
¶
Tools for smoothing the data.
smooth_data(data, window=5, method='rolling_mean', print_statistics=False, inplace=False, plot=False)
¶
Smooth a time series using a rolling mean or median.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Series
|
The time series data. |
required |
window |
int
|
The size of the window for the rolling mean or median. Defaults to 5. |
5
|
method |
str
|
The method to use for smoothing. Either 'rolling_mean' or 'rolling_median'. Defaults to 'rolling_mean'. |
'rolling_mean'
|
Returns:
Type | Description |
---|---|
Series | None
|
pandas.Series: The smoothed time series. |
Source code in gensor/processing/smoothing.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
transform
¶
Transformation
¶
Source code in gensor/processing/transform.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
|
box_cox(**kwargs)
¶
Apply the Box-Cox transformation to the time series data. Only works for all positive datasets!
Other Parameters:
Name | Type | Description |
---|---|---|
lmbda |
float
|
The transformation parameter. If not provided, it is automatically estimated. |
Returns:
Type | Description |
---|---|
tuple[Series, str]
|
pandas.Series: The Box-Cox transformed time series data. |
Source code in gensor/processing/transform.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
difference(**kwargs)
¶
Difference the time series data.
Keword Arguments
periods (int): The number of periods to shift. Defaults to 1.
Returns:
Type | Description |
---|---|
tuple[Series, str]
|
pandas.Series: The differenced time series data. |
Source code in gensor/processing/transform.py
54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
log()
¶
Take the natural logarithm of the time series data.
Returns:
Type | Description |
---|---|
tuple[Series, str]
|
pandas.Series: The natural logarithm of the time series data. |
Source code in gensor/processing/transform.py
68 69 70 71 72 73 74 75 |
|
maxabs_scaler()
¶
Normalize a pandas Series using MaxAbsScaler.
Source code in gensor/processing/transform.py
141 142 143 144 145 146 147 148 |
|
minmax_scaler()
¶
Normalize a pandas Series using MinMaxScaler.
Source code in gensor/processing/transform.py
123 124 125 126 127 128 129 130 |
|
robust_scaler()
¶
Normalize a pandas Series using RobustScaler.
Source code in gensor/processing/transform.py
132 133 134 135 136 137 138 139 |
|
square_root()
¶
Take the square root of the time series data.
Returns:
Type | Description |
---|---|
tuple[Series, str]
|
pandas.Series: The square root of the time series data. |
Source code in gensor/processing/transform.py
77 78 79 80 81 82 83 84 |
|
standard_scaler()
¶
Normalize a pandas Series using StandardScaler.
Source code in gensor/processing/transform.py
114 115 116 117 118 119 120 121 |
|
testdata
¶
Test data for Gensor package:
Attributes:
all (Path): The whole directory of test groundwater sensor data.
baro (Path): Timeseries of barometric pressure measurements.
pb01a (Path): Timeseries of a submerged logger.
pb02a_plain (Path): Timeseries from PB02A with the metadata removed.
all_paths: Traversable = resources.files(__name__)
module-attribute
¶
The whole directory of test groundwater sensor data.
baro: Traversable = all_paths / 'Barodiver_220427183008_BY222.csv'
module-attribute
¶
Timeseries of barometric pressure measurements.
pb01a: Traversable = all_paths / 'PB01A_moni_AV319_220427183019_AV319.csv'
module-attribute
¶
Timeseries of a submerged logger.
pb02a_plain: Traversable = all_paths / 'PB02A_plain.csv'
module-attribute
¶
Timeseries from PB02A with the metadata removed.