SQLite itegration¶
If you want to process your sensor data and store it for later, you can use the sqlite integration. Gensor's Timeseries
and Dataset
come with a .to_sql()
method which is uses pandas.Series.to_sql()
method under the hood to save the data in a SQLite database.
It is a simple implementation, where each timeseries is stored in a separate schema (database table) which is named in the following pattern: f"{location}_{sensor}_{variable}_{unit}".lower()
. There is a double check on duplicates. First, when you create a Dataset
, duplicates are nicely handled by merging timeseries from the same location, sensor and of the same variable and unit. Secondly the Timeseries.to_sql()
method is designed to ignore conflicts, so only new records are inserted into the database if you attempt to run the same commend twice.
Load test data¶
import gensor as gs
from gensor.testdata import all_paths, pb02a_plain
pattern = r"[A-Za-z]{2}\d{2}[A-Za-z]{1}|Barodiver"
ds = gs.read_from_csv(path=all_paths, file_format="vanessen", location_pattern=pattern)
ds2 = gs.read_from_csv(
path=pb02a_plain, file_format="plain", location="PB02A", sensor="AV336"
)
ds.add(ds2)
INFO: Loading file: /home/runner/work/gensor/gensor/gensor/testdata/PB02A_plain.csv
INFO: Skipping file /home/runner/work/gensor/gensor/gensor/testdata/PB02A_plain.csv due to missing metadata.
INFO: Loading file: /home/runner/work/gensor/gensor/gensor/testdata/Barodiver_220427183008_BY222.csv
INFO: Loading file: /home/runner/work/gensor/gensor/gensor/testdata/PB01A_moni_AV319_220427183019_AV319.csv
INFO: Loading file: /home/runner/work/gensor/gensor/gensor/testdata/PB02A_plain.csv
Dataset(6)
Create DatabaseConnection
¶
Both saving and loading data from sqlite require a DatabaseConnection
object to be passed as attribute. You can just instanciate it with empty parentheses to create a new database in the current working directory, or specify the path and name of the database.
If you have an existing Gensor database, you can use DatabaseConnection.get_timeseries_metadata()
to see if there already are some tables in the database that you want to use. If no arguments are provided, all records are returned.
db = gs.db.DatabaseConnection()
df = db.get_timeseries_metadata()
Loading the dataset to the database is straightforward. You just need to call .to_sql()
on the dataset instance and check the tables again to see that now there are a few.
Saving dataset to SQLite database¶
Dataset, like Timeseries, can be saved to a SQLite database by simply calling .to_sql()
method and passing the DatabaseConneciton
object as argument.
You can also check which tables are currently in the database by calling DatabaseConnection.get_timeseries_metadata()
. That method will give you a dataframe with all the tables in the database. The names of the tables are composed of the location name, variable measured, unit and a uniqur 5 character hash. This is a compromise between ensuring possible addition of slightly varrying timeseries to the dataset (e.g., the same sensor at the same location but with different rope length).
After running the cells below, you should see a dataframe with 6 entries.
ds.to_sql(db)
df = db.get_timeseries_metadata()
df
table_name | location | variable | unit | start | end | extra | cls | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
1 | barodiver_pressure_cmh2o_6e17d | Barodiver | pressure | cmh2o | 20200704040000 | 20220330130000 | {'sensor': 'BY222', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
2 | barodiver_temperature_degc_f0e2c | Barodiver | temperature | degc | 20200704040000 | 20220330130000 | {'sensor': 'BY222', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
3 | pb01a_pressure_cmh2o_76f11 | PB01A | pressure | cmh2o | 20200704040000 | 20220330090000 | {'sensor': 'AV319', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
4 | pb01a_temperature_degc_a791c | PB01A | temperature | degc | 20200704040000 | 20220330090000 | {'sensor': 'AV319', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
5 | pb02a_pressure_cmh2o_f8fbe | PB02A | pressure | cmh2o | 20200704060000 | 20220207160000 | {'sensor': 'AV336', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
6 | pb02a_temperature_degc_9f9a7 | PB02A | temperature | degc | 20200704060000 | 20220207160000 | {'sensor': 'AV336', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
Reading data from SQLite¶
Use read_from_sql()
to retrieve timeseries from the database. By default, load_all
parameter is set to True, so all tables from the database are loaded as Dataset
. You can also provide parameters to retrieve only some of the tables.
new_ds: gs.Dataset = gs.read_from_sql(db)
new_ds
Dataset(6)
Adding more timeseries to SQLite¶
You can always add more timeseries to the same database. Below, we make a copy of one of the timeseries, updating it's sensor_alt
, hence, making it slightly different from the origina. Then we add it to the dataset and call to_sql()
method again with the same DatabaseConnection
object.
ts_with_sensor_alt = new_ds[2].model_copy(update={"sensor_alt": 32.0}, deep=True)
ts_with_sensor_alt
Timeseries(variable='pressure', unit='cmh2o', location='PB01A', sensor='AV319', sensor_alt=32.0, start=Timestamp('2020-07-04 04:00:00+0000', tz='UTC'), end=Timestamp('2022-03-30 09:00:00+0000', tz='UTC'))
amended_ds = new_ds.add(ts_with_sensor_alt)
amended_ds.to_sql(db)
As you see now, we have a Dataset of 7, because the new timeseries is not equal to any of the existing timeseries (differs by sensor_alt
).
Even though we called to_sql()
again on the same dataset extended by just one timeseries, we see that only one new table was created. This is because the method will figure out which timeseries are already there, and at best update those that have new records in the amended Dataset
.
amended_ds
Dataset(7)
df = db.get_timeseries_metadata()
df
table_name | location | variable | unit | start | end | extra | cls | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
1 | barodiver_pressure_cmh2o_6e17d | Barodiver | pressure | cmh2o | 20200704040000 | 20220330130000 | {'sensor': 'BY222', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
2 | barodiver_temperature_degc_f0e2c | Barodiver | temperature | degc | 20200704040000 | 20220330130000 | {'sensor': 'BY222', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
3 | pb01a_pressure_cmh2o_76f11 | PB01A | pressure | cmh2o | 20200704040000 | 20220330090000 | {'sensor': 'AV319', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
4 | pb01a_temperature_degc_a791c | PB01A | temperature | degc | 20200704040000 | 20220330090000 | {'sensor': 'AV319', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
5 | pb02a_pressure_cmh2o_f8fbe | PB02A | pressure | cmh2o | 20200704060000 | 20220207160000 | {'sensor': 'AV336', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
6 | pb02a_temperature_degc_9f9a7 | PB02A | temperature | degc | 20200704060000 | 20220207160000 | {'sensor': 'AV336', 'sensor_alt': None} | gensor.core.timeseries.Timeseries |
7 | pb01a_pressure_cmh2o_36145 | PB01A | pressure | cmh2o | 20200704040000 | 20220330090000 | {'sensor': 'AV319', 'sensor_alt': 32.0} | gensor.core.timeseries.Timeseries |