Opening and processing actigraphy data#

Welcome to circStudio! This introduction provides an overview of how to load actigraphy data using the Raw class and demonstrates how adaptor subclasses can be used to convert common actigraphy file formats into Raw instances.

We begin the tutorial by importing circStudio, Numpy, Pandas and os:

import circstudio as cs
import numpy as np
import pandas as pd
import os
import plotly.io as pio
pio.renderers.default = "notebook"

Reading actigraphy files using the Raw class directly#

The preferred method for loading actigraphy data in circStudio is to create a new instance of the Raw class. This requires the actigraphy data to be preformatted as a table containing an activity and light series.

NOTE Unlike pyActigraphy, circStudio decouples the computation of actigraphy metrics from the Raw class, which is used exclusively for preprocessing the actigraphy signal, allowing users greater flexibility. For instance, users may define custom preprocessing pipelines or apply circStudio metrics to other types of time series data, such as skin temperature.

Unlike pyActigraphy, circStudio decouples the computation of actigraphy metrics from the Raw class, which is used exclusively for preprocessing the actigraphy signal, allowing users greater flexibility. For instance, users may define custom preprocessing pipelines or apply circStudio metrics to other types of time series data, such as skin temperature.

To illustrate the creation of a Raw object, we first create a new pd.DataFrame filled with random data sampled from a normal distribution. First, we initialize a default random number generator (RNG):

rng = np.random.default_rng()

Next, we generate synthetic actigraphy data by creating random time series for activity and light. These values are indexed using a DatetimeIndex to simulate a seven-day recording period with a sampling frequency of sixty seconds. This step is intended for demonstration purposes and can be skipped if the user already has actigraphy data stored in a pandas.DataFrame:

# Define the number of samples for seven days of data at a 60s interval
n = 1440*7 # 1440 minutes/day x 7 days

# Generate a random time series to simulate activity counts
activity = rng.normal(loc=10,scale=1, size=n)

# Generate a random time series to simulate light exposure (in lux)
light = rng.normal(loc=100, scale=10, size=n)

# Create a datetime index starting on January 1st, 2025, with 60s intervals
index = pd.date_range(start='01-01-2025', freq='60s', periods=n)

# Store the synthetic data into a pd.DataFrame
data = pd.DataFrame(index=index,
                    data={
                        'Activity': activity,
                        'Light': light
                    })

# Display the resulting DataFrame
data
Activity Light
2025-01-01 00:00:00 9.972924 101.290872
2025-01-01 00:01:00 11.201004 106.171440
2025-01-01 00:02:00 9.341955 90.995576
2025-01-01 00:03:00 9.973207 87.974919
2025-01-01 00:04:00 9.714965 98.316874
... ... ...
2025-01-07 23:55:00 11.685443 98.549054
2025-01-07 23:56:00 11.018233 103.698525
2025-01-07 23:57:00 10.162152 93.505352
2025-01-07 23:58:00 9.488018 106.932148
2025-01-07 23:59:00 11.765320 97.512196

10080 rows × 2 columns

Finally, we import the Rawclass from circStudio.io and create a new Raw instance using the synthetic data. We specify the dataframe (df), activity (activity) and light (light) time serie, start time (start_time), total duration (period), and sampling frequency (frequency):

# Create a new Raw instance
raw = cs.io.Raw(
    df=data, # pd.DataFrame
    activity=data['Activity'], # Activity time series
    light=data['Light'], # Light time series
    start_time=data.index[0], # Start time
    period=(data.index[-1]-data.index[0]), # Total duration
    frequency=data.index.freq # Sampling frequency
)

Reading actigraphy files using adaptor subclasses#

To simplify the data import process, circStudio includes several adaptor subclasses for commonly used actigraphy file formats. These adaptors enable users to easily convert supported file types into Raw instances.

NOTE The Raw class is designed to be format-agnostic and flexible. Users can implement custom functions to convert data from other actigraphy file formats for which circStudio does not natively have an adaptor, as long as the resulting data is compatible with the structure expected by the Raw class. As observed in the previous section, a Raw object requires the user to provide a DataFrame containing all the data (df), activity (activity) and light (light) time series, start time (start_time), total duration (period`), and sampling frequency (frequency).

In the following example, we open a .txt file generated by an ActTrust (Condor Instruments) actigraphy device. To access example files included with circStudio, we construct a file path using os.path:

fpath = os.path.join(os.path.dirname(cs.__file__))

This retrieves the directory where circStudio is installed by referencing its __file__ attribute, which stores the path from which circStudio was imported. os.path.dirname extracts the directory name from that path.

Next, we create a new Raw instance using the auxiliary function read_atr, which is located in circStudio.io:

raw = cs.io.read_atr(os.path.join(fpath, 'data', 'test_sample_atr.txt'))
WARNING Some ATR files contain an extra line above the header, such as #ActLogModel=2.0.0, which should should be excluded when importing data. The skip_rows optional parameter allows users to skip lines during import: circStudio.io.read_atr(os.path.join(fpath, 'data', 'test_sample_atr.txt'), skip_rows=1)

Each adaptor subclass can be accessed using a helper function with the format read_XXX, where XXX indicates the file format. Besides read_atr, the available adaptor functions are:

# Actiwatch
raw = cs.io.read_awd(os.path.join(fpath, 'data', 'example_01.AWD'))
# Actigraph
raw = cs.io.read_agd(os.path.join(fpath, 'data', 'test_sample.agd'))
# Daqtometer
raw = cs.io.read_dqt(os.path.join(fpath, 'data', 'test_sample_dqt.csv'))
# Multi-Ethnic Study of Atherosclerosis (MESA)
raw = cs.io.read_mesa(os.path.join(fpath, 'data', 'test_sample_mesa.csv'))
# Respironics
raw = cs.io.read_rpx(os.path.join(fpath, 'data', 'test_sample_rpx_eng.csv'))
# Tempatilumi
raw = cs.io.read_tal(os.path.join(fpath, 'data', 'test_sample_tal.txt'))

Retrieving information from Raw instances#

After converting an original actigraphy file into a Raw object, users may access the underlying data and respective metadata. Below are some examples; for a complete description of available attributes and methods, please refer to the API.

NOTE Only the information required for calculations is imported from the original file. Metadata not used by circStudio is not imported.
  • Activity (pd.Series):

raw.activity
DATE/TIME
1918-01-01 09:00:00    1851
1918-01-01 09:01:00    2683
1918-01-01 09:02:00    5260
1918-01-01 09:03:00       8
1918-01-01 09:04:00       4
                       ... 
1918-01-05 08:55:00       0
1918-01-05 08:56:00    1271
1918-01-05 08:57:00     543
1918-01-05 08:58:00     837
1918-01-05 08:59:00    4817
Freq: min, Name: PIM, Length: 5760, dtype: int64
  • Interactive activity plot:

raw.plot(mode='activity', log=False)
  • Light (pd.Series):

raw.light
DATE/TIME
1918-01-01 09:00:00    384.44
1918-01-01 09:01:00    324.20
1918-01-01 09:02:00    320.28
1918-01-01 09:03:00    312.00
1918-01-01 09:04:00    309.69
                        ...  
1918-01-05 08:55:00      0.01
1918-01-05 08:56:00      0.01
1918-01-05 08:57:00      0.02
1918-01-05 08:58:00      0.02
1918-01-05 08:59:00      0.03
Freq: min, Name: LIGHT, Length: 5760, dtype: float64
# Create a new figure and axis
raw.plot(mode='light', log=True)
  • Plot of the temperature signal (extracted from raw.df):

raw.plot(ts='TEMPERATURE', log='True')
  • First five rows of the pd.DataFrame:

raw.df.head()
MS EVENT TEMPERATURE EXT TEMPERATURE ORIENTATION PIM PIMn TAT TATn ZCM ZCMn LIGHT AMB LIGHT RED LIGHT GREEN LIGHT BLUE LIGHT IR LIGHT UVA LIGHT UVB LIGHT STATE
DATE/TIME
1918-01-01 09:00:00 0 0 29.38 28.19 0 1851 30.850000 109 1.816670 41 0.683333 384.44 155.78 51.36 73.89 37.99 13.36 0.0 0.0 0
1918-01-01 09:01:00 0 0 29.47 28.25 0 2683 44.716700 134 2.233330 64 1.066670 324.20 131.37 43.02 62.64 31.88 11.46 0.0 0.0 0
1918-01-01 09:02:00 0 0 29.56 28.38 0 5260 87.666700 199 3.316670 171 2.850000 320.28 129.78 41.71 62.46 31.69 11.69 0.0 0.0 0
1918-01-01 09:03:00 0 0 29.75 28.56 0 8 0.133333 1 0.016667 0 0.000000 312.00 126.43 41.20 60.67 30.30 11.16 0.0 0.0 0
1918-01-01 09:04:00 0 0 29.98 28.75 0 4 0.066667 0 0.000000 0 0.000000 309.69 125.49 40.63 60.21 30.14 11.19 0.0 0.0 0
  • Acquisition frequency:

raw.frequency
Timedelta('0 days 00:01:00')
  • Duration of the data acquisition period:

raw.duration()
Timedelta('4 days 00:00:00')

Next Steps#

In the next section, we will explore data masking and resampling using methods provided by the Raw class.