Preprocessing actigraphy data#
In this section, we will learn how to detect and handle spurious periods of inactivity in actigraphy recordings. These periods may occur at the start, end, or during the recording (e.g., nonwear period, device malfunction or dead battery).
We will use the circStudio package along with Python’s built-in os module to load and preprocess the data.
import circstudio as cs
import os
import plotly.io as pio
pio.renderers.default = "notebook"
First, let’s open a sample AWD file and plot the activity trace to identify potentially spurious periods of inactivity:
# Create file path for sample files within the circStudio package
fpath = os.path.join(os.path.dirname(cs.__file__))
# Create a new Raw instance using awd AWD adaptor
raw = cs.io.read_awd(os.path.join(fpath, 'data', 'example_01.AWD'))
# Plot activity signal
raw.plot(mode='activity')
From the plot above, we can identify two potentially problematic regions in the signal:
At the beginning, the activity signal shows extended periods of zero counts interspersed with sudden spikes. This pattern is typical of the device being handled but not yet worn on the wrist.
At the end, activity counts drop abruptly to zero and remain there, apart from occasional spikes. This pattern is consistent with the participant having removed the actimeter before the study officially ended.
1. Discarding invalid sequences at start and end of the recording#
You can discard these invalid sequences at the beginning and end of the recording by adjusting the start_time and period parameters; in this case, by skipping the first hour and retaining nine days of valid data:
raw = cs.io.read_awd(os.path.join(fpath, 'data', 'example_01.AWD'),
start_time='1918-01-24 08:00:00',
period='9 days')
raw.plot(mode='activity')
2. Discarding invalid sequences during the recording#
Discarding invalid sequences within the actigraphy recording can be more challenging, as care must be taken not to confuse periods of sleep or naps with nonwear artifacts.
Similar to pyActigraphy, circStudio allows users to mask spurious periods of inactivity by marking them as NaN. For some analyses, such as fitting a Cosinor model or applying mathematical models of circadian rhythms, it may be preferable to impute missing data instead. In addition, the original sampling frequency may not always be optimal for capturing the circadian patterns present in actigraphy data. Thus, circStudio provides resampling capabilities.
All of these filters /including masking, imputation, binarization, and resampling) can be toggled on or off, and they are applied simultaneously to both the activity and light time series.
2.1. Create inactivity mask for periods of inactivity longer than a given threshold#
Similar to pyActigraphy, you can create a mask to identify potential nonwear periods by specifying a minimum duration threshold (e.g., 120 minutes). Any continuous stretch of zero activity counts longer than this threshold is classified as an invalid segment and marked accordingly.
First, open sample AWD file to illustrate the use of filters:
# Create file path for sample files within the circStudio package
fpath = os.path.join(os.path.dirname(cs.__file__))
# Create a new Raw instance using awd AWD adaptor
raw = cs.io.read_awd(os.path.join(fpath, 'data', 'example_01_mask.AWD'))
# Plot activity signal
raw.plot(mode='activity')
Second, create and visualize a mask with a 120 min threshold:
raw.create_inactivity_mask(duration='120min')
raw.plot(mode='mask')
If satisfied, you can apply the mask to the original trace:
raw.apply_filters(apply_mask=True)
With the mask applied, the invalid sequences were marked with NaN:
# Plot the masked activity signal
raw.plot(mode='activity')
2.2. Impute missing values from invalid regions of the signal#
In some cases, you may prefer to work with a continuous time series. circStudio allows users to create a mask for invalid periods and impute missing values using the mean activity or light intensity for the corresponding hour of the day.
To illustrate this functionality, we call apply_filters to apply the mask (apply_mask=True) and impute missing values (impute_nan=True). This applies the mask and fills any gaps with the hourly average. Keep in mind that each time you invoke apply_filters, all previously applied filters are cleared and replaced by the new settings.
# Apply mask and impute NaN using the mean
raw.apply_filters(apply_mask=True, impute_nan=True)
Let’s visualize the result:
# Plot the treated activity signal
raw.plot(mode='activity')
2.2. Build a custom mask#
It may be helpful to store the start time, end time, and label of each invalid segment in a separate .csv file. This could either be result of visual inspection or the output of an algorithm to detect spurious periods of inactivity. In this section, you will learn how to load such a file and use it to construct a custom mask.
To begin, make sure to reset any existing mask:
# Reset mask
raw.inactivity_length = None
# Confirm that it was reset
raw.mask
Inactivity length set to None. Could not create a mask.
After resetting any existing mask, the user can use add_mask_periods to load a .csv file containing the start and stop time of inactivity periods, along with a optional label.
The CSV file should be formatted as:
Mask |
Start_time |
Stop_time |
Label (optional) |
|---|---|---|---|
mask_01 |
1918-02-01 20:42:00 |
1918-02-01 21:35:00 |
Reported in the sleep diary. |
mask_02 |
1918-02-02 13:20:00 |
1918-02-02 14:14:00 |
Found by visual inspection. |
# Create file path for sample files within the circStudio package
fpath = os.path.join(os.path.dirname(cs.__file__))
# Add mask periods from a sample masklog
raw.add_mask_periods(os.path.join(fpath,'data', 'example_masklog.csv'))
# Visualize the mask
raw.plot(mode='mask')
Under the hood, add_mask_periods iterates over the mask log and calls add_mask_period for each entry. The add_mask_period function is also available to users for manually adding individual segments, or even for a custom implementation of an algorithm to identify invalid sequences:
# Add a single mask period
raw.add_mask_period(start='1918-01-27 09:30:00',stop='1918-01-27 17:48:00')
# Visualize the mask
raw.plot(mode='mask')
2.3. Binarizing the signal#
In some applications, it may be useful to binarize the activity or light signal. The apply_filters method allows users to enable binarization and specify a custom threshold.
# Apply binarize data using a custom threshold of 4 activity counts
raw.apply_filters(binarize=True, threshold=4)
# Plot the treated activity signal
raw.plot(mode='activity')
2.4. Reset all filters#
Finally, it is possible to manually reset all the filters by calling the reset_filters method:
# Clear all the filters
raw.reset_filters()
# Plot the treated activity signal
raw.plot(mode='activity')
Next steps#
Having learned how to preprocess actigraphy data using the Rawclass, we will now learn how to calculate several activity, light and sleep-related metrics in the next section.