Data Collection Package

Submodules

causal_testing.data_collection.data_collector module

class causal_testing.data_collection.data_collector.DataCollector(scenario: causal_testing.specification.scenario.Scenario)

Bases: abc.ABC

A data collector is a mechanism which generates or collects data from a system for a given scenario.

abstract collect_data(**kwargs) pandas.core.frame.DataFrame

Populate the dataframe with execution data. :return df: A pandas dataframe containing execution data for the system-under-test.

filter_valid_data(data: pandas.core.frame.DataFrame, check_pos: bool = True) pandas.core.frame.DataFrame

Check is execution data is valid for the scenario-under-test.

Data is invalid if it does not meet the constraints specified in the scenario-under-test.

Parameters
  • data – A pandas dataframe containing execution data from the system-under-test.

  • check_pos – Whether to check the data for positivity violations (defaults to true).

Return satisfying_data

A pandas dataframe containing execution data that satisfy the constraints specified

in the scenario-under-test.

class causal_testing.data_collection.data_collector.ExperimentalDataCollector(scenario: causal_testing.specification.scenario.Scenario, control_input_configuration: dict, treatment_input_configuration: dict, n_repeats: int = 1)

Bases: causal_testing.data_collection.data_collector.DataCollector

A data collector that generates data directly by running the system-under-test in the desired conditions.

Users should implement these methods to collect data from their system.

abstract collect_data(**kwargs) pandas.core.frame.DataFrame

Populate the dataframe with execution data.

Returns

A pandas dataframe containing execution data for the system-under-test in both control and treatment

executions.

abstract run_system_with_input_configuration(input_configuration: dict) pandas.core.frame.DataFrame

Run the system with a given input configuration and return the resulting execution data.

Parameters

input_configuration – A dictionary which maps a subset of inputs to values.

Returns

A pandas dataframe containing execution data obtained by executing the system-under-test with the

specified input configuration.

class causal_testing.data_collection.data_collector.ObservationalDataCollector(scenario: causal_testing.specification.scenario.Scenario, csv_path: str)

Bases: causal_testing.data_collection.data_collector.DataCollector

A data collector that extracts data that is relevant to the specified scenario from a csv of execution data.

collect_data(**kwargs) pandas.core.frame.DataFrame

Read a csv containing execution data for the system-under-test into a pandas dataframe and filter to remove any data which is invalid for the scenario-under-test.

Data is invalid if it does not meet the constraints outlined in the scenario-under-test (Scenario).

Returns

A pandas dataframe containing execution data that is valid for the scenario-under-test.