Contributing
Starting to contribute to a new repo can be daunting and the architecture can be offputting. This section is meant to help you get introduced to the codebase and make your first contribution.
Where to start?
Jump in and break things!
If your preferred way of learning is to jump in a break things, a good starting point is to run an example file, e.g.:
vai_lab --file ./examples/xml_files/user_feedback_demo.xml
which will call a module and plugin and give an example of the call stack.
Get a good grip on the code
If you prefer to learn by reading, the following sections will give you a high level overview of the codebase.
Overview
The VAI-lab codebase consists of individual modules
representing individual processes; for each module there are multiple plugins
, which are specific methods or implementations of performing these processes.
Example
For example, DataProcessing
is a Module
which will manipulate data in some way. The specific type of processing is determined by the plugin
that is chosen.
Say we have some data and we want to change all the values to either a 0 or 1, we would use the binarizer
plugin for the DataProcessing
module.
While a plugin
specifies the exact implementation to perform on a process, the core
deals with setting up and executing the plugin.
Each module contains a Core
which dictates the required methods and attributes of a compatible plugin
, it also instantiates the plugin
and executes it.
Supervisor Core
As well as each module having a Core
, there is an overarching Supervisor
Core
which calls each module sequentially, which in turn execute the plugin
. The supervisor module is the top module in Fig 1. above.
The supervisor core script is named vai_lab_core.py and can be found in the src/vai_lab/Core
directory.
Within this script there are private handler functions for different types of pipeline components, where the function name for each starts with _execute_<name of component>
Function Names
- Functions which handle component executions are named according to the following convention:
_execute_module
: instantiates a module and executes the plugin_execute_loop
: generic handler for loops, calls specific_execute_for_loop
or_execute_while_loop
functions in turn_execute_entry_point
: instantiates the uniqueInitialiser
entry point to the pipeline which deals with data definitions and config information_execute_exit_point
: handles the exiting of the pipeline, such as saving data to file
The naming convention of these functions are important, as the functions themselves are called using the python getattr
function, which takes the name of the function as a string, with the type of component. The component type is determined during setup and appended to the supervisor config.
Glossary
Module
: An container representing a generic process to manipulate or produce data - all modules are to be populated by a plugin.
Plugin
: A specific implementation or method to carry out a process. Each plugin will be inserted into a module.
Core
: Each module consists of a Core
and a set of plugins. The core is responsible for performing the required background processes and handling the plugins