# Getting started The Comets python module is intended to offer a programatic, easy and intuitive interface to `COMETS`. While it internally uses the same `COMETS` Java engine as always, it replaces the legacy parameter files, simulation layouts or models, bash scripts and output files, by python objects in a single environment, which users deal with to perform simulations and analyze the results. Any comets simulation starts from a **layout** and a set of **parameters**. The layout specifies the environment (media metabolites, refresh values, periodic dilutions) and the species present in it, that is, the **models**. The parameters specify many simulation characteristics, such as number of iterations, timestep, type of metabolite exchange or whether to record different output logs. Therefore, the two main types of objects are `layout` and `params`. These two are passed to the `comets` class, which perform simulations and contain their output. In this section, we will walk through the basic functionalities of `COMETS` using the Python Toolbox, and more specific examples of usage will be provided in the next sections. ## Preparing a model for `COMETS` The class `model` is used to store the genome-scale metabolic models used in `COMETS` simulations. Most frequently, we will first load a model using `COBRAPy`. Then, we can pass it to the `COMETS` `model` class, which allows us to change `COMETS`-specific model parameters, such as initial population sizes. ```python import cobra import cobra.test import cometspy as c # Load a textbook example model using the COBRAPy toolbox test_model = cobra.test.create_test_model('textbook') # Use the above model to create a COMETS model test_model = c.model(test_model) # Change comets specific parameters, e.g. the initial biomass of the model # Notre test_model.initial_pop = [0, 0, 1e-7] ``` Using license file /home/djordje/gurobi.lic Academic license - for non-commercial use only ## Setting `COMETS` simulation parameters `COMETS` simulation parameters are stored in the `params` class, which contains just a `dict` object with the parameter names and values. If we initialize the class without arguments, it will contain the default parameter values (see [here]()). Once loaded, the parameter values can be visualized and modified as desired. ```python # Create a parameters object with default values my_params = c.params() # Change the value of a parameter, for example number of simulation cycles my_params.set_param('maxCycles', 100) # Set some writeTotalBiomassLog parameter to True, in order to save the output my_params.set_param('writeTotalBiomassLog', True) # See avaliable parameters and their values my_params.show_params() ```
VALUE UNITS
BiomassLogName biomass.txt
BiomassLogRate 1 cycles
FluxLogName flux_out
FluxLogRate 5 cycles
MediaLogName media_out
... ... ...
writeBiomassLog False logical
writeFluxLog False logical
writeMediaLog False logical
writeSpecificMediaLog False logical
writeTotalBiomassLog True logical

62 rows × 2 columns

### Preparing a `COMETS` simulation layout The layout class describes the characteristics of the environment, i.e. the "world", including which species (models) are in it. It can be instantiated in empty or using `COMETS` models: * If instantiated without arguments (as `my_layout = c.layout()`), an empty layout is created with all necessary fields that have to be populated. * If a layout is instantiated passing a `model` (or several models), it will generate a layout with all metabolites those models can exchange with the environment at zero concentration, plus metals and ions at unlimited concentration (default -1000). To examine the different parts of a Comets `layout`, let\'s first create one from the above loaded textbook model: ```python my_layout = c.layout(test_model) ``` The layout stores information about the species (`my_layout.models`) and spatial structure (`my_layout.grid`) in the environment. In this case, the model is only the textbook one, and the grid is the default one, which is $1 \times 1$ i.e. only one cell. The layout stores also information about the **media** as a pandas dataframe. In this case, no amount of any media component is present. ```python my_layout.media ```
diff_c g_refresh g_static g_static_val init_amount metabolite
0 0.000005 0 0 0 0 ac_e
1 0.000005 0 0 0 0 acald_e
2 0.000005 0 0 0 0 akg_e
3 0.000005 0 0 0 0 co2_e
4 0.000005 0 0 0 0 etoh_e
5 0.000005 0 0 0 0 for_e
6 0.000005 0 0 0 0 fru_e
7 0.000005 0 0 0 0 fum_e
8 0.000005 0 0 0 0 glc__D_e
9 0.000005 0 0 0 0 gln__L_e
10 0.000005 0 0 0 0 glu__L_e
11 0.000005 0 0 0 0 h2o_e
12 0.000005 0 0 0 0 h_e
13 0.000005 0 0 0 0 lac__D_e
14 0.000005 0 0 0 0 mal__L_e
15 0.000005 0 0 0 0 nh4_e
16 0.000005 0 0 0 0 o2_e
17 0.000005 0 0 0 0 pi_e
18 0.000005 0 0 0 0 pyr_e
19 0.000005 0 0 0 0 succ_e
| metabolite | init_amount | diff_c | g_static | g_static_val | g_refresh_val | |------------|-------------|--------|----------|--------------|---------------| | ca2_e | 1000 | NaN | 1 | 1000 | 0 | | cbl_e | 1000 | NaN | 1 | 1000 | 0 | | cl_e | 1000 | NaN | 1 | 1000 | 0 | | . . . | . . . | . . . | . . . | . . . | . . . | When initiated from models, the media compounds that can be in the environment are all those for which there is an exchange reaction in at least one of the models. The media, shown in the table above, is a `pandas` dataframe where several pieces of information are stored: * `init_amount` is the initial amount to be added to each cell of the simulation grid (in mmol). * `diff_c` indicates whether the molecule has a diffusion constant different than the default one (stored in `ec_layout.global_diff`) * `g_static` indicates whether the component should remain at a static value, i.e. without change due to consumption and other effects of the simulation. This is useful for example, for setting some nutrients as unlimited. * `g_static_val` indicates at which value shouold the nutrient remain static, if the previous coulmn value is 1. * `g_refresh_val` indicates the amount of the metabolite that should be added after each simulation cycle to each cell of the grid. In addition, we can set local `static` and `refresh` values, specific to a cell of the simulation grid. When a media component is `static`, this means that its concentration is returned in each cycle to the set static value. This is used when we want a media component to remain virtually unlimited during a simulation. When a media component has a `refresh` value, this means it will be replenished by adding the set amount at every simulation cycle. Local refresh values are stored in a list, `my_layout.local_refresh`, where each element of the list is itself a list with the form `[ x y m1_r m2_r m3_r ... ]`. The first two elements `x` and `y` represent the coordinates, and are followed by the refresh values for all metabolites, in the same order as in `media`. Local static values are stored in a similar way. Each element of the `my_layout.local_static` list is itself a list with the form `[ x y m1_s m1_s_v m2_s m2_s_v ... ]`. The difference here is that for each metabolite, there are two values, one defining whether the molecule is to be static at that coordinate (`m1_s`, `m2_s`, ... ) and another with the value at which it should be kept (`m1_s_v`, `m2_s_v`, ... ). Note that both `local_refresh` and `local_static` can be empty (the default), or contain only entries for the coordinates where there is at least one nonzero refresh or static value, respectively. Finally, the layout also contains information about the starting biomass of each model. This information is stored in the `initial_pop` list. Each component of `initial_pop` is itself a list with the format `[x y biomass_1 biomass_2 ...]`specifying the amount of biomass of each model in each coordinate. ### Running a `COMETS` simulation The `comets` class uses a layout object and a parameters object to run simulations and store the output. Running a comets simulation is pretty straightforward. We firstly define the `comets` object by passing it a `layout` and a `params` objects as arguments. Then, we `run()` the simulation: ```python my_simulation = c.comets(my_layout, my_params) my_simulation.run() ``` Running COMETS simulation ... Done! ### Checking simulation output and possible errors In the background, this command invokes the `COMETS` Java engine in a console, giving a standard output (stdout) and standard error (stderr) logs. Both can be acessed through the fields `run_outputs` and `run_errors`, respectively. ```python print(my_simulation.run_output) ``` -script running script file: /home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_script Current Java version: 11.0.7 Loading layout file '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_layout'... Found 1 model files! Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ... Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ... Academic license - for non-commercial use only Academic license - for non-commercial use only Done! Testing default parameters... Done! Optimizer status code = 5 (looks ok!) objective solution = 0.8739215069684305 Constructing world... Done! medialist ac_e acald_e akg_e co2_e etoh_e for_e fru_e fum_e glc__D_e gln__L_e glu__L_e h2o_e h_e lac__D_e mal__L_e nh4_e o2_e pi_e pyr_e succ_e Cycle 1 Total biomass: Model e_coli_core.cmd: 1.0E-7 Cycle 2 Total biomass: Model e_coli_core.cmd: 1.0E-7 ... Total time = 0.312s ```python print(my_simulation.run_errors) ``` STDERR empty. ### Accessing the results of the simulation The results of the successful simulation are stored in several fields in the `comets` object, depending on whether the parameters `writeTotalBiomasslog`, `writeBiomassLog`, `writeFluxLog` and `writeMediaLog` were set to `True`. * The field `total_biomass` stores the total biomass (summed up over all coordinates) for each timepoint and species. * The field `biomass` stores detailed biomass values for each timepoint, coordinate and species. * The field `media` stores the composition of the media at each timepoint. * The field `fluxes` stores the metabolic fluxes for each species, coordinate and timepoint. Additionally, specific comets modes will have additional output fields; for instance, if we run an evolution simulation, the field `genotypes` will store information about each species such as its ancestor and which mutation it suffered. All of the output files ae `pandas` dataframes which can be further analyzed or plotted using standard Python tools. ```python my_simulation.total_biomass ```
cycle e_coli_core
0 0.0 1.000000e-07
1 1.0 1.000000e-07
2 2.0 1.000000e-07
3 3.0 1.000000e-07
4 4.0 1.000000e-07
... ... ...
96 96.0 1.000000e-07
97 97.0 1.000000e-07
98 98.0 1.000000e-07
99 99.0 1.000000e-07
100 100.0 1.000000e-07

101 rows × 2 columns