# Getting started
The Comets python module is intended to offer a programatic, easy and intuitive interface to `COMETS`. While it internally uses the same `COMETS` Java engine as always, it replaces the legacy parameter files, simulation layouts or models, bash scripts and output files, by python objects in a single environment, which users deal with to perform simulations and analyze the results.
Any comets simulation starts from a **layout** and a set of **parameters**. The layout specifies the environment (media metabolites, refresh values, periodic dilutions) and the species present in it, that is, the **models**. The parameters specify many simulation characteristics, such as number of iterations, timestep, type of metabolite exchange or whether to record different output logs. Therefore, the two main types of objects are `layout` and `params`. These two are passed to the `comets` class, which perform simulations and contain their output.
In this section, we will walk through the basic functionalities of `COMETS` using the Python Toolbox, and more specific examples of usage will be provided in the next sections.
## Preparing a model for `COMETS`
The class `model` is used to store the genome-scale metabolic models used in `COMETS` simulations. Most frequently, we will first load a model using `COBRAPy`. Then, we can pass it to the `COMETS` `model` class, which allows us to change `COMETS`-specific model parameters, such as initial population sizes.
```python
import cobra
import cobra.test
import cometspy as c
# Load a textbook example model using the COBRAPy toolbox
test_model = cobra.test.create_test_model('textbook')
# Use the above model to create a COMETS model
test_model = c.model(test_model)
# Change comets specific parameters, e.g. the initial biomass of the model
# Notre
test_model.initial_pop = [0, 0, 1e-7]
```
Using license file /home/djordje/gurobi.lic
Academic license - for non-commercial use only
## Setting `COMETS` simulation parameters
`COMETS` simulation parameters are stored in the `params` class, which contains just a `dict` object with the parameter names and values. If we initialize the class without arguments, it will contain the default parameter values (see [here]()). Once loaded, the parameter values can be visualized and modified as desired.
```python
# Create a parameters object with default values
my_params = c.params()
# Change the value of a parameter, for example number of simulation cycles
my_params.set_param('maxCycles', 100)
# Set some writeTotalBiomassLog parameter to True, in order to save the output
my_params.set_param('writeTotalBiomassLog', True)
# See avaliable parameters and their values
my_params.show_params()
```
|
VALUE |
UNITS |
| BiomassLogName |
biomass.txt |
|
| BiomassLogRate |
1 |
cycles |
| FluxLogName |
flux_out |
|
| FluxLogRate |
5 |
cycles |
| MediaLogName |
media_out |
|
| ... |
... |
... |
| writeBiomassLog |
False |
logical |
| writeFluxLog |
False |
logical |
| writeMediaLog |
False |
logical |
| writeSpecificMediaLog |
False |
logical |
| writeTotalBiomassLog |
True |
logical |
62 rows × 2 columns
### Preparing a `COMETS` simulation layout
The layout class describes the characteristics of the environment, i.e. the "world", including which species (models) are in it. It can be instantiated in empty or using `COMETS` models:
* If instantiated without arguments (as `my_layout = c.layout()`), an empty layout is created with all necessary fields that have to be populated.
* If a layout is instantiated passing a `model` (or several models), it will generate a layout with all metabolites those models can exchange with the environment at zero concentration, plus metals and ions at unlimited concentration (default -1000).
To examine the different parts of a Comets `layout`, let\'s first create one from the above loaded textbook model:
```python
my_layout = c.layout(test_model)
```
The layout stores information about the species (`my_layout.models`) and spatial structure (`my_layout.grid`) in the environment. In this case, the model is only the textbook one, and the grid is the default one, which is $1 \times 1$ i.e. only one cell.
The layout stores also information about the **media** as a pandas dataframe. In this case, no amount of any media component is present.
```python
my_layout.media
```
|
diff_c |
g_refresh |
g_static |
g_static_val |
init_amount |
metabolite |
| 0 |
0.000005 |
0 |
0 |
0 |
0 |
ac_e |
| 1 |
0.000005 |
0 |
0 |
0 |
0 |
acald_e |
| 2 |
0.000005 |
0 |
0 |
0 |
0 |
akg_e |
| 3 |
0.000005 |
0 |
0 |
0 |
0 |
co2_e |
| 4 |
0.000005 |
0 |
0 |
0 |
0 |
etoh_e |
| 5 |
0.000005 |
0 |
0 |
0 |
0 |
for_e |
| 6 |
0.000005 |
0 |
0 |
0 |
0 |
fru_e |
| 7 |
0.000005 |
0 |
0 |
0 |
0 |
fum_e |
| 8 |
0.000005 |
0 |
0 |
0 |
0 |
glc__D_e |
| 9 |
0.000005 |
0 |
0 |
0 |
0 |
gln__L_e |
| 10 |
0.000005 |
0 |
0 |
0 |
0 |
glu__L_e |
| 11 |
0.000005 |
0 |
0 |
0 |
0 |
h2o_e |
| 12 |
0.000005 |
0 |
0 |
0 |
0 |
h_e |
| 13 |
0.000005 |
0 |
0 |
0 |
0 |
lac__D_e |
| 14 |
0.000005 |
0 |
0 |
0 |
0 |
mal__L_e |
| 15 |
0.000005 |
0 |
0 |
0 |
0 |
nh4_e |
| 16 |
0.000005 |
0 |
0 |
0 |
0 |
o2_e |
| 17 |
0.000005 |
0 |
0 |
0 |
0 |
pi_e |
| 18 |
0.000005 |
0 |
0 |
0 |
0 |
pyr_e |
| 19 |
0.000005 |
0 |
0 |
0 |
0 |
succ_e |
| metabolite | init_amount | diff_c | g_static | g_static_val | g_refresh_val |
|------------|-------------|--------|----------|--------------|---------------|
| ca2_e | 1000 | NaN | 1 | 1000 | 0 |
| cbl_e | 1000 | NaN | 1 | 1000 | 0 |
| cl_e | 1000 | NaN | 1 | 1000 | 0 |
| . . . | . . . | . . . | . . . | . . . | . . . |
When initiated from models, the media compounds that can be in the environment are all those for which there is an exchange reaction in at least one of the models. The media, shown in the table above, is a `pandas` dataframe where several pieces of information are stored:
* `init_amount` is the initial amount to be added to each cell of the simulation grid (in mmol).
* `diff_c` indicates whether the molecule has a diffusion constant different than the default one (stored in `ec_layout.global_diff`)
* `g_static` indicates whether the component should remain at a static value, i.e. without change due to consumption and other effects of the simulation. This is useful for example, for setting some nutrients as unlimited.
* `g_static_val` indicates at which value shouold the nutrient remain static, if the previous coulmn value is 1.
* `g_refresh_val` indicates the amount of the metabolite that should be added after each simulation cycle to each cell of the grid.
In addition, we can set local `static` and `refresh` values, specific to a cell of the simulation grid.
When a media component is `static`, this means that its concentration is returned in each cycle to the set static value. This is used when we want a media component to remain virtually unlimited during a simulation.
When a media component has a `refresh` value, this means it will be replenished by adding the set amount at every simulation cycle.
Local refresh values are stored in a list, `my_layout.local_refresh`, where each element of the list is itself a list with the form `[ x y m1_r m2_r m3_r ... ]`. The first two elements `x` and `y` represent the coordinates, and are followed by the refresh values for all metabolites, in the same order as in `media`.
Local static values are stored in a similar way. Each element of the `my_layout.local_static` list is itself a list with the form `[ x y m1_s m1_s_v m2_s m2_s_v ... ]`. The difference here is that for each metabolite, there are two values, one defining whether the molecule is to be static at that coordinate (`m1_s`, `m2_s`, ... ) and another with the value at which it should be kept (`m1_s_v`, `m2_s_v`, ... ).
Note that both `local_refresh` and `local_static` can be empty (the default), or contain only entries for the coordinates where there is at least one nonzero refresh or static value, respectively.
Finally, the layout also contains information about the starting biomass of each model. This information is stored in the `initial_pop` list. Each component of `initial_pop` is itself a list with the format `[x y biomass_1 biomass_2 ...]`specifying the amount of biomass of each model in each coordinate.
### Running a `COMETS` simulation
The `comets` class uses a layout object and a parameters object to run simulations and store the output. Running a comets simulation is pretty straightforward. We firstly define the `comets` object by passing it a `layout` and a `params` objects as arguments. Then, we `run()` the simulation:
```python
my_simulation = c.comets(my_layout, my_params)
my_simulation.run()
```
Running COMETS simulation ...
Done!
### Checking simulation output and possible errors
In the background, this command invokes the `COMETS` Java engine in a console, giving a standard output (stdout) and standard error (stderr) logs. Both can be acessed through the fields `run_outputs` and `run_errors`, respectively.
```python
print(my_simulation.run_output)
```
-script
running script file: /home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_script
Current Java version: 11.0.7
Loading layout file '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/.current_layout'...
Found 1 model files!
Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ...
Loading '/home/djordje/Dropbox/projects/comets_paper/Methods_Paper_Materials/COMETS_Examples/COMETS_getting_started/Python/e_coli_core.cmd' ...
Academic license - for non-commercial use only
Academic license - for non-commercial use only
Done!
Testing default parameters...
Done!
Optimizer status code = 5 (looks ok!)
objective solution = 0.8739215069684305
Constructing world...
Done!
medialist ac_e acald_e akg_e co2_e etoh_e for_e fru_e fum_e glc__D_e gln__L_e glu__L_e h2o_e h_e lac__D_e mal__L_e nh4_e o2_e pi_e pyr_e succ_e
Cycle 1
Total biomass:
Model e_coli_core.cmd: 1.0E-7
Cycle 2
Total biomass:
Model e_coli_core.cmd: 1.0E-7
...
Total time = 0.312s
```python
print(my_simulation.run_errors)
```
STDERR empty.
### Accessing the results of the simulation
The results of the successful simulation are stored in several fields in the `comets` object, depending on whether the parameters `writeTotalBiomasslog`, `writeBiomassLog`, `writeFluxLog` and `writeMediaLog` were set to `True`.
* The field `total_biomass` stores the total biomass (summed up over all coordinates) for each timepoint and species.
* The field `biomass` stores detailed biomass values for each timepoint, coordinate and species.
* The field `media` stores the composition of the media at each timepoint.
* The field `fluxes` stores the metabolic fluxes for each species, coordinate and timepoint.
Additionally, specific comets modes will have additional output fields; for instance, if we run an evolution simulation, the field `genotypes` will store information about each species such as its ancestor and which mutation it suffered.
All of the output files ae `pandas` dataframes which can be further analyzed or plotted using standard Python tools.
```python
my_simulation.total_biomass
```
|
cycle |
e_coli_core |
| 0 |
0.0 |
1.000000e-07 |
| 1 |
1.0 |
1.000000e-07 |
| 2 |
2.0 |
1.000000e-07 |
| 3 |
3.0 |
1.000000e-07 |
| 4 |
4.0 |
1.000000e-07 |
| ... |
... |
... |
| 96 |
96.0 |
1.000000e-07 |
| 97 |
97.0 |
1.000000e-07 |
| 98 |
98.0 |
1.000000e-07 |
| 99 |
99.0 |
1.000000e-07 |
| 100 |
100.0 |
1.000000e-07 |
101 rows × 2 columns