Assembler Module
Assembler Input
The DART Assembler Module generates 3D structures of novel transition metal complexes from a database of ligands, which can either be the full MetaLig database or a subset from a user-defined chemical space. While this page focuses on the input options for the assembler, you can read more about how the DART Assembler Module works here.
The assembler module is run from the command line by providing a single configuration file:
DARTassembler assembler --path assembler_input.yml
Copy-Paste Template:
################## Settings for the DART assembler module. ##################
# Everything after '#' is ignored by the program and only there for the user.
output_directory: DART # Path to a directory for the output files.
batches: # List of batches to generate.
- name: First_batch # Name of the batch.
metal_center: Fe # Chemical symbol of the desired metal center.
metal_oxidation_state: 2 # Oxidation state of the desired metal center.
total_charge: 0 # Total charge of the complex.
geometry: 2-1-1 # Geometry of the complexes. Options: `2-1-1`, `2-2`, `mer-3-2-1`, `mer-4-1-1`, `5-1`
ligand_db_file: metalig # Path to the ligand db file. Options: `metalig`, `test_metalig`, filepath or list of paths/keywords (see documentation).
max_num_complexes: 100 # Maximum number of complexes/isomers to generate. Integer or `all`.
isomers: all # Which isomers to save for each complex. Options: `lowest_energy`, `all`
#random_seed: 0 # Optional. Random seed for reproducibility of results. Choose any integer.
#forcefield: false # Optional. Whether to optimize the structures after generation with a UFF force field. Recommended: `false`.
#bidentate_rotator: auto # Optional. How to rotate bidentate ligands in square-planar complexes. Options: `auto`, `horseshoe`, `slab`. Recommended: `auto`.
#geometry_modifier_filepath: # Optional. Path to a geometry modifier file to shift atoms in complexes.
#complex_name_appendix: # Optional. String to append to each randomly generated complex name for labeling purposes.
#ffmovie: false # Optional. Whether to output a movie (concatenated xyz file) of the forcefield optimization process.
#concatenate_xyz: true # Optional. Whether to save concatenated xyz files with all passed/failed complexes respectively.
#verbosity: 2 # Optional. Output verbosity level (0-3), recommended: `2`.
#same_isomer_names: true # Optional. Whether to give the same name to isomers of the same complex and then to number them.
#complex_name_length: 8 # Optional. Length for generated complex names, recommended: 8.
Users can download this template into their current directory by running:
DARTassembler configs --path .
Batch Options
Batch settings are mostly mandatory and specify details concerning the metal center and the ligands. Multiple batches can be specified and will be assembled in sequence. For each batch, the first option must be preceded by a hyphen (‘-’) to denote the start of the list.
Mandatory batch settings:
- name
- Options:
string
- Required:
true
Unique name for the batch for easy identification. Each batch must have a different name.
- metal_center
- Options:
chemical symbol
- Required:
true
Chemical symbol of the desired metal center, e.g.
PdorFe.
- metal_oxidation_state
- Options:
integer > 0
- Required:
true
Oxidation state of the desired metal center, e.g.
2.
- total_charge
- Options:
integer
- Required:
true
Total charge of the complex. Can be positive, negative or zero.
- geometry
- Options:
mer-3-2-1,mer-4-1-1,5-1,2-1-1,2-2- Required:
true
The geometry specifies the denticities of the ligands around the complex. For example,
mer-3-2-1would generate a complex with one mer-tridentate, one bidentate and one monodentate ligand. Currently, the following topologies are supported:Octahedral complexes:
mer-3-2-1,mer-4-1-1,5-1Square planar complexes:
2-1-1,2-2
- ligand_db_file
- Options:
empty/metalig,test_metalig, filepath OR list(filepath / keyword )- Required:
false- Default:
metalig
Specifies the source databases for ligands used in complex assembly. This option can be configured in two ways:
List of Filepaths and/or Keywords: A list where each entry is either a path to a ligand database file or the keyword
same_ligand_as_previous. The list should match the number of ligand sites as defined in thegeometryoption. For instance, in amer-3-2-1geometry, the first database in the list supplies tridentate ligands, the second supplies bidentate, and the third supplies monodentate ligands. Thesame_ligand_as_previouskeyword can be used in place of a path to indicate that the ligand for the current site should be identical to the one used in the previous site for each assembled complex. This feature is useful for creating complexes with symmetrical or repeating ligand structures.Single Filepath or Empty: When a single path is provided, ligands for all sites will be drawn from this database. Identical to specifying a list with the same ligand db path for each ligand site. If empty or
metalig, the entire MetaLig database will be used. Iftest_metalig, a small subset of the MetaLig database will be used to speed up testing.
Note: Ligands in the database with a denticity not matching the specified
geometrywill be ignored during the assembly process. This ensures that only compatible ligands are selected for complex formation.
- max_num_complexes
- Options:
integer > 0 OR
all- Required:
true
Maximum number of complexes to generate. If
max_num_complexesis set toall, it will generate all combinatorically possible complexes.Note: If
isomersis set toall, each isomer is counted as different complex. Note that the actual number of complexes generated can be a little higher in this case because for the last complex, all isomers are saved, even if this exceedsmax_num_complexes.
- isomers
- Options:
lowest_energy,all- Required:
true
The assembler will always generate all possible geometric isomers. The option
isomersdetermines which isomers are saved. Iflowest_energy, only the lowest energy isomer is saved as determined by a UFF forcefield. Ifall, all isomers are saved.
Optional batch settings:
- random_seed
- Options:
integer
- Required:
false- Default:
Randomly chosen between 1000 and 9999
Sets a seed for the random number generator to make the assembly of complexes exactly reproducible for each individual batch. If not set, a random seed between 1000 and 9999 is completely randomly chosen for each batch and recorded. That means the run is still reproducible by checking which random seed was chosen, but the seed is not known in advance.
- forcefield
- Options:
true,false- Required:
false
- Default:
false
Whether to relax the generated structures with a Universal Force Field (UFF) as implemented in the
openbabelsoftware. Because force fields often struggle to describe metal atoms, the metal center and the donor atoms are kept fixed and only the rest of the ligand atoms is relaxed.
- bidentate_rotator
- Options:
auto,horseshoe,slab- Required:
false
- Default:
auto
This option specifies how to assemble bidentate ligands in square-planar complexes. It effects only the topologies
2-2or2-1-1.horseshoeandslabare the shapes of the underlying potential energy surfaces.horseshoeworks best for ligands with a planar metallacycle, while non-planar ligands often give better results withslab.autowill choose the shape automatically based on the ligand geometry.Tip
This option can strongly affect the quality of generated complexes and how many make it through the post-assembly filter. For serious applications we recommend to set
max_num_complexesto100, try all three options and check how many complexes fail the post-assembly filter for each option (this info is displayed at the end of the assembly). Whichever option has the least number of complexes failing the post-assembly filter usually gives the highest quality geometries. This method takes only a few minutes and is demonstrated in the advanced example.
- geometry_modifier_filepath
- Options:
empty OR filepath
- Required:
false
- Default:
empty
Path to the geometry modifier file. If left empty, no geometry modification is performed.
The geometry modifier file allows very advanced and fine-grained control over the geometry of the generated complexes. Usually it is not needed, since a forcefield optimization will often be a better option. However, there might be cases where it is desired to move atoms in an assembled ligand from one position to another position for all complexes with this ligand. This can be achieved with the geometry modifier file as shown in the Pd/Ni cross coupling example.
For moving an atom to another position you need to supply the chemical symbol and the coordinates of the original atom and the coordinates the atom at it’s new coordinates. The geometry modifier file is an .xyz file with two sets of atoms: The first set is all atoms that should be moved, the second set is the new positions of these atoms. Both sets of atoms are provided as “molecule” in the .xyz format and concatenated. The order and the chemical elements of both sets of atoms have to match up. In the assembly, for each generated complex, the atoms with coordinates in the first set are moved to the coordinates in the second set.
- complex_name_appendix
- Options:
empty or string
- Required:
false
- Default:
empty
Appends a custom string to the randomly generated name of each assembled complex. For example, if the appendix is set to
_charge1, a generated complex will be named ‘ZUMUVAMI_charge1’ if otherwise it would have been named ‘ZUMUVAMI’. This can be helpful in organizing DART generated complexes and keeping track which complex belongs to which batch.
Global Options
Global options are all optional and specify settings that apply to all batches.
- output_directory
- Options:
dirpath
- Required:
false
- Default:
DART
Path to directory in which the output will be saved.
- ffmovie
- Options:
true,false- Required:
false
- Default:
false
Whether to output a movie (i.e. a concatenated .xyz file displaying multiple frames) of the forcefield optimization process. Useful for visualization e.g. with
ase gui FILE.xyz.
- concatenate_xyz
- Options:
true,false- Required:
false
- Default:
true
Whether to save concatenated xyz files with all passed/failed complexes respectively. Useful for quick visualization and browsing of the generated complexes e.g. with
ase gui FILE.xyz.
- verbosity
- Options:
0,1,2,3- Required:
false
- Default:
2
How much output to print (except the progress bars, which are always printed).
0means only errors,1means also warnings,2means also normal info,3means also debug info.
- same_isomer_names
- Options:
true,false- Required:
false
- Default:
true
If
true, isomers of the same complex will get the same name but enumerated. Very handy to see quickly which complexes are isomers of each other. If set tofalse, each isomer will get a completely unique name.
- complex_name_length
- Options:
integer > 0
- Required:
false
- Default:
8
Length of the randomly generated name for each generated complex (e.g. ‘ZUMUVAMI’).
Assembler Output
The output of the DART assembler will be saved in a specific folder. This folder is determined by the output_directory you set in your assembly input file. Within this folder, each generated metal complex has a unique name such as ‘IKOTENIC’, which is automatically generated based on its coordinates.
The assembler module creates not only the xyz files for each complex but also various other files that could be of interest. Below, you’ll find an overview of all files and folders generated by the DART assembler.
Folder Structure
Here’s what the output folder will look like:
output_directory/
├── info_table.csv (Summary Table)
├── concat_passed_complexes.xyz (Successful Complexes)
├── ffmovie.xyz (Forcefield Trajectories)
└── batches/ (Batch Folders)
├── batch_1/
│ ├── concat_passed_complexes.xyz
│ ├── concat_failed_complexes.xyz
│ ├── concat_passed_ffmovie.xyz
│ ├── concat_failed_ffmovie.xyz
│ └── complexes/ (Individual Complex Folders)
│ ├── complex_1/
│ │ ├── complex_1_structure.xyz
│ │ ├── complex_1_ligandinfo.csv
│ │ ├── complex_1_ffmovie.xyz
│ │ └── complex_1_data.json
│ ├── complex_2/
│ └── ...
└── batch_2/
└── ...
Output Files
- General Output Files:
These files provide a broad overview of the assembly process:
batches/: This is a folder that contains all the batches of assembled complexes.info_table.csv: This is a summary table listing all generated complexes and their characteristics. It’s a good starting point for understanding the results.concat_passed_complexes.xyz: This file contains the coordinates of all successfully generated complexes, bundled together. You can visualize these using software like ASE.ffmovie.xyz: This contains forcefield optimization trajectories for the successfully generated complexes.
- Batch-Specific Files:
Inside the
batches/folder, you’ll find separate folders for each batch. These folders may contain:complexes/: This is a folder that contains all the complexes for that batch.concat_passed_complexes.xyz: Coordinates of successful complexes for that specific batch.concat_failed_complexes.xyz: Coordinates of complexes that failed to assemble correctly.concat_passed_ffmovie.xyz: Forcefield optimization trajectories for successful complexes.concat_failed_ffmovie.xyz: Forcefield optimization trajectories for failed complexes.
If a file is missing, that means no complexes fall into that category for the batch (e.g., no failed complexes).
- Complex-Specific Files:
Within each batch, each complex has its own folder under
complexes/. These folders contain:NAME_structure.xyz: The 3D coordinates of the complex.NAME_ligandinfo.csv: Detailed information about the ligands in the complex.NAME_ffmovie.xyz: Forcefield optimization trajectory for the complex, if activated.NAME_data.json: This is a machine-readable data file, useful for further computational analysis.