Ligand Filters Module
The Ligand Filters Module enables users to obtain a set of ligands with well-defined properties from the entire MetaLig Database. These filters are invaluable for assembling complexes targeted to a user-defined chemical space.
Users can apply a large range of predefined filters. For those requiring precise control over the structures, the smarts filter allows for the application of powerful SMARTS patterns to filter ligands based on their 2D chemical structure. Furthermore, the graph_IDs filter enables the selection of individual ligands. Alternatively, instead of using the Ligand Filters Module with pre-defined filters, users can explore the MetaLig and create custom filters using Python.
The ligand filters module is run in the terminal by providing a single configuration file:
DARTassembler ligandfilters --path ligandfilters_input.yml
The following filters are currently implemented:
- Physical Property Filters:
- Molecular Graph Filters:
- Statistical CSD Filters:
Input File
Users interact with the Ligand Filters Module by providing an input file in YAML format. In this file, users can specify parameters for each filter, repeat the same filter with different parameters, or omit filters they don’t need. The order of filters doesn’t matter.
This template specifies all available filters and examples of their parameters:
Copy-Paste Template:
################## Settings for the DART ligand filters module. ##################
# Everything after '#' is ignored by the program and only there for the user.
input_db_file: metalig # path, 'metalig' or 'test_metalig'. Default: 'metalig'
output_db_file: filtered_ligand_db.jsonlines # path. Default: 'filtered_ligand_db.jsonlines'
output_ligands_info: true # true or false. If true, an overview of the filtered and passed ligands will be saved. Default: true
filters:
####### Physical Property Filters #######
- filter: denticities
denticities: [2, 3, 4] # Only keep ligands with these denticities
- filter: ligand_charges
ligand_charges: [-1, 0, 1] # Only keep ligands with these charges
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: ligand_composition # Filters ligands by their stoichiometry
elements: CHN # Stoichiometry/list of elements to apply this filter to
instruction: must_only_contain_in_any_amount # Instruction for how to apply this filter. Options: 'must_contain_and_only_contain', 'must_at_least_contain', 'must_exclude', 'must_only_contain_in_any_amount'
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: coordinating_atoms_composition # Filters ligands by their donor atoms
elements: CN # Stoichiometry/list of elements to apply this filter to
instruction: must_contain_and_only_contain # Instruction for how to apply this filter. Options: 'must_contain_and_only_contain', 'must_at_least_contain', 'must_exclude', 'must_only_contain_in_any_amount'
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: number_of_atoms # Filters ligands by their total atom count.
min: 10 # If empty, defaults to 0.
max: 100 # If empty, defaults to infinity.
apply_to_denticities: [1] # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: molecular_weight # Filters ligands by their molecular weight (in g/mol).
min: # If empty, defaults to 0.
max: 200 # If empty, defaults to infinity.
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: interatomic_distances # Filters ligands by interatomic distances (in Angstrom), but not only bonds.
min: 0.6 # If empty, defaults to 0.
max: # If empty, defaults to infinity.
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: planarity # The 'planarity score' is a number between 0 and 1. 1 means all ligand atoms are perfectly planar.
min: 0.9 # If empty, defaults to 0.
max: 1.0 # If empty, defaults to 1.0.
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
####### Molecular Graph Filters #######
- filter: remove_ligands_with_adjacent_coordinating_atoms # Filter out ligands with neighboring coordinating atoms
remove_ligands_with_adjacent_coordinating_atoms: true # true or false. If false, filter will have no effect.
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: remove_ligands_with_beta_hydrogens # Filter out ligands with beta hydrogens
remove_ligands_with_beta_hydrogens: true # true or false. If false, filter will have no effect.
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: remove_ligands_with_missing_bond_orders # Filter out ligands with missing bond orders
remove_ligands_with_missing_bond_orders: true # true or false. If false, filter will be ignored.
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: atomic_neighbors # Filters out ligands in which a chemical element is connected to the specified neighbors
atom: C # Chemical element of the central atom
neighbors: H2 # List of chemical elements/stoichiometry of the neighbors
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: smarts # Filter ligands using SMARTS patterns. Recommended to be used with filter:remove_ligands_with_missing_bond_orders
smarts: '[C&H2]' # SMARTS pattern to match. Important: use single quotes around the SMARTS pattern.
should_contain: false # If true, the ligand must contain the SMARTS pattern to pass the filter. If false, the ligand must not contain the SMARTS pattern to pass.
include_metal: false # If true, the ligand structure will contain a 'Cu' metal center connected to the coordinating atoms when matching the SMARTS pattern.
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: graph_IDs # Only keep ligands with specified graph IDs
graph_IDs: [a2b7bbb6ca4ce36dc3147760335e7374, 53b7a3d91a1be6e167a3975bb7921206] # List of graph IDs to keep
####### Statistical CSD Filters #######
- filter: occurrences # Filter out ligands based on the number of times they have been observed in the CSD
min: 20 # If empty, defaults to 0.
max: # If empty, defaults to infinity.
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
- filter: metal_ligand_binding_history # Only keep ligands which have been observed to coordinate to these metals
metal_ligand_binding_history: [Pd, Ni] # List of metals to keep
apply_to_denticities: # List of denticities to apply this filter to. If empty, applies to all denticities.
You can also download this template into your current directory by running:
DARTassembler configs --path .
Tip
Every filter, except denticities and graph_IDs, has an optional parameter apply_to_denticities. This parameter allows users to apply the respective filter only to ligands with the specified denticities, which can be very useful. If this parameter is empty or omitted, the filter will be applied to all ligands.
Global Options
The following options specify global settings for the Ligand Filters Module. If a setting is missing, the default value is used.
- input_db_file
- Type:
filepath,
metalig,test_metalig- Default:
metalig
Path to the input ligand database. If empty, the entire MetaLig ligand database will be used as input.
- output_db_file
- Type:
filepath
- Default:
filtered_ligand_db.jsonlines
Path to where the filtered ligand database will be saved.
- output_ligands_info
- Type:
true,false- Default:
true
If
false, only the ligand database file will be saved. Iftrue, a directory with info files about the database and the filtering process will be saved.
Physical Property Filters
- denticities
Keeps only ligands with denticities specified in the list.
- Options:
- denticities :
List of denticities to keep.
- Example:
This example will keep only ligands with denticity 2, 3 and 5.
- filter: denticities denticities: [2, 3, 5]
- ligand_charges
Keep only ligands with formal charges specified in the list.
- Options:
- ligand_charges :
List of formal charges to keep.
- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
For ligands with denticity of 2 or 3, this example will keep only ligands which have a formal charge of -1, 0 or 1. Ligands with denticities other than 2 or 3 will always pass.
- filter: ligand_charges ligand_charges: [-1, 0, 1] apply_to_denticities: [2, 3]
- ligand_composition
Filter ligands based on their chemical composition, e.g. C6H5 for phenyl. The filter has four different modes: depending on the value of instruction, the specified elements are used to check a different condition. This filter works exactly like the
coordinating_atoms_compositionfilter, except that it applies to all atoms of the ligand instead of only the set of coordinating atoms.- Options:
elements :
Stoichiometry or list of chemical elements to apply this filter to. For example, specifying
CH2Nis equivalent to[C, H, H, N]. For most instructions, the atom count is irrelevant and only the specified elements are used by the filter.instruction :
Instruction for how to apply this filter. The following instructions are available:
must_contain_and_only_containLigands must consist of exactly these atoms in exactly this count. Use this to filter for exact stoichiometry.
must_at_least_containLigands must contain all specified elements but can also contain other elements. Atom count is ignored, only elements are important.
must_excludeLigands must not contain any of the specified elements. Atom count is ignored, only elements are important.
must_only_contain_in_any_amountLigands must contain no other elements than the specified elements, but may contain not all of the specified elements. Atom count is ignored, only elements are important.
apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This will keep only ligands with exact stoichiometry of C2H6N.
- filter: ligand_composition elements: C2H6N instruction: must_contain_and_only_contain apply_to_denticities:
- Example:
This will keep only ligands which contain at least the elements C, H, N and may contain other elements.
- filter: ligand_composition elements: CHN instruction: must_at_least_contain apply_to_denticities:
- Example:
This will keep only ligands which do not contain any C, H or N atoms.
- filter: ligand_composition elements: CHN instruction: must_exclude apply_to_denticities:
- Example:
This will keep only ligands which contain C, H, N or subsets of these elements (e.g. C, H or only H).
- filter: ligand_composition elements: CHN instruction: must_only_contain_in_any_amount apply_to_denticities:
- coordinating_atoms_composition
Filter ligands based on their donor atoms. The filter has four different modes: depending on the value of instruction, the specified elements are used to check a different condition. This filter works exactly like the
ligand_compositionfilter, except that it applies only to the set of donor atoms instead of all atoms in the ligand.- Options:
elements :
Stoichiometry or list of chemical elements to apply this filter to. For example, specifying
N2is equivalent to[N, N]. For most instructions, the atom count is irrelevant and only the specified elements are used by the filter.instruction :
Instruction for how to apply this filter. The following instructions are available:
must_contain_and_only_containDonor atoms must consist of exactly these atoms in exactly this count. Use this to filter for an exact list of donor atoms, e.g. N-N ligands.
must_at_least_containDonor atoms must contain all specified elements but can also contain other elements. Atom count is ignored, only elements are important.
must_excludeDonor atoms must not contain any of the specified elements. Atom count is ignored, only elements are important.
must_only_contain_in_any_amountDonor atoms must contain no other elements than the specified elements, but may contain not all of the specified elements. Atom count is ignored, only elements are important.
apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This will keep only bidentate N-N donors.
- filter: coordinating_atoms_composition elements: N2 instruction: must_contain_and_only_contain apply_to_denticities:
- Example:
This will keep only ligands which coordinate via at least one C and one N atom, such as C-N or C-N-H donors.
- filter: coordinating_atoms_composition elements: CN instruction: must_at_least_contain apply_to_denticities:
- Example:
This will keep only ligands which do not coordinate via any C or N atoms, such as O-O donors.
- filter: coordinating_atoms_composition elements: CN instruction: must_exclude apply_to_denticities:
- Example:
This will keep only ligands which coordinate only via C and N atoms or subsets of these atoms, such as C-N-N or N-N donors.
- filter: coordinating_atoms_composition elements: CN instruction: must_only_contain_in_any_amount apply_to_denticities:
Tip
The ligand_composition and coordinating_atoms_composition filters have four different modes depending on the instruction parameter. On first glance, these modes might seem too general to make a useful filter, but by combining the same filter multiple times with different instructions, users can achieve very specific filters.
- number_of_atoms
Removes ligands with number of atoms outside of the specified range.
- Options:
- min :
Minimum number of atoms. If empty, will be set to 0.
- max :
Maximum number of atoms. If empty, will be treated as infinity.
- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This example will remove all monodentate ligands with less than 10 atoms or more than 100 atoms. Ligands with denticities other than 1 will always pass.
- filter: number_of_atoms min: 10 max: 100 apply_to_denticities: [1]
- molecular_weight
Only keeps ligands with molecular weight within the specified range.
- Options:
- min :
Minimum molecular weight in g/mol. If empty, will be set to 0.
- max :
Maximum molecular weight in g/mol. If empty, will be treated as infinity.
- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This example will keep only ligands with a maximum molecular weight of 200 g/mol.
- filter: molecular_weight min: max: 200 apply_to_denticities:
- interatomic_distances
Only keeps ligands in which all interatomic distances are within the specified range. The calculated interatomic distances are not only between atoms with a bond, but between all atoms in the ligand. The maximum interatomic distance can be used as a measure for the size of a ligand, while the minimum interatomic distance can be used as a measure for how close atoms are in the ligand. Therefore, this filter is essentially a 2-in-1 filter which can be used to remove ligands which are either too big or have atoms which are too close to each other.
- Options:
- min :
Minimum interatomic distance in Angstrom. If empty, will be set to 0.
- max :
Maximum interatomic distance in Angstrom. If empty, will be treated as infinity.
- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This filter will remove ligands if any two atoms in the ligand are closer than 0.6 Angstrom.
- filter: interatomic_distances min: 0.6 max: apply_to_denticities:
- Example:
This filter will remove “big” ligands which are more than 30 Angstroms long in any direction, without considering bulkiness.
- filter: interatomic_distances min: max: 30 apply_to_denticities:
- planarity
This filter uses a ‘planarity score’ to filter ligands based on how planar all their atoms are. Very planar ligands are ones in which all atoms lie in one plane, while very non-planar ligands are ones which are sphere-like. The planarity score is a number between 0 and 1, where 0 is not planar (a perfect sphere) and 1 is perfectly planar. Because this planarity score has no physical intuition behind it, it is recommended to try different values and see what works best for your application.
- Options:
- min :
Minimum planarity score. If empty, will be set to 0.
- max :
Maximum planarity score. If empty, will be set to 1.
- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This filter will keep only relatively planar ligands in which most atoms lie mostly in the same plane.
- filter: planarity min: 0.9 max: 1 apply_to_denticities:
Tip
There are four filters which can be used as a measure for the size and bulkiness of a ligand: number_of_atoms, molecular_weight, interatomic_distances and planarity. They all measure different aspects and can be used in combination to define the dimension of your ligands.
Molecular Graph Filters
- remove_ligands_with_adjacent_coordinating_atoms
Removes ligands that have a donor atom bonding to another donor atom, which often correlates with haptic interactions. It is recommended to always apply this filter because DART in its current version cannot assemble these ligands yet and they are filtered out during the assembly anyway.
- Options:
- remove_ligands_with_adjacent_coordinating_atoms :
If
true, apply this filter. Iffalse, this filter has no effect.- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This example will remove all ligands with neighboring coordinating atoms.
- filter: remove_ligands_with_adjacent_coordinating_atoms remove_ligands_with_adjacent_coordinating_atoms: true apply_to_denticities:
- remove_ligands_with_beta_hydrogens
Removes ligands with beta hydrogen atoms, i.e. hydrogen atoms bound to donor atoms.
- Options:
- remove_ligands_with_beta_hydrogens :
If
true, apply this filter. Iffalse, this filter has no effect.- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This example will remove all ligands with beta hydrogen atoms.
- filter: remove_ligands_with_beta_hydrogens remove_ligands_with_beta_hydrogens: true apply_to_denticities:
- remove_ligands_with_missing_bond_orders
Removes ligands with missing bond orders (~4% of ligands in the MetaLig). Most helpful in concert with the filter
smarts, since that filter will automatically pass ligands with unknown bond orders. If you want to be sure that all passed ligands obey the SMARTS filter, it is recommended to apply this filter together with the SMARTS filter.- Options:
- remove_ligands_with_missing_bond_orders :
If
true, apply this filter. Iffalse, this filter has no effect.- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This example will remove all ligands with missing bond orders.
- filter: remove_ligands_with_missing_bond_orders remove_ligands_with_missing_bond_orders: true apply_to_denticities:
- atomic_neighbors
This filter removes all ligands in which a chemical element
atomis connected to the atoms specified inneighbors. Importantly, this filter only checks if the specified atom has at least the specified neighbors, but there might be more neighbors than specified and the ligand will still be removed. For more control, use thesmartsfilter.- Options:
atom :
Chemical element of the central atom.
neighbors :
List of chemical elements or stoichiometry. The ligand will be removed if the
atomis connected to at least the specified neighbors.apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This example removes all ligands in which a C is connected to 2 H atoms, plus potentially other neighbors.
- filter: atomic_neighbors atom: C neighbors: H2 apply_to_denticities:
- smarts
This filter is a very powerful tool to filter ligands based on their 2D chemical structure, including bond orders. SMARTS is a language to describe and match chemical patterns and motifs in molecules. It can be thought of as a way to search chemical motifs in SMILES strings.
The smarts filter works by first computing the SMILES string of the ligand (with or without ‘Cu’ metal center depending on the parameter include_metal) and then matching the specified SMARTS pattern to the SMILES string using rdkit.
Warning
If a ligand has unknown bond orders (~4% of ligands in the MetaLig), it will automatically pass this filter. If you want to be sure that all passed ligands obey the SMARTS filter, it is recommended to apply this filter together with the filter
remove_ligands_with_missing_bond_orders.Note
SMARTS patterns are very expressive, but can be difficult to come up with. We recommended to use tools like SMARTSviewer to design your SMARTS pattern. We have also made very good experiences with using Large Language Models like ChatGPT. Either way, always make sure your SMARTS pattern works as intended by checking the passed and failed output ligands of the filter.
- Options:
smarts :
SMARTS pattern to match. Please note that the SMARTS pattern must be enclosed in single or double quotes, e.g. ‘[C&H2]’. Otherwise it is likely that the YAML parser will throw an error.
should_contain :
If
true, the ligand must contain the SMARTS pattern to pass. Iffalse, the ligand must not contain the SMARTS pattern to pass.include_metal :
If
true, the ligand’s coordinating atoms will be connected to a Cu metal center. The bonds between Cu and the coordinating atoms are defined as single bonds. This allows to target coordinating atoms in the SMARTS pattern in contrast to other atoms. Iffalse, the ligand will be treated as just the ligand structure without a metal center.apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This example will remove all ligands in which any C atom bonds to exactly 2 H atoms.
- filter: smarts smarts: '[C&H2]' should_contain: false include_metal: false apply_to_denticities:
- graph_IDs
A filter to keep only individually specified ligands. Graph IDs are unique IDs for each ligand which can be found in all ligand .csv files, generated e.g. by the dbinfo module. Together with writing custom filters using python, this filter is very useful for special requirements.
- Options:
- graph_IDs :
List of graph IDs of the ligands to keep.
- Example:
This example will keep only the 2 ligands with the graph IDs a2b7bbb6ca4ce36dc3147760335e7374 and 53b7a3d91a1be6e167a3975bb7921206.
- filter: graph_IDs graph_IDs: [a2b7bbb6ca4ce36dc3147760335e7374, 53b7a3d91a1be6e167a3975bb7921206]
Statistical CSD Filters
- occurrences
Filters ligands based on how often they were observed in the Cambridge Structural Database (CSD).
- Options:
- min :
Minimum number of occurrences. If empty, will be set to 0.
- max :
Maximum number of occurrences. If empty, will be treated as infinity.
- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This example will keep only ligands which have been observed in the CSD at least 20 times. This might be helpful to avoid exotic ligands and help with synthetic feasibility.
- filter: occurrences min: 20 max: apply_to_denticities:
- metal_ligand_binding_history
Keep only ligands which have been observed in the Cambridge Structural Database to coordinate to specific metals. If a ligand has never been observed coordinating to any of the specified metals it will be filtered out.
- Options:
- metal_ligand_binding_history :
List of metals, e.g. [Pd, Ni]. Any metal from the d- or f-block can be specified.
- apply_to_denticities :
Denticity or list of denticities. This filter will be applied only to ligands with the specified denticities. If empty or omitted, will apply to all ligands.
- Example:
This filter will keep only ligands which have been observed to coordinate to Pd or Ni.
- filter: metal_ligand_binding_history metal_ligand_binding_history: [Pd, Ni] apply_to_denticities: