LigandFilters Module
LigandFilters Input
The LigandFilters Module enables users to obtain a set of ligands with well-defined properties from the entire MetaLig Database. These filters are invaluable for assembling complexes targeted to a user-defined chemical space.
There are four types of filters, which can be applied to any of the MetaLig ligand properties:
property : filter by a simple named property such as
charge,archetypeorn_haptic_groups.composition : filter by composition of donor atoms or the entire ligand.
smarts : filter by a SMARTS pattern to exclude/include ligands with specific substructures, e.g.
'[N]=[N]'for azo groups.parents : filter by known parent metal centers of the ligand, e.g.
['Pt2+', 'Pd'].
The LigandFilters module is run in the terminal by providing a single configuration file:
DARTassembler ligandfilters --input ligandfilters.yml
Copy-Paste Template:
1################## Settings for the DART ligandfilters module ##################
2# file: ligandfilters.yml
3# Everything after '#' is a comment that is ignored by the program and only there for the user.
4# See the 'LigandFilters Module' section in the DART documentation for more information.
5
6db: 'metalig' # Input ligand database. Either path to a .jsonlines file or 'metalig' to load the full MetaLig database (see docs). Leaving this empty is equivalent to 'metalig'.
7outpath: 'filtered_ligand_db.jsonlines' # Output file path for the filtered ligand database. Should end with .jsonlines.
8n: 5000 # Speed up test runs by limiting the number of ligands loaded from the MetaLig. Default: None (load all ligands).
9dbinfo: True # Output an info directory about the filtering (see dbinfo module docs).
10metal: True # Include metal atoms in the concatenated .xyz files if dbinfo is True (see dbinfo module docs).
11filters:
12 #============ Simple property filters ============#
13 # Keep only cis-bidentate ligands (and haptic ligands treated as 2-cis, see docs)
14 - filter: 'property'
15 name: 'archetype'
16 values: ['1-mono']
17
18 # Keep only ligands with formal charge -2, -1 or 0
19 - filter: 'property'
20 name: 'charge'
21 values: [-2, -1, 0]
22
23 # Keep only ligands with molecular weight between 30.1 and 300.2 g/mol
24 - filter: 'property'
25 name: 'molecular_weight'
26 range: [30.1, 300.2]
27
28 # Exclude ligands which appeared less than 5 times in the CSD source data
29 - filter: 'property'
30 name: 'n_ligand_instances'
31 range: [5, 999999999]
32
33 #============ Composition filters ============#
34 # Keep only ligands containing C, H, N, O atoms (in any amount)
35 - filter: 'composition'
36 elements: 'CHNO'
37 instruction: 'must_only_contain_in_any_amount'
38 only_donors: False
39
40 # Keep only ligands with exactly 1 N donor
41 - filter: 'composition'
42 elements: 'N'
43 instruction: 'must_contain_and_only_contain'
44 only_donors: True
45
46 #============ SMARTS substructure filters ============#
47 # Keep only amide ligands
48 - filter: 'smarts'
49 smarts: '[N&D3X3!a](-[Hg])(-[C,Si])(-[C,Si])'
50 should_contain: True
51 include_metal: True
52
53 # Exclude ligands containing methylene groups (CH2)
54 - filter: 'smarts'
55 smarts: '[C&H2]'
56 should_contain: False
57 include_metal: True
58
59 #============ Parent CSD complex filters ============#
60 # Keep only ligands which have coordinated to Pt(II), Pt(IV), Pd or Ni in at least one of their parent CSD complexes.
61 - filter: 'parents'
62 metal_centers: ['Pt+2', 'Pt+4', 'Pd', 'Ni']
Users can download this template into their current directory as ligandfilters.yml by running:
DARTassembler configs --outdir .
LigandFilters Output
The output of the LigandFilters module is a file called e.g. filtered_ligand_db.jsonlines, which contains all ligands that passed the specified filters. These files can be used as input ligand_db_files for the Assembler Module. Thus, users can assemble complexes with ligands that have exactly the desired properties.
If dbinfo is True, an additional directory info_filtered_ligand_db/ is created, which contains summary information about the filtered ligands, including:
ligands_overview.csv: A summary table of the filters. The columnfilterindicates which filter each ligand failed. If a ligand passed all filters, this column contains'Passed'. The passed ligands are at the very end of the table.
filters.txt: This is the main log file of the filtering process, recording all messages, warnings, and errors during filtering.
concat_xyz/: This folder contains concatenated .xyz files of all ligands that failed a certain filter. Each file is named according to the filter it corresponds to. All the ligands which passed all filters are stored inconcat_Passed.xyz. These concatenated .xyz files can easily be browsed using thease guicommand from the ASE package:ase gui concat_Passed.xyz
LigandFilters Options
The provided filters are applied in the order they are listed in the filters list. Each filter is a dictionary with a key 'filter' specifying the filter type and other filter-specific keys. Each filter is then passed to the corresponding method of the DARTassembler.src.metalig.mol.Ligand class. Please refer to the docstrings of these 4 methods for a detailed description of all available options. You will see that the options match perfectly with the options in the .yml configuration template above.
Usage: To use these filters on a database, it is recommended to use the terminal command DARTassembler ligandfilters --input ligandfilters.yml and edit the provided configuration file as needed. The python API below is mainly to document the available options in a single place and for users who want to write their own filtering scripts in Python.
- DARTassembler.src.metalig.mol.Ligand.property_filter(self, name, range=None, values=None)
Test whether a certain simple property falls within a numeric range or matches one of the provided values.
- Parameters:
name (str) – The property name of one of the MetaLig properties.
range (list[ tuple[float, float] ] | None) – Single range tuple or list of (min, max) tuples specifying allowed numerical intervals.
values (list | None) – List of allowed exact values or None to skip value matching.
- Returns:
True if property value satisfies any provided test, otherwise False.
- Return type:
bool
- Raises:
ValueError – If the requested property is not present or a non-numeric property is tested against numeric ranges.
Examples :
To test whether a ligand has a ‘2-cis’ archetype:
ligand.property_filter(name='archetype', values=['2-cis'])
To test whether a ligand has between 1 and 5 haptic atoms:
ligand.property_filter(name='n_haptic_atoms', range=[1,5])
To test whether a ligand has between 10 and 20 atoms or between 30 and 40 atoms:
ligand.property_filter(name='n_atoms', range=[(10,20),(30,40)])
- DARTassembler.src.metalig.mol.Ligand.composition_filter(self, elements, instruction, only_donors=False)
Test whether a ligand satisfies a specified chemical composition criterion.
Depending on the value of
instruction, this method applies one of several composition-based filters using the providedelements. Ifonly_donorsisTrue, only the donor atoms are considered; otherwise all atoms in the ligand are included.- Parameters:
elements (str | list[str]) – Stoichiometry (e.g.,
'H2O') or list of element symbols (e.g.,['H', 'H', 'O']) used to define the expected atomic composition.instruction (str) –
Specifies the composition rule to apply. Supported options are:
must_contain_and_only_contain– The ligand must consist of exactly these atoms in exactly this count. Use this to match an exact stoichiometry (e.g.,C6H6for benzene).must_at_least_contain– The ligand must contain all specified elements, but may also include others.must_exclude– The ligand must not contain any of the specified elements.must_only_contain_in_any_amount– The ligand may contain any count, including zero, of the specified elements, but must not contain any other elements.
Tip
On first glance, these instructions might seem too general to make a useful filter, but by combining the same filter multiple times with different instructions, users can achieve very specific filters.
only_donors (bool) – If
True, evaluate the composition using only donor atoms; otherwise all atoms in the ligand are considered.
- Returns:
Trueif the ligand satisfies the specified composition criterion.- Return type:
bool
- Raises:
ValueError – If an unrecognized instruction string is provided.
Examples :
To select tridentate ligands whose donors are exactly two N and one C:
ligand.composition_filter( elements='CN2', instruction='must_contain_and_only_contain', only_donors=True )
To select ligands whose donors may contain only C and N atoms (or zero of either):
ligand.composition_filter( elements='CN', instruction='must_only_contain_in_any_amount', only_donors=True )
To select ligands that contain zero or more C, N, and H atoms, but no other elements:
ligand.composition_filter( elements='CNH', instruction='must_only_contain_in_any_amount', only_donors=False )
To exclude ligands containing sulfur atoms:
ligand.composition_filter( elements='S', instruction='must_exclude', only_donors=False )
To select ligands that contain at least one O atom:
ligand.composition_filter( elements='O', instruction='must_at_least_contain', only_donors=False )
- DARTassembler.src.metalig.mol.Ligand.smarts_filter(self, smarts, should_contain, include_metal=None)
Test whether the ligand matches (or not) a SMARTS substructure pattern.
If
include_metalis True, a pseudo metal center (Hg) is added to the ligand SMILES string before testing. The Hg atom is connected to all donor atoms via single bonds. This allows SMARTS patterns to target only donor atoms.- Parameters:
smarts (str) – SMARTS pattern string to search for.
should_contain (bool) – If True, ligand must contain the pattern; if False, ligand must not contain it.
include_metal (bool | None) – If True include a Hg metal center in SMILES generation (default).
- Returns:
True if the ligand satisfies the SMARTS pattern condition, otherwise False.
- Return type:
bool
Tip
While SMARTS patterns are very useful to define chemical substructures, they can be difficult to create. However, AI assistants can help you, and you can verify the correctness of your SMARTS patterns using tools such as the online SMARTS tester.
Examples :
To include only amide ligands where the N donor is coordinated to the metal and two C or Si atoms:
ligand.smarts_filter(smarts='[N&D3X3!a](-[Hg])(-[C,Si])(-[C,Si])', should_contain=True, include_metal=True)
To exclude ligands that contain at least one azo (N=N) group:
ligand.smarts_filter(smarts='[N]=[N]', should_contain=False)
- DARTassembler.src.metalig.mol.Ligand.parents_filter(self, metal_centers)
Test whether the ligand has been observed in the CSD source complexes with any of the specified parent metal centers.
While this filter does not directly check for chemistry, it is useful for maximizing compatibility of the ligand with new metal centers by selecting only those ligands that have previously coordinated to similar metals.
- Parameters:
metal_centers (list[str]) – List of metal center strings to check. Each string must be either a valid element symbol or an element symbol followed by an oxidation state.
- Returns:
True if any provided metal_center string matches recorded parent metals or oxidation states.
- Return type:
bool
Examples :
To filter for ligands that have been observed coordinating to either Fe or Cu(II):
ligand.parents_filter(metal_centers=['Fe', 'Cu+2'])