LigandFilters API
Example Usage
The following filters will return cis-bidentate N-O donors with up to 50 atoms that do not contain any CH2 groups, only contain C, H, N, O atoms in total, and have been observed to coordinate to Pt+2, Pt+4, Pd, or Ni metal centers. We speed up the filtering by only loading 1000 ligands from the MetaLig database.
from DARTassembler import LigandFilters
filters = LigandFilters(db='metalig', n=1000)
db = filters.run(
filters=[
{'filter': 'property', 'name': 'n_atoms', 'range': [1, 50]},
{'filter': 'property', 'name': 'archetype', 'values': ['2-cis']},
{'filter': 'smarts', 'smarts': '[C&H2]', 'should_contain': False, 'include_metal': True},
{'filter': 'composition', 'elements': 'CHNO', 'instruction': 'must_at_least_contain', 'only_donors': False},
{'filter': 'composition', 'elements': 'NO', 'instruction': 'must_contain_and_only_contain', 'only_donors': True},
{'filter': 'parents', 'metal_centers': ['Pt+2', 'Pt+4', 'Pd', 'Ni']}
],
outpath='filtered_ligand_db.jsonlines',
dbinfo=True,
metal=True,
)
- class DARTassembler.src.metalig.ligandfilters.LigandFilters(db, n=None)[source]
Bases:
BaseModuleThis module applies user-defined filters to a ligand database to obtain a subset of ligands with desired properties.
Initialize the DART LigandFilters module. The options set here applied to all batches.
Tip
All the parameters below are available as well via the ligandfilters .yml file as global options.
- Parameters:
db (str | None) – .jsonlines ligand db filepath or None to use the entire MetaLig database.
n (int | None) – Maximum number of ligands to load from the database. If None, load all ligands.
- Returns:
None
- Return type:
None
- apply_filters(filters)[source]
Apply a sequence of filters to the loaded ligand database and return the filtered database.
Each filter in the list must be a dictionary with a key ‘filter’ specifying the filter type. Supported filters:
‘property’ : filter by a named global property (e.g. ‘charge’, ‘archetype’).
‘composition’ : filter by element composition or stoichiometry (elements may be a string that is converted to an atom list).
‘parents’ : filter by parent metal centers (e.g. [‘Fe’, ‘Co’]).
‘smarts’ : filter by a SMARTS pattern; note that a bond-order validity property filter is prepended automatically.
- Parameters:
filters (list[dict]) – Ordered list of filter specification dictionaries. Each dictionary must contain at least the key ‘filter’ and other filter-specific keys.
- Raises:
ValueError – If a filter type specified in any filter dict is not recognized.
- Returns:
LigandDB object containing only the ligands that passed all filters.
- Return type:
- save_filtered_ligands_output()[source]
Write informational outputs for filtered ligands: a summary text, a CSV overview, and concatenated XYZ files.
The method creates an ‘info’ directory next to the specified output ligand DB path and writes:
filters.txt : human readable filter summary produced by _get_filter_tracking_string,
ligands_overview.csv : table with ligand metadata and filter assignment,
concat_*.xyz : concatenated xyz files for ‘Passed’ ligands and for each filter-specific removal group.
- Returns:
None
- Return type:
None
- classmethod run_from_yaml(input, n=None)[source]
Create and run a LigandFilters instance from a YAML specification file.
If input is None, a default template ligandfilters.yml is used. The YAML file must contain top-level keys ‘db’ (optional), ‘n’ (optional) and the filter list and options required by the run(…) method.
- Parameters:
input (Union[str, Path, None]) – Path to the filter input file (.yml) or None to use a default template.
n (Union[int, None]) – Number of ligand objects to include in the output. Takes precedence over the ‘n’ value in the YAML file if provided.
- Returns:
A LigandFilters instance after executing the configured filters.
- Return type:
- classmethod run_from_cli(input=None, n=None)[source]
Run ligand filtering using command-line style setup helpers and a YAML input.
This method wraps run_from_yaml with BaseModule CLI pre/post hooks to provide standardized CLI logging and argument printing.
- Parameters:
input (Union[str, Path, None]) – Path to the filter input file (.yml) or None to use the default template.
n (Union[int, None]) – Number of ligand objects to include in the output. Takes precedence over the ‘n’ value in the YAML file if provided.
- Returns:
A LigandFilters instance after executing the configured filters.
- Return type:
- run(filters, outpath='filtered_ligand_db.jsonlines', dbinfo=True, metal=True)[source]
Apply provided filters, save the filtered ligand database, and optionally save auxiliary info.
Tip
All the parameters below are available as well via the assembler .yml file as batch options (i.e. indented in the
batches:list).The method executes the filtering pipeline, writes the filtered LigandDB to outpath (if provided), and optionally writes human-readable information (CSV, XYZ concatenations) controlled by dbinfo. It returns the list of unique ligand identifiers that passed all filters.
- Parameters:
filters (list[dict]) – List of filter specification dictionaries to apply in sequence.
outpath ([None|str]) – Path to the output ligand database file. If None, no ligand DB file is written.
dbinfo (bool) – If True, write additional info files (CSV, concatenated XYZ) to an info directory.
metal (bool) – If True, in the concatenated XYZ files, include a pseudo metal center in the ligand structure for visualization.
- Returns:
LigandDB object containing only the ligands that passed all filters.
- Return type: