Ligands API

The MetaLig contains ligands, which are represented by the Ligand class.

class DARTassembler.src.metalig.mol.BaseMolecule(graph, atomic_props, global_props=None, validity_check=True, node_label='node_label', bond_label='bond_type')[source]

Bases: object

Initialize a BaseMolecule instance, which represents a molecule with molecular graph (atomic connectivity) and atomic properties such as 3D coordinates.

Parameters:
  • graph (nx.Graph) – NetworkX Graph describing connectivity. Nodes should correspond to atomic_props ordering.

  • atomic_props (dict | ase.Atoms) –

    ASE Atoms object or atomic properties dictionary. Example format:

    {
        'atoms': ['C','H','O',...],
        'x': [x1, x2, ...],
        'y': [y1, y2, ...],
        'z': [z1, z2, ...],
        ... (optional per-atom arrays)
    }
    

  • global_props (dict | None) –

    Optional flat dictionary of global properties (charge, stoichiometry, etc.). Example format:

    {
        'charge': -1,
        'stoichiometry': 'C6H6O',
        ...
    }
    

  • validity_check (bool) – If True, perform internal consistency checks comparing graph and coordinates.

  • node_label (str) – Node attribute key used for element symbols in graphs.

  • bond_label (str) – Edge attribute key used for bond type information in graphs.

Raises:

AssertionError – If graph and atomic_props are inconsistent when validity_check is True.

property n_atoms[source]

Return the number of atoms in the molecule.

Returns:

Atom count.

Return type:

int

property n_hydrogens[source]

Return the number of hydrogen atoms in the molecule.

Returns:

Count of hydrogen atoms.

Return type:

int

property n_protons[source]

Return the total number of protons (sum of atomic numbers) present in the molecule.

Returns:

Sum of atomic numbers across all atoms.

Return type:

int

property n_bonds[source]

Return the number of bonds as given by the molecular graph.

Returns:

Number of edges in the molecular graph.

Return type:

int

property has_bond_order_attribute[source]

Indicate whether graph edges expose the configured bond order attribute.

Returns:

True when every edge contains the configured bond_label.

Return type:

bool

property has_unknown_bond_orders[source]

Indicate whether the graph contains unknown or placeholder RDKit bond orders.

If the bond order attribute is missing, the property returns True (unknown). :return: True when any edge has an unknown bond order or bond orders are missing/invalid. :rtype: bool

property has_all_bond_orders_valid[source]

Indicate whether all bond orders in the graph are present and valid.

Returns:

True if bond orders exist and none are unknown.

Return type:

bool

property graph_hash[source]

Compute the Weisfeiler–Lehman graph hash for the molecular graph (element labels only).

Returns:

String fingerprint of the graph topology.

Return type:

str

property heavy_atoms_graph_hash[source]

Compute the WL graph hash after removing hydrogen atoms.

Returns:

Hash string for heavy-atom-only graph.

Return type:

str

property bond_order_graph_hash[source]

Compute a graph hash that includes bond order information when available.

Returns:

Hash string including bond orders, or None if bond orders are missing/invalid.

Return type:

str | None

property stoichiometry[source]

Return the canonical stoichiometry string for the molecule (e.g. ‘C6H6’).

Returns:

Standardized stoichiometry string.

Return type:

str

get_reindexed_graph()[source]

Return a reindexed copy of the internal molecular graph (0..n-1 node indices).

Returns:

NetworkX Graph with contiguous node indices.

Return type:

nx.Graph

get_ase_atoms(remove_elements=None, add_atoms=None)[source]

Produce an ASE Atoms object from internal atomic_props with optional filtering and additions.

Parameters:
  • remove_elements (list[str] | None) – List of element symbols to remove from the returned ASE Atoms (e.g. [‘H’]).

  • add_atoms (list[ tuple[str, tuple[float, float, float]] ] | None) – List of tuples (element_symbol, (x, y, z)) specifying atoms to append.

Returns:

ASE Atoms object representing the molecule with requested modifications.

Return type:

ase.Atoms

get_smiles()[source]

Return a SMILES string of the molecular graph if bond orders are available.

Returns:

SMILES string or None when bond orders are unknown/invalid.

Return type:

str | None

count_C_H_bonds()[source]

Count the number of C–H bonds in the molecular graph.

Iterates over edges and counts edges between ‘C’ and ‘H’.

Returns:

Number of C–H bonds.

Return type:

int

count_bond_types(bond_types)[source]

Count occurrences of specified RDKit bond-type codes in the molecular graph.

Parameters:

bond_types (list[int]) – Iterable of RDKit bond-type integers to count (e.g. aromatic, single, double codes).

Returns:

Number of edges whose bond_type attribute matches any entry in bond_types.

Return type:

int

Raises:

ValueError – If the graph lacks the configured bond order attribute.

get_graph_fragments()[source]

Return connected component indices and corresponding element-lists for each fragment.

Returns:

Tuple (fragment_indices, fragment_elements) where each entry corresponds to a disconnected component.

Return type:

tuple[list, list]

view_3D()[source]

Open an ASE-based 3D viewer for the current ASE Atoms representation.

Returns:

None

Return type:

None

view_graph(node_size=150)[source]

Visualize the molecular graph using the project helper.

Parameters:

node_size (int) – Display size for graph nodes in the viewer.

Returns:

None

Return type:

None

get_coordinates()[source]

Return the atomic coordinates as a list of 3‑tuples.

Returns:

Coordinates in format [[x1, y1, z1], [x2, y2, z2], …].

Return type:

list[list[float]]

get_interatomic_distances(skip_elements=None)[source]

Compute pairwise interatomic distances and return minimum, maximum and full distance matrix.

Parameters:

skip_elements (str | list[str] | None) – Element symbol or list of symbols to exclude from distance calculations.

Returns:

Tuple (min_distance, max_distance, distances_matrix).

Return type:

tuple[ float, float, np.ndarray ]

get_all_interatomic_distances_flat()[source]

Return unique pairwise interatomic distances as a flattened list (upper triangle).

Returns:

List of pairwise distances (each pair reported once).

Return type:

list[float]

get_xyz_string(comment='')[source]

Produce a string in XYZ file format for the molecule.

Parameters:

comment (str) – Optional single-line comment to include in the XYZ header.

Returns:

Multiline string conforming to the XYZ format.

Return type:

str

to_dict(include_graph=True)[source]

Serialize the molecule to a dictionary suitable for JSON output.

Parameters:

include_graph (bool) – If True, include graph representation with node labels.

Returns:

Dictionary with keys ‘atomic_props’, ‘global_props’ and optionally ‘graph’.

Return type:

dict

class DARTassembler.src.metalig.mol.Ligand(atomic_props, donor_idc, graph, unique_name, charge, global_props=None, ligand_instances=None, hapdent_idc=None, geometric_isomers_hapdent_idc=None, validity_check=False)[source]

Bases: BaseMolecule

Initialize a Ligand instance representing a metal-coordinating ligand from the MetaLig database.

The constructor accepts atomic and graph data plus metadata describing donor indices, known parent complex instances and optional precomputed hapdent (denticity/hapticity) tuples.

Parameters:
  • atomic_props (dict | ase.Atoms) – ASE Atoms object or atomic properties dictionary with keys ‘atoms’, ‘x’, ‘y’, ‘z’, etc.

  • donor_idc (list[int]) – List of atomic indices that act as donor atoms coordinating to a metal.

  • graph (nx.Graph) – NetworkX Graph describing connectivity (node labels must match atomic_props[‘atoms’] ordering).

  • unique_name (str) – Unique identifier string for the ligand (database key).

  • charge (int | float) – Formal charge of the ligand. May be np.nan when unknown.

  • global_props (dict | None) – Optional dictionary of global properties.

  • ligand_instances (dict | None) –

    Optional dictionary describing parent complex instances. Example format:

    {
        'ligand_name': ['lig1','lig2',...],
        'parent_complex_id': ['CSD1','CSD2',...],
        'parent_complex_charge': [0, +1, ...],
        'parent_metal': ['Cu','Fe',...],
        'parent_metal_os': [2, 2, ...],
    }
    

  • hapdent_idc (tuple | None) – Optional denticity/hapticity tuple structure (haptic groups as sub-tuples).

  • geometric_isomers_hapdent_idc (list[ tuple[int | tuple[int]] ] | None) – Optional list of hapdent tuples for geometric isomers.

  • validity_check (bool) – If True, run consistency checks after initialization.

Raises:

ValueError – If ligand_instances lacks required keys.

property has_confident_charge[source]

Return whether the stored ligand charge is flagged as confident.

If the flag is present in global_props, that value is returned; otherwise True is assumed.

Returns:

Boolean indicator whether the charge is considered confident.

Return type:

bool

property n_ligand_instances[source]

Return the number of recorded parent complex instances for this ligand.

Returns:

Number of parent occurrences recorded in ligand_instances.

Return type:

int

property donor_metal_planarity[source]

Return the planarity score of donor atoms including the original metal center.

Returns:

Planarity (0..1) of donor atoms with the metal included.

Return type:

float

property donor_planarity[source]

Return the planarity score of donor atoms alone, excluding the metal.

Returns:

Planarity (0..1) of donor atoms.

Return type:

float

property planarity[source]

Return the overall planarity score of the ligand.

Returns:

Planarity (0..1) of the ligand.

Return type:

float

property n_donors[source]

Return the number of donor atoms defined for this ligand.

Returns:

Integer number of donor atoms.

Return type:

int

property donor_elements[source]

Return the element symbols of the donor atoms in ligand order.

Returns:

List of element symbols for donor atoms (length n_donors).

Return type:

list[str]

property donor_positions[source]

Return the 3D coordinates of donor atoms as an (n_donors, 3) array.

Returns:

numpy array of donor coordinates.

Return type:

np.ndarray

property parent_complex_id[source]

Return the identifier of the first parent complex instance for this ligand.

Returns:

Parent complex ID string.

Return type:

str

property parent_metal_position[source]

Return the 3D coordinates of the parent metal center observed in the source complex.

Returns:

3‑vector coordinates of the parent metal.

Return type:

np.ndarray

property parent_metal[source]

Return the chemical symbol of the parent metal observed for this ligand.

Returns:

Element symbol (e.g. ‘Cu’, ‘Fe’).

Return type:

str

property hapdent_idc[source]

Return the donor index tuple structure with haptic groups as sub-tuples.

Returns:

Tuple combining integer donor indices and sub-tuples for haptic groups.

Return type:

tuple

property n_eff_denticities[source]

Return the effective denticity counting each haptic group as a single donor.

Returns:

Effective denticity integer.

Return type:

int

property n_denticities[source]

Return the classical denticity (count of integer donor entries, ignoring haptic groups).

Returns:

Integer denticity count.

Return type:

int

property n_haptic_atoms[source]

Return the total number of atoms that participate in haptic coordination.

Returns:

Integer count of atoms that are part of haptic groups.

Return type:

int

property n_haptic_groups[source]

Return the number of haptic groups (hapticity units) present in the ligand.

Returns:

Integer number of haptic groups.

Return type:

int

property archetype[source]

Return the best-matching ligand archetype string (e.g. ‘2-cis’).

Returns:

Archetype identifier.

Return type:

str

property geometric_isomers_hapdent_idc[source]

Return hapdent tuples for each recognised geometric isomer of the ligand.

Returns:

List of hapdent tuples (each tuple contains int or sub-tuple entries).

Return type:

list

property archetype_rssd[source]

Return the root-sum-of-squared-differences (RSSD) metric of archetype assignment.

Lower values indicate closer match to the archetype ideal geometry. :return: RSSD value (float). :rtype: float

property archetype_confidence[source]

Return a confidence metric for the archetype assignment.

Higher values indicate greater confidence in the assigned archetype. :return: Archetype confidence float. :rtype: float

property min_interatomic_distance[source]

Return the minimum interatomic distance observed in the ligand (Å).

Returns:

Minimum positive pairwise distance.

Return type:

float

property max_ligand_extension[source]

Return the maximal interatomic distance (molecular extension) in the ligand (Å).

Returns:

Maximum pairwise distance.

Return type:

float

property smiles[source]

Return the ligand SMILES string if bond orders are present.

Returns:

SMILES string or None.

Return type:

str | None

property metal_counts[source]

Return a dict mapping observed parent metal symbols to their counts.

Returns:

Ordered dict-like mapping ‘Element’ -> count observed in parent complexes.

Return type:

dict

property metal_os_counts[source]

Return parent metal oxidation-state counts as formatted strings.

Returns:

Mapping like ‘Fe+2’ -> count.

Return type:

dict

property smiles_with_metal[source]

Return a SMILES string of the ligand with a pseudo-metal appended (Hg) for visualization/analysis.

Returns:

SMILES string including a pseudo-metal or None.

Return type:

str | None

property is_2D_symmetrical[source]

Return whether the ligand graph is topologically symmetrical between any pair of donor atoms.

This is a 2D graph symmetry check and does not guarantee 3D symmetry.

Returns:

True if any donor-pair symmetry is detected, False otherwise.

Return type:

bool

property graph_hash_with_metal[source]

Return the WL graph hash of the ligand graph after connecting a pseudo-metal.

Returns:

Hash string of ligand+metal graph.

Return type:

str

property heavy_atoms_graph_hash_with_metal[source]

Return the heavy-atom-only graph hash for the ligand with a pseudo-metal attached.

Returns:

Hash string for heavy-atom-only ligand+metal graph.

Return type:

str

property n_beta_hydrogens[source]

Return the count of beta-hydrogen atoms (two bonds from coordinating atom, excluding alpha-H).

Returns:

Integer number of beta hydrogens.

Return type:

int

property is_haptic[source]

Return whether the ligand contains any haptic coordination atoms.

Returns:

True when haptic atoms/groups exist, False otherwise.

Return type:

bool

get_smiles(with_metal=None)[source]

Return the SMILES string for the ligand, optionally including a specified metal node.

Parameters:

with_metal (str | None) – Element symbol of the metal to attach to the ligand graph. If None, do not attach a metal.

Returns:

SMILES string or None if bond orders are unknown.

Return type:

str | None

Raises:

ValueError – If with_metal is provided and is not a metal element.

get_ase_atoms_with_metal(metal=None)[source]

Return an ASE Atoms object of the ligand with the parent metal placed at the recorded metal position.

Parameters:

metal (str | None) – Element symbol of the metal to add; if None the default parent metal from metadata is used.

Returns:

ASE Atoms with the metal atom appended at parent_metal_position.

Return type:

ase.Atoms

property_filter(name, range=None, values=None)[source]

Test whether a certain simple property falls within a numeric range or matches one of the provided values.

Parameters:
  • name (str) – The property name of one of the MetaLig properties.

  • range (list[ tuple[float, float] ] | None) – Single range tuple or list of (min, max) tuples specifying allowed numerical intervals.

  • values (list | None) – List of allowed exact values or None to skip value matching.

Returns:

True if property value satisfies any provided test, otherwise False.

Return type:

bool

Raises:

ValueError – If the requested property is not present or a non-numeric property is tested against numeric ranges.

Examples :

To test whether a ligand has a ‘2-cis’ archetype:

ligand.property_filter(name='archetype', values=['2-cis'])

To test whether a ligand has between 1 and 5 haptic atoms:

ligand.property_filter(name='n_haptic_atoms', range=[1,5])

To test whether a ligand has between 10 and 20 atoms or between 30 and 40 atoms:

ligand.property_filter(name='n_atoms', range=[(10,20),(30,40)])
composition_filter(elements, instruction, only_donors=False)[source]

Test whether a ligand satisfies a specified chemical composition criterion.

Depending on the value of instruction, this method applies one of several composition-based filters using the provided elements. If only_donors is True, only the donor atoms are considered; otherwise all atoms in the ligand are included.

Parameters:
  • elements (str | list[str]) – Stoichiometry (e.g., 'H2O') or list of element symbols (e.g., ['H', 'H', 'O']) used to define the expected atomic composition.

  • instruction (str) –

    Specifies the composition rule to apply. Supported options are:

    • must_contain_and_only_contain – The ligand must consist of exactly these atoms in exactly this count. Use this to match an exact stoichiometry (e.g., C6H6 for benzene).

    • must_at_least_contain – The ligand must contain all specified elements, but may also include others.

    • must_exclude – The ligand must not contain any of the specified elements.

    • must_only_contain_in_any_amount – The ligand may contain any count, including zero, of the specified elements, but must not contain any other elements.

    Tip

    On first glance, these instructions might seem too general to make a useful filter, but by combining the same filter multiple times with different instructions, users can achieve very specific filters.

  • only_donors (bool) – If True, evaluate the composition using only donor atoms; otherwise all atoms in the ligand are considered.

Returns:

True if the ligand satisfies the specified composition criterion.

Return type:

bool

Raises:

ValueError – If an unrecognized instruction string is provided.

Examples :

To select tridentate ligands whose donors are exactly two N and one C:

ligand.composition_filter(
        elements='CN2',
        instruction='must_contain_and_only_contain',
        only_donors=True
    )

To select ligands whose donors may contain only C and N atoms (or zero of either):

ligand.composition_filter(
        elements='CN',
        instruction='must_only_contain_in_any_amount',
        only_donors=True
    )

To select ligands that contain zero or more C, N, and H atoms, but no other elements:

ligand.composition_filter(
    elements='CNH',
    instruction='must_only_contain_in_any_amount',
    only_donors=False
)

To exclude ligands containing sulfur atoms:

ligand.composition_filter(
    elements='S',
    instruction='must_exclude',
    only_donors=False
)

To select ligands that contain at least one O atom:

ligand.composition_filter(
    elements='O',
    instruction='must_at_least_contain',
    only_donors=False
)
parents_filter(metal_centers)[source]

Test whether the ligand has been observed in the CSD source complexes with any of the specified parent metal centers.

While this filter does not directly check for chemistry, it is useful for maximizing compatibility of the ligand with new metal centers by selecting only those ligands that have previously coordinated to similar metals.

Parameters:

metal_centers (list[str]) – List of metal center strings to check. Each string must be either a valid element symbol or an element symbol followed by an oxidation state.

Returns:

True if any provided metal_center string matches recorded parent metals or oxidation states.

Return type:

bool

Examples :

To filter for ligands that have been observed coordinating to either Fe or Cu(II):

ligand.parents_filter(metal_centers=['Fe', 'Cu+2'])
smarts_filter(smarts, should_contain, include_metal=None)[source]

Test whether the ligand matches (or not) a SMARTS substructure pattern.

If include_metal is True, a pseudo metal center (Hg) is added to the ligand SMILES string before testing. The Hg atom is connected to all donor atoms via single bonds. This allows SMARTS patterns to target only donor atoms.

Parameters:
  • smarts (str) – SMARTS pattern string to search for.

  • should_contain (bool) – If True, ligand must contain the pattern; if False, ligand must not contain it.

  • include_metal (bool | None) – If True include a Hg metal center in SMILES generation (default).

Returns:

True if the ligand satisfies the SMARTS pattern condition, otherwise False.

Return type:

bool

Tip

While SMARTS patterns are very useful to define chemical substructures, they can be difficult to create. However, AI assistants can help you, and you can verify the correctness of your SMARTS patterns using tools such as the online SMARTS tester.

Examples :

To include only amide ligands where the N donor is coordinated to the metal and two C or Si atoms:

ligand.smarts_filter(smarts='[N&D3X3!a](-[Hg])(-[C,Si])(-[C,Si])', should_contain=True, include_metal=True)

To exclude ligands that contain at least one azo (N=N) group:

ligand.smarts_filter(smarts='[N]=[N]', should_contain=False)
get_graph_with_metal(metal_symbol)[source]

Return a copy of the ligand graph with an added metal node connected to donor atoms.

The metal node receives a node attribute self.node_label equal to metal_symbol and metal‑donor edges are assigned bond_type = 1.

Parameters:

metal_symbol (str | None) – Element symbol of the metal to attach (e.g. ‘Fe’).

Returns:

Tuple (graph_with_metal, metal_node_index).

Return type:

tuple[nx.Graph, int]

get_graph_hash_with_metal(metal_symbol)[source]

Return a graph hash for the ligand after attaching a specified metal node.

Parameters:

metal_symbol (str) – Metal element symbol used for hashing.

Returns:

Graph hash string.

Return type:

str

get_heavy_atoms_graph_hash_with_metal(metal_symbol)[source]

Return the heavy-atom-only graph hash for the ligand when a metal is attached.

Parameters:

metal_symbol (str) – Metal element symbol used for hashing.

Returns:

Hash for heavy-atom-only ligand+metal graph.

Return type:

str

get_xyz_string(comment='', with_metal=True)[source]

Return an XYZ-format string for the ligand, optionally including a pseudo-metal.

Parameters:
  • comment (str) – Optional comment line; if None a default informative comment is generated.

  • with_metal (bool) – If True include a pseudo metal at recorded parent_metal_position.

Returns:

XYZ-format string (header + coordinates).

Return type:

str

get_all_effective_ligand_atoms_with_effective_donor_indices(dummy='Cu')[source]

Build an effective ligand ASE Atoms object that replaces haptic groups by dummy atoms.

The function returns (atoms_with_dummies, effective_donor_indices) where dummy atoms represent haptic groups to simplify geometric operations.

Parameters:

dummy (str) – Element symbol used for dummy atoms representing haptic groups.

Returns:

Tuple (ASE Atoms with dummy atoms appended, list of effective donor indices).

Return type:

tuple[ase.Atoms, list[int]]

get_isomers_effective_ligand_atoms_with_effective_donor_indices(dummy='Cu')[source]

Produce effective-ligand ASE Atoms for each geometric isomer by replacing haptic groups with dummy atoms.

Returns a tuple (atoms_with_dummies, isomer_effective_donor_idc) where the second element is a list of donor-index lists for each geometrical isomer, suitable for rotation/assembly routines.

Parameters:

dummy (str) – Element symbol to use for dummy atoms representing haptic groups.

Returns:

(ASE Atoms for effective ligand, list of donor-index lists for each isomer).

Return type:

tuple[ase.Atoms, list[list[int]]]

get_ligand_archetype_and_isomers()[source]

Determine ligand archetype and geometric isomers, handling haptic donors via dummies.

The routine returns:

(archetype, real_isomers, hapdent_isomer_idc, rssd, second_archetype, weight_for_change)

Returns:

Tuple containing archetype string, list of ASE Atoms for best isomers, hapdent tuples for isomers, RSSD float, second-best archetype string and weight needed to change archetype.

Return type:

tuple[str, list[ase.Atoms], tuple[Union[int, tuple[int]]], float, str, float]

get_csv_info(max_entries=5)[source]

Build a flat dictionary of ligand metadata suitable for CSV export.

Long lists (e.g. parent complex IDs) are truncated to at most max_entries entries and represented as comma-separated strings with an ellipsis indicator.

Parameters:

max_entries (int) – Maximum number of list entries to present before truncation.

Returns:

Dictionary containing ligand global properties augmented with CSV-friendly fields.

Return type:

dict

to_dict(include_graph=True, copy=False, full_global_props=False)[source]

Serialize the ligand to a dictionary matching the MetaLig format.

Parameters:
  • include_graph (bool) – If True, include the connectivity graph in the output.

  • copy (bool) – If True, return a deep copy of the dictionary.

  • full_global_props (bool) – If True, compute and include all expected global_props fields before serializing.

Returns:

Dictionary with keys identical to MetaLig ligand entries.

Return type:

dict

Raises:

AssertionError – If returned dictionary keys differ from the expected ligand_dict_props.

classmethod from_dict(d, validity_check=True)[source]

Construct a Ligand instance from a MetaLig-style dictionary.

The method handles backward-compatible refactoring of deprecated formats and validates presence of required fields before instantiation.

Parameters:
  • d (dict) –

    Dictionary matching the MetaLig ligand entry format. Example keys:

    {
        'atomic_props': {...},
        'global_props': {...},
        'graph': {...},
        'donor_idc': [...],
        'ligand_instances': {...},
        'hapdent_idc': ...,
        'geometric_isomers_hapdent_idc': ...
    }
    

  • validity_check (bool) – Whether to run post-construction consistency checks.

Returns:

Instantiated Ligand object.

Return type:

Ligand

Raises:

ValueError – If required top-level keys or global_props[‘charge’] are missing.