Ligands API
The MetaLig contains ligands, which are represented by the Ligand class.
- class DARTassembler.src.metalig.mol.BaseMolecule(graph, atomic_props, global_props=None, validity_check=True, node_label='node_label', bond_label='bond_type')[source]
Bases:
objectInitialize a
BaseMoleculeinstance, which represents a molecule with molecular graph (atomic connectivity) and atomic properties such as 3D coordinates.- Parameters:
graph (nx.Graph) – NetworkX Graph describing connectivity. Nodes should correspond to atomic_props ordering.
atomic_props (dict | ase.Atoms) –
ASE Atoms object or atomic properties dictionary. Example format:
{ 'atoms': ['C','H','O',...], 'x': [x1, x2, ...], 'y': [y1, y2, ...], 'z': [z1, z2, ...], ... (optional per-atom arrays) }
global_props (dict | None) –
Optional flat dictionary of global properties (charge, stoichiometry, etc.). Example format:
{ 'charge': -1, 'stoichiometry': 'C6H6O', ... }
validity_check (bool) – If True, perform internal consistency checks comparing graph and coordinates.
node_label (str) – Node attribute key used for element symbols in graphs.
bond_label (str) – Edge attribute key used for bond type information in graphs.
- Raises:
AssertionError – If graph and atomic_props are inconsistent when validity_check is True.
- property n_atoms[source]
Return the number of atoms in the molecule.
- Returns:
Atom count.
- Return type:
int
- property n_hydrogens[source]
Return the number of hydrogen atoms in the molecule.
- Returns:
Count of hydrogen atoms.
- Return type:
int
- property n_protons[source]
Return the total number of protons (sum of atomic numbers) present in the molecule.
- Returns:
Sum of atomic numbers across all atoms.
- Return type:
int
- property n_bonds[source]
Return the number of bonds as given by the molecular graph.
- Returns:
Number of edges in the molecular graph.
- Return type:
int
- property has_bond_order_attribute[source]
Indicate whether graph edges expose the configured bond order attribute.
- Returns:
True when every edge contains the configured bond_label.
- Return type:
bool
- property has_unknown_bond_orders[source]
Indicate whether the graph contains unknown or placeholder RDKit bond orders.
If the bond order attribute is missing, the property returns True (unknown). :return: True when any edge has an unknown bond order or bond orders are missing/invalid. :rtype: bool
- property has_all_bond_orders_valid[source]
Indicate whether all bond orders in the graph are present and valid.
- Returns:
True if bond orders exist and none are unknown.
- Return type:
bool
- property graph_hash[source]
Compute the Weisfeiler–Lehman graph hash for the molecular graph (element labels only).
- Returns:
String fingerprint of the graph topology.
- Return type:
str
- property heavy_atoms_graph_hash[source]
Compute the WL graph hash after removing hydrogen atoms.
- Returns:
Hash string for heavy-atom-only graph.
- Return type:
str
- property bond_order_graph_hash[source]
Compute a graph hash that includes bond order information when available.
- Returns:
Hash string including bond orders, or None if bond orders are missing/invalid.
- Return type:
str | None
- property stoichiometry[source]
Return the canonical stoichiometry string for the molecule (e.g. ‘C6H6’).
- Returns:
Standardized stoichiometry string.
- Return type:
str
- get_reindexed_graph()[source]
Return a reindexed copy of the internal molecular graph (0..n-1 node indices).
- Returns:
NetworkX Graph with contiguous node indices.
- Return type:
nx.Graph
- get_ase_atoms(remove_elements=None, add_atoms=None)[source]
Produce an ASE Atoms object from internal atomic_props with optional filtering and additions.
- Parameters:
remove_elements (list[str] | None) – List of element symbols to remove from the returned ASE Atoms (e.g. [‘H’]).
add_atoms (list[ tuple[str, tuple[float, float, float]] ] | None) – List of tuples (element_symbol, (x, y, z)) specifying atoms to append.
- Returns:
ASE Atoms object representing the molecule with requested modifications.
- Return type:
ase.Atoms
- get_smiles()[source]
Return a SMILES string of the molecular graph if bond orders are available.
- Returns:
SMILES string or None when bond orders are unknown/invalid.
- Return type:
str | None
- count_C_H_bonds()[source]
Count the number of C–H bonds in the molecular graph.
Iterates over edges and counts edges between ‘C’ and ‘H’.
- Returns:
Number of C–H bonds.
- Return type:
int
- count_bond_types(bond_types)[source]
Count occurrences of specified RDKit bond-type codes in the molecular graph.
- Parameters:
bond_types (list[int]) – Iterable of RDKit bond-type integers to count (e.g. aromatic, single, double codes).
- Returns:
Number of edges whose bond_type attribute matches any entry in bond_types.
- Return type:
int
- Raises:
ValueError – If the graph lacks the configured bond order attribute.
- get_graph_fragments()[source]
Return connected component indices and corresponding element-lists for each fragment.
- Returns:
Tuple (fragment_indices, fragment_elements) where each entry corresponds to a disconnected component.
- Return type:
tuple[list, list]
- view_3D()[source]
Open an ASE-based 3D viewer for the current ASE Atoms representation.
- Returns:
None
- Return type:
None
- view_graph(node_size=150)[source]
Visualize the molecular graph using the project helper.
- Parameters:
node_size (int) – Display size for graph nodes in the viewer.
- Returns:
None
- Return type:
None
- get_coordinates()[source]
Return the atomic coordinates as a list of 3‑tuples.
- Returns:
Coordinates in format [[x1, y1, z1], [x2, y2, z2], …].
- Return type:
list[list[float]]
- get_interatomic_distances(skip_elements=None)[source]
Compute pairwise interatomic distances and return minimum, maximum and full distance matrix.
- Parameters:
skip_elements (str | list[str] | None) – Element symbol or list of symbols to exclude from distance calculations.
- Returns:
Tuple (min_distance, max_distance, distances_matrix).
- Return type:
tuple[ float, float, np.ndarray ]
- get_all_interatomic_distances_flat()[source]
Return unique pairwise interatomic distances as a flattened list (upper triangle).
- Returns:
List of pairwise distances (each pair reported once).
- Return type:
list[float]
- class DARTassembler.src.metalig.mol.Ligand(atomic_props, donor_idc, graph, unique_name, charge, global_props=None, ligand_instances=None, hapdent_idc=None, geometric_isomers_hapdent_idc=None, validity_check=False)[source]
Bases:
BaseMoleculeInitialize a
Ligandinstance representing a metal-coordinating ligand from the MetaLig database.The constructor accepts atomic and graph data plus metadata describing donor indices, known parent complex instances and optional precomputed hapdent (denticity/hapticity) tuples.
- Parameters:
atomic_props (dict | ase.Atoms) – ASE Atoms object or atomic properties dictionary with keys ‘atoms’, ‘x’, ‘y’, ‘z’, etc.
donor_idc (list[int]) – List of atomic indices that act as donor atoms coordinating to a metal.
graph (nx.Graph) – NetworkX Graph describing connectivity (node labels must match atomic_props[‘atoms’] ordering).
unique_name (str) – Unique identifier string for the ligand (database key).
charge (int | float) – Formal charge of the ligand. May be np.nan when unknown.
global_props (dict | None) – Optional dictionary of global properties.
ligand_instances (dict | None) –
Optional dictionary describing parent complex instances. Example format:
{ 'ligand_name': ['lig1','lig2',...], 'parent_complex_id': ['CSD1','CSD2',...], 'parent_complex_charge': [0, +1, ...], 'parent_metal': ['Cu','Fe',...], 'parent_metal_os': [2, 2, ...], }
hapdent_idc (tuple | None) – Optional denticity/hapticity tuple structure (haptic groups as sub-tuples).
geometric_isomers_hapdent_idc (list[ tuple[int | tuple[int]] ] | None) – Optional list of hapdent tuples for geometric isomers.
validity_check (bool) – If True, run consistency checks after initialization.
- Raises:
ValueError – If ligand_instances lacks required keys.
- property has_confident_charge[source]
Return whether the stored ligand charge is flagged as confident.
If the flag is present in global_props, that value is returned; otherwise True is assumed.
- Returns:
Boolean indicator whether the charge is considered confident.
- Return type:
bool
- property n_ligand_instances[source]
Return the number of recorded parent complex instances for this ligand.
- Returns:
Number of parent occurrences recorded in ligand_instances.
- Return type:
int
- property donor_metal_planarity[source]
Return the planarity score of donor atoms including the original metal center.
- Returns:
Planarity (0..1) of donor atoms with the metal included.
- Return type:
float
- property donor_planarity[source]
Return the planarity score of donor atoms alone, excluding the metal.
- Returns:
Planarity (0..1) of donor atoms.
- Return type:
float
- property planarity[source]
Return the overall planarity score of the ligand.
- Returns:
Planarity (0..1) of the ligand.
- Return type:
float
- property n_donors[source]
Return the number of donor atoms defined for this ligand.
- Returns:
Integer number of donor atoms.
- Return type:
int
- property donor_elements[source]
Return the element symbols of the donor atoms in ligand order.
- Returns:
List of element symbols for donor atoms (length n_donors).
- Return type:
list[str]
- property donor_positions[source]
Return the 3D coordinates of donor atoms as an (n_donors, 3) array.
- Returns:
numpy array of donor coordinates.
- Return type:
np.ndarray
- property parent_complex_id[source]
Return the identifier of the first parent complex instance for this ligand.
- Returns:
Parent complex ID string.
- Return type:
str
- property parent_metal_position[source]
Return the 3D coordinates of the parent metal center observed in the source complex.
- Returns:
3‑vector coordinates of the parent metal.
- Return type:
np.ndarray
- property parent_metal[source]
Return the chemical symbol of the parent metal observed for this ligand.
- Returns:
Element symbol (e.g. ‘Cu’, ‘Fe’).
- Return type:
str
- property hapdent_idc[source]
Return the donor index tuple structure with haptic groups as sub-tuples.
- Returns:
Tuple combining integer donor indices and sub-tuples for haptic groups.
- Return type:
tuple
- property n_eff_denticities[source]
Return the effective denticity counting each haptic group as a single donor.
- Returns:
Effective denticity integer.
- Return type:
int
- property n_denticities[source]
Return the classical denticity (count of integer donor entries, ignoring haptic groups).
- Returns:
Integer denticity count.
- Return type:
int
- property n_haptic_atoms[source]
Return the total number of atoms that participate in haptic coordination.
- Returns:
Integer count of atoms that are part of haptic groups.
- Return type:
int
- property n_haptic_groups[source]
Return the number of haptic groups (hapticity units) present in the ligand.
- Returns:
Integer number of haptic groups.
- Return type:
int
- property archetype[source]
Return the best-matching ligand archetype string (e.g. ‘2-cis’).
- Returns:
Archetype identifier.
- Return type:
str
- property geometric_isomers_hapdent_idc[source]
Return hapdent tuples for each recognised geometric isomer of the ligand.
- Returns:
List of hapdent tuples (each tuple contains int or sub-tuple entries).
- Return type:
list
- property archetype_rssd[source]
Return the root-sum-of-squared-differences (RSSD) metric of archetype assignment.
Lower values indicate closer match to the archetype ideal geometry. :return: RSSD value (float). :rtype: float
- property archetype_confidence[source]
Return a confidence metric for the archetype assignment.
Higher values indicate greater confidence in the assigned archetype. :return: Archetype confidence float. :rtype: float
- property min_interatomic_distance[source]
Return the minimum interatomic distance observed in the ligand (Å).
- Returns:
Minimum positive pairwise distance.
- Return type:
float
- property max_ligand_extension[source]
Return the maximal interatomic distance (molecular extension) in the ligand (Å).
- Returns:
Maximum pairwise distance.
- Return type:
float
- property smiles[source]
Return the ligand SMILES string if bond orders are present.
- Returns:
SMILES string or None.
- Return type:
str | None
- property metal_counts[source]
Return a dict mapping observed parent metal symbols to their counts.
- Returns:
Ordered dict-like mapping ‘Element’ -> count observed in parent complexes.
- Return type:
dict
- property metal_os_counts[source]
Return parent metal oxidation-state counts as formatted strings.
- Returns:
Mapping like ‘Fe+2’ -> count.
- Return type:
dict
- property smiles_with_metal[source]
Return a SMILES string of the ligand with a pseudo-metal appended (Hg) for visualization/analysis.
- Returns:
SMILES string including a pseudo-metal or None.
- Return type:
str | None
- property is_2D_symmetrical[source]
Return whether the ligand graph is topologically symmetrical between any pair of donor atoms.
This is a 2D graph symmetry check and does not guarantee 3D symmetry.
- Returns:
True if any donor-pair symmetry is detected, False otherwise.
- Return type:
bool
- property graph_hash_with_metal[source]
Return the WL graph hash of the ligand graph after connecting a pseudo-metal.
- Returns:
Hash string of ligand+metal graph.
- Return type:
str
- property heavy_atoms_graph_hash_with_metal[source]
Return the heavy-atom-only graph hash for the ligand with a pseudo-metal attached.
- Returns:
Hash string for heavy-atom-only ligand+metal graph.
- Return type:
str
- property n_beta_hydrogens[source]
Return the count of beta-hydrogen atoms (two bonds from coordinating atom, excluding alpha-H).
- Returns:
Integer number of beta hydrogens.
- Return type:
int
- property is_haptic[source]
Return whether the ligand contains any haptic coordination atoms.
- Returns:
True when haptic atoms/groups exist, False otherwise.
- Return type:
bool
- get_smiles(with_metal=None)[source]
Return the SMILES string for the ligand, optionally including a specified metal node.
- Parameters:
with_metal (str | None) – Element symbol of the metal to attach to the ligand graph. If None, do not attach a metal.
- Returns:
SMILES string or None if bond orders are unknown.
- Return type:
str | None
- Raises:
ValueError – If with_metal is provided and is not a metal element.
- get_ase_atoms_with_metal(metal=None)[source]
Return an ASE Atoms object of the ligand with the parent metal placed at the recorded metal position.
- Parameters:
metal (str | None) – Element symbol of the metal to add; if None the default parent metal from metadata is used.
- Returns:
ASE Atoms with the metal atom appended at parent_metal_position.
- Return type:
ase.Atoms
- property_filter(name, range=None, values=None)[source]
Test whether a certain simple property falls within a numeric range or matches one of the provided values.
- Parameters:
name (str) – The property name of one of the MetaLig properties.
range (list[ tuple[float, float] ] | None) – Single range tuple or list of (min, max) tuples specifying allowed numerical intervals.
values (list | None) – List of allowed exact values or None to skip value matching.
- Returns:
True if property value satisfies any provided test, otherwise False.
- Return type:
bool
- Raises:
ValueError – If the requested property is not present or a non-numeric property is tested against numeric ranges.
Examples :
To test whether a ligand has a ‘2-cis’ archetype:
ligand.property_filter(name='archetype', values=['2-cis'])
To test whether a ligand has between 1 and 5 haptic atoms:
ligand.property_filter(name='n_haptic_atoms', range=[1,5])
To test whether a ligand has between 10 and 20 atoms or between 30 and 40 atoms:
ligand.property_filter(name='n_atoms', range=[(10,20),(30,40)])
- composition_filter(elements, instruction, only_donors=False)[source]
Test whether a ligand satisfies a specified chemical composition criterion.
Depending on the value of
instruction, this method applies one of several composition-based filters using the providedelements. Ifonly_donorsisTrue, only the donor atoms are considered; otherwise all atoms in the ligand are included.- Parameters:
elements (str | list[str]) – Stoichiometry (e.g.,
'H2O') or list of element symbols (e.g.,['H', 'H', 'O']) used to define the expected atomic composition.instruction (str) –
Specifies the composition rule to apply. Supported options are:
must_contain_and_only_contain– The ligand must consist of exactly these atoms in exactly this count. Use this to match an exact stoichiometry (e.g.,C6H6for benzene).must_at_least_contain– The ligand must contain all specified elements, but may also include others.must_exclude– The ligand must not contain any of the specified elements.must_only_contain_in_any_amount– The ligand may contain any count, including zero, of the specified elements, but must not contain any other elements.
Tip
On first glance, these instructions might seem too general to make a useful filter, but by combining the same filter multiple times with different instructions, users can achieve very specific filters.
only_donors (bool) – If
True, evaluate the composition using only donor atoms; otherwise all atoms in the ligand are considered.
- Returns:
Trueif the ligand satisfies the specified composition criterion.- Return type:
bool
- Raises:
ValueError – If an unrecognized instruction string is provided.
Examples :
To select tridentate ligands whose donors are exactly two N and one C:
ligand.composition_filter( elements='CN2', instruction='must_contain_and_only_contain', only_donors=True )
To select ligands whose donors may contain only C and N atoms (or zero of either):
ligand.composition_filter( elements='CN', instruction='must_only_contain_in_any_amount', only_donors=True )
To select ligands that contain zero or more C, N, and H atoms, but no other elements:
ligand.composition_filter( elements='CNH', instruction='must_only_contain_in_any_amount', only_donors=False )
To exclude ligands containing sulfur atoms:
ligand.composition_filter( elements='S', instruction='must_exclude', only_donors=False )
To select ligands that contain at least one O atom:
ligand.composition_filter( elements='O', instruction='must_at_least_contain', only_donors=False )
- parents_filter(metal_centers)[source]
Test whether the ligand has been observed in the CSD source complexes with any of the specified parent metal centers.
While this filter does not directly check for chemistry, it is useful for maximizing compatibility of the ligand with new metal centers by selecting only those ligands that have previously coordinated to similar metals.
- Parameters:
metal_centers (list[str]) – List of metal center strings to check. Each string must be either a valid element symbol or an element symbol followed by an oxidation state.
- Returns:
True if any provided metal_center string matches recorded parent metals or oxidation states.
- Return type:
bool
Examples :
To filter for ligands that have been observed coordinating to either Fe or Cu(II):
ligand.parents_filter(metal_centers=['Fe', 'Cu+2'])
- smarts_filter(smarts, should_contain, include_metal=None)[source]
Test whether the ligand matches (or not) a SMARTS substructure pattern.
If
include_metalis True, a pseudo metal center (Hg) is added to the ligand SMILES string before testing. The Hg atom is connected to all donor atoms via single bonds. This allows SMARTS patterns to target only donor atoms.- Parameters:
smarts (str) – SMARTS pattern string to search for.
should_contain (bool) – If True, ligand must contain the pattern; if False, ligand must not contain it.
include_metal (bool | None) – If True include a Hg metal center in SMILES generation (default).
- Returns:
True if the ligand satisfies the SMARTS pattern condition, otherwise False.
- Return type:
bool
Tip
While SMARTS patterns are very useful to define chemical substructures, they can be difficult to create. However, AI assistants can help you, and you can verify the correctness of your SMARTS patterns using tools such as the online SMARTS tester.
Examples :
To include only amide ligands where the N donor is coordinated to the metal and two C or Si atoms:
ligand.smarts_filter(smarts='[N&D3X3!a](-[Hg])(-[C,Si])(-[C,Si])', should_contain=True, include_metal=True)
To exclude ligands that contain at least one azo (N=N) group:
ligand.smarts_filter(smarts='[N]=[N]', should_contain=False)
- get_graph_with_metal(metal_symbol)[source]
Return a copy of the ligand graph with an added metal node connected to donor atoms.
The metal node receives a node attribute self.node_label equal to metal_symbol and metal‑donor edges are assigned bond_type = 1.
- Parameters:
metal_symbol (str | None) – Element symbol of the metal to attach (e.g. ‘Fe’).
- Returns:
Tuple (graph_with_metal, metal_node_index).
- Return type:
tuple[nx.Graph, int]
- get_graph_hash_with_metal(metal_symbol)[source]
Return a graph hash for the ligand after attaching a specified metal node.
- Parameters:
metal_symbol (str) – Metal element symbol used for hashing.
- Returns:
Graph hash string.
- Return type:
str
- get_heavy_atoms_graph_hash_with_metal(metal_symbol)[source]
Return the heavy-atom-only graph hash for the ligand when a metal is attached.
- Parameters:
metal_symbol (str) – Metal element symbol used for hashing.
- Returns:
Hash for heavy-atom-only ligand+metal graph.
- Return type:
str
- get_xyz_string(comment='', with_metal=True)[source]
Return an XYZ-format string for the ligand, optionally including a pseudo-metal.
- Parameters:
comment (str) – Optional comment line; if None a default informative comment is generated.
with_metal (bool) – If True include a pseudo metal at recorded parent_metal_position.
- Returns:
XYZ-format string (header + coordinates).
- Return type:
str
- get_all_effective_ligand_atoms_with_effective_donor_indices(dummy='Cu')[source]
Build an effective ligand ASE Atoms object that replaces haptic groups by dummy atoms.
The function returns (atoms_with_dummies, effective_donor_indices) where dummy atoms represent haptic groups to simplify geometric operations.
- Parameters:
dummy (str) – Element symbol used for dummy atoms representing haptic groups.
- Returns:
Tuple (ASE Atoms with dummy atoms appended, list of effective donor indices).
- Return type:
tuple[ase.Atoms, list[int]]
- get_isomers_effective_ligand_atoms_with_effective_donor_indices(dummy='Cu')[source]
Produce effective-ligand ASE Atoms for each geometric isomer by replacing haptic groups with dummy atoms.
Returns a tuple (atoms_with_dummies, isomer_effective_donor_idc) where the second element is a list of donor-index lists for each geometrical isomer, suitable for rotation/assembly routines.
- Parameters:
dummy (str) – Element symbol to use for dummy atoms representing haptic groups.
- Returns:
(ASE Atoms for effective ligand, list of donor-index lists for each isomer).
- Return type:
tuple[ase.Atoms, list[list[int]]]
- get_ligand_archetype_and_isomers()[source]
Determine ligand archetype and geometric isomers, handling haptic donors via dummies.
- The routine returns:
(archetype, real_isomers, hapdent_isomer_idc, rssd, second_archetype, weight_for_change)
- Returns:
Tuple containing archetype string, list of ASE Atoms for best isomers, hapdent tuples for isomers, RSSD float, second-best archetype string and weight needed to change archetype.
- Return type:
tuple[str, list[ase.Atoms], tuple[Union[int, tuple[int]]], float, str, float]
- get_csv_info(max_entries=5)[source]
Build a flat dictionary of ligand metadata suitable for CSV export.
Long lists (e.g. parent complex IDs) are truncated to at most max_entries entries and represented as comma-separated strings with an ellipsis indicator.
- Parameters:
max_entries (int) – Maximum number of list entries to present before truncation.
- Returns:
Dictionary containing ligand global properties augmented with CSV-friendly fields.
- Return type:
dict
- to_dict(include_graph=True, copy=False, full_global_props=False)[source]
Serialize the ligand to a dictionary matching the MetaLig format.
- Parameters:
include_graph (bool) – If True, include the connectivity graph in the output.
copy (bool) – If True, return a deep copy of the dictionary.
full_global_props (bool) – If True, compute and include all expected global_props fields before serializing.
- Returns:
Dictionary with keys identical to MetaLig ligand entries.
- Return type:
dict
- Raises:
AssertionError – If returned dictionary keys differ from the expected ligand_dict_props.
- classmethod from_dict(d, validity_check=True)[source]
Construct a Ligand instance from a MetaLig-style dictionary.
The method handles backward-compatible refactoring of deprecated formats and validates presence of required fields before instantiation.
- Parameters:
d (dict) –
Dictionary matching the MetaLig ligand entry format. Example keys:
{ 'atomic_props': {...}, 'global_props': {...}, 'graph': {...}, 'donor_idc': [...], 'ligand_instances': {...}, 'hapdent_idc': ..., 'geometric_isomers_hapdent_idc': ... }
validity_check (bool) – Whether to run post-construction consistency checks.
- Returns:
Instantiated Ligand object.
- Return type:
- Raises:
ValueError – If required top-level keys or global_props[‘charge’] are missing.