MetaLig Ligand Database
The organoMetallic Ligand database (MetaLig) contains 41,018 ligands extracted from 107,185 complexes from the Cambridge Structural Database (CSD). It contains 3D coordinates, formal charge, molecular graph and a variety of physical properties. Each ligand also includes statistical data about its occurrences in the CSD, such as which metals it typically coordinates to.
The MetaLig ligand database can be used in a variety of applications:
DART Assembler: As a source of ligands for the DART Assembler module.
DART LigandFilters: Filter ligands based on their properties to target specific chemical spaces in the Assembler module.
Ligand Analysis: To analyze and explore ligands across the CSD.
Ligand Property Prediction: As a dataset for training machine learning models.
To explore the ligands in the MetaLig, use the terminal to run the command
DARTassembler dbinfo --db metalig
This will generate two files, an .xyz file and a .csv file:
The .xyz file contains the 3D structures of all ligands. To view and browse through the ligands with ase, you can use the command ase gui concat_MetaLigDB_v1.1.0.xyz. Each ligand is coordinated to a Cu metal center for visualization purposes. The Cu metal center is not part of the ligands in the MetaLig, it is only added to the .xyz file to display the coordination of each ligand.
The MetaLigDB_v1.1.0.csv file displays a tabular overview of all ligands and their properties (see below). You can open this file with a program like Excel to sort and filter the ligands based on their properties.
MetaLig Ligand Properties
The MetaLig contains a wide range of properties, such as the 3D geometry and the molecular graph of each ligand. You can find all properties under the Ligand class documentation. The MetaLig also contains 38 tabular properties for each ligand. 29 of these are useful for filtering ligands in the DART LigandFilters module:
property |
type |
filter |
description |
example |
|---|---|---|---|---|
unique_name |
str |
property |
Unique ID for this ligand in the MetaLig |
unq_CSD-OZIYON-02-b |
str |
property |
Coordination archetype of the ligand, e.g. 2-cis or 3-meridional |
2-cis |
|
charge |
int |
property |
Formal charge |
-1 |
n_donors |
int |
property |
Number of donor atoms (both haptic and dentic) |
2 |
n_eff_denticities |
int |
property |
Effective denticity count (haptic groups count as one effective donor each) |
2 |
n_denticities |
int |
property |
Denticity (number of dentic donors, excluding haptic donors) |
2 |
n_haptic_atoms |
int |
property |
Number of haptically coordinating atoms in the ligand |
0 |
n_haptic_groups |
int |
property |
Number of distinct haptic donor groups in the ligand |
0 |
n_atoms |
int |
property |
Number of atoms in the ligand |
20 |
n_elements |
int |
property |
Number of distinct chemical elements |
4 |
n_bonds |
int |
property |
Number of atomic bonds (without donor-metal bonds) |
21 |
n_electrons |
int |
property |
Number of electrons |
98 |
n_protons |
int |
property |
Number of protons |
97 |
n_beta_hydrogens |
int |
property |
Number of beta hydrogen atoms |
2 |
molecular_weight |
float |
property |
Molecular weight in atomic mass units (amu) |
190.1728063 |
planarity |
float |
property |
How planar the ligand atoms are (0.0: spherical, 1.0: perfectly planar) |
0.721616381 |
donor_planarity |
float |
property |
How planar the donor atoms are (0.0: spherical, 1.0: perfectly planar) |
1 |
donor_metal_planarity |
float |
property |
How planar the donor atoms + metal center are (0.0: spherical, 1.0: perfectly planar) |
1 |
min_interatomic_distance |
float |
property |
Shortest interatomic distance within the ligand (Å) |
0.929167854 |
max_ligand_extension |
float |
property |
Maximum distance between any two ligand atoms (Å) |
9.172156942 |
is_2D_symmetrical |
bool |
property |
Whether the 2D molecular graph is symmetric |
FALSE |
has_all_bond_orders_valid |
bool |
property |
Whether all bond orders are provided and valid |
TRUE |
n_ligand_instances |
int |
property |
Number of occurrences/instances of this ligand in the ~107k CSD source complexes |
478 |
stoichiometry |
str |
composition |
Chemical formula |
C11H6F2N |
donors |
str |
composition |
Donor atom elements |
C-N |
smiles |
str |
smarts |
SMILES string (without metal center) |
FC:1:C(:[C]:C:C(:C1)F)C:1:N:C:C:C:C1 |
smiles_with_metal |
str |
smarts |
SMILES string (with Hg pseudo metal center) |
FC:1:C:2C:3:[N]([Hg]C2:C:C(:C1)F):C:C:C:C3 |
csd_metal_count |
list |
parents |
Metal centers (and counts) with coordination precedence in the CSD |
Ir(428), Pt(44), Os(4), Au(1), Ru(1) |
csd_metal_os_count |
list |
parents |
Metal centers by oxidation state (and counts) with coordination precedence in the CSD |
Ir+3(161), Pt+2(23), Pt+4(11), Ir+2(4), Os+2(4) |
Another 9 properties are mostly useful for inspection and analysis:
property |
type |
description |
example |
|---|---|---|---|
archetype_rssd |
float |
Root sum square distance from the ideal archetype geometry |
0.114051734 |
archetype_confidence |
float |
Confidence score for the archetype assignment (higher is better) |
10.40652244 |
has_confident_charge |
bool |
This property once was important but is now not required anymore because all ligands in the MetaLig have confident charge assignments now; so this property is TRUE for all ligands. You can ignore it in the ligandfilters. |
TRUE |
graph_hash |
str |
Graph hash identifier of the ligand (inter-ligand only) |
e018fbba4037dd7fbcada96133e091df |
graph_hash_with_metal |
str |
Graph hash including pseudo metal center |
2a93874ecbf095e56bd1ffb4f397fca5 |
heavy_atoms_graph_hash |
str |
Graph hash computed using heavy atoms only |
62196285009c06c5a0dbb8690a5bad97 |
heavy_atoms_graph_hash_with_metal |
str |
Heavy-atom graph hash including pseudo metal center |
1fd0bddc7139608dcf31f53d967593c7 |
bond_order_graph_hash |
str |
Graph hash including bond order information |
b24c25d7d7b989beb0143dea3590010e |
csd_complex_ids |
list |
CSD complex identifiers where ligand was observed (comma-separated if multiple) |
OZIYON, OZIYON, PAQJAV, PAQJAV, PEDFOW, … (473 more) |
Filter the MetaLig in Python
For many users, the DART Ligand Filters module will be enough to filter ligands with exactly defined properties. For complete freedom in filtering and exploring, the MetaLig database can be accessed via the DARTassembler Python API, specifically the LigandDB and Ligand classes. This allows you to write your own custom filtering scripts in Python to target ligands with exactly the properties you need.
As an example, let us extract Cp-like ligands from the MetaLig database. First, read in the MetaLig. To speed things up in this example, let’s only load the first 5000 ligands.
from DARTassembler import LigandDB
# Load the first 1000 out of 41,018 ligands in the MetaLig database.
metalig = LigandDB.from_json(path='metalig', n_max=5000)
Now, you can filter the MetaLig database based on your requirements. For example, let’s filter the MetaLig so that we retain only Cp-like ligands with an archetype of 1-mono, a charge of -1 and 5 C donor atoms. For more information how to use the Ligand objects from the MetaLig see its documentation.
# Set some criteria to filter Cp-like ligands
archetype = '1-mono'
charge = -1
donor_elements = ['C', 'C', 'C', 'C', 'C']
# Filter ligands and keep only those which adhere to all the above criteria
ligands_to_keep = []
for ligand_name, ligand in metalig.db.items():
correct_denticity = ligand.archetype == archetype
correct_charge = ligand.charge == charge
correct_donor_elements = ligand.donor_elements == donor_elements
if correct_denticity and correct_charge and correct_donor_elements:
ligands_to_keep.append(ligand_name)
# Reduce MetaLig database to only keep ligands which adhere to the above criteria
filtered_metalig = metalig.get_sub_db(ligand_names=ligands_to_keep)
print(f'Number of ligands after filtering: {len(filtered_metalig.db)}')
Now, we can save the filtered MetaLig database to a .jsonlines file and a concatenated .xyz file.
filtered_metalig.save_to_file('filtered_metalig.jsonlines')
filtered_metalig.save_to_concat_xyz('filtered_metalig.xyz')
filtered_metalig.save_to_csv('filtered_metalig.csv')
This .jsonlines file can be used in the DART Assembler module as source for ligands. By opening the .csv file with a program like Excel, you will see that this table displays 7 ligands Cp-like ligand with a formal charge of -1. You can also inspect the ligand structures in the concatenated .xyz file using ase gui filtered_metalig.xyz. In this way, you can use Python to filter the MetaLig database to your exact requirements and then save the filtered database to a .jsonlines file for use in the DART Assembler module.
Ligand Statistics
Bar chart of donor atoms in the MetaLig. For instance, there are nearly 8,000 N-N donor ligands present.
Bar chart showing the prevalence of ligands coordinating to specific metals, such as over 8,000 instances of ligands which were found in the CSD coordinating to Cu.