MetaLig Ligand Database

The organoMetallic Ligand database (MetaLig) contains 41,018 ligands extracted from 107,185 complexes from the Cambridge Structural Database (CSD). It contains 3D coordinates, formal charge, molecular graph and a variety of physical properties. Each ligand also includes statistical data about its occurrences in the CSD, such as which metals it typically coordinates to.

The MetaLig ligand database can be used in a variety of applications:

  • DART Assembler: As a source of ligands for the DART Assembler module.

  • DART LigandFilters: Filter ligands based on their properties to target specific chemical spaces in the Assembler module.

  • Ligand Analysis: To analyze and explore ligands across the CSD.

  • Ligand Property Prediction: As a dataset for training machine learning models.

../_images/metalig_fig.png

To explore the ligands in the MetaLig, use the terminal to run the command

DARTassembler dbinfo --db metalig

This will generate two files, an .xyz file and a .csv file:

The .xyz file contains the 3D structures of all ligands. To view and browse through the ligands with ase, you can use the command ase gui concat_MetaLigDB_v1.1.0.xyz. Each ligand is coordinated to a Cu metal center for visualization purposes. The Cu metal center is not part of the ligands in the MetaLig, it is only added to the .xyz file to display the coordination of each ligand.

The MetaLigDB_v1.1.0.csv file displays a tabular overview of all ligands and their properties (see below). You can open this file with a program like Excel to sort and filter the ligands based on their properties.

MetaLig Ligand Properties

The MetaLig contains a wide range of properties, such as the 3D geometry and the molecular graph of each ligand. You can find all properties under the Ligand class documentation. The MetaLig also contains 38 tabular properties for each ligand. 29 of these are useful for filtering ligands in the DART LigandFilters module:

property

type

filter

description

example

unique_name

str

property

Unique ID for this ligand in the MetaLig

unq_CSD-OZIYON-02-b

archetype

str

property

Coordination archetype of the ligand, e.g. 2-cis or 3-meridional

2-cis

charge

int

property

Formal charge

-1

n_donors

int

property

Number of donor atoms (both haptic and dentic)

2

n_eff_denticities

int

property

Effective denticity count (haptic groups count as one effective donor each)

2

n_denticities

int

property

Denticity (number of dentic donors, excluding haptic donors)

2

n_haptic_atoms

int

property

Number of haptically coordinating atoms in the ligand

0

n_haptic_groups

int

property

Number of distinct haptic donor groups in the ligand

0

n_atoms

int

property

Number of atoms in the ligand

20

n_elements

int

property

Number of distinct chemical elements

4

n_bonds

int

property

Number of atomic bonds (without donor-metal bonds)

21

n_electrons

int

property

Number of electrons

98

n_protons

int

property

Number of protons

97

n_beta_hydrogens

int

property

Number of beta hydrogen atoms

2

molecular_weight

float

property

Molecular weight in atomic mass units (amu)

190.1728063

planarity

float

property

How planar the ligand atoms are (0.0: spherical, 1.0: perfectly planar)

0.721616381

donor_planarity

float

property

How planar the donor atoms are (0.0: spherical, 1.0: perfectly planar)

1

donor_metal_planarity

float

property

How planar the donor atoms + metal center are (0.0: spherical, 1.0: perfectly planar)

1

min_interatomic_distance

float

property

Shortest interatomic distance within the ligand (Å)

0.929167854

max_ligand_extension

float

property

Maximum distance between any two ligand atoms (Å)

9.172156942

is_2D_symmetrical

bool

property

Whether the 2D molecular graph is symmetric

FALSE

has_all_bond_orders_valid

bool

property

Whether all bond orders are provided and valid

TRUE

n_ligand_instances

int

property

Number of occurrences/instances of this ligand in the ~107k CSD source complexes

478

stoichiometry

str

composition

Chemical formula

C11H6F2N

donors

str

composition

Donor atom elements

C-N

smiles

str

smarts

SMILES string (without metal center)

FC:1:C(:[C]:C:C(:C1)F)C:1:N:C:C:C:C1

smiles_with_metal

str

smarts

SMILES string (with Hg pseudo metal center)

FC:1:C:2C:3:[N]([Hg]C2:C:C(:C1)F):C:C:C:C3

csd_metal_count

list

parents

Metal centers (and counts) with coordination precedence in the CSD

Ir(428), Pt(44), Os(4), Au(1), Ru(1)

csd_metal_os_count

list

parents

Metal centers by oxidation state (and counts) with coordination precedence in the CSD

Ir+3(161), Pt+2(23), Pt+4(11), Ir+2(4), Os+2(4)

Another 9 properties are mostly useful for inspection and analysis:

property

type

description

example

archetype_rssd

float

Root sum square distance from the ideal archetype geometry

0.114051734

archetype_confidence

float

Confidence score for the archetype assignment (higher is better)

10.40652244

has_confident_charge

bool

This property once was important but is now not required anymore because all ligands in the MetaLig have confident charge assignments now; so this property is TRUE for all ligands. You can ignore it in the ligandfilters.

TRUE

graph_hash

str

Graph hash identifier of the ligand (inter-ligand only)

e018fbba4037dd7fbcada96133e091df

graph_hash_with_metal

str

Graph hash including pseudo metal center

2a93874ecbf095e56bd1ffb4f397fca5

heavy_atoms_graph_hash

str

Graph hash computed using heavy atoms only

62196285009c06c5a0dbb8690a5bad97

heavy_atoms_graph_hash_with_metal

str

Heavy-atom graph hash including pseudo metal center

1fd0bddc7139608dcf31f53d967593c7

bond_order_graph_hash

str

Graph hash including bond order information

b24c25d7d7b989beb0143dea3590010e

csd_complex_ids

list

CSD complex identifiers where ligand was observed (comma-separated if multiple)

OZIYON, OZIYON, PAQJAV, PAQJAV, PEDFOW, … (473 more)

Filter the MetaLig in Python

For many users, the DART Ligand Filters module will be enough to filter ligands with exactly defined properties. For complete freedom in filtering and exploring, the MetaLig database can be accessed via the DARTassembler Python API, specifically the LigandDB and Ligand classes. This allows you to write your own custom filtering scripts in Python to target ligands with exactly the properties you need.

As an example, let us extract Cp-like ligands from the MetaLig database. First, read in the MetaLig. To speed things up in this example, let’s only load the first 5000 ligands.

from DARTassembler import LigandDB

# Load the first 1000 out of 41,018 ligands in the MetaLig database.
metalig = LigandDB.from_json(path='metalig', n_max=5000)

Now, you can filter the MetaLig database based on your requirements. For example, let’s filter the MetaLig so that we retain only Cp-like ligands with an archetype of 1-mono, a charge of -1 and 5 C donor atoms. For more information how to use the Ligand objects from the MetaLig see its documentation.

# Set some criteria to filter Cp-like ligands
archetype = '1-mono'
charge = -1
donor_elements = ['C', 'C', 'C', 'C', 'C']

# Filter ligands and keep only those which adhere to all the above criteria
ligands_to_keep = []
for ligand_name, ligand in metalig.db.items():
    correct_denticity = ligand.archetype == archetype
    correct_charge = ligand.charge == charge
    correct_donor_elements = ligand.donor_elements == donor_elements
    if correct_denticity and correct_charge and correct_donor_elements:
        ligands_to_keep.append(ligand_name)

# Reduce MetaLig database to only keep ligands which adhere to the above criteria
filtered_metalig = metalig.get_sub_db(ligand_names=ligands_to_keep)
print(f'Number of ligands after filtering: {len(filtered_metalig.db)}')

Now, we can save the filtered MetaLig database to a .jsonlines file and a concatenated .xyz file.

filtered_metalig.save_to_file('filtered_metalig.jsonlines')
filtered_metalig.save_to_concat_xyz('filtered_metalig.xyz')
filtered_metalig.save_to_csv('filtered_metalig.csv')

This .jsonlines file can be used in the DART Assembler module as source for ligands. By opening the .csv file with a program like Excel, you will see that this table displays 7 ligands Cp-like ligand with a formal charge of -1. You can also inspect the ligand structures in the concatenated .xyz file using ase gui filtered_metalig.xyz. In this way, you can use Python to filter the MetaLig database to your exact requirements and then save the filtered database to a .jsonlines file for use in the DART Assembler module.

Ligand Statistics

../_images/hist_donors.png

Bar chart of donor atoms in the MetaLig. For instance, there are nearly 8,000 N-N donor ligands present.

../_images/hist_metal_center.png

Bar chart showing the prevalence of ligands coordinating to specific metals, such as over 8,000 instances of ligands which were found in the CSD coordinating to Cu.