alvaDesc – Python

The alvaDescCLIWrapper package can be used to access alvaDesc functionalities from Python (3.5 or higher).

In order to work, the package requires a licensed version of alvaDesc installed on the same computer.
Minimum alvaDesc version: 1.0.14 (some functions require newer version)

Example

1.Calculate two descriptors for three molecules on Windows:

from alvadesccliwrapper.alvadesc import AlvaDesc

aDesc = AlvaDesc(‘C:\\Program Files\\Alvascience\\alvaDesc\\alvaDescCLI.exe’) # Windows default alvaDescCLI.exe location
aDesc.set_input_SMILES([‘C#N’, ‘CCCC’, ‘CC(=O)OC1=CC=CC=C1C(=O)O’])
if not aDesc.calculate_descriptors([‘MW’, ‘AMW’]):
  print(‘Error: ‘ + aDesc.get_error())
else:
  print(aDesc.get_output_descriptors())
  print(aDesc.get_output())

The result is a list of lists of floats containing the required descriptors:

[‘MW’, ‘AMW’]
[[58.14, 4.15285714285714], [180.17, 8.57952380952381]]
MoleculeMWAMW
CCCC58.144.15285714285714
CC(=O)OC1=CC=CC=C1C(=O)O180.178.57952380952381

2. Calculate all descriptors for an input file on Linux:

from alvadesccliwrapper.alvadesc import AlvaDesc

aDesc = AlvaDesc(‘/usr/bin/alvaDescCLI’) # Linux default alvaDescCLI location
if not aDesc.calculate_descriptors(‘ALL’):
  print(‘Error: ‘ + aDesc.get_error())
else:
  print(aDesc.get_output())

3. Calculate the ECFP fingerprint with size 1024 saving the result to a text file on macOS:

from alvadesccliwrapper.alvadesc import AlvaDesc

aDesc = AlvaDesc(‘/Applications/alvaDesc.app/Contents/MacOS/alvaDescCLI’) # macOS default alvaDescCLI location
aDesc.set_input_file(‘./myfile.sdf’, ‘MDL’)
aDesc.set_output_file(‘./test.txt’)
if not aDesc.calculate_fingerprint(‘ECFP’, 1024):
  print(‘Error: ‘ + aDesc.get_error())
# the result is in the output file
#else:
#  print(‘Results: ‘ + aDesc.get_output())

Notes on set_output_file:

  • when using set_output_file, the results will be saved in the specified file and they won’t be available with the get_output function.
  • set_output_file writes the output using alvaDesc standard (which can be influenced by alvaDesc settings). Do not use this function if you need a specific output file format.

4. Convert descriptors output to NumPy / Pandas:
If you want, you can convert get_output results to NumPy matrix or Pandas dataframe.

import numpy as np
import pandas as pd
from alvadesccliwrapper.alvadesc import AlvaDesc

aDesc = AlvaDesc() # Windows is the default
aDesc.set_input_SMILES([‘C#N’, ‘CCCC’, ‘CC(=O)OC1=CC=CC=C1C(=O)O’])
if not aDesc.calculate_descriptors([‘AMW’, ‘MW’, ‘nBT’]):
  print(‘Error: ‘ + aDesc.get_error())
  exit()

res_out = aDesc.get_output()
res_mol_names = aDesc.get_output_molecule_names() # get molecule names according to alvaDescCLI standard
res_desc_names = aDesc.get_output_descriptors()

# NumPy array of array and matrix
numpy_array_of_array = np.array([np.array(xs) for xs in res_out])
numpy_matrix = np.matrix(res_out) # NumPy matrix
print(‘NumPy matrix’)
print(numpy_matrix)
print()

# Pandas dataframe
pandas_df = pd.DataFrame(res_out)
pandas_df.columns = res_desc_names
pandas_df.insert(loc=0, column=’NAME’, value=res_mol_names)
print(‘Pandas dataframe’)
print(pandas_df)

The result is:

NumPy matrix
[[ 4.1529 58.14 13. ]
[ 8.5795 180.17 21. ]]

Pandas dataframe
NAME AMW MW nBT
0 Molecule1 4.1529 58.14 13.0
1 Molecule2 8.5795 180.17 21.0

5. Calculate structural patterns (SMARTS) for an input file:

from alvadesccliwrapper.alvadesc import AlvaDesc

aDesc = AlvaDesc() # Minimum alvaDesc version: 3.0.0
aDesc.set_input_file(‘./myfile.sdf’, ‘MDL’)
list_patterns = [‘CCC’, ‘CNC’]
if not aDesc.calculate_patterns(list_patterns):
  print(‘Error: ‘ + aDesc.get_error())
  exit()

print(f’Pattern names: {aDesc.get_output_descriptors()}’)
res_out = aDesc.get_output())
for i, res_mol in enumerate(res_out):
  print(f’Pattern results for molecule {i+1}: {res_mol}’)

More examples are available in the documentation contained in the alvaDescCLIWrapper zip file.

Video

Download

Please, log in in order to access the content.