The alvaDescCLIWrapper package can be used to access alvaDesc functionalities from Python (3.5 or higher).
In order to work, the package requires a licensed version of alvaDesc installed on the same computer.
Minimum alvaDesc version: 1.0.14 (some functions require newer version)
Example
1.Calculate two descriptors for three molecules on Windows:
aDesc = AlvaDesc(‘C:\\Program Files\\Alvascience\\alvaDesc\\alvaDescCLI.exe’) # Windows default alvaDescCLI.exe location
aDesc.set_input_SMILES([‘C#N’, ‘CCCC’, ‘CC(=O)OC1=CC=CC=C1C(=O)O’])
if not aDesc.calculate_descriptors([‘MW’, ‘AMW’]):
print(‘Error: ‘ + aDesc.get_error())
else:
print(aDesc.get_output_descriptors())
print(aDesc.get_output())
The result is a list of lists of floats containing the required descriptors:
[[58.14, 4.15285714285714], [180.17, 8.57952380952381]]
Molecule | MW | AMW |
---|---|---|
CCCC | 58.14 | 4.15285714285714 |
CC(=O)OC1=CC=CC=C1C(=O)O | 180.17 | 8.57952380952381 |
2. Calculate all descriptors for an input file on Linux:
aDesc = AlvaDesc(‘/usr/bin/alvaDescCLI’) # Linux default alvaDescCLI location
if not aDesc.calculate_descriptors(‘ALL’):
print(‘Error: ‘ + aDesc.get_error())
else:
print(aDesc.get_output())
3. Calculate the ECFP fingerprint with size 1024 saving the result to a text file on macOS:
aDesc = AlvaDesc(‘/Applications/alvaDesc.app/Contents/MacOS/alvaDescCLI’) # macOS default alvaDescCLI location
aDesc.set_input_file(‘./myfile.sdf’, ‘MDL’)
aDesc.set_output_file(‘./test.txt’)
if not aDesc.calculate_fingerprint(‘ECFP’, 1024):
print(‘Error: ‘ + aDesc.get_error())
# the result is in the output file
#else:
# print(‘Results: ‘ + aDesc.get_output())
Notes on set_output_file:
- when using set_output_file, the results will be saved in the specified file and they won’t be available with the get_output function.
- set_output_file writes the output using alvaDesc standard (which can be influenced by alvaDesc settings). Do not use this function if you need a specific output file format.
4. Convert descriptors output to NumPy / Pandas:
If you want, you can convert get_output results to NumPy matrix or Pandas dataframe.
import pandas as pd
from alvadesccliwrapper.alvadesc import AlvaDesc
aDesc = AlvaDesc() # Windows is the default
aDesc.set_input_SMILES([‘C#N’, ‘CCCC’, ‘CC(=O)OC1=CC=CC=C1C(=O)O’])
if not aDesc.calculate_descriptors([‘AMW’, ‘MW’, ‘nBT’]):
print(‘Error: ‘ + aDesc.get_error())
exit()
res_out = aDesc.get_output()
res_mol_names = aDesc.get_output_molecule_names() # get molecule names according to alvaDescCLI standard
res_desc_names = aDesc.get_output_descriptors()
# NumPy array of array and matrix
numpy_array_of_array = np.array([np.array(xs) for xs in res_out])
numpy_matrix = np.matrix(res_out) # NumPy matrix
print(‘NumPy matrix’)
print(numpy_matrix)
print()
# Pandas dataframe
pandas_df = pd.DataFrame(res_out)
pandas_df.columns = res_desc_names
pandas_df.insert(loc=0, column=’NAME’, value=res_mol_names)
print(‘Pandas dataframe’)
print(pandas_df)
The result is:
[[ 4.1529 58.14 13. ]
[ 8.5795 180.17 21. ]]
Pandas dataframe
NAME AMW MW nBT
0 Molecule1 4.1529 58.14 13.0
1 Molecule2 8.5795 180.17 21.0
5. Calculate structural patterns (SMARTS) for an input file:
aDesc = AlvaDesc() # Minimum alvaDesc version: 3.0.0
aDesc.set_input_file(‘./myfile.sdf’, ‘MDL’)
list_patterns = [‘CCC’, ‘CNC’]
if not aDesc.calculate_patterns(list_patterns):
print(‘Error: ‘ + aDesc.get_error())
exit()
print(f’Pattern names: {aDesc.get_output_descriptors()}’)
res_out = aDesc.get_output())
for i, res_mol in enumerate(res_out):
print(f’Pattern results for molecule {i+1}: {res_mol}’)
More examples are available in the documentation contained in the alvaDescCLIWrapper zip file.