PyQAlloy Custom Dataset from BSON

Welcome to a minimal Jupyter notebook that shows how to use the PyQAlloy package with a Binary JSON (BSON) document native to MongoDB (or plain JSON).

Unlike in the case of the longer UserCuration.ipynb tutorial, you do not need ULTERA (ultera.org) access, as we will utilize a persisted snapshot of the database stored in this repository under devTools/ULTERA_sample.bson and containing a 300-document subset of the ULTERA data.

This notebook should also allow for plug-an-play compatibility with ULTERA Database snapshots stored in Zenodo repository at doi.org/10.5281/zenodo.7566415 or any data conforming to the ULTERA schema standards.

Single Composition Scope Example

We will now go through an example for a single composition scope, but you can pass the to any of the methods described in the pyqalloy.curation.analysis module, which are covered in the main UserCuration.ipynb tutorial.

Start by importing PyQAlloy and MontyDB. The latter will pretend to be a MongoDB client and allow us to load the BSON file. It is singificantly faster than the actual MongoDB client, but does not support all the features of MongoDB. For the purposes of this tutorial, it is sufficient, but in production environments, you should use the actual MongoDB as, for instance, its performance-critical aggregation framework is not supported by MontyDB.

from pyqalloy.curation.analysis import SingleCompositionAnalyzer
from montydb import MontyClient
import bson

Set up the customCollection with a convenient one-liner below

customCollection = MontyClient(":memory:").db.test

And load the BSON file into the collection. Please note if need to be in the root directory of the repository or adjust the path to the BSON file.

with open('devTools/ULTERA_sample.bson', 'rb+') as f:
    customCollection.insert_many(bson.decode_all(f.read()))

Set up the sC (single Composition) Analyzer Object with our custom collection:

sC = SingleCompositionAnalyzer(collectionManualOverride=customCollection)

Now let’s see if it works!

Scan through all the compositions, looking for the ones that are close to 100 but not exactly 100. Request up to 10 results and then stop.

sC.scanCompositionsAround100(resultLimit=10, 
                             printOnFly=True)
DOI: 10.1016/j.msea.2012.04.067  --> F9
F:   Hf1.4 Zr0.007 Ti0.4 Ta3.3 W9.4 Mo0.5 Cr8.1 Co9.3 Ni61.5 Al5.7 B0.017 C0.07
PF:  Hf1.4 Zr0 Ti0.4 Ta3.3 W9.4 Mo0.5 Cr8.1 Co9.3 Ni61.7 Al5.7 B0 C0.1
Raw:  Ni61.5 W9.4 Co9.3 Cr8.1 Al5.7 Ta3.3 Hf1.4 Ti0.4 Mo0.5 C0.07 B0.017 Zr0.007
RF:  Hf2.8 Zr0.01 Ti0.8 Ta6.6 W18.8 Mo1 Cr16.2 Co18.6 Ni123 Al11.4 B0.03 C0.14
[1.4, 0.007, 0.4, 3.3, 9.4, 0.5, 8.1, 9.3, 61.5, 5.7, 0.017, 0.07]
-->  99.694

DOI: 10.1016/j.ijfatigue.2018.08.029  --> T6
F:   Ti86.2 V3.15 Al10.2
PF:  Ti86.6 V3.2 Al10.2
Raw:  Ti86.2 Al10.2 V3.15
RF:  Ti27.37 V1 Al3.24
[86.2, 3.15, 10.2]
-->  99.55

DOI: 10.1016/j.actamat.2016.06.063
F:   Mo7 Cr23 Fe23 Co23 Ni23
PF:  Mo7.1 Cr23.2 Fe23.2 Co23.2 Ni23.2
Raw:  Co23Cr23Fe23Ni23Mo7
RF:  Mo1 Cr3.29 Fe3.29 Co3.29 Ni3.29
[7.0, 23.0, 23.0, 23.0, 23.0]
-->  99.0

DOI: 10.1016/j.actamat.2016.11.016
F:   Cr16 Fe16 Co16 Ni34.4 Al16
PF:  Cr16.3 Fe16.3 Co16.3 Ni35 Al16.3
Raw:  Al16Co16Cr16Fe16Ni34.4
RF:  Cr1 Fe1 Co1 Ni2.15 Al1
[16.0, 16.0, 16.0, 34.4, 16.0]
-->  98.4

DOI: 10.1016/j.msea.2017.04.111
F:   Cr19 Fe19 Co19 Ni37 Cu4 Al4
PF:  Cr18.6 Fe18.6 Co18.6 Ni36.3 Cu3.9 Al3.9
Raw:  Al4Co19Cr19Cu4Fe19Ni37
RF:  Cr4.75 Fe4.75 Co4.75 Ni9.25 Cu1 Al1
[19.0, 19.0, 19.0, 37.0, 4.0, 4.0]
-->  102.0

DOI: 10.1016/j.matlet.2017.04.072
F:   Cr23 Fe23 Co23 Ni23 Al7
PF:  Cr23.2 Fe23.2 Co23.2 Ni23.2 Al7.1
Raw:  Al7Co23Cr23Fe23Ni23
RF:  Cr3.29 Fe3.29 Co3.29 Ni3.29 Al1
[23.0, 23.0, 23.0, 23.0, 7.0]
-->  99.0

If some of the above seem like simple numerical precision propblems, you can re-initialize the sC object and run it again with custom settings (uncertainty=1, i.e., +/-1% passed as close enough to 100%). There are quite a few you can modify to your needs.

sC = SingleCompositionAnalyzer(collectionManualOverride=customCollection)
sC.scanCompositionsAround100(resultLimit=10, 
                             printOnFly=True, 
                             uncertainty=1)
DOI: 10.1016/j.actamat.2016.11.016
F:   Cr16 Fe16 Co16 Ni34.4 Al16
PF:  Cr16.3 Fe16.3 Co16.3 Ni35 Al16.3
Raw:  Al16Co16Cr16Fe16Ni34.4
RF:  Cr1 Fe1 Co1 Ni2.15 Al1
[16.0, 16.0, 16.0, 34.4, 16.0]
-->  98.4

DOI: 10.1016/j.msea.2017.04.111
F:   Cr19 Fe19 Co19 Ni37 Cu4 Al4
PF:  Cr18.6 Fe18.6 Co18.6 Ni36.3 Cu3.9 Al3.9
Raw:  Al4Co19Cr19Cu4Fe19Ni37
RF:  Cr4.75 Fe4.75 Co4.75 Ni9.25 Cu1 Al1
[19.0, 19.0, 19.0, 37.0, 4.0, 4.0]
-->  102.0

Lastly, run the same procedure, but only look at compositions that a specific researcher uploaded (not necessarily parsed themselves) by initializing the sC with a name specified. ?

This time the printOnFly is set to False, so that the results are not printed on the fly, but rather stored in a list.

sC = SingleCompositionAnalyzer(name='Adam Krajewski', 
                               collectionManualOverride=customCollection)
sC.scanCompositionsAround100(printOnFly=False, 
                             resultLimit=10, 
                             uncertainty=0.21)
sC.printOuts
['DOI: 10.1016/j.actamat.2016.06.063\nF:   Mo7 Cr23 Fe23 Co23 Ni23\nPF:  Mo7.1 Cr23.2 Fe23.2 Co23.2 Ni23.2\nRaw:  Co23Cr23Fe23Ni23Mo7\nRF:  Mo1 Cr3.29 Fe3.29 Co3.29 Ni3.29\n[7.0, 23.0, 23.0, 23.0, 23.0]\n-->  99.0\n',
 'DOI: 10.1016/j.actamat.2016.11.016\nF:   Cr16 Fe16 Co16 Ni34.4 Al16\nPF:  Cr16.3 Fe16.3 Co16.3 Ni35 Al16.3\nRaw:  Al16Co16Cr16Fe16Ni34.4\nRF:  Cr1 Fe1 Co1 Ni2.15 Al1\n[16.0, 16.0, 16.0, 34.4, 16.0]\n-->  98.4\n',
 'DOI: 10.1016/j.msea.2017.04.111\nF:   Cr19 Fe19 Co19 Ni37 Cu4 Al4\nPF:  Cr18.6 Fe18.6 Co18.6 Ni36.3 Cu3.9 Al3.9\nRaw:  Al4Co19Cr19Cu4Fe19Ni37\nRF:  Cr4.75 Fe4.75 Co4.75 Ni9.25 Cu1 Al1\n[19.0, 19.0, 19.0, 37.0, 4.0, 4.0]\n-->  102.0\n',
 'DOI: 10.1016/j.matlet.2017.04.072\nF:   Cr23 Fe23 Co23 Ni23 Al7\nPF:  Cr23.2 Fe23.2 Co23.2 Ni23.2 Al7.1\nRaw:  Al7Co23Cr23Fe23Ni23\nRF:  Cr3.29 Fe3.29 Co3.29 Ni3.29 Al1\n[23.0, 23.0, 23.0, 23.0, 7.0]\n-->  99.0\n']

And now, save that list for later analysis!

sC.writeResultsToFile('customSingleComp_Adam.txt')

Now, you know how to utilize a custom dataset with PyQAlloy!