Dear Colleagues,
Several weeks ago I requested and participants of this list responded with scanned-in SDS for our AI team to use to read scanned versions of SDS.
I was also challenged to bring back to the group results of our efforts reading SDS...
Well, the team has achieved about 97% accuracy for the specific parts of the SDS they were testing, such as the boiling points, flash point, Hazard Statements, etc.
The machine learning model they started with is the NER (Name Entity Recognition) and they are moving on to testing the NLP ( Advanced Natural Learning Processing) then plan to try out the
NLP and Supervised Learning Model afterward.
With the NER model, we have discovered enough differences between vendor SDS structures that we've had to train different vendor documents separately. We hope that will be less problematic with the other models.
In summary, it is working but not ready for prime time yet
Our plan is to still have a chemist review the data extraction to fix any errors as we add data to our chemical inventory library
Sincerely,
-Russ
--
---
For more information about the DCHAS-L e-mail list, contact the Divisional membership chair at