Skip to main content
Back to top
Ctrl
+
K
Search
Ctrl
+
K
Introduction and background
Overview of the working principles of LLMs
A. Structured Extraction Workflow
1. Obtaining data
1.1. Obtaining a set of relevant data sources
1.2. Mining data from ChemRxiv
1.3. Data annotation
2. Cleaning
2.1. Document parsing with OCR tools
2.2. Document cleaning
3. Strategies to tackle context window limitations
4. Choosing the learning paradigm
5. Beyond text
6. Agents
7. Constrained generation to guarantee syntactic correctness
8. Evaluations
B. Case Studies
9. Research articles vs datasets in chemistry and materials science
10. Collecting data on the synthesis procedures of bio-based adsorbents
11. Retrieving data from chacolgenide perovskites
12. Validation case study: Matching NMR spectra to composition of the molecule
13. Collecting data for reactions procedures
.md
.pdf
Cleaning
2.
Cleaning
#