Skip to main content

Ctrl+K

Introduction and background

Overview of the working principles of LLMs

A. Structured Extraction Workflow

1. Obtaining data
2. Cleaning
- 2.1. Document parsing with OCR tools
- 2.2. Document cleaning
3. Strategies to tackle context window limitations
4. Choosing the learning paradigm
5. Beyond text
6. Agents
7. Constrained generation to guarantee syntactic correctness
8. Evaluations

B. Case Studies

9. Research articles vs datasets in chemistry and materials science
10. Collecting data on the synthesis procedures of bio-based adsorbents
11. Retrieving data from chacolgenide perovskites
12. Validation case study: Matching NMR spectra to composition of the molecule
13. Collecting data for reactions procedures

.md

Cleaning

2. Cleaning#

By Mara Schilling-Wilhelmi, Martiño Ríos-García, Sherjeel Shabih, María Victoria Gil, Santiago Miret, Christoph Koch, Pepe Márquez, and Kevin Maik Jablonka