Validation case study: Matching NMR spectra to composition of the molecule

12. Validation case study: Matching NMR spectra to composition of the molecule#

Following data extraction, it is crucial to implement automated checks to ensure the validity of the extracted data. One effective validation method involves matching the extracted NMR spectra to the corresponding analyzed molecule. This process compares the number of protons and peaks in the molecule’s theoretical NMR spectra with those in the extracted NMR spectra. This approach is similar to that employed by Patiny and Godin [2023].

In this notebook, we demonstrate an example of how to perform this automated validation check. By implementing such checks, researchers can significantly enhance the reliability and accuracy of their extracted spectroscopic data, thereby improving the overall quality of their analyses.

12.1. Data extraction#

The first step in our process involves extracting the NMR spectra and the analyzed molecule using a Large Language Model (LLM). To accomplish this, we developed a basic prompt that includes the desired information and the content of the article. As a result, we obtain the names of the molecules and the NMR spectra of all included molecules in a structured JSON format.

We will start using data from an article by Gevorgyan et al. [2021] that we downloaded manually.

We first define some logic to extract the text from the PDF file and to call the LLM.

import matextract  # noqa: F401
from litellm import completion
import json
from doctr.io import DocumentFile
from doctr.models import ocr_predictor


def convert_pdf_with_doctr(pdf_path, det_arch="db_resnet50", reco_arch="crnn_vgg16_bn"):
    model = ocr_predictor(det_arch=det_arch, reco_arch=reco_arch, pretrained=True)
    model = ocr_predictor(pretrained=True)
    # PDF
    doc = DocumentFile.from_pdf(pdf_path)
    # Analyze
    result = model(doc)

    return result.render()


# Add the content of the XML file to the prompt
def format_prompt(template: str, text: str) -> str:
    return template.format(data=text)


# Define the function to call the LiteLLM API
def call_litellm(
    prompt: str, model: str = "gpt-4o", temperature: float = 0.0, **kwargs
) -> tuple:
    """Call LiteLLM model

    Args:
        prompt (str): Prompt to send to model
        model (str, optional): Name of the API. Defaults to "gpt-4o".
        temperature (float, optional): Inference temperature. Defaults to 0.
        kwargs (dict, optional): Additional arguments to pass to the API.

    Returns:
        tuple: message content and token usage (message_content, input_tokens, output_tokens)
    """
    messages = [
        {
            "role": "system",
            "content": (
                "You are a scientific assistant, extracting NMR spectra and the analyzed molecule "
                "out of XML documents in valid JSON format. Extract just data which you are 100% confident about the "
                "accuracy. Keep the entries short without details. Be careful with numbers."
            ),
        },
        {"role": "user", "content": prompt},
    ]

    response = completion(
        model=model,
        messages=messages,
        temperature=temperature,
        response_format={"type": "json_object"},
        **kwargs,
    )

    # Extract and return the message content and token usage
    message_content = response["choices"][0]["message"]["content"]
    input_tokens = response["usage"]["prompt_tokens"]
    output_tokens = response["usage"]["completion_tokens"]
    return message_content, input_tokens, output_tokens
text = convert_pdf_with_doctr("./gevorgyan.pdf")
ORGANOMETALLICS

AGO
Article

pubxasoOgpmomelics

Improved Buchwald-Hartwig Amination by the Use of Lipids and

Lipid Impurities

Ashot Gevorgyan, * Kathrin H. Hopmann, and Annette Bayer

Cite This: Organometalics 2022, 41, 1777-1785

Read Online

ACCESSI

Lil Metrics & More

Article Recommendations

Supporting Information
Catayst,
Solvent,
Additive

ABSTRACT: The development of green Buchwald-Hartwig aminations has long been Buchwald-Hartwig/ Aminationi in' Vegetable Oilsa and RelatedLipids

considered challenging, due to thel high sensitivity oft the reaction to the environment. Here
we show that food-grade and waste vegetable oils, triglycerides originating from animals,
and natural waxes can serve as excellent green solvents for Buchwald-Hartwig amination.
We further demonstrate that amphiphiles and trace ingredients present in triglycerides as
additives have a decisive effect on the yields of Buchwald-Hartwig aminations.

SMOR

INTRODUCTION
bonds
C-N

containing different functional groups to establish how these
would interfere. The results of over 3000
indicated that the majority of functional additives have a
For functional additives inhibiting the reaction, the authors had
In a recent study, we showed that vegetable oils and related
lipids excellent, sustainable, and safe solvents for Pd-
catalyzed C-C forming cross-coupling reactions. In the
present study, we developed a protocol for the more
challenging Buchwald-Hartwig amination in vegetable oils
and related lipids (Figure IB; for the description ofused lipids
see Figures S1-S19 in the Supporting Information). We also
solvents, including lipids, traditional and green solvents.
RESULTS AND DISCUSSION
First, we were interested in the development of conditions
(Table 1; for a detailed description of the
Special Issue: Sustainable Organometallic Chemistry
Published: October 25, 2021

are omnipresent in natural products and

experiments

pharmaceuticals. According to a recent study, over 62% of

bioactive molecules described in the medicinal chemistry negative influence on the outcome of Buchwald-Hartwig
literature possess a C-N bond in the form of primary, aminations, whereas some completely terminate the reaction.
secondary, or tertiary amines." While different variations of to conduct a full set of optimizations in order to find new

C(sp')-N bond forming reactions were invented over a productive conditions.

century ago, C(sp?)-N bond construction was quite
challenging until the late 1990s but was effectively resolved
by the invention of the Buchwald-Hartwig: amination. 2,3 The
significance of the Buchwald-Hartwig amination was demon-
strated by Brown et al. in their study on and present
synthetic methodologies used in medicinal chemistry, where
the Buchwald-Hartwig amination was found to be among the
Notebooks of a major pharmaceutical company further
evidenced the importance of Buchwald-Hartwig amination
Among Pd-catalyzed cross-coupling reactions, the Buch-
recently (Figure IA). One reason for the slow development
may be that the Buchwald-Hartwig amination is very sensitive
to the reaction conditions, including the Pd precatalyst, the
ligand, the additives, and the solvent. In fact, the reproduci-
drastically depending on the origin and quality of the reagents
used, the Pd source, and solvents. 2a,7 This was well illustrated
by Richardson et al., who performed a model Buchwald-
Hartwig amination in the presence of a range of chemicals

are
bond

-
-
3

8a

past

top 20 most frequently used reactions." Similar surveys by found that trace ingredients, originating from and present in
Schneider et al. on the methodologies used in pharmaceutical triglycerides, are valuable additives to improve the yields of
patents and Gillet et al. on an analysis of Electronic Lab Buchwald-Hartwig amination performed in a wide range of

for the pharmaceutical industry.

wald-Hartwig amination was invented and established most suitable for Buchwald-Hartwig amination in vegetable oils

of experiments
setup
ORGANOMEIALLKGS
/

bility of previously developed methodologies can vary Received: September 13, 2021

02021 The Authors. Published by
American Chemical Society 1777

ACS Publications

pye Organometallics 2022, 41, 1777-1785 1c00517



Organometallics

abiasoyOgmonelis
Pd-catalyst
Non-renewable
solvents

Article

(A) Previous work: C-N bond forming reactions in non-renewable solvents

H2N.

(B) Present study: C-N bond forming reactions in vegetable oils

HzN.

Pd-catalyst
Vegetable oil,
Additive

Figure 1. Previous work on Buchwald-Hartwig: amination in nonrenewable solvents (A)23 and present research on the use of lipids for C-N bond

forming reactions (B) (picture taken by A.G.).
Rapeseed Oil from Askim

Table 1. Optimization of Buchwald-Hartwig Amination in

Similarly, in the case of XPhos it was possible to significantly
reduce the catalyst loading by switching to XPhos Pd G3
Encouraged by the good performance of various catalytic
systems in rapeseed oil from Askim, we examined a
lipids as solvents for Buchwald-Hartwig amination with
XPhos Pd G3/XPhos as the catalytic system (Chart 1).
Initially, we compared reactions performed in rapeseed oils
from six different producers (orange columns). In our previous
studies on C-C bond forming cross-couplings, we found that
the origin of the rapeseed oil has little influence on the
efficiency of the reactions. 8a However, for Buchwald-Hartwig
amination, an initial screening of rapeseed oils showed that the
results were significantly dependent on the choice of supplier
(Chart 1, orange columns). Quantitative yields were only
maintained in rapeseed oils from Odelia (9796) and Anglamark
(9996). For all the other rapeseed oils, the yields went down
significantly (Coop (23%), Rema (43%), Sigma-Aldrich
In addition to rapeseed oils we examined the performance of
nine vegetable oils (orange columns), two triglycerides
originating from animals (pink columns), semisynthetic
triacetin and tributyrin (green columns),10 and three natural
waxes (blue columns) (Chart 1). When other types of lipids
were tested as solvents, similar varying yields were observed.
Low yields were found for sunflower oils (18-34%), soybean
oils (26-44%), corn oil (51%), avocado oil (43%6), a mixture
coconut oil (99%6), butter (99%), fish oil from Sigma-Aldrich
(8296), and waxes (95-100%) (Chart 1). Changing the
(Table S3 in the Supporting Information), which prompted us
A HRMS analysis of rapeseed oil from Askim showed the
presence of glycerol, free fatty acids, monoglycerides, and
surfactants, thus improving the solubility ofthe base and other
which have improved solubility in fats and can act as shuttle
bases or phase transfer catalysts. Amphiphiles can form

Pd-catalyst (mol%),
12
CF3 Ligand (mol%),
Rapeseedo K,CO, (2€ oil equiv.) (Askim),
CF3
110°C, 24h
1a, 1.56 equiv. 2a,1 1equiv.
Pd,(dba); (2.5)
tBuXPhos (10)
2 Pda(dba); (2.5)
XPhos (10)
3 Pda(dba); (0.5)
XPhos (2)
4 Pd,(dba); (2.5)
DavePhos (10)
5 Pd,(dba); (2.5)
SPhos (10)
6 Pd,(dba); (2.5)
BrettPhos (10)
7 Pd,(dba); (2.5)
RuPhos (10)
8 Pd,(dba); (2.5)
JohnPhos (10)
9 XPhos Pd G3 (2)
XPhos (2)
10 fBuXPhos Pd G3 (2) tBuXPhos (2)
11 Pda(dba); (2.5)
XantPhos (6)
12 Pda(dba); (2.5)
Ad,BuPHI (10)
13 Pd,(dba); (2.5)
Bu,PHBF, (10)
14 Pd,(dba); (2.5)
QPhos (10)
15 [PdcI(allyl)), (2.5) IPrHCI (6)
16 Pd(PPh,) (5)
as an internal standard. 'Isolated yield.

(entry 9).

of
range

Meo"

Meo
CF3
3a
77
100
92
84
97
100
100
31
100/99h
100
98
42
20
0
13
3

entry Pd catalyst (mol %) ligand (mol %) yield of 3a (9)-

(26%6)).

Yields were determined by 'H NMR using 135trimethoybenzene

see Figures S20-S29 in the Supporting Information). The of oils (6396), and fish oil (1496), while good to quantitative
reaction was initially examined in rapeseed oil from the brand yields were obtained in triacetin (92%6), tributyrin (100%),
Askim, using reference substrates 4-methoxyaniline (la) and olive oil (100%), sesame oil (88%6), rice bran oil (81%6),
Information). A number of sterically constrained strong 0- catalyst or catalyst loading did not improve the low yields
catalyst precursor. Among the tested Buchwald ligands, the to examine the existence and effects of various ingredients
while XPhos, SPhos, BrettPhos, and RuPhos resulted in diglycerides. These compounds are amphiphiles that can act as
chelating ligand XantPhos gave excellent yields (98%, entry ingredients of the reaction in oils. 11 Fatty acids can react with
11), whereas other bulky phosphines (Ad,BuPHI, tBu,PHBF the base (K,CO3), generating corresponding potassium salts,
possible to improve the yield from 77% to quantitative by reversed micelles'2 that have been shown to act as micro-
changing the source of Pd to tBuXPhos Pd G3 (entry 10). reactors for Pd-catalyzed transformations. 13 Accordingly, we

3-bis(trifluorometlybromobenzene (2a) (for complete
optimization tables see Tables S1-S5 in the Supporting
donor phosphine ligands were combined with Pd,(dba); as the
results were unsatisfactory only for JohnPhos (31%, Table 1,
entry 8)." The yields oft the amination product 3a were good for
fBuXPhos (7796) and DavePhos (8496) (entries 1 and 4),
quantitative yields (entries 2 and 5-7). Moreover, the
QPhos) and NHC ligands (IPrHCI) were not effective (entries
12-15). It is worth noting that in the case of (BuXPhos it was

present in natural lipids.

1778

pye 1c00517
Organometallics 2022, 41, 1777-1785



Organometallics

abiasoyOgmonelis
XPhos PdG3 (2 mol%),
CFs XPhos (2 mol%),
KCO, (2 equiv.)
Solvent,
Meo
110°C, 24h
100%00%
100% 92%
90%
99% 99%
80%
88%
70%
81%
60%
50%
63%
40%
43%
44% 51%
30%
23%
34%
43%
20%
26%
10%
18%
26%
0%
14%

a &
A C
& à
& &
B - -
d &
&
C
à
à &
a
& o 1 CoR
& & &
&
à
P
- &
-
-
M -
1 -
/ & 
- -
 7
-
&
/ 7 7
/
7
&
7

Article

Chart 1. Screening of Solvents for Buchwaldl-Hartwig Amination"

NH2
or
Meo
CFs
1a
2a
97% 99%
100%

CF3
3a CF3
10096100% 100% 100%
95%

100%
82%
48%

82%

75%

/ / /. A

/
d
/ /

"Yields were determined by 'H NMR using 3-trimctholybenzene as an internal standard.
Chart 2. Screening of Additives for Buchwald-Hartwig Amination*

Br
NH2
2a

CFs XPhos Pd G3 (2 mol%),
XPhos (2r mol%),
KCO, (2e equiv.)
Additive (50 mg), Meo"
CF3 Rapeseed oil (Sigma) (2 mL),
110°C, 24h
92% 92% 91%

CF3
CF3
3a
94% 98%
100%

Meo
100% 1a
90%
80%
70%
60%
50%
40%
30% 26%
20%
10%
0%
*  
& &
spl ol
ZA - 4
cpl
P 7

77%
65%
46%
29%
23%
5%

53%

37%

20%

d &
7 7
-

- * se
o
-
A
-
s

M - 1 1

"The quantity of glycerol was 1 drop. bThe quantity of monolaurin was 10 mg. Magic Mix consists of a mixture of soy phospholipids (5 mg), soy
PC (209) (5 mg), glycerol (1 drop), palmitic acid (5 mg), behenic acid (5 mg), monopalmitin (5 mg), and monolaurin (5 mg). dYields were
determined' by 'H NMR using 13trimethonybenzene as an internal standard.. All experiments were performed in rapeseed oil from Sigma-Aldrich
set a series of control experiments to investigate the effect of additives showed notable improvements in yields (from 26% to
various commercially available natural amphiphiles (50 mg) on 7796). A considerable increase in yields of Buchwald-Hartwig
the reaction in rapeseed oil (2 mL) from Sigma-Aldrich (Chart amination reaction was observed when glycerol (929), fatty

(2 mL) using 0.683 mmol (200 mg) of the limiting reagent (aryl halide 2a).

2). Our focus was on amphiphiles found in lipids and
originating from vegetable oils, such as phospholipids, the
products of hydrolysis of triglycerides, and fat-soluble vitamins
(for a detailed description of the additives used, see general
considerations in the Supporting Information). 11
phospholipids, soy PC (20%), and soy PC (959)) used as

acids such as palmitic acid (92%) and behenic acid (91%), and
monoglycerides such as monopalmitin (9496) and monolaurin
(9896) were used. On the other hand, (+)-s-tocopherol (29%),
cholesterol (5%), and cholecalciferol (23%6) were not effective,
distearin (53%) and 1,2-dipalmitin (20%). Addition of water

Initial trials with three different sets of phospholipids (soy and the same was observed with diglycerides, such as 1,2-

1779

https/doiorg/10.1 1021/acs.organomet: 1c00517
Organometallics 2022, 41, 1777-1785



Organometallics

abiasoyOgmonelis

Article

Chart 3. Buchwald-Hartwig Amination in the Presence of the Magic Mix"

Yields of Buchwald-Hartwig amination in lipids and common solvents
Vields oft the reactions performed in the presence of the Magic Mix

XPhos Pd G3 (2r mol9),
CF3 XPhos (2r mol%),
KCO, (26 equiv.)
Magic Mix,
Solvent (2 mL),
110°C, 24h

NH2
Meo
1a
100%
90%
80%
70%
60%
63%
50%
43%
51%
40%
1%
4%
43%
30% 23%
6%
20%
6%
8%
10%
0%
14%

à
-
&
&
a
-
& &
&
-
& &
à
-
& C
à &
- 7
A 1
7
-
7
7 7

Br

CF3

Meo
95%
75%

CF3
2a

3a CF3
100%
82%

100%1 100%1 100%: 100% 100%

95% 100% 98% 97% 100% 100% 100%

48%

A / /

See Chart 2. Yields were determined by 'H NMR using S-trimctholybenzene as an internal standard.
Scheme 1. Influence of the Magic Mix on Hiyama (A) and Heck (B) Cross-Couplings"

(A)7 The influence of the Magic Mix on Hiyama coupling

Pd(OAc)2 (4r mol%),
CF3 Ad,BuPHI (10 mol%),
TBAFX3H20 (2€ equiv.)
Sunflower oil
(Sigma) (2 mL),
Additive, 80°C, 24h
Pd2dbas (0.5 mol%),
CF3 (Bu,PHBF4 (2r mol%),
Bu4NOAc (2 equiv.),
Solvent (3 mL),
Additive,
120°C, 30h

CF3
CFs

Br
CF3
2a
Br
CF3
2a

Si(OEt)3
4a

5a, No Additive, 72%
Magic Mix, 80%

(B) The influence oft thel Magic Mix on Heck coupling

3
CF3

Me
6a

Me

7a, Tributyrin, No Additive, 65%
Tributyrin, Magic Mix, 73%
Toluene, No Additive, 65%
Toluene, Magic Mix, 84%

Yields were determined by 'H NMR.
hydrolysis generating amphiphiles.

slightly improved the yield (37%), probably due to partial
Eventually, we found that a combination of the best
additives, including soy phospholipids (5 mg), soy PC (20%)
(5 mg), glycerol (1 drop), palmitic acid (5 mg), behenic acid
Mix) in rapeseed oil (2 mL) from Sigma-Aldrich, increases the
combination of additives (Magic Mix) was also effective in

soybean oil from 34% and 44%, respectively, to quantitative
Intrigued by the good effect ofthe Magic Mix on the yield of
Buchwald-Hartwig amination in lipids, we tested the Magic
Mix as an additive for reactions in traditional and green
find that the low yields of Buchwald-Hartwig aminations in
(from 48% to 100% yield), and nonane (from 82% to 100%

(Chart 3, orange columns).

(5r mg), monopalmitin (5 mg), and monolaurin (5 mg) (Magic solvents, which gave unsatisfactory results. We were pleased to
yield of the reaction from 26% to quantitative. The acetal (l-diethoxzyethane, from 75% to 100% yield), toluene
increasing the yields in other lipids such as sunflower oil and yield) can be significantly improved (Chart 3).

1780

https/doiorg/10.1 1021/acs.organomet: 1c00517
Organometallics 2022, 41, 1777-1785



Organometallics

abiasoyOgmonelis

Article

Chart 4. Buchwald-Hartwig Amination in Waste and Inedible Oils"

Yields ofE Buchwald-Hartwig amination in waste and inedible oils
Yields of the reactions performed in the presence of the Magic Mix

XPhos Pd G3 (2 mo1%),
CF3 XPhos (2r mol%),
K,CO3 (26 equiv.) -
Magic Mix,
Solvent (2 mL),
110°C, 24h

H2
Br
Meo
CF3
1a
2a
100%
80%
60%
40%
23%
23%
20%
0%
s
s
6
d



A A A A
a
de
de
de
h
A A
3 / F F F
/ /
a
de
de
1
d8
de
A
/ A
de
de
3
A A /
2
2
2

N.
CF3
3a CF3
92%

Meo"

100% 100% 100% 100% 100% 100%

72%

45%

38%

35%

"Yields were determined by 'H NMR using 3S-trimethonybenzene as an internal standard.

The influence of the Magic Mix on the outcome of other 929, while the 45% yield obtained in castor oil could be
low-yielding cross-couplings in lipids, 8a such as Hiyama and a increased to quantitative by application of the Magic Mix.
1). For these transformations, the improvement in yield due to Buchwald-Hartwig amination in lipids as solvents. First, we

Heck cross-coupling reactions, was briefly examined (Scheme
the addition of the Magic Mix was between 10% and 20%. The
yield of the Hiyama cross-coupling of phenyltriethonyailane
(4a) and s-bis(rifluoromethybromobenzene (2a) in sun-
flower oil from Sigma-Aldrich was increased from 72% yield
in tributyrin or toluene was improved from 65% to 73% and
from 65% to 84% using the Magic Mix, respectively (Scheme
Magic Mix for other reactions and traditional solvents.
The majority of examined vegetable oils described above
with inedible castor and jojoba oils (Chart 4). Waste rapeseed
oils were obtained by frying potatoes at 130 C for 2, 4, 6, and

Eventually, we analyzed the scope and limitations of the
identified the most efficient catalytic systems for unactivated
aryl halides and sulfonates (Table S5 in the Supporting
Information). Here, the best yields were reached when a
BrettPhos Pd G3/BrettPhos- or tBuXPhos Pd G3/tBuXPhos-
anilines (3a-c; 94-99%) and heterocycles like indole and
Pyrrole (3g,h; 65-919) (Scheme 2). Satisfactory yields were
observed for secondary anilines (3d,e; 55-62%6),
amines (3f, 5096) and phenols (3i; 569).. Attempts to improve
57%) by the use of the Magic Mix were not successful.
Screening of aryl halides and sulfonates showed that the
(Scheme 2). Both electron-rich (3j-m; 96-999) and
electron-deficient (3n,0; 97-99%) aryl bromides and chlorides

(no additive) to 80% (with the Magic Mix) (Scheme IA). The based catalytic system was used. Using these catalysts, good
yield of the Heck coupling of p-methylstyrene (6a) performed yields were obtained for various nucleophiles, such as primary
(7a), observed in toluene, illuminates the potential of the the yields for a tertiary aniline (3e; 51%) and an ether (3i;
were: food grade. To avoid a competition between the need for developed methodology can be an excellent tool for the
food and chemicals, we examined waste rapeseed oils, along production of secondary anilines in quantitative yields
81 h, respectively, in rapeseed oil of the brand Coop (for details as well as aryl triflates gave excellent results. The yields of
see, general considerations in the Supporting Information). The Buchwald-Hartwig amination were moderate only in the case
yields of Buchwald-Hartwig amination in waste rapeseed oils of electron-rich and labile heterocycles (3p,9; 49-71%), such
used for frying for 2, 4, and 6 h were close to the yield as 3-bromothiophene and 2bromobemzothiophene For the
observed for the corresponding virgin rapeseed oil (23-389). last two systems, the amination products are highly unstable
during frying. Addition of the Magic Mix to the reactions in We have shown that Buchwald-Hartwig: aminations can be
waste rapeseed oils increased the yields to quantitative (Chart successfully realized in vegetable oils, triglycerides originating
4, orange columns). In the case of inedible oils, the yield of from animals and natural waxes as solvents. The presented

near
increase
IB). A
20%

primary

in the yield of stilbene derivative

However, the yield of the reaction in waste rapeseed oil used and must be stored in the freezer.

for frying for 8 h was increased to 72%, probably due to the
formation of amphiphiles by partial oxidation and hydrolysis
Buchwald-Hartwig amination performed in jojoba oil was

CONCLUSIONS

results highlight the excellent performance of safe and cheap

1781

https.Idoiorg/10.1 ONIAsOgAOmstIoRIY
Organometallics 2022, 41, 1777-1785



Organometallics

abiasoyOgmonelis
Method A: XPhos PdG3 (2r mol%),
XPhos (2 mol%6), K,CO3 (2 equiv.), R2
R3 Rapeseed oil (Askim), 110°C, 24h,
Method B: (BuXPhos PdG3 (2r mol%), R1
(BuXPhos (2 mol%), K2CO3 (2 equiv.),
Rapeseed oil (Askim), 110°C, 24h

Article

Scheme 2. Scope of Buchwald-Hartwig Amination in Rapeseed Oil from Askim"

R2
NH
R1
Meo,

X.
2
CF3
CF3 F3c
H
3a, Method A
X= Br, 99%
CF3
CF3
Me
3d, Method A
X= Br, 62%
CFs
CF3

R3
3
CF:
CF3
H
3c, Method A
X=E Br, 97%
CF3
CF3
CFs
CFs

CF3
Me.
CF3
3b, Method A
X= Br, 94%
CF3
CF3
CF3
Me.
CF3
3h, Method A
X= Br, 65%
Me.
Me
3j, Method B
X=E Br, 96%

3e, Method A Method B 3f, Method A Method B
X=E Br, 55% X=E Br, 0% X= Br, 50% X=Br, 0%

3g, Method A
X=E Br, 91%
Me,

3i, Method A Method B
X= Br, 28% X: =Br, 56%
Me Me.

3k, Method B
X=E Br, 99%
X=CI, 99%
Me,

31, Method B
X=CI,9 98%
X=OTf,99%
3 Me,

Me,

SFS

3m, Method B
X= Br, 96%
Me,

3n, Method B
X= Br, 99%
X= CI, 99%
Me,

30, Method B
X= Br, 97%

3p, Method B
X=E Br, 71%

3q, Method B
X= Br, 49%

"The yields refer to isolated products.

lipids as replacements for traditional solvents. Unlike other
cross-couplings, the Buchwald-Hartwig: amination is very
that small quantities of hydrolysis products from
triglycerides as well as phospholipids can have a decisive effect
on the efficiency of Buchwald-Hartwig amination. On the
of this observation, we introduced a mixture of
amphiphiles, originating from triglycerides, which can be
aminations and other cross-couplings in both lipids and

EXPERIMENTAL SECTION
General Considerations. Commercially available starting materi-
used without further purification. Flash column
performed with Merck silica gel 60 (230-400 mesh). chromatography The solvents for
column chromatography were distilled before use (in the case of
technical solvents). Thin-layer chromatography was carried out using
Merck TLC silica 60 and visualized by short-wavelength
ultraviolet light or by treatment with potassium permanganate
(KMnO4) staining. 'H, 13C, and 19F NMR spectra were recorded
spectra are reported in parts per million (ppm) downfield of TMS and
were measured relative to the signal for CHCI (7.26 ppm). All 13C
NMR spectra are reported in ppm relative to residual CDCI; (77.20

sensitive to the nature of the used solvents. Our studies als, reagents, catalysts, and anhydrous and degassed solvents were

was

indicate
basis

gel Fzs

used to improve the performance of Buchwald-Hartwig on a Bruker Avance 400 MHz spectrometer at 20 oC. All 'H NMR

classical organic solvents.

1782

https/doiorg/10.1 1021/acs.organomet: 1c00517
Organometallics 2022, 41, 1777-1785



Organometallics

abiasoyOgmonelis

Article

ppm) and were obtained with 'H decoupling. Coupling constants, 1 product was obtained as a colorless oil: yield 94% (0.239 & method
are reported in Hertz (Hz). High-resolution mass spectra (HRMS) A). 'H NMR (400 MHz, CDCI): 8 6.11 (brs s, IH, NH), 7.33-7.36
were recorded from methanol solutions on an LTQ Orbitrap XL (m, 3H, Ar), 7.44 (s, 3H, Ar), 7.49 (6J = 7.8 Hz, 1H,Ar). 13C NMR
(Thermo Scientific) in either negative or positive electrospray (101 MHz, CDCI): 8 114.5 (hept, J = 3.9 Hz), 116.3-116.5 (m),
General Experimental Procedure for Buchwald-Hartwig Hz), 130.6, 132.6 (g, J: = 32 Hz), 133.2 (q,J = 33 Hz), 141.6, 144.4.

ionization (ESI) mode.

120.1 (q,J = 3.8 Hz), 122.5, 123.4 (g, J = 271 Hz), 124.0 (q,J: = 270
HRMS-EI (m/z): [M + HJ* calcd for CIsH,F,N 374.0586, found
M-p-loyp-35bistrifluromethylonline (3c). Starting from
0.683 mmol of the corresponding aryl halide, the product was
obtained as a white solid: yield 97% (0.211 g method A). 'H NMR
(400 MHz, CDCI3): 82.40 (s,3H, Me), 5.89 (brs s, 1H,NH), 7.09 (d,

Amination.. Method A. Inside of an Ar-filled glovebox an oven-dried
10 mL flask was sequentially charged with XPhos Pd G3 (2 mol %), 374.0583.
XPhos (2 mol 9), K,CO; (2 equiv), and the appropriate nucleophile
(1.5 equiv). The flask was sealed with a rubber septum, removed from
the glovebox, and equipped with an Ar balloon. Next, rapeseed oil
from. Askim (2 mL) andt the corresponding aryl halide (1 equiv, 0.683
product according to one of the following methods.
(I) The reaction mixture was transferred onto the top of a column,
reaction vial was washed with 1 mL of DCM that was diluted
with 1 mL of heptane and transferred onto the top of the
column. This was followed by a classical column separation
applied for another experiment. NMR spectra of the rapeseed
oil before and after the reaction wer identical.
distillation apparatus (Kugelrohr) and
heated to 250 °C under reduced pressure. The product was
in the receiving bulb within 40-60 min. If
necessary, the condensed product could be further purified
containing the reaction mixture could be filtered through a
another experiment. NMR spectra of rapeseed oil before and
(II) The reaction mixture was transferred into a 250 mL round-
bottom flask. The reaction vial was washed using EtOAc or
THF and transferred into the 250 mL flask containing the
reaction mixture. Afterward, the volatiles were removed using a
stirred under reflux for 4 h. The resulting mixture was treated
with 100 mL of saturated NaCl solution, transferred into a 500
and further purified using column chromatography.
Method B. Inside of an Ar-filled glovebox an oven-dried 10 mL
flask was sequentially charged with tBuXPhos Pd G3 (2 mol 96),
tBuXPhos (2 mol %), K,CO; (2 equiv), and the appropriate
rapeseed oil from Askim (3 mL) and the corresponding aryl halide/
110 oC for 24 h. Afterward, the reaction mixture was cooled, which
Characterization of Products. MMetophemy35b5
trifluoromethylaniline (3a). Starting from 0.683 mmol of the
yield 99% (0.228 g, method A). 'H NMR (400 MHz, CDCI): 8 3.66
(s, 3H, OMe), 5.66 (br s, 1H, NH), 6.75-6.79 (m, 2H, Ar), 6.93-
6.97 (m, 2H, Ar), 7.00 (s, 2H, Ar), 7.07 (s, 1H, Ar). 13C NMR (101
147.3, 157.2. HRMS-EI (m/z): [M + H]* calcd for CIsH,,FNO
(3b). Starting from 0.683 mmol of the corresponding aryl halide, the

mmol) were added sequentially. The Ar balloon was removed, and the J = 8.1 Hz, 2H, Ar),7.23 (d,J = 8.0 Hz, 2H, Ar), 7.32 (s, 3H,Ar). 13C
resulting mixture was stirred at 110 "C for 24 h. Afterward, the NMR (101 MHz, CDCI3): & 20.9, 112.5 (hept,] J = 4.0 Hz), 114.6 (g,
reaction mixture was cooled, which was followed by isolation of the J = 4.0 Hz), 121.6, 123.6 (q,J = 271 Hz), 130.6, 132.8 (q, = 32.9
filled with silica gel, using a disposable Pasteur pipet. The from 0.683 mmol of the corresponding aryl halide, the product was
using mixtures of heptane as eluent. Rapeseed oil can be 13CI NMR (101 MHz, CDCI3): 0 40.6, 111.2 (P,J= = 3.9 Hz), 114.4(g,
washed out from the column using ethyl acetate, dried, and J = 4.0 Hz), 123.8 (q,J = 271 Hz), 125.7, 126.0, 130.4, 132.4 (<J =
() The flask containing the reaction mixture was attached to a 0.683 mmol of the corresponding aryl halide, the product was

Hz), 134.2, 137.7, 146.3.

MMetly-Mpheyl35osmethylanine (3d). Starting
obtained as a white solid: yield 62% (0.134 g, method A). 'H NMR
(400 MHz, CDCI3): 8 3.37 (s, 3H, NMe), 7.14 (s, 2H, Ar), 7.18-
7.20 (m, 2H, Ar), 7.22-7.26 (m, 2H, Ar), 7.40-7.46 (m, 2H, Ar).
NM-Dpheny3Sbistrmluoromethylonline (3e). Starting from
obtained as a white solid: 55% (0.142 method 0%
method B). 'H NMR (400 MHz, CDCI3): 8 7.15-7.24
7.34-7.43 (m, 7H, Ar). 13C NMR (101
- 4.01 Hz), 120.5 (q,J = 4.01 Hz), 123.5 (q,J = 271 Hz), 125.2, 125.6,
N-Benzyl35-bistrifluoromethylanline (3f). Starting from 0.683
colorless oil: 50% (0.109 method 0% method 'H
NMR (400 MHz, CDCI;): 8 4.38-4.46 3H, CH,/NH), 6.99
2H, Ar), 7.19 (s, 1H, Ar), 7.32-7.42 (m, SH, Ar). i3c NMR (101
MHz, CDCI): 8 48.2, 110.6 (P,J= 4.0 Hz), 112.1 (q,J = 4.0 Hz),
123.7 (q,J = 271 Hz), 127.8, 128.1, 129.1, 132.6 (g,J = 32.8 Hz),
85Bstrifluoromethylnenyl-1Hindole (3g). Starting from
(400 MHz, CDCI3): 8 6.80 (d,J= 3.4 Hz, 1H, Ar), 7.25-7.29 (m,
1H, Ar), 7.30-7.38 (m, 2H, Ar), 7.58 (d,J = 8.2. Hz, IH,Ar),7.74(3,
123.1 (q, = 271 Hz), 123.7, 124.0 (@ I = 3.7 Hz), 127.3, 130.0,
B5-BstrMloromethylphenyl-1lHpymole (3h). Starting from
0.683 mmol of the corresponding aryl halide, the product was
2H, pyrrole), 7.76 (s, 1H, Ar), 7.82 (s, 2H, Ar). 13C NMR (101 MHz,
HPloyoy35bstrnlluoromethylben.ene (3i). Starting from
obtained as a colorless oil: yield 28% (0.062 g, method A), 56%
(0.122 g method B). 'H NMR (400 MHz, CDCI3): 8 2.40 (s, 3H,
Me), 6.96-7.00 (m, 2H, Ar), 7.23-7.26 (m, 2H, Ar), 7.38 (s, 2H,
(hept,J=4 4.0 Hz), 117.6 (q,J= = 3.8 Hz), 120.2, 123.2 (q,J = 271 Hz),
131.1, 133.3 (g, J: = 33.6 Hz), 135.3, 152.8, 159.5.
Methy-Mpiolylaniline (3j). Starting from 0.877 mmol of the
(s, 3H, Me), 2.40 (s, 3H, Me), 5.38 (br s, 1H, NH), 6.96-7.02 (m,
3H, Ar), 7.16-7.23 (m, 3H, Ar), 7.26-7.28 (m, 2H, Ar). 13C NMR

32.7 Hz), 147.2, 150.0.

vacuum
short-path
condensed

yield

&
A), (0g
(m, 6H, Ar),
CDCI;): 8 114.2
MHz,
(p,]
B).
(s,

by column chromatography. Rapeseed oil from the flask 130.1, 132.6 (q1 = 33.0 Hz), 146.3, 149.5.

short pad ofs silica gel using ethyl acetate, dried, and applied for mmol of the corresponding aryl halide, the product was obtained as a

yield

g,

A), (0 g,
(m,

after
the reaction

were identical.

137.8, 148.8.

rotary evaporator followed by addition of 40 mL of a 5 M 0.683 mmol of the corresponding aryl halide, the product was
NaOH solution. The flask was equipped with a condenser and obtained as a white solid: yield 91% (0.204 g method A). 'H NMR
mL separating funnel, and extracted with DCM (3 x: 50 mL). J=7 7.7Hz, 1H,Ar), 7.90 (s, 1H, Ar), 8.02 (s, 2H,Ar). 13CNMR (101
The organic fractions were collected, evaporated to dryness, MHz, CDCI): 8 106.1, 109.9, 119.8 (p,] = 3.8 Hz), 121.7, 121.9,

133.5 (q,J= 33.8 Hz), 135.6, 141.5.

nucleophile (1.5 equiv). The flask was sealed with a rubber septum, obtained as a white solid: yield 65% (0.123 g method A). 'H NMR
removed from the glovebox, and equipped with an Ar balloon. Next, (400 MHz, CDCI,): 8 6.44-6.45 (m, 2H, pyrrole), 7.15-7.16 (m,
sulfonate ester (1 equiv, 0.869-0.948 mmol) were added sequentially. CDCI3): 8 112.5, 119.0 (P,J: = 3.8 Hz), 119.3, 120.2 (g, J = 4.0 Hz),
The Ar balloon was removed, and the resulting mixture was stirred at 123.2 (q,]: = 271 Hz), 133.4 (gJ= 33.7 Hz), 142.0.
was followed by isolation of the product according to one of the 0.683 mmol of the corresponding aryl halide, the product was
corresponding aryl halide, the product was obtained as a white solid: Ar), 7.56 (s, 1H, Ar). 13C NMR (101 MHz, CDCI): 8 21.0, 116.0
MHz, CDCI3): 855.7, 111.9 (hept,) J = 4.0 Hz), 113.8 (q,J=3 3.8 Hz), corresponding aryl halide, the product was obtained as a white solid:
115.3, 123.7 (q, J = 271 Hz), 124.8, 132.8 (q, J = 33 Hz), 133.0, yield 969 (0.166 g method B). 'H NMR (400 MHz, CDCI3): 82.34
35-8strifluoromethy)N-.tinuoromethylphenylanline (101 MHz, CDCI3): 8 18.0, 20.8, 117.4, 118.8, 121.2, 126.9, 127.2,

methods described in Method A.

336.0818, found 336.0823.

130.0, 130.5, 131.0, 141.2, 142.2.

1783

https//doi.org/10.1 1021/acs.organomet: 1c00517
Organometallics 2022, 41, 1777-1785



Organometallics

abiasoyOgmonelis

Article

Di-p-tolylamine (3k). Starting from 0.877 mmol of the
corresponding aryl bromide and 0.948 mmol of the corresponding
(0.172 g, X=1 Br, method B), 99% (0.185 g, X= CI, method B). 'H
NMR (400 MHz, CDCI3): 8 2.43 (s, 6H, 2xMe), 5.54 (br S, 1H,
NH), 7.06-7.09 (m, 4H, Ar), 7.18-7.21 (m, 4H,Ar). 3CI NMR (101
MHz, CDCI3): 8 20.7, 118.0, 129.9, 130.2, 141.3.
Methy-n-pheny/anline (31). Starting from 0.888 mmol of the
corresponding aryl chloride and 0.884 mmol oft the corresponding aryl
triflate, the product was obtained as a white solid: yield 98% (0.160 g
X = CI, method B), 99% (0.160 g, X= OTf, method B). 'H NMR
(400 MHz, CDCI,): 8 2.45 (s, 3H, Me), 5.66 (br s, 1H, NH), 7.00-
7.04 (m, 1H, Ar), 7.10-7.14 (m, 4H, Ar), 7.22 (a,J = 8.1 Hz, 2H,
(m, 2H, Ar). 13C NMR (101 MHz, CDCI3): 8 20.8,
117.0, 119.0, 120.4, 129.4, 130.0, 130.9, 140.4, 144.0.
Mploymophtaen2omine (3m). Starting from 0.869 mmol
of the corresponding aryl halide, the product was obtained as a white
solid: yield 96% (0.195 g, method B). 'H NMR (400 MHz, CDCI3):
8 2.44 (s, 3H, Me), 5.78 (br s, 1H, NH), 7.16-7.18 (m, 2H, Ar),
IH, Ar), 7.47-7.51 (m, 1H,Ar), 7.71 (d,J = 8.2 Hz, 1H, Ar), 7.81 (t,
J = 8.9 Hz, 2H, Ar). 13C NMR (101 MHz, CDCI3): 6 20.9, 110.5,
119.5, 119.7, 123.3, 126.5, 126.5, 127.8, 129.0, 129.2, 130.1, 131.5,
0.889 mmol of the corresponding aryll bromide and 0.886 mmol ofthe
solid: yield 99% (0.220 g, X= Br, method B), 99% (0.220 g, X= CI,
method B). 'H NMR (400 MHz, CDCI;): 8 2.43 (s, 3H, Me), 5.85
(br s, 1H, NH), 7.02 (d,J = 8.5 Hz, 2H, Ar), 7.11-7.13 (m, 2H,Ar),
(101 MHz, CDCIz): 8 20.9, 114.7, 121.1 (q, J= = 33 Hz), 121.2, 125.0
(qJ = 269 Hz), 126.8 (q,J= 3.8 Hz), 130.2, 133.1, 138.5, 147.7.
0.883 mmol of the corresponding aryl halide, the product was
obtained as a white solid: yield 97% (0.265 g, method B). 'H NMR
(400 MHz, CDCI,): 8 2.40 (s, 3H, Me), 5.77 (br s, IH, NH), 7.06-
7.13 (m, 3H, Ar), 7.20-7.22 (m, 2H, Ar), 7.25-7.33 (m, 2H, Ar),
7.39-7.40 (m, IH, Ar). 13C NMR (101 MHz, CDCI3): 8 20.9, 113.5
(p,J= 4.7 Hz), 117.0 (P,J= 4.7 Hz), 118.7, 120.3, 129.5, 130.3,
132.9, 138.9, 145.0, 155.1 (P,J: = 16.5 Hz). HRMS-EI (m/z): [M +
H]* calcd for CIH,FNS 310.0683, found 310.0679.
the corresponding aryl halide, the product was obtained as a brown
solid: yield 71% (0.124 g, method B). 'HI NMR (400 MHz, CDCI3):
2H, Ar), 7.28 (dd, I = 5.1, 3.1 Hz, 1H, thiophene). 13C NMR (101

AUTHOR INFORMATION

aryl chloride, the product was obtained as a white solid: yield 99% Corresponding Author

Ashot Gevorgyan Department of Chemistry, UiT The. Arctic
University of Norway, 9037 Tromse, Norway; o orcid.org/
0000-0002-7968-6695; Email: evorgyanashot@uitno
Kathrin H. Hopmann - Department of Chemistry, UiT The
Arctic University of Norway, 9037 Tromse, Norway;
0 alagnoo0ns2Ps7es
Annette Bayer - Department of Chemistry, UiT The Arctic
University
0000-0003-3481-200X ofNorway,
Complete contact information is available at:
https//pubsacsorg/10.1021/acsorganometl00517

Authors

9037 Tromse, Norway; e orcid.org/

7.35-7.39
Ar),

Author Contributions

7.22-7.25 (m, 3H, Ar), 7.36-7.40 (m, 1H, Ar), 7.43 (d,J = 2.3 Hz, A.G. directed the project, designed and carried out the
Methy-M-ctrluoromethyphenylantine (3n). Starting from All authors discussed the results and reviewed the manuscript.
corresponding aryl chloride, the product was obtained as a white The authors declare no competing financial interest.
7.23 (a,J = 8.1 Hz, 2H, Ar), 7.52 (d,J = 8.5 Hz, 2H, Ar). 13C NMR This work was performed with support from NordForsk
3-Pemtaluoro."sulany-Mp-olylentine (30). Starting from No. TFS2016KHH). We thank Truls E. Ingebrigtsen for

experiments, and analyzed the data. A.G. wrote the main
manuscript text. K.H.H. and A.B. provided advice to the
research and manuscript and granted funding for the research.

134.8, 140.2, 141.8.

Notes

ACKNOWLEDGMENTS

(Grant No. 85378), the Research Council of Norway (Grant
No. 313462), and the Tromso Research Foundation (Grant

technical
support.
REFERENCES

(1) Ertl, P.; Altmann, E.; McKenna, J. M. The Most Common
Functional Groups in Bioactive Molecules and How Their Popularity
Has Evolved over Time. J. Med. Chem. 2020, 63, 8408-8418.
(2) For selected reviews on Buchwald-Hartwig amination, see:
Palladium-Catalyzed Amination. Angew. Chem., Int. Ed. 2008, 47,
6338-6361. (b) Ruiz-Castillo, P.; Buchwald, S. L. Applications of
M. The Buchwald-Hartwig: Amination After 25 Years.. Angew. Chem,
(3) For selected studies on Buchwald-Hartwig amination, see:

Mploylhephenjamine (3p). Starting from 0.920 mmol of (a) Surry, D. S.; Buchwald, S. L. Biaryl Phosphane Ligands in
82.36 (s, 3H, Me), 5.66 (br s, 1H, NH), 6.70 (dd, J=3 3.1,1.5Hz, 1H, Palladum-Catalyzed C-N Cross- Coupling Reactions. Chem. Rev.
thiophene), 6.92-6.97 (m, 3H, Ar/thiophene), 7.13 (d,J= 8.1 Hz, 2016, 116, 12564-12649. (c) Dorel, R.; Grugel, C. P.; Haydl, A.
N-p-Tolylbenzolbthiophen-2-amine (3q). Starting from 0.892 (a) Ali, M. H.; Buchwald, S. L. An Improved Method for the
mmol oft the corresponding aryl halide, the product was obtained as a Palladum-Catalyzed Amination of Aryl Iodides. J. Org. Chem. 2001,
brown solid: yield 49% (0.104 g method B). 'H NMR (400 MHz, 66, 2560-2565. (b) Kataoka, N.; Shelby, Q:; Stambuli, J. P.; Hartwig,
CDCI3): 8 2.35 (s, 3H, Me), 5.91 (br s, 1H, NH), 6.76 (s, 1H, Ar), J. F. Air Stable, Sterically Hindered Ferrocenyl Dialkylphosphines for
7.04-7.07 (m, 2H,Ar), 7.14 (a,] = 8.1 Hz, 2H, Ar), 7.20-7.25 (m, Palladium-Catalyzed C-C, C-N, and C-O Bond-Forming Cross-
J= 8.0 Hz, 1H, Ar). isC NMR (101 MHz, CDCI): 8 20.8, 107.5, Utsunomiya, M.; Hartwig, J. F. Scope and Mechanism of Palladium-
147.6. HRMS-EI (m/z): [M + H]* calcd for CIH,4NS 240.0841, Chem. 2003, 68, 2861-2873. (d) Huang, X.; Anderson, K. W.; Zim,

MHz, CDCI): 8 20.7, 105.0, 116.4, 122.6, 125.2, 129.6, 130.0, 142.2, Int. Ed. 2019, 58, 17118-17129.

142.4.

1H,Ar), 7.29-7.34 (m, 1H,Ar), 7.56 (d,J= 7.9 Hz, 1H,Ar), 7.67 (d,
117.1, 121.8, 122.0, 122.7, 124.7, 130.1, 131.1, 134.1, 139.9, 141.2,

Couplings. J. Org. Chem. 2002, 67, 5553-5566. (c) Hooper, M. W.;
Catalyzed Amination of Five-Membered Heterocyclic Halides. J. Org.
D.;. Jiang, L; Klapars, A.; Buchwald, S. L. Expanding Pd-Catalyzed C-
N Bond-Forming Processes: The First Amidation of Aryl Sulfonates,
Aqueous Amination, and Complementarity with Cu-Catalyzed
Reactions. J. Am. Chem. Soc. 2003, 125, 6653-6655. Nishio, R.;
Wessely, S.; Sugiura, M.; Kobayashi, S. Synthesis of Acridone
Derivatives Using Polymer-Supported Palladium and Scandium
Catalysts. J. Comb. Chem. 2006, 8, 459-461. (f) Ikawa, T.; Barder,
T. E.; Biscoe, M. R.; Buchwald, S. L. PaCAtlyedAmidtions of Aryl
Chlorides Using Monodentate Biaryl Phosphine Ligands: A Kinetic,
Computational, and Synthetic Investigation. J.. Am. Chem. Soc. 2007,

found 240.0841.
ASSOCIATED CONTENT
Supporting

Information

(e)

The Supporting Information is available free of charge at
htps//pubsacsorg/da/10101/Acogrganomet.l.00517.
Materials, methods, optimization tables, synthetic
procedures, characterization of products, and relevant

NMR spectra (PDF)

129, 13001-13007.

1784

https./doiorg/10.1 1021/acs.organomet: 1c00517
Organometallics 2022, 41, 1777-1785



Organometallics

abiasoyOgmonelis

Article

(4) Brown, D. G.; Bostrom,] J. Analysis of Past and Present Synthetic
Methodologies on Medicinal Chemistry: Where Have All the New
Reactions Gone? J. Med. Chem. 2016, 59, 4443-4458.

(12) For selected reviews on reversed micelles, see: (a) Melo, E. E.;
Aires-Barros, M. R.; Cabral, J. M. S. Reverse micelles and protein
biotechnology. Biotechnol. Annu. Rev. 2001, 7, 87-129. (b) Ganguli,
Nonaqueous Polar Solvents in Reverse Micelle Systems. Chem. Rev.
2012, 112, 4569--4602. (a) Das, A.; Yadav, N.; Manchala, S.; Bungla,
M.; Ganguli, A. K. Mechanistic Investigations of Growth of
(13) For selected reviews on reactions enabled by micelles, see:
(a) Zhao, Y. Surface-Cros-Linked Micelles as Multifunctionalized
Organic Nanoparticles for Controlled Release, Light Harvesting, and
Does Organic Chemistry Follow Nature's Lead and "Make the
Switch"? J. Org. Chem. 2017, 82, 2806-2816. (c) Lipshutz, B. H.;
Organic Synthesis: Recent Synthetic Chemistry "in Water. Chem.
Eur. J. 2018, 24, 6672-6695. (a) Serrano-Luginbuh, S.; Ruiz-Mirazo,
Rev. Chem. 2018, 2, 306-327. (e) Cortes-Clerget, M.; Yu,J.; Kincaid,

(5) Schneider, N.; Lowe, D. M.; Sayle, R. A.; Tarselli, M. A.; A. K.; Ganguly, A.; Vaidya, S. Microemulsion-based synthesis of
Landrum, G. A. Big Data from Pharmaceutical Patents: A Computa- nanocrystalline materials. Chem. Soc. Rev. 2010, 39, 474-485.
tional Analysis of Medicinal Chemists' Bread and Butter. J. Med. (c) Correa, N. M.; Silber, J. J.; Riter, R. E.; Levinger, N. E.
Application of a Data-Driven Reaction Classification Model: Anisotropic Nanostructures in Reverse Micelles. ACS Omega 2021,

Chem. 2016, 59, 4385-4402.

(6) (a) Ghiandoni, G. M.; Bodkin, M.J.; Chen, B.; Hristozov, D.;
Wallace, J. E. A.; Webster, J.; Gillet, V. J. Development and
Comparison of an Electronic Lab Notebook and Medicinal Chemistry 6, 1007-1029.
Literature. J. Chem. Inf. Model. 2019, 59, 4167-4187. (b) See also:
htp/Pnstmoveofhware/N420y02/plamar-fvourite-
and other parameters on Buchwald-Hartwig amination, see:
(a) Carole, W. A.; Bradley, J.; Sarwar, M.; Colacot, T. J. Can
Impurities in Palladium Acetate. Org. Lett. 2015, 17, 5472-5475.
(b) Carole, W.. A.; Colacot, T.J. Understanding Palladium Acetate
Identifying and Developing Functional Group Tolerant Catalytic
Dibenzylideneacetone Palladium Complexes in Catalysis. Org. Process
Res. Dev. 2019, 23, 1462-1470. (e) Pentsak, E. O.; Eremin, D. B.;
Gordeev, E. G.; Ananikov, V. P. Phantom Reactivity in Organic and
Catalytic Reactions as a Consequence ofl Microscale Destruction and
Contamination Trapping Effects of Magnetic Stir Bars. ACS Catal.
(8) For our recent study on the application of vegetable oils in
chemical synthesis, see: (a) Gevorgyan, A.; Hopmann, K. H.; Bayer,
A. Lipids as Versatile Solvents for Chemical Synthesis. Green Chem.
2021, 23, 7219-7227. See also: (b) Noppawan, P.; Sangon, S.;
Supanchaiyamat, N.; Hunt, A. J. Vegetable oil as a highly effective
100% bio-based alternative solvent for the one-pot multicomponent
Biginelli reaction. Green Chem. 2021, 23, 5766-5774. (c) Ishizuka, F.;
Stenzel, M. H.; Zetterlund, P. B. Microcapsule Synthesis via RAFT
Photopolymerization in Vegetable Oil as a Green Solvent. J. Polym.
Sci, Part A: Polym. Chem. 2018, 56, 831-839.
(9) For selected studies on sustainable amination reactions, see:
(a) Wagner, P.; Bollenbach, M.; Doebelin, C.; Bihel, F.; Bourguignon,
J-J.; Salome, C.; Schmitt, M. t-BuXPhos: a highly efficient ligand for
Buchwald-Hartwig coupling in water. Green Chem. 2014, 16, 4170-
4178. (b) Sa, S.; Gawande, M. B.; Velhinho, A.; Veiga, J. P.;
Bundaleski, N.; Trigueiro,J; Tolstogouzov, A.; Teodoro, O. M. N.
D.; Zboril, R.; Varma, R. S.; Branco, P. S. Magnetically recyclable
magnetit-palladium (Nanocat-Fe-Pd) nanocatalyst for the Buch-
wald-Hartwig reaction. Green Chem. 2014, 16, 3494-3500.
(c) Petkova, D.; Borlinghaus, N.; Sharma, S.; Kaschel, J.; Lindner,
T.; Klee,J-Jolit, A.; Haller, V.; Heitz, S.; Britze, K.; Dietrich,J.; Braje,
W. M.; Handa, S. Hydrophobic Pockets of HPMC Enable Extremely
Short Reaction Times in Water. ACS Sustainable Chem. Eng. 2020, 8,
12612-12617. (a) Srivastava, A. K.; Sharma, C.; Joshi, R. K.
Cp*Co(III) and Cu(OAc), bimetallic catalysis for Buchwald-type C-
N cross coupling of aryl chlorides and amines under base, inert gas &
solvent-free conditions. Green Chem. 2020, 22, 8248-8253.
(e) Kubota, K.; Takahashi, R.; Uesugi, M.; Ito, H. A Glove-Box-
and Schlenk-Line-Free Protocol for Solid-State C-N Cross-Coupling
Reactions Using Mechanochemistry. ACS Sustainable Chem. Eng.
(10) Triacetin and tributyrin are present in small quantities in
(11) For an overview of natural amphiphiles, see: Foley, P.;
Kermanshahi pour, A.; Beach, E. S.; Zimmerman,J.1 B. Derivation and
synthesis of renewable surfactants. Chem. Soc. Rev. 2012, 41, 1499-

reactions/, last accessed at 30.06.2021.

(7) For selected studies on the influence oft the nature of precatalysts Catalysis. Langmuir 2016, 32, 5703-5713. (b) Lipshutz, B.H. When
Palladium Acetate Lose Its "Saltiness"? Catalytic Activities of the Ghorai, S.; Cortes-Clerget, M. The Hydrophobic Effect Applied to
from a User Perspective. Chem. Eur. J. 2016, 22, 7686-7695. K.; Ostaszewski, R.; Gallou, F.; Walde, P. Soft and dispersed interface-
(c) Richardson, J.; Ruble, J. C.; Love, E. A.; Berritt, S. A Method for rich aqueous systems that promote and guide chemical reactions. Nat.
Reactions: Application to the Buchwall-Hartwig Amination. J. Org. J. R. A.; Walde, P.; Gallou, F.; Lipshutz, B. H. Water as the reaction
Chem. 2017, 82, 3741-3750. (d) Weber, P.; Biafora, A.; Doppiu, A.; medium in organic chemistry: from our worst enemy to our best

Bongard, H.-J.; Kelm, H.; GooBen, L.J J. A Comparative Study of friend. Chem. Sci. 2021, 12, 4237-4266.

2019, 9, 3070-3081.

Recommended by ACS
Facile. Amide Bond Formation with TCFH-NMI in an
Organic Laboratory Course
Oliver W. M. Baldwin, David A. Vosburg, et al.
OCTOBER 03, 2022
JOURNAL OF CHEMICALI EDUCATION
Low- Valent Molybdenum PNP Pincer Complexes as
Catalysts for the Semihydrogenation of Alkynes
Niklas F. Both, Matthias Beller, et al.
MARCH 15, 2022
ORGANOMETALLICS
Acids
Anil Kumar, et al.
JANUARY: 21,2022
THE. JOURNAL OF ORGANIC CHEMISTRY
Orthogonal Scope
Jason P. Hibbard, Ana Bahamonde, et al.
AUGUST 24, 2022
THE. JOUR NAL OF ORGANIC CHEMISTRY
Get More Suggestions >

READI C

READI C

KPF-Mediated Esterification and Amidation of Carboxylic

READE

Mild Sustainable Amide. Alkylation Protocol Enables a Broad

2020, 8, 16577-16582.
vegetable oils and butter.
1518.

READI C

1785

https.Idoiorg/10.1 1021/acs.organomet: 1c00517
Organometallics 2022, 41, 1777-1785
# Define the prompt template
prompt_template = """Extract all 1H-NMR-spectra and the related analyzed molecule out of this XML file: {data}. 
Extract the complete 1-H-NMR-spectra as text. Extract the full IUPAC name of the molecules without abbreviations and details.
Extract the data in the following JSON format:"
    {{"molecules": [
        {{
            "molecule":
            "nmr_spectra":
        }},
        {{
            "molecule":
            "nmr_spectra":
        }}
        ]
    }}"""

# Add the XML data to the promp
prompt = format_prompt(prompt_template, text)

Now we can perform the actual call to the LLM.

# Call the LiteLLM API and print the output and token usage
output, input_tokens, output_tokens = call_litellm(prompt=prompt)
output = json.loads(output)
print("Output: ", output)
print("Input tokens used:", input_tokens, "Output tokens used:", output_tokens)

with open("NMR_data.json", "w", encoding="utf-8") as json_file:
    json.dump(output, json_file, indent=4)
Output:  {'molecules': [{'molecule': '4-methoxy-3,5-bis(trifluoromethyl)aniline', 'nmr_spectra': '3.66 (s, 3H, OMe), 5.66 (br s, 1H, NH), 6.75-6.79 (m, 2H, Ar), 6.93-6.97 (m, 2H, Ar), 7.00 (s, 2H, Ar), 7.07 (s, 1H, Ar)'}, {'molecule': '4-methyl-3,5-bis(trifluoromethyl)aniline', 'nmr_spectra': '2.40 (s, 3H, Me), 5.89 (br s, 1H, NH), 7.09 (d, J = 8.1 Hz, 2H, Ar), 7.23 (d, J = 8.0 Hz, 2H, Ar), 7.32 (s, 3H, Ar)'}, {'molecule': '4-methoxy-3,5-bis(trifluoromethyl)aniline', 'nmr_spectra': '6.11 (br s, 1H, NH), 7.33-7.36 (m, 3H, Ar), 7.44 (s, 3H, Ar), 7.49 (d, J = 7.8 Hz, 1H, Ar)'}, {'molecule': '4-methyl-N-phenyl-3,5-bis(trifluoromethyl)aniline', 'nmr_spectra': '3.37 (s, 3H, NMe), 7.14 (s, 2H, Ar), 7.18-7.20 (m, 2H, Ar), 7.22-7.26 (m, 2H, Ar), 7.40-7.46 (m, 2H, Ar)'}, {'molecule': 'N-benzyl-3,5-bis(trifluoromethyl)aniline', 'nmr_spectra': '4.38-4.46 (m, 3H, CH2/NH), 6.99 (d, J = 8.1 Hz, 2H, Ar), 7.19 (s, 1H, Ar), 7.32-7.42 (m, 5H, Ar)'}, {'molecule': '3-(3,5-bis(trifluoromethyl)phenyl)-1H-indole', 'nmr_spectra': '6.80 (d, J = 3.4 Hz, 1H, Ar), 7.25-7.29 (m, 1H, Ar), 7.30-7.38 (m, 2H, Ar), 7.58 (d, J = 8.2 Hz, 1H, Ar), 7.74 (s, 1H, Ar)'}, {'molecule': '3-(3,5-bis(trifluoromethyl)phenyl)-1H-pyrrole', 'nmr_spectra': '6.44-6.45 (m, 2H, pyrrole), 7.15-7.16 (m, 2H, pyrrole), 7.76 (s, 1H, Ar), 7.82 (s, 2H, Ar)'}, {'molecule': '4-methyl-3,5-bis(trifluoromethyl)phenol', 'nmr_spectra': '2.40 (s, 3H, Me), 6.96-7.00 (m, 2H, Ar), 7.23-7.26 (m, 2H, Ar), 7.38 (s, 2H, Ar)'}, {'molecule': '4-methyl-N-phenylaniline', 'nmr_spectra': '2.43 (s, 6H, 2xMe), 5.54 (br s, 1H, NH), 7.06-7.09 (m, 4H, Ar), 7.18-7.21 (m, 4H, Ar)'}, {'molecule': '4-methyl-N-phenylaniline', 'nmr_spectra': '2.45 (s, 3H, Me), 5.66 (br s, 1H, NH), 7.00-7.04 (m, 1H, Ar), 7.10-7.14 (m, 4H, Ar), 7.22 (d, J = 8.1 Hz, 2H, Ar)'}, {'molecule': '4-methyl-N-phenylaniline', 'nmr_spectra': '2.44 (s, 3H, Me), 5.78 (br s, 1H, NH), 7.16-7.18 (m, 2H, Ar), 7.47-7.51 (m, 1H, Ar), 7.71 (d, J = 8.2 Hz, 1H, Ar), 7.81 (t, J = 8.9 Hz, 2H, Ar)'}, {'molecule': '4-methyl-N-phenylaniline', 'nmr_spectra': '2.43 (s, 3H, Me), 5.85 (br s, 1H, NH), 7.02 (d, J = 8.5 Hz, 2H, Ar), 7.11-7.13 (m, 2H, Ar)'}, {'molecule': '4-methyl-N-phenylaniline', 'nmr_spectra': '2.40 (s, 3H, Me), 5.77 (br s, 1H, NH), 7.06-7.13 (m, 3H, Ar), 7.20-7.22 (m, 2H, Ar), 7.25-7.33 (m, 2H, Ar), 7.39-7.40 (m, 1H, Ar)'}, {'molecule': '4-methyl-N-phenylaniline', 'nmr_spectra': '2.36 (s, 3H, Me), 5.66 (br s, 1H, NH), 6.70 (dd, J = 3.1, 1.5 Hz, 1H, thiophene), 6.92-6.97 (m, 3H, Ar/thiophene), 7.13 (d, J = 8.1 Hz, 2H, Ar), 7.29-7.34 (m, 1H, Ar), 7.56 (d, J = 7.9 Hz, 1H, Ar), 7.67 (d, J = 8.0 Hz, 1H, Ar)'}, {'molecule': '4-methyl-N-phenylaniline', 'nmr_spectra': '2.35 (s, 3H, Me), 5.91 (br s, 1H, NH), 6.76 (s, 1H, Ar), 7.04-7.07 (m, 2H, Ar), 7.14 (d, J = 8.1 Hz, 2H, Ar), 7.20-7.25 (m, 2H, Ar), 7.43 (d, J = 2.3 Hz, 1H, Ar)'}]}
Input tokens used: 15888 Output tokens used: 1785

12.2. Validity check with NMR spectra and SMILES#

Next, we count and compare the hydrogen atoms in the extracted NMR spectra and molecule. We also calculate and compare the number of peaks in the NMR spectra and diastereotopic protons in the molecule. If the numbers do not match, we can assume an error in the extraction.

For doing so, we will need to define a few helper functions. The first one will compute the number of symmetry equivalent hydrogen atoms.

import rdkit
from rdkit import Chem
import numpy as np
def get_number_of_topologically_distinct_atoms(molecule, atomic_number: int = 1):
    """Return the number of unique `element` environments based on environmental topology.

    Args:
        molecule (rdkit.Chem.rdchem.Mol): Molecular instance.
        atomic_number (int, optional): Atomic number. Defaults to 1.

    Returns:
        int: Number of unique environments.
    """
    if atomic_number == 1:
        # add hydrogen
        mol = Chem.AddHs(molecule)
    else:
        mol = molecule

    # Get unique canonical atom rankings
    atom_ranks = list(rdkit.Chem.rdmolfiles.CanonicalRankAtoms(mol, breakTies=False))

    # Select the unique element environments
    atom_ranks = np.array(atom_ranks)

    # Atom indices
    atom_indices = [
        atom.GetIdx() for atom in mol.GetAtoms() if atom.GetAtomicNum() == atomic_number
    ]
    # Count them
    return len(set(atom_ranks[atom_indices]))

If we look at an example, e.g., for benzene c1ccccc1, we can see that the number of topologically distinct hydrogen atoms is 1. In contrast, if we look at ethanol, CCO, we can see that the number of topologically distinct hydrogen atoms is 3.

get_number_of_topologically_distinct_atoms(
    Chem.MolFromSmiles("c1ccccc1"), atomic_number=1
)
1
get_number_of_topologically_distinct_atoms(Chem.MolFromSmiles("CCO"), atomic_number=1)
3

In addition, we need to find the number of peaks in NMR spectra. For this we will use a regular expression.

import re


def count_hydrogens_from_nmr(nmr_spectra: str) -> int:
    pattern2 = r"\b(\d+)H\b"
    matches = re.findall(pattern2, nmr_spectra)
    return sum(int(match) for match in matches)

Using those two functions, we can calculate how often the extraction matches our expectation.

import json
import pandas as pd
import matextract.utils as utils

results = []
pattern = re.compile(r"\d+\.\d+\s*\([^)]*\)")

# Load JSON NMR data
with open("NMR_data.json", "r") as file:
    data = json.load(file)


# Loop over all molecules in data
for molecule_data in data["molecules"]:
    molecule_name = molecule_data["molecule"]
    nmr_spectra = molecule_data["nmr_spectra"]

    print(f"Processing molecule: {molecule_name}")

    # Calculate number of hydrogen atoms in NMR data
    H_number_nmr = count_hydrogens_from_nmr(nmr_spectra)

    # Count the number of peaks in the NMR spectra
    peaks = pattern.findall(nmr_spectra)
    found_number_of_peaks = len(peaks)

    # Convert molecules into SMILES
    mol_smiles = utils.name_to_smiles(molecule_name)
    if mol_smiles:
        # Convert SMILES into RDKit objects
        mol = Chem.MolFromSmiles(mol_smiles)
        if mol:
            expected_number_of_peaks = get_number_of_topologically_distinct_atoms(mol)

        else:
            print(
                f"Failed to create RDKit molecule object from SMILES for {molecule_name}"
            )
            H_number = None
            mol = None
    else:
        print(f"Failed to convert {molecule_name} to SMILES")
        H_number = None
        mol = None

    results.append(
        {
            "peaks": peaks,
            "molecule": molecule_name,
            "H_number_nmr": H_number_nmr,
            "rdkit_mol": mol,
            "mol_smiles": mol_smiles,
            "found_number_of_peaks": found_number_of_peaks,
            "expected_number_of_peaks": expected_number_of_peaks,
        }
    )
Processing molecule: 4-methoxy-3,5-bis(trifluoromethyl)aniline
Processing molecule: 4-methyl-3,5-bis(trifluoromethyl)aniline
Processing molecule: 4-methoxy-3,5-bis(trifluoromethyl)aniline
Processing molecule: 4-methyl-N-phenyl-3,5-bis(trifluoromethyl)aniline
Processing molecule: N-benzyl-3,5-bis(trifluoromethyl)aniline
Processing molecule: 3-(3,5-bis(trifluoromethyl)phenyl)-1H-indole
Processing molecule: 3-(3,5-bis(trifluoromethyl)phenyl)-1H-pyrrole
Processing molecule: 4-methyl-3,5-bis(trifluoromethyl)phenol
Processing molecule: 4-methyl-N-phenylaniline
Processing molecule: 4-methyl-N-phenylaniline
Processing molecule: 4-methyl-N-phenylaniline
Processing molecule: 4-methyl-N-phenylaniline
Processing molecule: 4-methyl-N-phenylaniline
Processing molecule: 4-methyl-N-phenylaniline
Processing molecule: 4-methyl-N-phenylaniline
df = pd.DataFrame(results)

We can now also use some utility form rdkit to visualize the molecules in the dataframe.

df.dropna(subset=["rdkit_mol"], inplace=True)
from rdkit.Chem import PandasTools

PandasTools.AddMoleculeColumnToFrame(df, molCol="rdkit_mol", smilesCol="mol_smiles")
df
peaks molecule H_number_nmr rdkit_mol mol_smiles found_number_of_peaks expected_number_of_peaks
0 [3.66 (s, 3H, OMe), 5.66 (br s, 1H, NH), 6.79 ... 4-methoxy-3,5-bis(trifluoromethyl)aniline 11
Mol
COc1c(C(F)(F)F)cc(N)cc1C(F)(F)F 6 3
1 [2.40 (s, 3H, Me), 5.89 (br s, 1H, NH), 7.09 (... 4-methyl-3,5-bis(trifluoromethyl)aniline 11
Mol
Cc1c(C(F)(F)F)cc(N)cc1C(F)(F)F 5 3
2 [6.11 (br s, 1H, NH), 7.36 (m, 3H, Ar), 7.44 (... 4-methoxy-3,5-bis(trifluoromethyl)aniline 8
Mol
COc1c(C(F)(F)F)cc(N)cc1C(F)(F)F 4 3
3 [3.37 (s, 3H, NMe), 7.14 (s, 2H, Ar), 7.20 (m,... 4-methyl-N-phenyl-3,5-bis(trifluoromethyl)aniline 11
Mol
Cc1c(C(F)(F)F)cc(Nc2ccccc2)cc1C(F)(F)F 5 6
4 [4.46 (m, 3H, CH2/NH), 6.99 (d, J = 8.1 Hz, 2H... N-benzyl-3,5-bis(trifluoromethyl)aniline 11
Mol
FC(F)(F)c1cc(NCc2ccccc2)cc(C(F)(F)F)c1 4 7
5 [6.80 (d, J = 3.4 Hz, 1H, Ar), 7.29 (m, 1H, Ar... 3-(3,5-bis(trifluoromethyl)phenyl)-1H-indole 6
Mol
FC(F)(F)c1cc(-c2c[nH]c3ccccc23)cc(C(F)(F)F)c1 5 8
6 [6.45 (m, 2H, pyrrole), 7.16 (m, 2H, pyrrole),... 3-(3,5-bis(trifluoromethyl)phenyl)-1H-pyrrole 7
Mol
FC(F)(F)c1cc(-c2cc[nH]c2)cc(C(F)(F)F)c1 4 6
7 [2.40 (s, 3H, Me), 7.00 (m, 2H, Ar), 7.26 (m, ... 4-methyl-3,5-bis(trifluoromethyl)phenol 9
Mol
Cc1c(C(F)(F)F)cc(O)cc1C(F)(F)F 4 3
8 [2.43 (s, 6H, 2xMe), 5.54 (br s, 1H, NH), 7.09... 4-methyl-N-phenylaniline 15
Mol
Cc1ccc(Nc2ccccc2)cc1 4 7
9 [2.45 (s, 3H, Me), 5.66 (br s, 1H, NH), 7.04 (... 4-methyl-N-phenylaniline 11
Mol
Cc1ccc(Nc2ccccc2)cc1 5 7
10 [2.44 (s, 3H, Me), 5.78 (br s, 1H, NH), 7.18 (... 4-methyl-N-phenylaniline 10
Mol
Cc1ccc(Nc2ccccc2)cc1 6 7
11 [2.43 (s, 3H, Me), 5.85 (br s, 1H, NH), 7.02 (... 4-methyl-N-phenylaniline 8
Mol
Cc1ccc(Nc2ccccc2)cc1 4 7
12 [2.40 (s, 3H, Me), 5.77 (br s, 1H, NH), 7.13 (... 4-methyl-N-phenylaniline 12
Mol
Cc1ccc(Nc2ccccc2)cc1 6 7
13 [2.36 (s, 3H, Me), 5.66 (br s, 1H, NH), 6.70 (... 4-methyl-N-phenylaniline 13
Mol
Cc1ccc(Nc2ccccc2)cc1 8 7
14 [2.35 (s, 3H, Me), 5.91 (br s, 1H, NH), 6.76 (... 4-methyl-N-phenylaniline 12
Mol
Cc1ccc(Nc2ccccc2)cc1 7 7

We see that in only one case the number of expected peaks matches the number of observed peak. Interestingly, the number of peaks vary for the different samples that the model extract for the 4-methyl-N-phenylaniline, and one of them is the only compound for which both numbers, expected and extracted match.

Since hydrogen atoms with a very similar environment could appear in an NMR spectra as one overlapped peak, the calculated number of peaks could deviate from the observed one. To meet this challenge, one could instead of only considering the symmetry equivalent environments, perform a basic simulation of the NMR spectrum.

12.3. Bibliography#

[GHB21]

Ashot Gevorgyan, Kathrin H. Hopmann, and Annette Bayer. Improved buchwald–hartwig amination by the use of lipids and lipid impurities. Organometallics, 41(14):1777–1785, October 2021. URL: http://dx.doi.org/10.1021/acs.organomet.1c00517, doi:10.1021/acs.organomet.1c00517.

[PG23]

Luc Patiny and Guillaume Godin. Automatic extraction of fair data from publications using llm. ChemRxiv preprint, December 2023. URL: http://dx.doi.org/10.26434/chemrxiv-2023-05v1b-v2, doi:10.26434/chemrxiv-2023-05v1b-v2.