Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
freebsd
GitHub Repository: freebsd/freebsd-src
Path: blob/main/contrib/libyaml/src/scanner.c
39536 views
1
2
/*
3
* Introduction
4
* ************
5
*
6
* The following notes assume that you are familiar with the YAML specification
7
* (http://yaml.org/spec/cvs/current.html). We mostly follow it, although in
8
* some cases we are less restrictive that it requires.
9
*
10
* The process of transforming a YAML stream into a sequence of events is
11
* divided on two steps: Scanning and Parsing.
12
*
13
* The Scanner transforms the input stream into a sequence of tokens, while the
14
* parser transform the sequence of tokens produced by the Scanner into a
15
* sequence of parsing events.
16
*
17
* The Scanner is rather clever and complicated. The Parser, on the contrary,
18
* is a straightforward implementation of a recursive-descendant parser (or,
19
* LL(1) parser, as it is usually called).
20
*
21
* Actually there are two issues of Scanning that might be called "clever", the
22
* rest is quite straightforward. The issues are "block collection start" and
23
* "simple keys". Both issues are explained below in details.
24
*
25
* Here the Scanning step is explained and implemented. We start with the list
26
* of all the tokens produced by the Scanner together with short descriptions.
27
*
28
* Now, tokens:
29
*
30
* STREAM-START(encoding) # The stream start.
31
* STREAM-END # The stream end.
32
* VERSION-DIRECTIVE(major,minor) # The '%YAML' directive.
33
* TAG-DIRECTIVE(handle,prefix) # The '%TAG' directive.
34
* DOCUMENT-START # '---'
35
* DOCUMENT-END # '...'
36
* BLOCK-SEQUENCE-START # Indentation increase denoting a block
37
* BLOCK-MAPPING-START # sequence or a block mapping.
38
* BLOCK-END # Indentation decrease.
39
* FLOW-SEQUENCE-START # '['
40
* FLOW-SEQUENCE-END # ']'
41
* FLOW-MAPPING-START # '{'
42
* FLOW-MAPPING-END # '}'
43
* BLOCK-ENTRY # '-'
44
* FLOW-ENTRY # ','
45
* KEY # '?' or nothing (simple keys).
46
* VALUE # ':'
47
* ALIAS(anchor) # '*anchor'
48
* ANCHOR(anchor) # '&anchor'
49
* TAG(handle,suffix) # '!handle!suffix'
50
* SCALAR(value,style) # A scalar.
51
*
52
* The following two tokens are "virtual" tokens denoting the beginning and the
53
* end of the stream:
54
*
55
* STREAM-START(encoding)
56
* STREAM-END
57
*
58
* We pass the information about the input stream encoding with the
59
* STREAM-START token.
60
*
61
* The next two tokens are responsible for tags:
62
*
63
* VERSION-DIRECTIVE(major,minor)
64
* TAG-DIRECTIVE(handle,prefix)
65
*
66
* Example:
67
*
68
* %YAML 1.1
69
* %TAG ! !foo
70
* %TAG !yaml! tag:yaml.org,2002:
71
* ---
72
*
73
* The corresponding sequence of tokens:
74
*
75
* STREAM-START(utf-8)
76
* VERSION-DIRECTIVE(1,1)
77
* TAG-DIRECTIVE("!","!foo")
78
* TAG-DIRECTIVE("!yaml","tag:yaml.org,2002:")
79
* DOCUMENT-START
80
* STREAM-END
81
*
82
* Note that the VERSION-DIRECTIVE and TAG-DIRECTIVE tokens occupy a whole
83
* line.
84
*
85
* The document start and end indicators are represented by:
86
*
87
* DOCUMENT-START
88
* DOCUMENT-END
89
*
90
* Note that if a YAML stream contains an implicit document (without '---'
91
* and '...' indicators), no DOCUMENT-START and DOCUMENT-END tokens will be
92
* produced.
93
*
94
* In the following examples, we present whole documents together with the
95
* produced tokens.
96
*
97
* 1. An implicit document:
98
*
99
* 'a scalar'
100
*
101
* Tokens:
102
*
103
* STREAM-START(utf-8)
104
* SCALAR("a scalar",single-quoted)
105
* STREAM-END
106
*
107
* 2. An explicit document:
108
*
109
* ---
110
* 'a scalar'
111
* ...
112
*
113
* Tokens:
114
*
115
* STREAM-START(utf-8)
116
* DOCUMENT-START
117
* SCALAR("a scalar",single-quoted)
118
* DOCUMENT-END
119
* STREAM-END
120
*
121
* 3. Several documents in a stream:
122
*
123
* 'a scalar'
124
* ---
125
* 'another scalar'
126
* ---
127
* 'yet another scalar'
128
*
129
* Tokens:
130
*
131
* STREAM-START(utf-8)
132
* SCALAR("a scalar",single-quoted)
133
* DOCUMENT-START
134
* SCALAR("another scalar",single-quoted)
135
* DOCUMENT-START
136
* SCALAR("yet another scalar",single-quoted)
137
* STREAM-END
138
*
139
* We have already introduced the SCALAR token above. The following tokens are
140
* used to describe aliases, anchors, tag, and scalars:
141
*
142
* ALIAS(anchor)
143
* ANCHOR(anchor)
144
* TAG(handle,suffix)
145
* SCALAR(value,style)
146
*
147
* The following series of examples illustrate the usage of these tokens:
148
*
149
* 1. A recursive sequence:
150
*
151
* &A [ *A ]
152
*
153
* Tokens:
154
*
155
* STREAM-START(utf-8)
156
* ANCHOR("A")
157
* FLOW-SEQUENCE-START
158
* ALIAS("A")
159
* FLOW-SEQUENCE-END
160
* STREAM-END
161
*
162
* 2. A tagged scalar:
163
*
164
* !!float "3.14" # A good approximation.
165
*
166
* Tokens:
167
*
168
* STREAM-START(utf-8)
169
* TAG("!!","float")
170
* SCALAR("3.14",double-quoted)
171
* STREAM-END
172
*
173
* 3. Various scalar styles:
174
*
175
* --- # Implicit empty plain scalars do not produce tokens.
176
* --- a plain scalar
177
* --- 'a single-quoted scalar'
178
* --- "a double-quoted scalar"
179
* --- |-
180
* a literal scalar
181
* --- >-
182
* a folded
183
* scalar
184
*
185
* Tokens:
186
*
187
* STREAM-START(utf-8)
188
* DOCUMENT-START
189
* DOCUMENT-START
190
* SCALAR("a plain scalar",plain)
191
* DOCUMENT-START
192
* SCALAR("a single-quoted scalar",single-quoted)
193
* DOCUMENT-START
194
* SCALAR("a double-quoted scalar",double-quoted)
195
* DOCUMENT-START
196
* SCALAR("a literal scalar",literal)
197
* DOCUMENT-START
198
* SCALAR("a folded scalar",folded)
199
* STREAM-END
200
*
201
* Now it's time to review collection-related tokens. We will start with
202
* flow collections:
203
*
204
* FLOW-SEQUENCE-START
205
* FLOW-SEQUENCE-END
206
* FLOW-MAPPING-START
207
* FLOW-MAPPING-END
208
* FLOW-ENTRY
209
* KEY
210
* VALUE
211
*
212
* The tokens FLOW-SEQUENCE-START, FLOW-SEQUENCE-END, FLOW-MAPPING-START, and
213
* FLOW-MAPPING-END represent the indicators '[', ']', '{', and '}'
214
* correspondingly. FLOW-ENTRY represent the ',' indicator. Finally the
215
* indicators '?' and ':', which are used for denoting mapping keys and values,
216
* are represented by the KEY and VALUE tokens.
217
*
218
* The following examples show flow collections:
219
*
220
* 1. A flow sequence:
221
*
222
* [item 1, item 2, item 3]
223
*
224
* Tokens:
225
*
226
* STREAM-START(utf-8)
227
* FLOW-SEQUENCE-START
228
* SCALAR("item 1",plain)
229
* FLOW-ENTRY
230
* SCALAR("item 2",plain)
231
* FLOW-ENTRY
232
* SCALAR("item 3",plain)
233
* FLOW-SEQUENCE-END
234
* STREAM-END
235
*
236
* 2. A flow mapping:
237
*
238
* {
239
* a simple key: a value, # Note that the KEY token is produced.
240
* ? a complex key: another value,
241
* }
242
*
243
* Tokens:
244
*
245
* STREAM-START(utf-8)
246
* FLOW-MAPPING-START
247
* KEY
248
* SCALAR("a simple key",plain)
249
* VALUE
250
* SCALAR("a value",plain)
251
* FLOW-ENTRY
252
* KEY
253
* SCALAR("a complex key",plain)
254
* VALUE
255
* SCALAR("another value",plain)
256
* FLOW-ENTRY
257
* FLOW-MAPPING-END
258
* STREAM-END
259
*
260
* A simple key is a key which is not denoted by the '?' indicator. Note that
261
* the Scanner still produce the KEY token whenever it encounters a simple key.
262
*
263
* For scanning block collections, the following tokens are used (note that we
264
* repeat KEY and VALUE here):
265
*
266
* BLOCK-SEQUENCE-START
267
* BLOCK-MAPPING-START
268
* BLOCK-END
269
* BLOCK-ENTRY
270
* KEY
271
* VALUE
272
*
273
* The tokens BLOCK-SEQUENCE-START and BLOCK-MAPPING-START denote indentation
274
* increase that precedes a block collection (cf. the INDENT token in Python).
275
* The token BLOCK-END denote indentation decrease that ends a block collection
276
* (cf. the DEDENT token in Python). However YAML has some syntax pecularities
277
* that makes detections of these tokens more complex.
278
*
279
* The tokens BLOCK-ENTRY, KEY, and VALUE are used to represent the indicators
280
* '-', '?', and ':' correspondingly.
281
*
282
* The following examples show how the tokens BLOCK-SEQUENCE-START,
283
* BLOCK-MAPPING-START, and BLOCK-END are emitted by the Scanner:
284
*
285
* 1. Block sequences:
286
*
287
* - item 1
288
* - item 2
289
* -
290
* - item 3.1
291
* - item 3.2
292
* -
293
* key 1: value 1
294
* key 2: value 2
295
*
296
* Tokens:
297
*
298
* STREAM-START(utf-8)
299
* BLOCK-SEQUENCE-START
300
* BLOCK-ENTRY
301
* SCALAR("item 1",plain)
302
* BLOCK-ENTRY
303
* SCALAR("item 2",plain)
304
* BLOCK-ENTRY
305
* BLOCK-SEQUENCE-START
306
* BLOCK-ENTRY
307
* SCALAR("item 3.1",plain)
308
* BLOCK-ENTRY
309
* SCALAR("item 3.2",plain)
310
* BLOCK-END
311
* BLOCK-ENTRY
312
* BLOCK-MAPPING-START
313
* KEY
314
* SCALAR("key 1",plain)
315
* VALUE
316
* SCALAR("value 1",plain)
317
* KEY
318
* SCALAR("key 2",plain)
319
* VALUE
320
* SCALAR("value 2",plain)
321
* BLOCK-END
322
* BLOCK-END
323
* STREAM-END
324
*
325
* 2. Block mappings:
326
*
327
* a simple key: a value # The KEY token is produced here.
328
* ? a complex key
329
* : another value
330
* a mapping:
331
* key 1: value 1
332
* key 2: value 2
333
* a sequence:
334
* - item 1
335
* - item 2
336
*
337
* Tokens:
338
*
339
* STREAM-START(utf-8)
340
* BLOCK-MAPPING-START
341
* KEY
342
* SCALAR("a simple key",plain)
343
* VALUE
344
* SCALAR("a value",plain)
345
* KEY
346
* SCALAR("a complex key",plain)
347
* VALUE
348
* SCALAR("another value",plain)
349
* KEY
350
* SCALAR("a mapping",plain)
351
* VALUE
352
* BLOCK-MAPPING-START
353
* KEY
354
* SCALAR("key 1",plain)
355
* VALUE
356
* SCALAR("value 1",plain)
357
* KEY
358
* SCALAR("key 2",plain)
359
* VALUE
360
* SCALAR("value 2",plain)
361
* BLOCK-END
362
* KEY
363
* SCALAR("a sequence",plain)
364
* VALUE
365
* BLOCK-SEQUENCE-START
366
* BLOCK-ENTRY
367
* SCALAR("item 1",plain)
368
* BLOCK-ENTRY
369
* SCALAR("item 2",plain)
370
* BLOCK-END
371
* BLOCK-END
372
* STREAM-END
373
*
374
* YAML does not always require to start a new block collection from a new
375
* line. If the current line contains only '-', '?', and ':' indicators, a new
376
* block collection may start at the current line. The following examples
377
* illustrate this case:
378
*
379
* 1. Collections in a sequence:
380
*
381
* - - item 1
382
* - item 2
383
* - key 1: value 1
384
* key 2: value 2
385
* - ? complex key
386
* : complex value
387
*
388
* Tokens:
389
*
390
* STREAM-START(utf-8)
391
* BLOCK-SEQUENCE-START
392
* BLOCK-ENTRY
393
* BLOCK-SEQUENCE-START
394
* BLOCK-ENTRY
395
* SCALAR("item 1",plain)
396
* BLOCK-ENTRY
397
* SCALAR("item 2",plain)
398
* BLOCK-END
399
* BLOCK-ENTRY
400
* BLOCK-MAPPING-START
401
* KEY
402
* SCALAR("key 1",plain)
403
* VALUE
404
* SCALAR("value 1",plain)
405
* KEY
406
* SCALAR("key 2",plain)
407
* VALUE
408
* SCALAR("value 2",plain)
409
* BLOCK-END
410
* BLOCK-ENTRY
411
* BLOCK-MAPPING-START
412
* KEY
413
* SCALAR("complex key")
414
* VALUE
415
* SCALAR("complex value")
416
* BLOCK-END
417
* BLOCK-END
418
* STREAM-END
419
*
420
* 2. Collections in a mapping:
421
*
422
* ? a sequence
423
* : - item 1
424
* - item 2
425
* ? a mapping
426
* : key 1: value 1
427
* key 2: value 2
428
*
429
* Tokens:
430
*
431
* STREAM-START(utf-8)
432
* BLOCK-MAPPING-START
433
* KEY
434
* SCALAR("a sequence",plain)
435
* VALUE
436
* BLOCK-SEQUENCE-START
437
* BLOCK-ENTRY
438
* SCALAR("item 1",plain)
439
* BLOCK-ENTRY
440
* SCALAR("item 2",plain)
441
* BLOCK-END
442
* KEY
443
* SCALAR("a mapping",plain)
444
* VALUE
445
* BLOCK-MAPPING-START
446
* KEY
447
* SCALAR("key 1",plain)
448
* VALUE
449
* SCALAR("value 1",plain)
450
* KEY
451
* SCALAR("key 2",plain)
452
* VALUE
453
* SCALAR("value 2",plain)
454
* BLOCK-END
455
* BLOCK-END
456
* STREAM-END
457
*
458
* YAML also permits non-indented sequences if they are included into a block
459
* mapping. In this case, the token BLOCK-SEQUENCE-START is not produced:
460
*
461
* key:
462
* - item 1 # BLOCK-SEQUENCE-START is NOT produced here.
463
* - item 2
464
*
465
* Tokens:
466
*
467
* STREAM-START(utf-8)
468
* BLOCK-MAPPING-START
469
* KEY
470
* SCALAR("key",plain)
471
* VALUE
472
* BLOCK-ENTRY
473
* SCALAR("item 1",plain)
474
* BLOCK-ENTRY
475
* SCALAR("item 2",plain)
476
* BLOCK-END
477
*/
478
479
#include "yaml_private.h"
480
481
/*
482
* Ensure that the buffer contains the required number of characters.
483
* Return 1 on success, 0 on failure (reader error or memory error).
484
*/
485
486
#define CACHE(parser,length) \
487
(parser->unread >= (length) \
488
? 1 \
489
: yaml_parser_update_buffer(parser, (length)))
490
491
/*
492
* Advance the buffer pointer.
493
*/
494
495
#define SKIP(parser) \
496
(parser->mark.index ++, \
497
parser->mark.column ++, \
498
parser->unread --, \
499
parser->buffer.pointer += WIDTH(parser->buffer))
500
501
#define SKIP_LINE(parser) \
502
(IS_CRLF(parser->buffer) ? \
503
(parser->mark.index += 2, \
504
parser->mark.column = 0, \
505
parser->mark.line ++, \
506
parser->unread -= 2, \
507
parser->buffer.pointer += 2) : \
508
IS_BREAK(parser->buffer) ? \
509
(parser->mark.index ++, \
510
parser->mark.column = 0, \
511
parser->mark.line ++, \
512
parser->unread --, \
513
parser->buffer.pointer += WIDTH(parser->buffer)) : 0)
514
515
/*
516
* Copy a character to a string buffer and advance pointers.
517
*/
518
519
#define READ(parser,string) \
520
(STRING_EXTEND(parser,string) ? \
521
(COPY(string,parser->buffer), \
522
parser->mark.index ++, \
523
parser->mark.column ++, \
524
parser->unread --, \
525
1) : 0)
526
527
/*
528
* Copy a line break character to a string buffer and advance pointers.
529
*/
530
531
#define READ_LINE(parser,string) \
532
(STRING_EXTEND(parser,string) ? \
533
(((CHECK_AT(parser->buffer,'\r',0) \
534
&& CHECK_AT(parser->buffer,'\n',1)) ? /* CR LF -> LF */ \
535
(*((string).pointer++) = (yaml_char_t) '\n', \
536
parser->buffer.pointer += 2, \
537
parser->mark.index += 2, \
538
parser->mark.column = 0, \
539
parser->mark.line ++, \
540
parser->unread -= 2) : \
541
(CHECK_AT(parser->buffer,'\r',0) \
542
|| CHECK_AT(parser->buffer,'\n',0)) ? /* CR|LF -> LF */ \
543
(*((string).pointer++) = (yaml_char_t) '\n', \
544
parser->buffer.pointer ++, \
545
parser->mark.index ++, \
546
parser->mark.column = 0, \
547
parser->mark.line ++, \
548
parser->unread --) : \
549
(CHECK_AT(parser->buffer,'\xC2',0) \
550
&& CHECK_AT(parser->buffer,'\x85',1)) ? /* NEL -> LF */ \
551
(*((string).pointer++) = (yaml_char_t) '\n', \
552
parser->buffer.pointer += 2, \
553
parser->mark.index ++, \
554
parser->mark.column = 0, \
555
parser->mark.line ++, \
556
parser->unread --) : \
557
(CHECK_AT(parser->buffer,'\xE2',0) && \
558
CHECK_AT(parser->buffer,'\x80',1) && \
559
(CHECK_AT(parser->buffer,'\xA8',2) || \
560
CHECK_AT(parser->buffer,'\xA9',2))) ? /* LS|PS -> LS|PS */ \
561
(*((string).pointer++) = *(parser->buffer.pointer++), \
562
*((string).pointer++) = *(parser->buffer.pointer++), \
563
*((string).pointer++) = *(parser->buffer.pointer++), \
564
parser->mark.index ++, \
565
parser->mark.column = 0, \
566
parser->mark.line ++, \
567
parser->unread --) : 0), \
568
1) : 0)
569
570
/*
571
* Public API declarations.
572
*/
573
574
YAML_DECLARE(int)
575
yaml_parser_scan(yaml_parser_t *parser, yaml_token_t *token);
576
577
/*
578
* Error handling.
579
*/
580
581
static int
582
yaml_parser_set_scanner_error(yaml_parser_t *parser, const char *context,
583
yaml_mark_t context_mark, const char *problem);
584
585
/*
586
* High-level token API.
587
*/
588
589
YAML_DECLARE(int)
590
yaml_parser_fetch_more_tokens(yaml_parser_t *parser);
591
592
static int
593
yaml_parser_fetch_next_token(yaml_parser_t *parser);
594
595
/*
596
* Potential simple keys.
597
*/
598
599
static int
600
yaml_parser_stale_simple_keys(yaml_parser_t *parser);
601
602
static int
603
yaml_parser_save_simple_key(yaml_parser_t *parser);
604
605
static int
606
yaml_parser_remove_simple_key(yaml_parser_t *parser);
607
608
static int
609
yaml_parser_increase_flow_level(yaml_parser_t *parser);
610
611
static int
612
yaml_parser_decrease_flow_level(yaml_parser_t *parser);
613
614
/*
615
* Indentation treatment.
616
*/
617
618
static int
619
yaml_parser_roll_indent(yaml_parser_t *parser, ptrdiff_t column,
620
ptrdiff_t number, yaml_token_type_t type, yaml_mark_t mark);
621
622
static int
623
yaml_parser_unroll_indent(yaml_parser_t *parser, ptrdiff_t column);
624
625
/*
626
* Token fetchers.
627
*/
628
629
static int
630
yaml_parser_fetch_stream_start(yaml_parser_t *parser);
631
632
static int
633
yaml_parser_fetch_stream_end(yaml_parser_t *parser);
634
635
static int
636
yaml_parser_fetch_directive(yaml_parser_t *parser);
637
638
static int
639
yaml_parser_fetch_document_indicator(yaml_parser_t *parser,
640
yaml_token_type_t type);
641
642
static int
643
yaml_parser_fetch_flow_collection_start(yaml_parser_t *parser,
644
yaml_token_type_t type);
645
646
static int
647
yaml_parser_fetch_flow_collection_end(yaml_parser_t *parser,
648
yaml_token_type_t type);
649
650
static int
651
yaml_parser_fetch_flow_entry(yaml_parser_t *parser);
652
653
static int
654
yaml_parser_fetch_block_entry(yaml_parser_t *parser);
655
656
static int
657
yaml_parser_fetch_key(yaml_parser_t *parser);
658
659
static int
660
yaml_parser_fetch_value(yaml_parser_t *parser);
661
662
static int
663
yaml_parser_fetch_anchor(yaml_parser_t *parser, yaml_token_type_t type);
664
665
static int
666
yaml_parser_fetch_tag(yaml_parser_t *parser);
667
668
static int
669
yaml_parser_fetch_block_scalar(yaml_parser_t *parser, int literal);
670
671
static int
672
yaml_parser_fetch_flow_scalar(yaml_parser_t *parser, int single);
673
674
static int
675
yaml_parser_fetch_plain_scalar(yaml_parser_t *parser);
676
677
/*
678
* Token scanners.
679
*/
680
681
static int
682
yaml_parser_scan_to_next_token(yaml_parser_t *parser);
683
684
static int
685
yaml_parser_scan_directive(yaml_parser_t *parser, yaml_token_t *token);
686
687
static int
688
yaml_parser_scan_directive_name(yaml_parser_t *parser,
689
yaml_mark_t start_mark, yaml_char_t **name);
690
691
static int
692
yaml_parser_scan_version_directive_value(yaml_parser_t *parser,
693
yaml_mark_t start_mark, int *major, int *minor);
694
695
static int
696
yaml_parser_scan_version_directive_number(yaml_parser_t *parser,
697
yaml_mark_t start_mark, int *number);
698
699
static int
700
yaml_parser_scan_tag_directive_value(yaml_parser_t *parser,
701
yaml_mark_t mark, yaml_char_t **handle, yaml_char_t **prefix);
702
703
static int
704
yaml_parser_scan_anchor(yaml_parser_t *parser, yaml_token_t *token,
705
yaml_token_type_t type);
706
707
static int
708
yaml_parser_scan_tag(yaml_parser_t *parser, yaml_token_t *token);
709
710
static int
711
yaml_parser_scan_tag_handle(yaml_parser_t *parser, int directive,
712
yaml_mark_t start_mark, yaml_char_t **handle);
713
714
static int
715
yaml_parser_scan_tag_uri(yaml_parser_t *parser, int uri_char, int directive,
716
yaml_char_t *head, yaml_mark_t start_mark, yaml_char_t **uri);
717
718
static int
719
yaml_parser_scan_uri_escapes(yaml_parser_t *parser, int directive,
720
yaml_mark_t start_mark, yaml_string_t *string);
721
722
static int
723
yaml_parser_scan_block_scalar(yaml_parser_t *parser, yaml_token_t *token,
724
int literal);
725
726
static int
727
yaml_parser_scan_block_scalar_breaks(yaml_parser_t *parser,
728
int *indent, yaml_string_t *breaks,
729
yaml_mark_t start_mark, yaml_mark_t *end_mark);
730
731
static int
732
yaml_parser_scan_flow_scalar(yaml_parser_t *parser, yaml_token_t *token,
733
int single);
734
735
static int
736
yaml_parser_scan_plain_scalar(yaml_parser_t *parser, yaml_token_t *token);
737
738
/*
739
* Get the next token.
740
*/
741
742
YAML_DECLARE(int)
743
yaml_parser_scan(yaml_parser_t *parser, yaml_token_t *token)
744
{
745
assert(parser); /* Non-NULL parser object is expected. */
746
assert(token); /* Non-NULL token object is expected. */
747
748
/* Erase the token object. */
749
750
memset(token, 0, sizeof(yaml_token_t));
751
752
/* No tokens after STREAM-END or error. */
753
754
if (parser->stream_end_produced || parser->error) {
755
return 1;
756
}
757
758
/* Ensure that the tokens queue contains enough tokens. */
759
760
if (!parser->token_available) {
761
if (!yaml_parser_fetch_more_tokens(parser))
762
return 0;
763
}
764
765
/* Fetch the next token from the queue. */
766
767
*token = DEQUEUE(parser, parser->tokens);
768
parser->token_available = 0;
769
parser->tokens_parsed ++;
770
771
if (token->type == YAML_STREAM_END_TOKEN) {
772
parser->stream_end_produced = 1;
773
}
774
775
return 1;
776
}
777
778
/*
779
* Set the scanner error and return 0.
780
*/
781
782
static int
783
yaml_parser_set_scanner_error(yaml_parser_t *parser, const char *context,
784
yaml_mark_t context_mark, const char *problem)
785
{
786
parser->error = YAML_SCANNER_ERROR;
787
parser->context = context;
788
parser->context_mark = context_mark;
789
parser->problem = problem;
790
parser->problem_mark = parser->mark;
791
792
return 0;
793
}
794
795
/*
796
* Ensure that the tokens queue contains at least one token which can be
797
* returned to the Parser.
798
*/
799
800
YAML_DECLARE(int)
801
yaml_parser_fetch_more_tokens(yaml_parser_t *parser)
802
{
803
int need_more_tokens;
804
805
/* While we need more tokens to fetch, do it. */
806
807
while (1)
808
{
809
/*
810
* Check if we really need to fetch more tokens.
811
*/
812
813
need_more_tokens = 0;
814
815
if (parser->tokens.head == parser->tokens.tail)
816
{
817
/* Queue is empty. */
818
819
need_more_tokens = 1;
820
}
821
else
822
{
823
yaml_simple_key_t *simple_key;
824
825
/* Check if any potential simple key may occupy the head position. */
826
827
if (!yaml_parser_stale_simple_keys(parser))
828
return 0;
829
830
for (simple_key = parser->simple_keys.start;
831
simple_key != parser->simple_keys.top; simple_key++) {
832
if (simple_key->possible
833
&& simple_key->token_number == parser->tokens_parsed) {
834
need_more_tokens = 1;
835
break;
836
}
837
}
838
}
839
840
/* We are finished. */
841
842
if (!need_more_tokens)
843
break;
844
845
/* Fetch the next token. */
846
847
if (!yaml_parser_fetch_next_token(parser))
848
return 0;
849
}
850
851
parser->token_available = 1;
852
853
return 1;
854
}
855
856
/*
857
* The dispatcher for token fetchers.
858
*/
859
860
static int
861
yaml_parser_fetch_next_token(yaml_parser_t *parser)
862
{
863
/* Ensure that the buffer is initialized. */
864
865
if (!CACHE(parser, 1))
866
return 0;
867
868
/* Check if we just started scanning. Fetch STREAM-START then. */
869
870
if (!parser->stream_start_produced)
871
return yaml_parser_fetch_stream_start(parser);
872
873
/* Eat whitespaces and comments until we reach the next token. */
874
875
if (!yaml_parser_scan_to_next_token(parser))
876
return 0;
877
878
/* Remove obsolete potential simple keys. */
879
880
if (!yaml_parser_stale_simple_keys(parser))
881
return 0;
882
883
/* Check the indentation level against the current column. */
884
885
if (!yaml_parser_unroll_indent(parser, parser->mark.column))
886
return 0;
887
888
/*
889
* Ensure that the buffer contains at least 4 characters. 4 is the length
890
* of the longest indicators ('--- ' and '... ').
891
*/
892
893
if (!CACHE(parser, 4))
894
return 0;
895
896
/* Is it the end of the stream? */
897
898
if (IS_Z(parser->buffer))
899
return yaml_parser_fetch_stream_end(parser);
900
901
/* Is it a directive? */
902
903
if (parser->mark.column == 0 && CHECK(parser->buffer, '%'))
904
return yaml_parser_fetch_directive(parser);
905
906
/* Is it the document start indicator? */
907
908
if (parser->mark.column == 0
909
&& CHECK_AT(parser->buffer, '-', 0)
910
&& CHECK_AT(parser->buffer, '-', 1)
911
&& CHECK_AT(parser->buffer, '-', 2)
912
&& IS_BLANKZ_AT(parser->buffer, 3))
913
return yaml_parser_fetch_document_indicator(parser,
914
YAML_DOCUMENT_START_TOKEN);
915
916
/* Is it the document end indicator? */
917
918
if (parser->mark.column == 0
919
&& CHECK_AT(parser->buffer, '.', 0)
920
&& CHECK_AT(parser->buffer, '.', 1)
921
&& CHECK_AT(parser->buffer, '.', 2)
922
&& IS_BLANKZ_AT(parser->buffer, 3))
923
return yaml_parser_fetch_document_indicator(parser,
924
YAML_DOCUMENT_END_TOKEN);
925
926
/* Is it the flow sequence start indicator? */
927
928
if (CHECK(parser->buffer, '['))
929
return yaml_parser_fetch_flow_collection_start(parser,
930
YAML_FLOW_SEQUENCE_START_TOKEN);
931
932
/* Is it the flow mapping start indicator? */
933
934
if (CHECK(parser->buffer, '{'))
935
return yaml_parser_fetch_flow_collection_start(parser,
936
YAML_FLOW_MAPPING_START_TOKEN);
937
938
/* Is it the flow sequence end indicator? */
939
940
if (CHECK(parser->buffer, ']'))
941
return yaml_parser_fetch_flow_collection_end(parser,
942
YAML_FLOW_SEQUENCE_END_TOKEN);
943
944
/* Is it the flow mapping end indicator? */
945
946
if (CHECK(parser->buffer, '}'))
947
return yaml_parser_fetch_flow_collection_end(parser,
948
YAML_FLOW_MAPPING_END_TOKEN);
949
950
/* Is it the flow entry indicator? */
951
952
if (CHECK(parser->buffer, ','))
953
return yaml_parser_fetch_flow_entry(parser);
954
955
/* Is it the block entry indicator? */
956
957
if (CHECK(parser->buffer, '-') && IS_BLANKZ_AT(parser->buffer, 1))
958
return yaml_parser_fetch_block_entry(parser);
959
960
/* Is it the key indicator? */
961
962
if (CHECK(parser->buffer, '?')
963
&& (parser->flow_level || IS_BLANKZ_AT(parser->buffer, 1)))
964
return yaml_parser_fetch_key(parser);
965
966
/* Is it the value indicator? */
967
968
if (CHECK(parser->buffer, ':')
969
&& (parser->flow_level || IS_BLANKZ_AT(parser->buffer, 1)))
970
return yaml_parser_fetch_value(parser);
971
972
/* Is it an alias? */
973
974
if (CHECK(parser->buffer, '*'))
975
return yaml_parser_fetch_anchor(parser, YAML_ALIAS_TOKEN);
976
977
/* Is it an anchor? */
978
979
if (CHECK(parser->buffer, '&'))
980
return yaml_parser_fetch_anchor(parser, YAML_ANCHOR_TOKEN);
981
982
/* Is it a tag? */
983
984
if (CHECK(parser->buffer, '!'))
985
return yaml_parser_fetch_tag(parser);
986
987
/* Is it a literal scalar? */
988
989
if (CHECK(parser->buffer, '|') && !parser->flow_level)
990
return yaml_parser_fetch_block_scalar(parser, 1);
991
992
/* Is it a folded scalar? */
993
994
if (CHECK(parser->buffer, '>') && !parser->flow_level)
995
return yaml_parser_fetch_block_scalar(parser, 0);
996
997
/* Is it a single-quoted scalar? */
998
999
if (CHECK(parser->buffer, '\''))
1000
return yaml_parser_fetch_flow_scalar(parser, 1);
1001
1002
/* Is it a double-quoted scalar? */
1003
1004
if (CHECK(parser->buffer, '"'))
1005
return yaml_parser_fetch_flow_scalar(parser, 0);
1006
1007
/*
1008
* Is it a plain scalar?
1009
*
1010
* A plain scalar may start with any non-blank characters except
1011
*
1012
* '-', '?', ':', ',', '[', ']', '{', '}',
1013
* '#', '&', '*', '!', '|', '>', '\'', '\"',
1014
* '%', '@', '`'.
1015
*
1016
* In the block context (and, for the '-' indicator, in the flow context
1017
* too), it may also start with the characters
1018
*
1019
* '-', '?', ':'
1020
*
1021
* if it is followed by a non-space character.
1022
*
1023
* The last rule is more restrictive than the specification requires.
1024
*/
1025
1026
if (!(IS_BLANKZ(parser->buffer) || CHECK(parser->buffer, '-')
1027
|| CHECK(parser->buffer, '?') || CHECK(parser->buffer, ':')
1028
|| CHECK(parser->buffer, ',') || CHECK(parser->buffer, '[')
1029
|| CHECK(parser->buffer, ']') || CHECK(parser->buffer, '{')
1030
|| CHECK(parser->buffer, '}') || CHECK(parser->buffer, '#')
1031
|| CHECK(parser->buffer, '&') || CHECK(parser->buffer, '*')
1032
|| CHECK(parser->buffer, '!') || CHECK(parser->buffer, '|')
1033
|| CHECK(parser->buffer, '>') || CHECK(parser->buffer, '\'')
1034
|| CHECK(parser->buffer, '"') || CHECK(parser->buffer, '%')
1035
|| CHECK(parser->buffer, '@') || CHECK(parser->buffer, '`')) ||
1036
(CHECK(parser->buffer, '-') && !IS_BLANK_AT(parser->buffer, 1)) ||
1037
(!parser->flow_level &&
1038
(CHECK(parser->buffer, '?') || CHECK(parser->buffer, ':'))
1039
&& !IS_BLANKZ_AT(parser->buffer, 1)))
1040
return yaml_parser_fetch_plain_scalar(parser);
1041
1042
/*
1043
* If we don't determine the token type so far, it is an error.
1044
*/
1045
1046
return yaml_parser_set_scanner_error(parser,
1047
"while scanning for the next token", parser->mark,
1048
"found character that cannot start any token");
1049
}
1050
1051
/*
1052
* Check the list of potential simple keys and remove the positions that
1053
* cannot contain simple keys anymore.
1054
*/
1055
1056
static int
1057
yaml_parser_stale_simple_keys(yaml_parser_t *parser)
1058
{
1059
yaml_simple_key_t *simple_key;
1060
1061
/* Check for a potential simple key for each flow level. */
1062
1063
for (simple_key = parser->simple_keys.start;
1064
simple_key != parser->simple_keys.top; simple_key ++)
1065
{
1066
/*
1067
* The specification requires that a simple key
1068
*
1069
* - is limited to a single line,
1070
* - is shorter than 1024 characters.
1071
*/
1072
1073
if (simple_key->possible
1074
&& (simple_key->mark.line < parser->mark.line
1075
|| simple_key->mark.index+1024 < parser->mark.index)) {
1076
1077
/* Check if the potential simple key to be removed is required. */
1078
1079
if (simple_key->required) {
1080
return yaml_parser_set_scanner_error(parser,
1081
"while scanning a simple key", simple_key->mark,
1082
"could not find expected ':'");
1083
}
1084
1085
simple_key->possible = 0;
1086
}
1087
}
1088
1089
return 1;
1090
}
1091
1092
/*
1093
* Check if a simple key may start at the current position and add it if
1094
* needed.
1095
*/
1096
1097
static int
1098
yaml_parser_save_simple_key(yaml_parser_t *parser)
1099
{
1100
/*
1101
* A simple key is required at the current position if the scanner is in
1102
* the block context and the current column coincides with the indentation
1103
* level.
1104
*/
1105
1106
int required = (!parser->flow_level
1107
&& parser->indent == (ptrdiff_t)parser->mark.column);
1108
1109
/*
1110
* If the current position may start a simple key, save it.
1111
*/
1112
1113
if (parser->simple_key_allowed)
1114
{
1115
yaml_simple_key_t simple_key;
1116
simple_key.possible = 1;
1117
simple_key.required = required;
1118
simple_key.token_number =
1119
parser->tokens_parsed + (parser->tokens.tail - parser->tokens.head);
1120
simple_key.mark = parser->mark;
1121
1122
if (!yaml_parser_remove_simple_key(parser)) return 0;
1123
1124
*(parser->simple_keys.top-1) = simple_key;
1125
}
1126
1127
return 1;
1128
}
1129
1130
/*
1131
* Remove a potential simple key at the current flow level.
1132
*/
1133
1134
static int
1135
yaml_parser_remove_simple_key(yaml_parser_t *parser)
1136
{
1137
yaml_simple_key_t *simple_key = parser->simple_keys.top-1;
1138
1139
if (simple_key->possible)
1140
{
1141
/* If the key is required, it is an error. */
1142
1143
if (simple_key->required) {
1144
return yaml_parser_set_scanner_error(parser,
1145
"while scanning a simple key", simple_key->mark,
1146
"could not find expected ':'");
1147
}
1148
}
1149
1150
/* Remove the key from the stack. */
1151
1152
simple_key->possible = 0;
1153
1154
return 1;
1155
}
1156
1157
/*
1158
* Increase the flow level and resize the simple key list if needed.
1159
*/
1160
1161
static int
1162
yaml_parser_increase_flow_level(yaml_parser_t *parser)
1163
{
1164
yaml_simple_key_t empty_simple_key = { 0, 0, 0, { 0, 0, 0 } };
1165
1166
/* Reset the simple key on the next level. */
1167
1168
if (!PUSH(parser, parser->simple_keys, empty_simple_key))
1169
return 0;
1170
1171
/* Increase the flow level. */
1172
1173
if (parser->flow_level == INT_MAX) {
1174
parser->error = YAML_MEMORY_ERROR;
1175
return 0;
1176
}
1177
1178
parser->flow_level++;
1179
1180
return 1;
1181
}
1182
1183
/*
1184
* Decrease the flow level.
1185
*/
1186
1187
static int
1188
yaml_parser_decrease_flow_level(yaml_parser_t *parser)
1189
{
1190
if (parser->flow_level) {
1191
parser->flow_level --;
1192
(void)POP(parser, parser->simple_keys);
1193
}
1194
1195
return 1;
1196
}
1197
1198
/*
1199
* Push the current indentation level to the stack and set the new level
1200
* the current column is greater than the indentation level. In this case,
1201
* append or insert the specified token into the token queue.
1202
*
1203
*/
1204
1205
static int
1206
yaml_parser_roll_indent(yaml_parser_t *parser, ptrdiff_t column,
1207
ptrdiff_t number, yaml_token_type_t type, yaml_mark_t mark)
1208
{
1209
yaml_token_t token;
1210
1211
/* In the flow context, do nothing. */
1212
1213
if (parser->flow_level)
1214
return 1;
1215
1216
if (parser->indent < column)
1217
{
1218
/*
1219
* Push the current indentation level to the stack and set the new
1220
* indentation level.
1221
*/
1222
1223
if (!PUSH(parser, parser->indents, parser->indent))
1224
return 0;
1225
1226
if (column > INT_MAX) {
1227
parser->error = YAML_MEMORY_ERROR;
1228
return 0;
1229
}
1230
1231
parser->indent = column;
1232
1233
/* Create a token and insert it into the queue. */
1234
1235
TOKEN_INIT(token, type, mark, mark);
1236
1237
if (number == -1) {
1238
if (!ENQUEUE(parser, parser->tokens, token))
1239
return 0;
1240
}
1241
else {
1242
if (!QUEUE_INSERT(parser,
1243
parser->tokens, number - parser->tokens_parsed, token))
1244
return 0;
1245
}
1246
}
1247
1248
return 1;
1249
}
1250
1251
/*
1252
* Pop indentation levels from the indents stack until the current level
1253
* becomes less or equal to the column. For each indentation level, append
1254
* the BLOCK-END token.
1255
*/
1256
1257
1258
static int
1259
yaml_parser_unroll_indent(yaml_parser_t *parser, ptrdiff_t column)
1260
{
1261
yaml_token_t token;
1262
1263
/* In the flow context, do nothing. */
1264
1265
if (parser->flow_level)
1266
return 1;
1267
1268
/* Loop through the indentation levels in the stack. */
1269
1270
while (parser->indent > column)
1271
{
1272
/* Create a token and append it to the queue. */
1273
1274
TOKEN_INIT(token, YAML_BLOCK_END_TOKEN, parser->mark, parser->mark);
1275
1276
if (!ENQUEUE(parser, parser->tokens, token))
1277
return 0;
1278
1279
/* Pop the indentation level. */
1280
1281
parser->indent = POP(parser, parser->indents);
1282
}
1283
1284
return 1;
1285
}
1286
1287
/*
1288
* Initialize the scanner and produce the STREAM-START token.
1289
*/
1290
1291
static int
1292
yaml_parser_fetch_stream_start(yaml_parser_t *parser)
1293
{
1294
yaml_simple_key_t simple_key = { 0, 0, 0, { 0, 0, 0 } };
1295
yaml_token_t token;
1296
1297
/* Set the initial indentation. */
1298
1299
parser->indent = -1;
1300
1301
/* Initialize the simple key stack. */
1302
1303
if (!PUSH(parser, parser->simple_keys, simple_key))
1304
return 0;
1305
1306
/* A simple key is allowed at the beginning of the stream. */
1307
1308
parser->simple_key_allowed = 1;
1309
1310
/* We have started. */
1311
1312
parser->stream_start_produced = 1;
1313
1314
/* Create the STREAM-START token and append it to the queue. */
1315
1316
STREAM_START_TOKEN_INIT(token, parser->encoding,
1317
parser->mark, parser->mark);
1318
1319
if (!ENQUEUE(parser, parser->tokens, token))
1320
return 0;
1321
1322
return 1;
1323
}
1324
1325
/*
1326
* Produce the STREAM-END token and shut down the scanner.
1327
*/
1328
1329
static int
1330
yaml_parser_fetch_stream_end(yaml_parser_t *parser)
1331
{
1332
yaml_token_t token;
1333
1334
/* Force new line. */
1335
1336
if (parser->mark.column != 0) {
1337
parser->mark.column = 0;
1338
parser->mark.line ++;
1339
}
1340
1341
/* Reset the indentation level. */
1342
1343
if (!yaml_parser_unroll_indent(parser, -1))
1344
return 0;
1345
1346
/* Reset simple keys. */
1347
1348
if (!yaml_parser_remove_simple_key(parser))
1349
return 0;
1350
1351
parser->simple_key_allowed = 0;
1352
1353
/* Create the STREAM-END token and append it to the queue. */
1354
1355
STREAM_END_TOKEN_INIT(token, parser->mark, parser->mark);
1356
1357
if (!ENQUEUE(parser, parser->tokens, token))
1358
return 0;
1359
1360
return 1;
1361
}
1362
1363
/*
1364
* Produce a VERSION-DIRECTIVE or TAG-DIRECTIVE token.
1365
*/
1366
1367
static int
1368
yaml_parser_fetch_directive(yaml_parser_t *parser)
1369
{
1370
yaml_token_t token;
1371
1372
/* Reset the indentation level. */
1373
1374
if (!yaml_parser_unroll_indent(parser, -1))
1375
return 0;
1376
1377
/* Reset simple keys. */
1378
1379
if (!yaml_parser_remove_simple_key(parser))
1380
return 0;
1381
1382
parser->simple_key_allowed = 0;
1383
1384
/* Create the YAML-DIRECTIVE or TAG-DIRECTIVE token. */
1385
1386
if (!yaml_parser_scan_directive(parser, &token))
1387
return 0;
1388
1389
/* Append the token to the queue. */
1390
1391
if (!ENQUEUE(parser, parser->tokens, token)) {
1392
yaml_token_delete(&token);
1393
return 0;
1394
}
1395
1396
return 1;
1397
}
1398
1399
/*
1400
* Produce the DOCUMENT-START or DOCUMENT-END token.
1401
*/
1402
1403
static int
1404
yaml_parser_fetch_document_indicator(yaml_parser_t *parser,
1405
yaml_token_type_t type)
1406
{
1407
yaml_mark_t start_mark, end_mark;
1408
yaml_token_t token;
1409
1410
/* Reset the indentation level. */
1411
1412
if (!yaml_parser_unroll_indent(parser, -1))
1413
return 0;
1414
1415
/* Reset simple keys. */
1416
1417
if (!yaml_parser_remove_simple_key(parser))
1418
return 0;
1419
1420
parser->simple_key_allowed = 0;
1421
1422
/* Consume the token. */
1423
1424
start_mark = parser->mark;
1425
1426
SKIP(parser);
1427
SKIP(parser);
1428
SKIP(parser);
1429
1430
end_mark = parser->mark;
1431
1432
/* Create the DOCUMENT-START or DOCUMENT-END token. */
1433
1434
TOKEN_INIT(token, type, start_mark, end_mark);
1435
1436
/* Append the token to the queue. */
1437
1438
if (!ENQUEUE(parser, parser->tokens, token))
1439
return 0;
1440
1441
return 1;
1442
}
1443
1444
/*
1445
* Produce the FLOW-SEQUENCE-START or FLOW-MAPPING-START token.
1446
*/
1447
1448
static int
1449
yaml_parser_fetch_flow_collection_start(yaml_parser_t *parser,
1450
yaml_token_type_t type)
1451
{
1452
yaml_mark_t start_mark, end_mark;
1453
yaml_token_t token;
1454
1455
/* The indicators '[' and '{' may start a simple key. */
1456
1457
if (!yaml_parser_save_simple_key(parser))
1458
return 0;
1459
1460
/* Increase the flow level. */
1461
1462
if (!yaml_parser_increase_flow_level(parser))
1463
return 0;
1464
1465
/* A simple key may follow the indicators '[' and '{'. */
1466
1467
parser->simple_key_allowed = 1;
1468
1469
/* Consume the token. */
1470
1471
start_mark = parser->mark;
1472
SKIP(parser);
1473
end_mark = parser->mark;
1474
1475
/* Create the FLOW-SEQUENCE-START of FLOW-MAPPING-START token. */
1476
1477
TOKEN_INIT(token, type, start_mark, end_mark);
1478
1479
/* Append the token to the queue. */
1480
1481
if (!ENQUEUE(parser, parser->tokens, token))
1482
return 0;
1483
1484
return 1;
1485
}
1486
1487
/*
1488
* Produce the FLOW-SEQUENCE-END or FLOW-MAPPING-END token.
1489
*/
1490
1491
static int
1492
yaml_parser_fetch_flow_collection_end(yaml_parser_t *parser,
1493
yaml_token_type_t type)
1494
{
1495
yaml_mark_t start_mark, end_mark;
1496
yaml_token_t token;
1497
1498
/* Reset any potential simple key on the current flow level. */
1499
1500
if (!yaml_parser_remove_simple_key(parser))
1501
return 0;
1502
1503
/* Decrease the flow level. */
1504
1505
if (!yaml_parser_decrease_flow_level(parser))
1506
return 0;
1507
1508
/* No simple keys after the indicators ']' and '}'. */
1509
1510
parser->simple_key_allowed = 0;
1511
1512
/* Consume the token. */
1513
1514
start_mark = parser->mark;
1515
SKIP(parser);
1516
end_mark = parser->mark;
1517
1518
/* Create the FLOW-SEQUENCE-END of FLOW-MAPPING-END token. */
1519
1520
TOKEN_INIT(token, type, start_mark, end_mark);
1521
1522
/* Append the token to the queue. */
1523
1524
if (!ENQUEUE(parser, parser->tokens, token))
1525
return 0;
1526
1527
return 1;
1528
}
1529
1530
/*
1531
* Produce the FLOW-ENTRY token.
1532
*/
1533
1534
static int
1535
yaml_parser_fetch_flow_entry(yaml_parser_t *parser)
1536
{
1537
yaml_mark_t start_mark, end_mark;
1538
yaml_token_t token;
1539
1540
/* Reset any potential simple keys on the current flow level. */
1541
1542
if (!yaml_parser_remove_simple_key(parser))
1543
return 0;
1544
1545
/* Simple keys are allowed after ','. */
1546
1547
parser->simple_key_allowed = 1;
1548
1549
/* Consume the token. */
1550
1551
start_mark = parser->mark;
1552
SKIP(parser);
1553
end_mark = parser->mark;
1554
1555
/* Create the FLOW-ENTRY token and append it to the queue. */
1556
1557
TOKEN_INIT(token, YAML_FLOW_ENTRY_TOKEN, start_mark, end_mark);
1558
1559
if (!ENQUEUE(parser, parser->tokens, token))
1560
return 0;
1561
1562
return 1;
1563
}
1564
1565
/*
1566
* Produce the BLOCK-ENTRY token.
1567
*/
1568
1569
static int
1570
yaml_parser_fetch_block_entry(yaml_parser_t *parser)
1571
{
1572
yaml_mark_t start_mark, end_mark;
1573
yaml_token_t token;
1574
1575
/* Check if the scanner is in the block context. */
1576
1577
if (!parser->flow_level)
1578
{
1579
/* Check if we are allowed to start a new entry. */
1580
1581
if (!parser->simple_key_allowed) {
1582
return yaml_parser_set_scanner_error(parser, NULL, parser->mark,
1583
"block sequence entries are not allowed in this context");
1584
}
1585
1586
/* Add the BLOCK-SEQUENCE-START token if needed. */
1587
1588
if (!yaml_parser_roll_indent(parser, parser->mark.column, -1,
1589
YAML_BLOCK_SEQUENCE_START_TOKEN, parser->mark))
1590
return 0;
1591
}
1592
else
1593
{
1594
/*
1595
* It is an error for the '-' indicator to occur in the flow context,
1596
* but we let the Parser detect and report about it because the Parser
1597
* is able to point to the context.
1598
*/
1599
}
1600
1601
/* Reset any potential simple keys on the current flow level. */
1602
1603
if (!yaml_parser_remove_simple_key(parser))
1604
return 0;
1605
1606
/* Simple keys are allowed after '-'. */
1607
1608
parser->simple_key_allowed = 1;
1609
1610
/* Consume the token. */
1611
1612
start_mark = parser->mark;
1613
SKIP(parser);
1614
end_mark = parser->mark;
1615
1616
/* Create the BLOCK-ENTRY token and append it to the queue. */
1617
1618
TOKEN_INIT(token, YAML_BLOCK_ENTRY_TOKEN, start_mark, end_mark);
1619
1620
if (!ENQUEUE(parser, parser->tokens, token))
1621
return 0;
1622
1623
return 1;
1624
}
1625
1626
/*
1627
* Produce the KEY token.
1628
*/
1629
1630
static int
1631
yaml_parser_fetch_key(yaml_parser_t *parser)
1632
{
1633
yaml_mark_t start_mark, end_mark;
1634
yaml_token_t token;
1635
1636
/* In the block context, additional checks are required. */
1637
1638
if (!parser->flow_level)
1639
{
1640
/* Check if we are allowed to start a new key (not necessary simple). */
1641
1642
if (!parser->simple_key_allowed) {
1643
return yaml_parser_set_scanner_error(parser, NULL, parser->mark,
1644
"mapping keys are not allowed in this context");
1645
}
1646
1647
/* Add the BLOCK-MAPPING-START token if needed. */
1648
1649
if (!yaml_parser_roll_indent(parser, parser->mark.column, -1,
1650
YAML_BLOCK_MAPPING_START_TOKEN, parser->mark))
1651
return 0;
1652
}
1653
1654
/* Reset any potential simple keys on the current flow level. */
1655
1656
if (!yaml_parser_remove_simple_key(parser))
1657
return 0;
1658
1659
/* Simple keys are allowed after '?' in the block context. */
1660
1661
parser->simple_key_allowed = (!parser->flow_level);
1662
1663
/* Consume the token. */
1664
1665
start_mark = parser->mark;
1666
SKIP(parser);
1667
end_mark = parser->mark;
1668
1669
/* Create the KEY token and append it to the queue. */
1670
1671
TOKEN_INIT(token, YAML_KEY_TOKEN, start_mark, end_mark);
1672
1673
if (!ENQUEUE(parser, parser->tokens, token))
1674
return 0;
1675
1676
return 1;
1677
}
1678
1679
/*
1680
* Produce the VALUE token.
1681
*/
1682
1683
static int
1684
yaml_parser_fetch_value(yaml_parser_t *parser)
1685
{
1686
yaml_mark_t start_mark, end_mark;
1687
yaml_token_t token;
1688
yaml_simple_key_t *simple_key = parser->simple_keys.top-1;
1689
1690
/* Have we found a simple key? */
1691
1692
if (simple_key->possible)
1693
{
1694
1695
/* Create the KEY token and insert it into the queue. */
1696
1697
TOKEN_INIT(token, YAML_KEY_TOKEN, simple_key->mark, simple_key->mark);
1698
1699
if (!QUEUE_INSERT(parser, parser->tokens,
1700
simple_key->token_number - parser->tokens_parsed, token))
1701
return 0;
1702
1703
/* In the block context, we may need to add the BLOCK-MAPPING-START token. */
1704
1705
if (!yaml_parser_roll_indent(parser, simple_key->mark.column,
1706
simple_key->token_number,
1707
YAML_BLOCK_MAPPING_START_TOKEN, simple_key->mark))
1708
return 0;
1709
1710
/* Remove the simple key. */
1711
1712
simple_key->possible = 0;
1713
1714
/* A simple key cannot follow another simple key. */
1715
1716
parser->simple_key_allowed = 0;
1717
}
1718
else
1719
{
1720
/* The ':' indicator follows a complex key. */
1721
1722
/* In the block context, extra checks are required. */
1723
1724
if (!parser->flow_level)
1725
{
1726
/* Check if we are allowed to start a complex value. */
1727
1728
if (!parser->simple_key_allowed) {
1729
return yaml_parser_set_scanner_error(parser, NULL, parser->mark,
1730
"mapping values are not allowed in this context");
1731
}
1732
1733
/* Add the BLOCK-MAPPING-START token if needed. */
1734
1735
if (!yaml_parser_roll_indent(parser, parser->mark.column, -1,
1736
YAML_BLOCK_MAPPING_START_TOKEN, parser->mark))
1737
return 0;
1738
}
1739
1740
/* Simple keys after ':' are allowed in the block context. */
1741
1742
parser->simple_key_allowed = (!parser->flow_level);
1743
}
1744
1745
/* Consume the token. */
1746
1747
start_mark = parser->mark;
1748
SKIP(parser);
1749
end_mark = parser->mark;
1750
1751
/* Create the VALUE token and append it to the queue. */
1752
1753
TOKEN_INIT(token, YAML_VALUE_TOKEN, start_mark, end_mark);
1754
1755
if (!ENQUEUE(parser, parser->tokens, token))
1756
return 0;
1757
1758
return 1;
1759
}
1760
1761
/*
1762
* Produce the ALIAS or ANCHOR token.
1763
*/
1764
1765
static int
1766
yaml_parser_fetch_anchor(yaml_parser_t *parser, yaml_token_type_t type)
1767
{
1768
yaml_token_t token;
1769
1770
/* An anchor or an alias could be a simple key. */
1771
1772
if (!yaml_parser_save_simple_key(parser))
1773
return 0;
1774
1775
/* A simple key cannot follow an anchor or an alias. */
1776
1777
parser->simple_key_allowed = 0;
1778
1779
/* Create the ALIAS or ANCHOR token and append it to the queue. */
1780
1781
if (!yaml_parser_scan_anchor(parser, &token, type))
1782
return 0;
1783
1784
if (!ENQUEUE(parser, parser->tokens, token)) {
1785
yaml_token_delete(&token);
1786
return 0;
1787
}
1788
return 1;
1789
}
1790
1791
/*
1792
* Produce the TAG token.
1793
*/
1794
1795
static int
1796
yaml_parser_fetch_tag(yaml_parser_t *parser)
1797
{
1798
yaml_token_t token;
1799
1800
/* A tag could be a simple key. */
1801
1802
if (!yaml_parser_save_simple_key(parser))
1803
return 0;
1804
1805
/* A simple key cannot follow a tag. */
1806
1807
parser->simple_key_allowed = 0;
1808
1809
/* Create the TAG token and append it to the queue. */
1810
1811
if (!yaml_parser_scan_tag(parser, &token))
1812
return 0;
1813
1814
if (!ENQUEUE(parser, parser->tokens, token)) {
1815
yaml_token_delete(&token);
1816
return 0;
1817
}
1818
1819
return 1;
1820
}
1821
1822
/*
1823
* Produce the SCALAR(...,literal) or SCALAR(...,folded) tokens.
1824
*/
1825
1826
static int
1827
yaml_parser_fetch_block_scalar(yaml_parser_t *parser, int literal)
1828
{
1829
yaml_token_t token;
1830
1831
/* Remove any potential simple keys. */
1832
1833
if (!yaml_parser_remove_simple_key(parser))
1834
return 0;
1835
1836
/* A simple key may follow a block scalar. */
1837
1838
parser->simple_key_allowed = 1;
1839
1840
/* Create the SCALAR token and append it to the queue. */
1841
1842
if (!yaml_parser_scan_block_scalar(parser, &token, literal))
1843
return 0;
1844
1845
if (!ENQUEUE(parser, parser->tokens, token)) {
1846
yaml_token_delete(&token);
1847
return 0;
1848
}
1849
1850
return 1;
1851
}
1852
1853
/*
1854
* Produce the SCALAR(...,single-quoted) or SCALAR(...,double-quoted) tokens.
1855
*/
1856
1857
static int
1858
yaml_parser_fetch_flow_scalar(yaml_parser_t *parser, int single)
1859
{
1860
yaml_token_t token;
1861
1862
/* A plain scalar could be a simple key. */
1863
1864
if (!yaml_parser_save_simple_key(parser))
1865
return 0;
1866
1867
/* A simple key cannot follow a flow scalar. */
1868
1869
parser->simple_key_allowed = 0;
1870
1871
/* Create the SCALAR token and append it to the queue. */
1872
1873
if (!yaml_parser_scan_flow_scalar(parser, &token, single))
1874
return 0;
1875
1876
if (!ENQUEUE(parser, parser->tokens, token)) {
1877
yaml_token_delete(&token);
1878
return 0;
1879
}
1880
1881
return 1;
1882
}
1883
1884
/*
1885
* Produce the SCALAR(...,plain) token.
1886
*/
1887
1888
static int
1889
yaml_parser_fetch_plain_scalar(yaml_parser_t *parser)
1890
{
1891
yaml_token_t token;
1892
1893
/* A plain scalar could be a simple key. */
1894
1895
if (!yaml_parser_save_simple_key(parser))
1896
return 0;
1897
1898
/* A simple key cannot follow a flow scalar. */
1899
1900
parser->simple_key_allowed = 0;
1901
1902
/* Create the SCALAR token and append it to the queue. */
1903
1904
if (!yaml_parser_scan_plain_scalar(parser, &token))
1905
return 0;
1906
1907
if (!ENQUEUE(parser, parser->tokens, token)) {
1908
yaml_token_delete(&token);
1909
return 0;
1910
}
1911
1912
return 1;
1913
}
1914
1915
/*
1916
* Eat whitespaces and comments until the next token is found.
1917
*/
1918
1919
static int
1920
yaml_parser_scan_to_next_token(yaml_parser_t *parser)
1921
{
1922
/* Until the next token is not found. */
1923
1924
while (1)
1925
{
1926
/* Allow the BOM mark to start a line. */
1927
1928
if (!CACHE(parser, 1)) return 0;
1929
1930
if (parser->mark.column == 0 && IS_BOM(parser->buffer))
1931
SKIP(parser);
1932
1933
/*
1934
* Eat whitespaces.
1935
*
1936
* Tabs are allowed:
1937
*
1938
* - in the flow context;
1939
* - in the block context, but not at the beginning of the line or
1940
* after '-', '?', or ':' (complex value).
1941
*/
1942
1943
if (!CACHE(parser, 1)) return 0;
1944
1945
while (CHECK(parser->buffer,' ') ||
1946
((parser->flow_level || !parser->simple_key_allowed) &&
1947
CHECK(parser->buffer, '\t'))) {
1948
SKIP(parser);
1949
if (!CACHE(parser, 1)) return 0;
1950
}
1951
1952
/* Eat a comment until a line break. */
1953
1954
if (CHECK(parser->buffer, '#')) {
1955
while (!IS_BREAKZ(parser->buffer)) {
1956
SKIP(parser);
1957
if (!CACHE(parser, 1)) return 0;
1958
}
1959
}
1960
1961
/* If it is a line break, eat it. */
1962
1963
if (IS_BREAK(parser->buffer))
1964
{
1965
if (!CACHE(parser, 2)) return 0;
1966
SKIP_LINE(parser);
1967
1968
/* In the block context, a new line may start a simple key. */
1969
1970
if (!parser->flow_level) {
1971
parser->simple_key_allowed = 1;
1972
}
1973
}
1974
else
1975
{
1976
/* We have found a token. */
1977
1978
break;
1979
}
1980
}
1981
1982
return 1;
1983
}
1984
1985
/*
1986
* Scan a YAML-DIRECTIVE or TAG-DIRECTIVE token.
1987
*
1988
* Scope:
1989
* %YAML 1.1 # a comment \n
1990
* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1991
* %TAG !yaml! tag:yaml.org,2002: \n
1992
* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1993
*/
1994
1995
int
1996
yaml_parser_scan_directive(yaml_parser_t *parser, yaml_token_t *token)
1997
{
1998
yaml_mark_t start_mark, end_mark;
1999
yaml_char_t *name = NULL;
2000
int major, minor;
2001
yaml_char_t *handle = NULL, *prefix = NULL;
2002
2003
/* Eat '%'. */
2004
2005
start_mark = parser->mark;
2006
2007
SKIP(parser);
2008
2009
/* Scan the directive name. */
2010
2011
if (!yaml_parser_scan_directive_name(parser, start_mark, &name))
2012
goto error;
2013
2014
/* Is it a YAML directive? */
2015
2016
if (strcmp((char *)name, "YAML") == 0)
2017
{
2018
/* Scan the VERSION directive value. */
2019
2020
if (!yaml_parser_scan_version_directive_value(parser, start_mark,
2021
&major, &minor))
2022
goto error;
2023
2024
end_mark = parser->mark;
2025
2026
/* Create a VERSION-DIRECTIVE token. */
2027
2028
VERSION_DIRECTIVE_TOKEN_INIT(*token, major, minor,
2029
start_mark, end_mark);
2030
}
2031
2032
/* Is it a TAG directive? */
2033
2034
else if (strcmp((char *)name, "TAG") == 0)
2035
{
2036
/* Scan the TAG directive value. */
2037
2038
if (!yaml_parser_scan_tag_directive_value(parser, start_mark,
2039
&handle, &prefix))
2040
goto error;
2041
2042
end_mark = parser->mark;
2043
2044
/* Create a TAG-DIRECTIVE token. */
2045
2046
TAG_DIRECTIVE_TOKEN_INIT(*token, handle, prefix,
2047
start_mark, end_mark);
2048
}
2049
2050
/* Unknown directive. */
2051
2052
else
2053
{
2054
yaml_parser_set_scanner_error(parser, "while scanning a directive",
2055
start_mark, "found unknown directive name");
2056
goto error;
2057
}
2058
2059
/* Eat the rest of the line including any comments. */
2060
2061
if (!CACHE(parser, 1)) goto error;
2062
2063
while (IS_BLANK(parser->buffer)) {
2064
SKIP(parser);
2065
if (!CACHE(parser, 1)) goto error;
2066
}
2067
2068
if (CHECK(parser->buffer, '#')) {
2069
while (!IS_BREAKZ(parser->buffer)) {
2070
SKIP(parser);
2071
if (!CACHE(parser, 1)) goto error;
2072
}
2073
}
2074
2075
/* Check if we are at the end of the line. */
2076
2077
if (!IS_BREAKZ(parser->buffer)) {
2078
yaml_parser_set_scanner_error(parser, "while scanning a directive",
2079
start_mark, "did not find expected comment or line break");
2080
goto error;
2081
}
2082
2083
/* Eat a line break. */
2084
2085
if (IS_BREAK(parser->buffer)) {
2086
if (!CACHE(parser, 2)) goto error;
2087
SKIP_LINE(parser);
2088
}
2089
2090
yaml_free(name);
2091
2092
return 1;
2093
2094
error:
2095
yaml_free(prefix);
2096
yaml_free(handle);
2097
yaml_free(name);
2098
return 0;
2099
}
2100
2101
/*
2102
* Scan the directive name.
2103
*
2104
* Scope:
2105
* %YAML 1.1 # a comment \n
2106
* ^^^^
2107
* %TAG !yaml! tag:yaml.org,2002: \n
2108
* ^^^
2109
*/
2110
2111
static int
2112
yaml_parser_scan_directive_name(yaml_parser_t *parser,
2113
yaml_mark_t start_mark, yaml_char_t **name)
2114
{
2115
yaml_string_t string = NULL_STRING;
2116
2117
if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error;
2118
2119
/* Consume the directive name. */
2120
2121
if (!CACHE(parser, 1)) goto error;
2122
2123
while (IS_ALPHA(parser->buffer))
2124
{
2125
if (!READ(parser, string)) goto error;
2126
if (!CACHE(parser, 1)) goto error;
2127
}
2128
2129
/* Check if the name is empty. */
2130
2131
if (string.start == string.pointer) {
2132
yaml_parser_set_scanner_error(parser, "while scanning a directive",
2133
start_mark, "could not find expected directive name");
2134
goto error;
2135
}
2136
2137
/* Check for an blank character after the name. */
2138
2139
if (!IS_BLANKZ(parser->buffer)) {
2140
yaml_parser_set_scanner_error(parser, "while scanning a directive",
2141
start_mark, "found unexpected non-alphabetical character");
2142
goto error;
2143
}
2144
2145
*name = string.start;
2146
2147
return 1;
2148
2149
error:
2150
STRING_DEL(parser, string);
2151
return 0;
2152
}
2153
2154
/*
2155
* Scan the value of VERSION-DIRECTIVE.
2156
*
2157
* Scope:
2158
* %YAML 1.1 # a comment \n
2159
* ^^^^^^
2160
*/
2161
2162
static int
2163
yaml_parser_scan_version_directive_value(yaml_parser_t *parser,
2164
yaml_mark_t start_mark, int *major, int *minor)
2165
{
2166
/* Eat whitespaces. */
2167
2168
if (!CACHE(parser, 1)) return 0;
2169
2170
while (IS_BLANK(parser->buffer)) {
2171
SKIP(parser);
2172
if (!CACHE(parser, 1)) return 0;
2173
}
2174
2175
/* Consume the major version number. */
2176
2177
if (!yaml_parser_scan_version_directive_number(parser, start_mark, major))
2178
return 0;
2179
2180
/* Eat '.'. */
2181
2182
if (!CHECK(parser->buffer, '.')) {
2183
return yaml_parser_set_scanner_error(parser, "while scanning a %YAML directive",
2184
start_mark, "did not find expected digit or '.' character");
2185
}
2186
2187
SKIP(parser);
2188
2189
/* Consume the minor version number. */
2190
2191
if (!yaml_parser_scan_version_directive_number(parser, start_mark, minor))
2192
return 0;
2193
2194
return 1;
2195
}
2196
2197
#define MAX_NUMBER_LENGTH 9
2198
2199
/*
2200
* Scan the version number of VERSION-DIRECTIVE.
2201
*
2202
* Scope:
2203
* %YAML 1.1 # a comment \n
2204
* ^
2205
* %YAML 1.1 # a comment \n
2206
* ^
2207
*/
2208
2209
static int
2210
yaml_parser_scan_version_directive_number(yaml_parser_t *parser,
2211
yaml_mark_t start_mark, int *number)
2212
{
2213
int value = 0;
2214
size_t length = 0;
2215
2216
/* Repeat while the next character is digit. */
2217
2218
if (!CACHE(parser, 1)) return 0;
2219
2220
while (IS_DIGIT(parser->buffer))
2221
{
2222
/* Check if the number is too long. */
2223
2224
if (++length > MAX_NUMBER_LENGTH) {
2225
return yaml_parser_set_scanner_error(parser, "while scanning a %YAML directive",
2226
start_mark, "found extremely long version number");
2227
}
2228
2229
value = value*10 + AS_DIGIT(parser->buffer);
2230
2231
SKIP(parser);
2232
2233
if (!CACHE(parser, 1)) return 0;
2234
}
2235
2236
/* Check if the number was present. */
2237
2238
if (!length) {
2239
return yaml_parser_set_scanner_error(parser, "while scanning a %YAML directive",
2240
start_mark, "did not find expected version number");
2241
}
2242
2243
*number = value;
2244
2245
return 1;
2246
}
2247
2248
/*
2249
* Scan the value of a TAG-DIRECTIVE token.
2250
*
2251
* Scope:
2252
* %TAG !yaml! tag:yaml.org,2002: \n
2253
* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2254
*/
2255
2256
static int
2257
yaml_parser_scan_tag_directive_value(yaml_parser_t *parser,
2258
yaml_mark_t start_mark, yaml_char_t **handle, yaml_char_t **prefix)
2259
{
2260
yaml_char_t *handle_value = NULL;
2261
yaml_char_t *prefix_value = NULL;
2262
2263
/* Eat whitespaces. */
2264
2265
if (!CACHE(parser, 1)) goto error;
2266
2267
while (IS_BLANK(parser->buffer)) {
2268
SKIP(parser);
2269
if (!CACHE(parser, 1)) goto error;
2270
}
2271
2272
/* Scan a handle. */
2273
2274
if (!yaml_parser_scan_tag_handle(parser, 1, start_mark, &handle_value))
2275
goto error;
2276
2277
/* Expect a whitespace. */
2278
2279
if (!CACHE(parser, 1)) goto error;
2280
2281
if (!IS_BLANK(parser->buffer)) {
2282
yaml_parser_set_scanner_error(parser, "while scanning a %TAG directive",
2283
start_mark, "did not find expected whitespace");
2284
goto error;
2285
}
2286
2287
/* Eat whitespaces. */
2288
2289
while (IS_BLANK(parser->buffer)) {
2290
SKIP(parser);
2291
if (!CACHE(parser, 1)) goto error;
2292
}
2293
2294
/* Scan a prefix. */
2295
2296
if (!yaml_parser_scan_tag_uri(parser, 1, 1, NULL, start_mark, &prefix_value))
2297
goto error;
2298
2299
/* Expect a whitespace or line break. */
2300
2301
if (!CACHE(parser, 1)) goto error;
2302
2303
if (!IS_BLANKZ(parser->buffer)) {
2304
yaml_parser_set_scanner_error(parser, "while scanning a %TAG directive",
2305
start_mark, "did not find expected whitespace or line break");
2306
goto error;
2307
}
2308
2309
*handle = handle_value;
2310
*prefix = prefix_value;
2311
2312
return 1;
2313
2314
error:
2315
yaml_free(handle_value);
2316
yaml_free(prefix_value);
2317
return 0;
2318
}
2319
2320
static int
2321
yaml_parser_scan_anchor(yaml_parser_t *parser, yaml_token_t *token,
2322
yaml_token_type_t type)
2323
{
2324
int length = 0;
2325
yaml_mark_t start_mark, end_mark;
2326
yaml_string_t string = NULL_STRING;
2327
2328
if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error;
2329
2330
/* Eat the indicator character. */
2331
2332
start_mark = parser->mark;
2333
2334
SKIP(parser);
2335
2336
/* Consume the value. */
2337
2338
if (!CACHE(parser, 1)) goto error;
2339
2340
while (IS_ALPHA(parser->buffer)) {
2341
if (!READ(parser, string)) goto error;
2342
if (!CACHE(parser, 1)) goto error;
2343
length ++;
2344
}
2345
2346
end_mark = parser->mark;
2347
2348
/*
2349
* Check if length of the anchor is greater than 0 and it is followed by
2350
* a whitespace character or one of the indicators:
2351
*
2352
* '?', ':', ',', ']', '}', '%', '@', '`'.
2353
*/
2354
2355
if (!length || !(IS_BLANKZ(parser->buffer) || CHECK(parser->buffer, '?')
2356
|| CHECK(parser->buffer, ':') || CHECK(parser->buffer, ',')
2357
|| CHECK(parser->buffer, ']') || CHECK(parser->buffer, '}')
2358
|| CHECK(parser->buffer, '%') || CHECK(parser->buffer, '@')
2359
|| CHECK(parser->buffer, '`'))) {
2360
yaml_parser_set_scanner_error(parser, type == YAML_ANCHOR_TOKEN ?
2361
"while scanning an anchor" : "while scanning an alias", start_mark,
2362
"did not find expected alphabetic or numeric character");
2363
goto error;
2364
}
2365
2366
/* Create a token. */
2367
2368
if (type == YAML_ANCHOR_TOKEN) {
2369
ANCHOR_TOKEN_INIT(*token, string.start, start_mark, end_mark);
2370
}
2371
else {
2372
ALIAS_TOKEN_INIT(*token, string.start, start_mark, end_mark);
2373
}
2374
2375
return 1;
2376
2377
error:
2378
STRING_DEL(parser, string);
2379
return 0;
2380
}
2381
2382
/*
2383
* Scan a TAG token.
2384
*/
2385
2386
static int
2387
yaml_parser_scan_tag(yaml_parser_t *parser, yaml_token_t *token)
2388
{
2389
yaml_char_t *handle = NULL;
2390
yaml_char_t *suffix = NULL;
2391
yaml_mark_t start_mark, end_mark;
2392
2393
start_mark = parser->mark;
2394
2395
/* Check if the tag is in the canonical form. */
2396
2397
if (!CACHE(parser, 2)) goto error;
2398
2399
if (CHECK_AT(parser->buffer, '<', 1))
2400
{
2401
/* Set the handle to '' */
2402
2403
handle = YAML_MALLOC(1);
2404
if (!handle) goto error;
2405
handle[0] = '\0';
2406
2407
/* Eat '!<' */
2408
2409
SKIP(parser);
2410
SKIP(parser);
2411
2412
/* Consume the tag value. */
2413
2414
if (!yaml_parser_scan_tag_uri(parser, 1, 0, NULL, start_mark, &suffix))
2415
goto error;
2416
2417
/* Check for '>' and eat it. */
2418
2419
if (!CHECK(parser->buffer, '>')) {
2420
yaml_parser_set_scanner_error(parser, "while scanning a tag",
2421
start_mark, "did not find the expected '>'");
2422
goto error;
2423
}
2424
2425
SKIP(parser);
2426
}
2427
else
2428
{
2429
/* The tag has either the '!suffix' or the '!handle!suffix' form. */
2430
2431
/* First, try to scan a handle. */
2432
2433
if (!yaml_parser_scan_tag_handle(parser, 0, start_mark, &handle))
2434
goto error;
2435
2436
/* Check if it is, indeed, handle. */
2437
2438
if (handle[0] == '!' && handle[1] != '\0' && handle[strlen((char *)handle)-1] == '!')
2439
{
2440
/* Scan the suffix now. */
2441
2442
if (!yaml_parser_scan_tag_uri(parser, 0, 0, NULL, start_mark, &suffix))
2443
goto error;
2444
}
2445
else
2446
{
2447
/* It wasn't a handle after all. Scan the rest of the tag. */
2448
2449
if (!yaml_parser_scan_tag_uri(parser, 0, 0, handle, start_mark, &suffix))
2450
goto error;
2451
2452
/* Set the handle to '!'. */
2453
2454
yaml_free(handle);
2455
handle = YAML_MALLOC(2);
2456
if (!handle) goto error;
2457
handle[0] = '!';
2458
handle[1] = '\0';
2459
2460
/*
2461
* A special case: the '!' tag. Set the handle to '' and the
2462
* suffix to '!'.
2463
*/
2464
2465
if (suffix[0] == '\0') {
2466
yaml_char_t *tmp = handle;
2467
handle = suffix;
2468
suffix = tmp;
2469
}
2470
}
2471
}
2472
2473
/* Check the character which ends the tag. */
2474
2475
if (!CACHE(parser, 1)) goto error;
2476
2477
if (!IS_BLANKZ(parser->buffer)) {
2478
if (!parser->flow_level || !CHECK(parser->buffer, ',') ) {
2479
yaml_parser_set_scanner_error(parser, "while scanning a tag",
2480
start_mark, "did not find expected whitespace or line break");
2481
goto error;
2482
}
2483
}
2484
2485
end_mark = parser->mark;
2486
2487
/* Create a token. */
2488
2489
TAG_TOKEN_INIT(*token, handle, suffix, start_mark, end_mark);
2490
2491
return 1;
2492
2493
error:
2494
yaml_free(handle);
2495
yaml_free(suffix);
2496
return 0;
2497
}
2498
2499
/*
2500
* Scan a tag handle.
2501
*/
2502
2503
static int
2504
yaml_parser_scan_tag_handle(yaml_parser_t *parser, int directive,
2505
yaml_mark_t start_mark, yaml_char_t **handle)
2506
{
2507
yaml_string_t string = NULL_STRING;
2508
2509
if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error;
2510
2511
/* Check the initial '!' character. */
2512
2513
if (!CACHE(parser, 1)) goto error;
2514
2515
if (!CHECK(parser->buffer, '!')) {
2516
yaml_parser_set_scanner_error(parser, directive ?
2517
"while scanning a tag directive" : "while scanning a tag",
2518
start_mark, "did not find expected '!'");
2519
goto error;
2520
}
2521
2522
/* Copy the '!' character. */
2523
2524
if (!READ(parser, string)) goto error;
2525
2526
/* Copy all subsequent alphabetical and numerical characters. */
2527
2528
if (!CACHE(parser, 1)) goto error;
2529
2530
while (IS_ALPHA(parser->buffer))
2531
{
2532
if (!READ(parser, string)) goto error;
2533
if (!CACHE(parser, 1)) goto error;
2534
}
2535
2536
/* Check if the trailing character is '!' and copy it. */
2537
2538
if (CHECK(parser->buffer, '!'))
2539
{
2540
if (!READ(parser, string)) goto error;
2541
}
2542
else
2543
{
2544
/*
2545
* It's either the '!' tag or not really a tag handle. If it's a %TAG
2546
* directive, it's an error. If it's a tag token, it must be a part of
2547
* URI.
2548
*/
2549
2550
if (directive && !(string.start[0] == '!' && string.start[1] == '\0')) {
2551
yaml_parser_set_scanner_error(parser, "while parsing a tag directive",
2552
start_mark, "did not find expected '!'");
2553
goto error;
2554
}
2555
}
2556
2557
*handle = string.start;
2558
2559
return 1;
2560
2561
error:
2562
STRING_DEL(parser, string);
2563
return 0;
2564
}
2565
2566
/*
2567
* Scan a tag.
2568
*/
2569
2570
static int
2571
yaml_parser_scan_tag_uri(yaml_parser_t *parser, int uri_char, int directive,
2572
yaml_char_t *head, yaml_mark_t start_mark, yaml_char_t **uri)
2573
{
2574
size_t length = head ? strlen((char *)head) : 0;
2575
yaml_string_t string = NULL_STRING;
2576
2577
if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error;
2578
2579
/* Resize the string to include the head. */
2580
2581
while ((size_t)(string.end - string.start) <= length) {
2582
if (!yaml_string_extend(&string.start, &string.pointer, &string.end)) {
2583
parser->error = YAML_MEMORY_ERROR;
2584
goto error;
2585
}
2586
}
2587
2588
/*
2589
* Copy the head if needed.
2590
*
2591
* Note that we don't copy the leading '!' character.
2592
*/
2593
2594
if (length > 1) {
2595
memcpy(string.start, head+1, length-1);
2596
string.pointer += length-1;
2597
}
2598
2599
/* Scan the tag. */
2600
2601
if (!CACHE(parser, 1)) goto error;
2602
2603
/*
2604
* The set of characters that may appear in URI is as follows:
2605
*
2606
* '0'-'9', 'A'-'Z', 'a'-'z', '_', '-', ';', '/', '?', ':', '@', '&',
2607
* '=', '+', '$', '.', '!', '~', '*', '\'', '(', ')', '%'.
2608
*
2609
* If we are inside a verbatim tag <...> (parameter uri_char is true)
2610
* then also the following flow indicators are allowed:
2611
* ',', '[', ']'
2612
*/
2613
2614
while (IS_ALPHA(parser->buffer) || CHECK(parser->buffer, ';')
2615
|| CHECK(parser->buffer, '/') || CHECK(parser->buffer, '?')
2616
|| CHECK(parser->buffer, ':') || CHECK(parser->buffer, '@')
2617
|| CHECK(parser->buffer, '&') || CHECK(parser->buffer, '=')
2618
|| CHECK(parser->buffer, '+') || CHECK(parser->buffer, '$')
2619
|| CHECK(parser->buffer, '.') || CHECK(parser->buffer, '%')
2620
|| CHECK(parser->buffer, '!') || CHECK(parser->buffer, '~')
2621
|| CHECK(parser->buffer, '*') || CHECK(parser->buffer, '\'')
2622
|| CHECK(parser->buffer, '(') || CHECK(parser->buffer, ')')
2623
|| (uri_char && (
2624
CHECK(parser->buffer, ',')
2625
|| CHECK(parser->buffer, '[') || CHECK(parser->buffer, ']')
2626
)
2627
))
2628
{
2629
/* Check if it is a URI-escape sequence. */
2630
2631
if (CHECK(parser->buffer, '%')) {
2632
if (!STRING_EXTEND(parser, string))
2633
goto error;
2634
2635
if (!yaml_parser_scan_uri_escapes(parser,
2636
directive, start_mark, &string)) goto error;
2637
}
2638
else {
2639
if (!READ(parser, string)) goto error;
2640
}
2641
2642
length ++;
2643
if (!CACHE(parser, 1)) goto error;
2644
}
2645
2646
/* Check if the tag is non-empty. */
2647
2648
if (!length) {
2649
if (!STRING_EXTEND(parser, string))
2650
goto error;
2651
2652
yaml_parser_set_scanner_error(parser, directive ?
2653
"while parsing a %TAG directive" : "while parsing a tag",
2654
start_mark, "did not find expected tag URI");
2655
goto error;
2656
}
2657
2658
*uri = string.start;
2659
2660
return 1;
2661
2662
error:
2663
STRING_DEL(parser, string);
2664
return 0;
2665
}
2666
2667
/*
2668
* Decode an URI-escape sequence corresponding to a single UTF-8 character.
2669
*/
2670
2671
static int
2672
yaml_parser_scan_uri_escapes(yaml_parser_t *parser, int directive,
2673
yaml_mark_t start_mark, yaml_string_t *string)
2674
{
2675
int width = 0;
2676
2677
/* Decode the required number of characters. */
2678
2679
do {
2680
2681
unsigned char octet = 0;
2682
2683
/* Check for a URI-escaped octet. */
2684
2685
if (!CACHE(parser, 3)) return 0;
2686
2687
if (!(CHECK(parser->buffer, '%')
2688
&& IS_HEX_AT(parser->buffer, 1)
2689
&& IS_HEX_AT(parser->buffer, 2))) {
2690
return yaml_parser_set_scanner_error(parser, directive ?
2691
"while parsing a %TAG directive" : "while parsing a tag",
2692
start_mark, "did not find URI escaped octet");
2693
}
2694
2695
/* Get the octet. */
2696
2697
octet = (AS_HEX_AT(parser->buffer, 1) << 4) + AS_HEX_AT(parser->buffer, 2);
2698
2699
/* If it is the leading octet, determine the length of the UTF-8 sequence. */
2700
2701
if (!width)
2702
{
2703
width = (octet & 0x80) == 0x00 ? 1 :
2704
(octet & 0xE0) == 0xC0 ? 2 :
2705
(octet & 0xF0) == 0xE0 ? 3 :
2706
(octet & 0xF8) == 0xF0 ? 4 : 0;
2707
if (!width) {
2708
return yaml_parser_set_scanner_error(parser, directive ?
2709
"while parsing a %TAG directive" : "while parsing a tag",
2710
start_mark, "found an incorrect leading UTF-8 octet");
2711
}
2712
}
2713
else
2714
{
2715
/* Check if the trailing octet is correct. */
2716
2717
if ((octet & 0xC0) != 0x80) {
2718
return yaml_parser_set_scanner_error(parser, directive ?
2719
"while parsing a %TAG directive" : "while parsing a tag",
2720
start_mark, "found an incorrect trailing UTF-8 octet");
2721
}
2722
}
2723
2724
/* Copy the octet and move the pointers. */
2725
2726
*(string->pointer++) = octet;
2727
SKIP(parser);
2728
SKIP(parser);
2729
SKIP(parser);
2730
2731
} while (--width);
2732
2733
return 1;
2734
}
2735
2736
/*
2737
* Scan a block scalar.
2738
*/
2739
2740
static int
2741
yaml_parser_scan_block_scalar(yaml_parser_t *parser, yaml_token_t *token,
2742
int literal)
2743
{
2744
yaml_mark_t start_mark;
2745
yaml_mark_t end_mark;
2746
yaml_string_t string = NULL_STRING;
2747
yaml_string_t leading_break = NULL_STRING;
2748
yaml_string_t trailing_breaks = NULL_STRING;
2749
int chomping = 0;
2750
int increment = 0;
2751
int indent = 0;
2752
int leading_blank = 0;
2753
int trailing_blank = 0;
2754
2755
if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error;
2756
if (!STRING_INIT(parser, leading_break, INITIAL_STRING_SIZE)) goto error;
2757
if (!STRING_INIT(parser, trailing_breaks, INITIAL_STRING_SIZE)) goto error;
2758
2759
/* Eat the indicator '|' or '>'. */
2760
2761
start_mark = parser->mark;
2762
2763
SKIP(parser);
2764
2765
/* Scan the additional block scalar indicators. */
2766
2767
if (!CACHE(parser, 1)) goto error;
2768
2769
/* Check for a chomping indicator. */
2770
2771
if (CHECK(parser->buffer, '+') || CHECK(parser->buffer, '-'))
2772
{
2773
/* Set the chomping method and eat the indicator. */
2774
2775
chomping = CHECK(parser->buffer, '+') ? +1 : -1;
2776
2777
SKIP(parser);
2778
2779
/* Check for an indentation indicator. */
2780
2781
if (!CACHE(parser, 1)) goto error;
2782
2783
if (IS_DIGIT(parser->buffer))
2784
{
2785
/* Check that the indentation is greater than 0. */
2786
2787
if (CHECK(parser->buffer, '0')) {
2788
yaml_parser_set_scanner_error(parser, "while scanning a block scalar",
2789
start_mark, "found an indentation indicator equal to 0");
2790
goto error;
2791
}
2792
2793
/* Get the indentation level and eat the indicator. */
2794
2795
increment = AS_DIGIT(parser->buffer);
2796
2797
SKIP(parser);
2798
}
2799
}
2800
2801
/* Do the same as above, but in the opposite order. */
2802
2803
else if (IS_DIGIT(parser->buffer))
2804
{
2805
if (CHECK(parser->buffer, '0')) {
2806
yaml_parser_set_scanner_error(parser, "while scanning a block scalar",
2807
start_mark, "found an indentation indicator equal to 0");
2808
goto error;
2809
}
2810
2811
increment = AS_DIGIT(parser->buffer);
2812
2813
SKIP(parser);
2814
2815
if (!CACHE(parser, 1)) goto error;
2816
2817
if (CHECK(parser->buffer, '+') || CHECK(parser->buffer, '-')) {
2818
chomping = CHECK(parser->buffer, '+') ? +1 : -1;
2819
2820
SKIP(parser);
2821
}
2822
}
2823
2824
/* Eat whitespaces and comments to the end of the line. */
2825
2826
if (!CACHE(parser, 1)) goto error;
2827
2828
while (IS_BLANK(parser->buffer)) {
2829
SKIP(parser);
2830
if (!CACHE(parser, 1)) goto error;
2831
}
2832
2833
if (CHECK(parser->buffer, '#')) {
2834
while (!IS_BREAKZ(parser->buffer)) {
2835
SKIP(parser);
2836
if (!CACHE(parser, 1)) goto error;
2837
}
2838
}
2839
2840
/* Check if we are at the end of the line. */
2841
2842
if (!IS_BREAKZ(parser->buffer)) {
2843
yaml_parser_set_scanner_error(parser, "while scanning a block scalar",
2844
start_mark, "did not find expected comment or line break");
2845
goto error;
2846
}
2847
2848
/* Eat a line break. */
2849
2850
if (IS_BREAK(parser->buffer)) {
2851
if (!CACHE(parser, 2)) goto error;
2852
SKIP_LINE(parser);
2853
}
2854
2855
end_mark = parser->mark;
2856
2857
/* Set the indentation level if it was specified. */
2858
2859
if (increment) {
2860
indent = parser->indent >= 0 ? parser->indent+increment : increment;
2861
}
2862
2863
/* Scan the leading line breaks and determine the indentation level if needed. */
2864
2865
if (!yaml_parser_scan_block_scalar_breaks(parser, &indent, &trailing_breaks,
2866
start_mark, &end_mark)) goto error;
2867
2868
/* Scan the block scalar content. */
2869
2870
if (!CACHE(parser, 1)) goto error;
2871
2872
while ((int)parser->mark.column == indent && !(IS_Z(parser->buffer)))
2873
{
2874
/*
2875
* We are at the beginning of a non-empty line.
2876
*/
2877
2878
/* Is it a trailing whitespace? */
2879
2880
trailing_blank = IS_BLANK(parser->buffer);
2881
2882
/* Check if we need to fold the leading line break. */
2883
2884
if (!literal && (*leading_break.start == '\n')
2885
&& !leading_blank && !trailing_blank)
2886
{
2887
/* Do we need to join the lines by space? */
2888
2889
if (*trailing_breaks.start == '\0') {
2890
if (!STRING_EXTEND(parser, string)) goto error;
2891
*(string.pointer ++) = ' ';
2892
}
2893
2894
CLEAR(parser, leading_break);
2895
}
2896
else {
2897
if (!JOIN(parser, string, leading_break)) goto error;
2898
CLEAR(parser, leading_break);
2899
}
2900
2901
/* Append the remaining line breaks. */
2902
2903
if (!JOIN(parser, string, trailing_breaks)) goto error;
2904
CLEAR(parser, trailing_breaks);
2905
2906
/* Is it a leading whitespace? */
2907
2908
leading_blank = IS_BLANK(parser->buffer);
2909
2910
/* Consume the current line. */
2911
2912
while (!IS_BREAKZ(parser->buffer)) {
2913
if (!READ(parser, string)) goto error;
2914
if (!CACHE(parser, 1)) goto error;
2915
}
2916
2917
/* Consume the line break. */
2918
2919
if (!CACHE(parser, 2)) goto error;
2920
2921
if (!READ_LINE(parser, leading_break)) goto error;
2922
2923
/* Eat the following indentation spaces and line breaks. */
2924
2925
if (!yaml_parser_scan_block_scalar_breaks(parser,
2926
&indent, &trailing_breaks, start_mark, &end_mark)) goto error;
2927
}
2928
2929
/* Chomp the tail. */
2930
2931
if (chomping != -1) {
2932
if (!JOIN(parser, string, leading_break)) goto error;
2933
}
2934
if (chomping == 1) {
2935
if (!JOIN(parser, string, trailing_breaks)) goto error;
2936
}
2937
2938
/* Create a token. */
2939
2940
SCALAR_TOKEN_INIT(*token, string.start, string.pointer-string.start,
2941
literal ? YAML_LITERAL_SCALAR_STYLE : YAML_FOLDED_SCALAR_STYLE,
2942
start_mark, end_mark);
2943
2944
STRING_DEL(parser, leading_break);
2945
STRING_DEL(parser, trailing_breaks);
2946
2947
return 1;
2948
2949
error:
2950
STRING_DEL(parser, string);
2951
STRING_DEL(parser, leading_break);
2952
STRING_DEL(parser, trailing_breaks);
2953
2954
return 0;
2955
}
2956
2957
/*
2958
* Scan indentation spaces and line breaks for a block scalar. Determine the
2959
* indentation level if needed.
2960
*/
2961
2962
static int
2963
yaml_parser_scan_block_scalar_breaks(yaml_parser_t *parser,
2964
int *indent, yaml_string_t *breaks,
2965
yaml_mark_t start_mark, yaml_mark_t *end_mark)
2966
{
2967
int max_indent = 0;
2968
2969
*end_mark = parser->mark;
2970
2971
/* Eat the indentation spaces and line breaks. */
2972
2973
while (1)
2974
{
2975
/* Eat the indentation spaces. */
2976
2977
if (!CACHE(parser, 1)) return 0;
2978
2979
while ((!*indent || (int)parser->mark.column < *indent)
2980
&& IS_SPACE(parser->buffer)) {
2981
SKIP(parser);
2982
if (!CACHE(parser, 1)) return 0;
2983
}
2984
2985
if ((int)parser->mark.column > max_indent)
2986
max_indent = (int)parser->mark.column;
2987
2988
/* Check for a tab character messing the indentation. */
2989
2990
if ((!*indent || (int)parser->mark.column < *indent)
2991
&& IS_TAB(parser->buffer)) {
2992
return yaml_parser_set_scanner_error(parser, "while scanning a block scalar",
2993
start_mark, "found a tab character where an indentation space is expected");
2994
}
2995
2996
/* Have we found a non-empty line? */
2997
2998
if (!IS_BREAK(parser->buffer)) break;
2999
3000
/* Consume the line break. */
3001
3002
if (!CACHE(parser, 2)) return 0;
3003
if (!READ_LINE(parser, *breaks)) return 0;
3004
*end_mark = parser->mark;
3005
}
3006
3007
/* Determine the indentation level if needed. */
3008
3009
if (!*indent) {
3010
*indent = max_indent;
3011
if (*indent < parser->indent + 1)
3012
*indent = parser->indent + 1;
3013
if (*indent < 1)
3014
*indent = 1;
3015
}
3016
3017
return 1;
3018
}
3019
3020
/*
3021
* Scan a quoted scalar.
3022
*/
3023
3024
static int
3025
yaml_parser_scan_flow_scalar(yaml_parser_t *parser, yaml_token_t *token,
3026
int single)
3027
{
3028
yaml_mark_t start_mark;
3029
yaml_mark_t end_mark;
3030
yaml_string_t string = NULL_STRING;
3031
yaml_string_t leading_break = NULL_STRING;
3032
yaml_string_t trailing_breaks = NULL_STRING;
3033
yaml_string_t whitespaces = NULL_STRING;
3034
int leading_blanks;
3035
3036
if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error;
3037
if (!STRING_INIT(parser, leading_break, INITIAL_STRING_SIZE)) goto error;
3038
if (!STRING_INIT(parser, trailing_breaks, INITIAL_STRING_SIZE)) goto error;
3039
if (!STRING_INIT(parser, whitespaces, INITIAL_STRING_SIZE)) goto error;
3040
3041
/* Eat the left quote. */
3042
3043
start_mark = parser->mark;
3044
3045
SKIP(parser);
3046
3047
/* Consume the content of the quoted scalar. */
3048
3049
while (1)
3050
{
3051
/* Check that there are no document indicators at the beginning of the line. */
3052
3053
if (!CACHE(parser, 4)) goto error;
3054
3055
if (parser->mark.column == 0 &&
3056
((CHECK_AT(parser->buffer, '-', 0) &&
3057
CHECK_AT(parser->buffer, '-', 1) &&
3058
CHECK_AT(parser->buffer, '-', 2)) ||
3059
(CHECK_AT(parser->buffer, '.', 0) &&
3060
CHECK_AT(parser->buffer, '.', 1) &&
3061
CHECK_AT(parser->buffer, '.', 2))) &&
3062
IS_BLANKZ_AT(parser->buffer, 3))
3063
{
3064
yaml_parser_set_scanner_error(parser, "while scanning a quoted scalar",
3065
start_mark, "found unexpected document indicator");
3066
goto error;
3067
}
3068
3069
/* Check for EOF. */
3070
3071
if (IS_Z(parser->buffer)) {
3072
yaml_parser_set_scanner_error(parser, "while scanning a quoted scalar",
3073
start_mark, "found unexpected end of stream");
3074
goto error;
3075
}
3076
3077
/* Consume non-blank characters. */
3078
3079
if (!CACHE(parser, 2)) goto error;
3080
3081
leading_blanks = 0;
3082
3083
while (!IS_BLANKZ(parser->buffer))
3084
{
3085
/* Check for an escaped single quote. */
3086
3087
if (single && CHECK_AT(parser->buffer, '\'', 0)
3088
&& CHECK_AT(parser->buffer, '\'', 1))
3089
{
3090
if (!STRING_EXTEND(parser, string)) goto error;
3091
*(string.pointer++) = '\'';
3092
SKIP(parser);
3093
SKIP(parser);
3094
}
3095
3096
/* Check for the right quote. */
3097
3098
else if (CHECK(parser->buffer, single ? '\'' : '"'))
3099
{
3100
break;
3101
}
3102
3103
/* Check for an escaped line break. */
3104
3105
else if (!single && CHECK(parser->buffer, '\\')
3106
&& IS_BREAK_AT(parser->buffer, 1))
3107
{
3108
if (!CACHE(parser, 3)) goto error;
3109
SKIP(parser);
3110
SKIP_LINE(parser);
3111
leading_blanks = 1;
3112
break;
3113
}
3114
3115
/* Check for an escape sequence. */
3116
3117
else if (!single && CHECK(parser->buffer, '\\'))
3118
{
3119
size_t code_length = 0;
3120
3121
if (!STRING_EXTEND(parser, string)) goto error;
3122
3123
/* Check the escape character. */
3124
3125
switch (parser->buffer.pointer[1])
3126
{
3127
case '0':
3128
*(string.pointer++) = '\0';
3129
break;
3130
3131
case 'a':
3132
*(string.pointer++) = '\x07';
3133
break;
3134
3135
case 'b':
3136
*(string.pointer++) = '\x08';
3137
break;
3138
3139
case 't':
3140
case '\t':
3141
*(string.pointer++) = '\x09';
3142
break;
3143
3144
case 'n':
3145
*(string.pointer++) = '\x0A';
3146
break;
3147
3148
case 'v':
3149
*(string.pointer++) = '\x0B';
3150
break;
3151
3152
case 'f':
3153
*(string.pointer++) = '\x0C';
3154
break;
3155
3156
case 'r':
3157
*(string.pointer++) = '\x0D';
3158
break;
3159
3160
case 'e':
3161
*(string.pointer++) = '\x1B';
3162
break;
3163
3164
case ' ':
3165
*(string.pointer++) = '\x20';
3166
break;
3167
3168
case '"':
3169
*(string.pointer++) = '"';
3170
break;
3171
3172
case '/':
3173
*(string.pointer++) = '/';
3174
break;
3175
3176
case '\\':
3177
*(string.pointer++) = '\\';
3178
break;
3179
3180
case 'N': /* NEL (#x85) */
3181
*(string.pointer++) = '\xC2';
3182
*(string.pointer++) = '\x85';
3183
break;
3184
3185
case '_': /* #xA0 */
3186
*(string.pointer++) = '\xC2';
3187
*(string.pointer++) = '\xA0';
3188
break;
3189
3190
case 'L': /* LS (#x2028) */
3191
*(string.pointer++) = '\xE2';
3192
*(string.pointer++) = '\x80';
3193
*(string.pointer++) = '\xA8';
3194
break;
3195
3196
case 'P': /* PS (#x2029) */
3197
*(string.pointer++) = '\xE2';
3198
*(string.pointer++) = '\x80';
3199
*(string.pointer++) = '\xA9';
3200
break;
3201
3202
case 'x':
3203
code_length = 2;
3204
break;
3205
3206
case 'u':
3207
code_length = 4;
3208
break;
3209
3210
case 'U':
3211
code_length = 8;
3212
break;
3213
3214
default:
3215
yaml_parser_set_scanner_error(parser, "while parsing a quoted scalar",
3216
start_mark, "found unknown escape character");
3217
goto error;
3218
}
3219
3220
SKIP(parser);
3221
SKIP(parser);
3222
3223
/* Consume an arbitrary escape code. */
3224
3225
if (code_length)
3226
{
3227
unsigned int value = 0;
3228
size_t k;
3229
3230
/* Scan the character value. */
3231
3232
if (!CACHE(parser, code_length)) goto error;
3233
3234
for (k = 0; k < code_length; k ++) {
3235
if (!IS_HEX_AT(parser->buffer, k)) {
3236
yaml_parser_set_scanner_error(parser, "while parsing a quoted scalar",
3237
start_mark, "did not find expected hexdecimal number");
3238
goto error;
3239
}
3240
value = (value << 4) + AS_HEX_AT(parser->buffer, k);
3241
}
3242
3243
/* Check the value and write the character. */
3244
3245
if ((value >= 0xD800 && value <= 0xDFFF) || value > 0x10FFFF) {
3246
yaml_parser_set_scanner_error(parser, "while parsing a quoted scalar",
3247
start_mark, "found invalid Unicode character escape code");
3248
goto error;
3249
}
3250
3251
if (value <= 0x7F) {
3252
*(string.pointer++) = value;
3253
}
3254
else if (value <= 0x7FF) {
3255
*(string.pointer++) = 0xC0 + (value >> 6);
3256
*(string.pointer++) = 0x80 + (value & 0x3F);
3257
}
3258
else if (value <= 0xFFFF) {
3259
*(string.pointer++) = 0xE0 + (value >> 12);
3260
*(string.pointer++) = 0x80 + ((value >> 6) & 0x3F);
3261
*(string.pointer++) = 0x80 + (value & 0x3F);
3262
}
3263
else {
3264
*(string.pointer++) = 0xF0 + (value >> 18);
3265
*(string.pointer++) = 0x80 + ((value >> 12) & 0x3F);
3266
*(string.pointer++) = 0x80 + ((value >> 6) & 0x3F);
3267
*(string.pointer++) = 0x80 + (value & 0x3F);
3268
}
3269
3270
/* Advance the pointer. */
3271
3272
for (k = 0; k < code_length; k ++) {
3273
SKIP(parser);
3274
}
3275
}
3276
}
3277
3278
else
3279
{
3280
/* It is a non-escaped non-blank character. */
3281
3282
if (!READ(parser, string)) goto error;
3283
}
3284
3285
if (!CACHE(parser, 2)) goto error;
3286
}
3287
3288
/* Check if we are at the end of the scalar. */
3289
3290
/* Fix for crash unitialized value crash
3291
* Credit for the bug and input is to OSS Fuzz
3292
* Credit for the fix to Alex Gaynor
3293
*/
3294
if (!CACHE(parser, 1)) goto error;
3295
if (CHECK(parser->buffer, single ? '\'' : '"'))
3296
break;
3297
3298
/* Consume blank characters. */
3299
3300
if (!CACHE(parser, 1)) goto error;
3301
3302
while (IS_BLANK(parser->buffer) || IS_BREAK(parser->buffer))
3303
{
3304
if (IS_BLANK(parser->buffer))
3305
{
3306
/* Consume a space or a tab character. */
3307
3308
if (!leading_blanks) {
3309
if (!READ(parser, whitespaces)) goto error;
3310
}
3311
else {
3312
SKIP(parser);
3313
}
3314
}
3315
else
3316
{
3317
if (!CACHE(parser, 2)) goto error;
3318
3319
/* Check if it is a first line break. */
3320
3321
if (!leading_blanks)
3322
{
3323
CLEAR(parser, whitespaces);
3324
if (!READ_LINE(parser, leading_break)) goto error;
3325
leading_blanks = 1;
3326
}
3327
else
3328
{
3329
if (!READ_LINE(parser, trailing_breaks)) goto error;
3330
}
3331
}
3332
if (!CACHE(parser, 1)) goto error;
3333
}
3334
3335
/* Join the whitespaces or fold line breaks. */
3336
3337
if (leading_blanks)
3338
{
3339
/* Do we need to fold line breaks? */
3340
3341
if (leading_break.start[0] == '\n') {
3342
if (trailing_breaks.start[0] == '\0') {
3343
if (!STRING_EXTEND(parser, string)) goto error;
3344
*(string.pointer++) = ' ';
3345
}
3346
else {
3347
if (!JOIN(parser, string, trailing_breaks)) goto error;
3348
CLEAR(parser, trailing_breaks);
3349
}
3350
CLEAR(parser, leading_break);
3351
}
3352
else {
3353
if (!JOIN(parser, string, leading_break)) goto error;
3354
if (!JOIN(parser, string, trailing_breaks)) goto error;
3355
CLEAR(parser, leading_break);
3356
CLEAR(parser, trailing_breaks);
3357
}
3358
}
3359
else
3360
{
3361
if (!JOIN(parser, string, whitespaces)) goto error;
3362
CLEAR(parser, whitespaces);
3363
}
3364
}
3365
3366
/* Eat the right quote. */
3367
3368
SKIP(parser);
3369
3370
end_mark = parser->mark;
3371
3372
/* Create a token. */
3373
3374
SCALAR_TOKEN_INIT(*token, string.start, string.pointer-string.start,
3375
single ? YAML_SINGLE_QUOTED_SCALAR_STYLE : YAML_DOUBLE_QUOTED_SCALAR_STYLE,
3376
start_mark, end_mark);
3377
3378
STRING_DEL(parser, leading_break);
3379
STRING_DEL(parser, trailing_breaks);
3380
STRING_DEL(parser, whitespaces);
3381
3382
return 1;
3383
3384
error:
3385
STRING_DEL(parser, string);
3386
STRING_DEL(parser, leading_break);
3387
STRING_DEL(parser, trailing_breaks);
3388
STRING_DEL(parser, whitespaces);
3389
3390
return 0;
3391
}
3392
3393
/*
3394
* Scan a plain scalar.
3395
*/
3396
3397
static int
3398
yaml_parser_scan_plain_scalar(yaml_parser_t *parser, yaml_token_t *token)
3399
{
3400
yaml_mark_t start_mark;
3401
yaml_mark_t end_mark;
3402
yaml_string_t string = NULL_STRING;
3403
yaml_string_t leading_break = NULL_STRING;
3404
yaml_string_t trailing_breaks = NULL_STRING;
3405
yaml_string_t whitespaces = NULL_STRING;
3406
int leading_blanks = 0;
3407
int indent = parser->indent+1;
3408
3409
if (!STRING_INIT(parser, string, INITIAL_STRING_SIZE)) goto error;
3410
if (!STRING_INIT(parser, leading_break, INITIAL_STRING_SIZE)) goto error;
3411
if (!STRING_INIT(parser, trailing_breaks, INITIAL_STRING_SIZE)) goto error;
3412
if (!STRING_INIT(parser, whitespaces, INITIAL_STRING_SIZE)) goto error;
3413
3414
start_mark = end_mark = parser->mark;
3415
3416
/* Consume the content of the plain scalar. */
3417
3418
while (1)
3419
{
3420
/* Check for a document indicator. */
3421
3422
if (!CACHE(parser, 4)) goto error;
3423
3424
if (parser->mark.column == 0 &&
3425
((CHECK_AT(parser->buffer, '-', 0) &&
3426
CHECK_AT(parser->buffer, '-', 1) &&
3427
CHECK_AT(parser->buffer, '-', 2)) ||
3428
(CHECK_AT(parser->buffer, '.', 0) &&
3429
CHECK_AT(parser->buffer, '.', 1) &&
3430
CHECK_AT(parser->buffer, '.', 2))) &&
3431
IS_BLANKZ_AT(parser->buffer, 3)) break;
3432
3433
/* Check for a comment. */
3434
3435
if (CHECK(parser->buffer, '#'))
3436
break;
3437
3438
/* Consume non-blank characters. */
3439
3440
while (!IS_BLANKZ(parser->buffer))
3441
{
3442
/* Check for "x:" + one of ',?[]{}' in the flow context. TODO: Fix the test "spec-08-13".
3443
* This is not completely according to the spec
3444
* See http://yaml.org/spec/1.1/#id907281 9.1.3. Plain
3445
*/
3446
3447
if (parser->flow_level
3448
&& CHECK(parser->buffer, ':')
3449
&& (
3450
CHECK_AT(parser->buffer, ',', 1)
3451
|| CHECK_AT(parser->buffer, '?', 1)
3452
|| CHECK_AT(parser->buffer, '[', 1)
3453
|| CHECK_AT(parser->buffer, ']', 1)
3454
|| CHECK_AT(parser->buffer, '{', 1)
3455
|| CHECK_AT(parser->buffer, '}', 1)
3456
)
3457
) {
3458
yaml_parser_set_scanner_error(parser, "while scanning a plain scalar",
3459
start_mark, "found unexpected ':'");
3460
goto error;
3461
}
3462
3463
/* Check for indicators that may end a plain scalar. */
3464
3465
if ((CHECK(parser->buffer, ':') && IS_BLANKZ_AT(parser->buffer, 1))
3466
|| (parser->flow_level &&
3467
(CHECK(parser->buffer, ',')
3468
|| CHECK(parser->buffer, '[')
3469
|| CHECK(parser->buffer, ']') || CHECK(parser->buffer, '{')
3470
|| CHECK(parser->buffer, '}'))))
3471
break;
3472
3473
/* Check if we need to join whitespaces and breaks. */
3474
3475
if (leading_blanks || whitespaces.start != whitespaces.pointer)
3476
{
3477
if (leading_blanks)
3478
{
3479
/* Do we need to fold line breaks? */
3480
3481
if (leading_break.start[0] == '\n') {
3482
if (trailing_breaks.start[0] == '\0') {
3483
if (!STRING_EXTEND(parser, string)) goto error;
3484
*(string.pointer++) = ' ';
3485
}
3486
else {
3487
if (!JOIN(parser, string, trailing_breaks)) goto error;
3488
CLEAR(parser, trailing_breaks);
3489
}
3490
CLEAR(parser, leading_break);
3491
}
3492
else {
3493
if (!JOIN(parser, string, leading_break)) goto error;
3494
if (!JOIN(parser, string, trailing_breaks)) goto error;
3495
CLEAR(parser, leading_break);
3496
CLEAR(parser, trailing_breaks);
3497
}
3498
3499
leading_blanks = 0;
3500
}
3501
else
3502
{
3503
if (!JOIN(parser, string, whitespaces)) goto error;
3504
CLEAR(parser, whitespaces);
3505
}
3506
}
3507
3508
/* Copy the character. */
3509
3510
if (!READ(parser, string)) goto error;
3511
3512
end_mark = parser->mark;
3513
3514
if (!CACHE(parser, 2)) goto error;
3515
}
3516
3517
/* Is it the end? */
3518
3519
if (!(IS_BLANK(parser->buffer) || IS_BREAK(parser->buffer)))
3520
break;
3521
3522
/* Consume blank characters. */
3523
3524
if (!CACHE(parser, 1)) goto error;
3525
3526
while (IS_BLANK(parser->buffer) || IS_BREAK(parser->buffer))
3527
{
3528
if (IS_BLANK(parser->buffer))
3529
{
3530
/* Check for tab characters that abuse indentation. */
3531
3532
if (leading_blanks && (int)parser->mark.column < indent
3533
&& IS_TAB(parser->buffer)) {
3534
yaml_parser_set_scanner_error(parser, "while scanning a plain scalar",
3535
start_mark, "found a tab character that violates indentation");
3536
goto error;
3537
}
3538
3539
/* Consume a space or a tab character. */
3540
3541
if (!leading_blanks) {
3542
if (!READ(parser, whitespaces)) goto error;
3543
}
3544
else {
3545
SKIP(parser);
3546
}
3547
}
3548
else
3549
{
3550
if (!CACHE(parser, 2)) goto error;
3551
3552
/* Check if it is a first line break. */
3553
3554
if (!leading_blanks)
3555
{
3556
CLEAR(parser, whitespaces);
3557
if (!READ_LINE(parser, leading_break)) goto error;
3558
leading_blanks = 1;
3559
}
3560
else
3561
{
3562
if (!READ_LINE(parser, trailing_breaks)) goto error;
3563
}
3564
}
3565
if (!CACHE(parser, 1)) goto error;
3566
}
3567
3568
/* Check indentation level. */
3569
3570
if (!parser->flow_level && (int)parser->mark.column < indent)
3571
break;
3572
}
3573
3574
/* Create a token. */
3575
3576
SCALAR_TOKEN_INIT(*token, string.start, string.pointer-string.start,
3577
YAML_PLAIN_SCALAR_STYLE, start_mark, end_mark);
3578
3579
/* Note that we change the 'simple_key_allowed' flag. */
3580
3581
if (leading_blanks) {
3582
parser->simple_key_allowed = 1;
3583
}
3584
3585
STRING_DEL(parser, leading_break);
3586
STRING_DEL(parser, trailing_breaks);
3587
STRING_DEL(parser, whitespaces);
3588
3589
return 1;
3590
3591
error:
3592
STRING_DEL(parser, string);
3593
STRING_DEL(parser, leading_break);
3594
STRING_DEL(parser, trailing_breaks);
3595
STRING_DEL(parser, whitespaces);
3596
3597
return 0;
3598
}
3599
3600