CoCalc -- dr

GitHub Repository: stenzek/duckstation
Path: blob/master/dep/libchdr/include/dr_libs/dr_flac.h
⁴²⁴⁷ views
1
/*
2
FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
3
dr_flac - v0.12.42 - 2023-11-02
4

5
David Reid - [email protected]
6

7
GitHub: https://github.com/mackron/dr_libs
8
*/
9

10
/*
11
RELEASE NOTES - v0.12.0
12
=======================
13
Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
14

15

16
Improved Client-Defined Memory Allocation
17
-----------------------------------------
18
The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
19
existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
20
allocation callbacks are specified.
21

22
To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
23

24
    void* my_malloc(size_t sz, void* pUserData)
25
    {
26
        return malloc(sz);
27
    }
28
    void* my_realloc(void* p, size_t sz, void* pUserData)
29
    {
30
        return realloc(p, sz);
31
    }
32
    void my_free(void* p, void* pUserData)
33
    {
34
        free(p);
35
    }
36

37
    ...
38

39
    drflac_allocation_callbacks allocationCallbacks;
40
    allocationCallbacks.pUserData = &myData;
41
    allocationCallbacks.onMalloc  = my_malloc;
42
    allocationCallbacks.onRealloc = my_realloc;
43
    allocationCallbacks.onFree    = my_free;
44
    drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
45

46
The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
47

48
Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
49
DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
50

51
Every API that opens a drflac object now takes this extra parameter. These include the following:
52

53
    drflac_open()
54
    drflac_open_relaxed()
55
    drflac_open_with_metadata()
56
    drflac_open_with_metadata_relaxed()
57
    drflac_open_file()
58
    drflac_open_file_with_metadata()
59
    drflac_open_memory()
60
    drflac_open_memory_with_metadata()
61
    drflac_open_and_read_pcm_frames_s32()
62
    drflac_open_and_read_pcm_frames_s16()
63
    drflac_open_and_read_pcm_frames_f32()
64
    drflac_open_file_and_read_pcm_frames_s32()
65
    drflac_open_file_and_read_pcm_frames_s16()
66
    drflac_open_file_and_read_pcm_frames_f32()
67
    drflac_open_memory_and_read_pcm_frames_s32()
68
    drflac_open_memory_and_read_pcm_frames_s16()
69
    drflac_open_memory_and_read_pcm_frames_f32()
70

71

72

73
Optimizations
74
-------------
75
Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
76
improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
77
advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
78
means it will be disabled when DR_FLAC_NO_CRC is used.
79

80
The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
81
particular. 16-bit streams should also see some improvement.
82

83
drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
84
to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
85

86
A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
87
channel reconstruction which is the last part of the decoding process.
88

89
The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
90
compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
91
compile time and the REV instruction requires ARM architecture version 6.
92

93
An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
94

95

96
Removed APIs
97
------------
98
The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
99

100
    drflac_read_s32()                   -> drflac_read_pcm_frames_s32()
101
    drflac_read_s16()                   -> drflac_read_pcm_frames_s16()
102
    drflac_read_f32()                   -> drflac_read_pcm_frames_f32()
103
    drflac_seek_to_sample()             -> drflac_seek_to_pcm_frame()
104
    drflac_open_and_decode_s32()        -> drflac_open_and_read_pcm_frames_s32()
105
    drflac_open_and_decode_s16()        -> drflac_open_and_read_pcm_frames_s16()
106
    drflac_open_and_decode_f32()        -> drflac_open_and_read_pcm_frames_f32()
107
    drflac_open_and_decode_file_s32()   -> drflac_open_file_and_read_pcm_frames_s32()
108
    drflac_open_and_decode_file_s16()   -> drflac_open_file_and_read_pcm_frames_s16()
109
    drflac_open_and_decode_file_f32()   -> drflac_open_file_and_read_pcm_frames_f32()
110
    drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
111
    drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
112
    drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
113

114
Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
115
to the old per-sample APIs. You now need to use the "pcm_frame" versions.
116
*/
117

118

119
/*
120
Introduction
121
============
122
dr_flac is a single file library. To use it, do something like the following in one .c file.
123

124
    ```c
125
    #define DR_FLAC_IMPLEMENTATION
126
    #include "dr_flac.h"
127
    ```
128

129
You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
130

131
    ```c
132
    drflac* pFlac = drflac_open_file("MySong.flac", NULL);
133
    if (pFlac == NULL) {
134
        // Failed to open FLAC file
135
    }
136

137
    drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
138
    drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
139
    ```
140

141
The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
142
should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
143
a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
144

145
You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
146
samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
147

148
    ```c
149
    while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
150
        do_something();
151
    }
152
    ```
153

154
You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
155

156
If you just want to quickly decode an entire FLAC file in one go you can do something like this:
157

158
    ```c
159
    unsigned int channels;
160
    unsigned int sampleRate;
161
    drflac_uint64 totalPCMFrameCount;
162
    drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
163
    if (pSampleData == NULL) {
164
        // Failed to open and decode FLAC file.
165
    }
166

167
    ...
168

169
    drflac_free(pSampleData, NULL);
170
    ```
171

172
You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
173
should be considered lossy.
174

175

176
If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
177
The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
178
reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
179

180
The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
181
streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
182
    
183
    `drflac_open_relaxed()`
184
    `drflac_open_with_metadata_relaxed()`
185

186
It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
187
APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
188

189

190

191
Build Options
192
=============
193
#define these options before including this file.
194

195
#define DR_FLAC_NO_STDIO
196
  Disable `drflac_open_file()` and family.
197

198
#define DR_FLAC_NO_OGG
199
  Disables support for Ogg/FLAC streams.
200

201
#define DR_FLAC_BUFFER_SIZE <number>
202
  Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
203
  Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
204
  you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
205

206
#define DR_FLAC_NO_CRC
207
  Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
208
  be used if available. Otherwise the seek will be performed using brute force.
209

210
#define DR_FLAC_NO_SIMD
211
  Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
212

213
#define DR_FLAC_NO_WCHAR
214
  Disables all functions ending with `_w`. Use this if your compiler does not provide wchar.h. Not required if DR_FLAC_NO_STDIO is also defined.
215

216

217

218
Notes
219
=====
220
- dr_flac does not support changing the sample rate nor channel count mid stream.
221
- dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
222
- When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
223
  to differences in corrupted stream recorvery logic between the two APIs.
224
*/
225

226
#ifndef dr_flac_h
227
#define dr_flac_h
228

229
#ifdef __cplusplus
230
extern "C" {
231
#endif
232

233
#define DRFLAC_STRINGIFY(x)      #x
234
#define DRFLAC_XSTRINGIFY(x)     DRFLAC_STRINGIFY(x)
235

236
#define DRFLAC_VERSION_MAJOR     0
237
#define DRFLAC_VERSION_MINOR     12
238
#define DRFLAC_VERSION_REVISION  42
239
#define DRFLAC_VERSION_STRING    DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
240

241
#include <stddef.h> /* For size_t. */
242

243
/* Sized Types */
244
typedef   signed char           drflac_int8;
245
typedef unsigned char           drflac_uint8;
246
typedef   signed short          drflac_int16;
247
typedef unsigned short          drflac_uint16;
248
typedef   signed int            drflac_int32;
249
typedef unsigned int            drflac_uint32;
250
#if defined(_MSC_VER) && !defined(__clang__)
251
    typedef   signed __int64    drflac_int64;
252
    typedef unsigned __int64    drflac_uint64;
253
#else
254
    #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
255
        #pragma GCC diagnostic push
256
        #pragma GCC diagnostic ignored "-Wlong-long"
257
        #if defined(__clang__)
258
            #pragma GCC diagnostic ignored "-Wc++11-long-long"
259
        #endif
260
    #endif
261
    typedef   signed long long  drflac_int64;
262
    typedef unsigned long long  drflac_uint64;
263
    #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
264
        #pragma GCC diagnostic pop
265
    #endif
266
#endif
267
#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
268
    typedef drflac_uint64       drflac_uintptr;
269
#else
270
    typedef drflac_uint32       drflac_uintptr;
271
#endif
272
typedef drflac_uint8            drflac_bool8;
273
typedef drflac_uint32           drflac_bool32;
274
#define DRFLAC_TRUE             1
275
#define DRFLAC_FALSE            0
276
/* End Sized Types */
277

278
/* Decorations */
279
#if !defined(DRFLAC_API)
280
    #if defined(DRFLAC_DLL)
281
        #if defined(_WIN32)
282
            #define DRFLAC_DLL_IMPORT  __declspec(dllimport)
283
            #define DRFLAC_DLL_EXPORT  __declspec(dllexport)
284
            #define DRFLAC_DLL_PRIVATE static
285
        #else
286
            #if defined(__GNUC__) && __GNUC__ >= 4
287
                #define DRFLAC_DLL_IMPORT  __attribute__((visibility("default")))
288
                #define DRFLAC_DLL_EXPORT  __attribute__((visibility("default")))
289
                #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
290
            #else
291
                #define DRFLAC_DLL_IMPORT
292
                #define DRFLAC_DLL_EXPORT
293
                #define DRFLAC_DLL_PRIVATE static
294
            #endif
295
        #endif
296

297
        #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
298
            #define DRFLAC_API  DRFLAC_DLL_EXPORT
299
        #else
300
            #define DRFLAC_API  DRFLAC_DLL_IMPORT
301
        #endif
302
        #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
303
    #else
304
        #define DRFLAC_API extern
305
        #define DRFLAC_PRIVATE static
306
    #endif
307
#endif
308
/* End Decorations */
309

310
#if defined(_MSC_VER) && _MSC_VER >= 1700   /* Visual Studio 2012 */
311
    #define DRFLAC_DEPRECATED       __declspec(deprecated)
312
#elif (defined(__GNUC__) && __GNUC__ >= 4)  /* GCC 4 */
313
    #define DRFLAC_DEPRECATED       __attribute__((deprecated))
314
#elif defined(__has_feature)                /* Clang */
315
    #if __has_feature(attribute_deprecated)
316
        #define DRFLAC_DEPRECATED   __attribute__((deprecated))
317
    #else
318
        #define DRFLAC_DEPRECATED
319
    #endif
320
#else
321
    #define DRFLAC_DEPRECATED
322
#endif
323

324
DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
325
DRFLAC_API const char* drflac_version_string(void);
326

327
/* Allocation Callbacks */
328
typedef struct
329
{
330
    void* pUserData;
331
    void* (* onMalloc)(size_t sz, void* pUserData);
332
    void* (* onRealloc)(void* p, size_t sz, void* pUserData);
333
    void  (* onFree)(void* p, void* pUserData);
334
} drflac_allocation_callbacks;
335
/* End Allocation Callbacks */
336

337
/*
338
As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
339
but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
340
*/
341
#ifndef DR_FLAC_BUFFER_SIZE
342
#define DR_FLAC_BUFFER_SIZE   4096
343
#endif
344

345

346
/* Architecture Detection */
347
#if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
348
#define DRFLAC_64BIT
349
#endif
350

351
#if defined(__x86_64__) || defined(_M_X64)
352
    #define DRFLAC_X64
353
#elif defined(__i386) || defined(_M_IX86)
354
    #define DRFLAC_X86
355
#elif defined(__arm__) || defined(_M_ARM) || defined(__arm64) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM64)
356
    #define DRFLAC_ARM
357
#endif
358
/* End Architecture Detection */
359

360

361
#ifdef DRFLAC_64BIT
362
typedef drflac_uint64 drflac_cache_t;
363
#else
364
typedef drflac_uint32 drflac_cache_t;
365
#endif
366

367
/* The various metadata block types. */
368
#define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO       0
369
#define DRFLAC_METADATA_BLOCK_TYPE_PADDING          1
370
#define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION      2
371
#define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE        3
372
#define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT   4
373
#define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET         5
374
#define DRFLAC_METADATA_BLOCK_TYPE_PICTURE          6
375
#define DRFLAC_METADATA_BLOCK_TYPE_INVALID          127
376

377
/* The various picture types specified in the PICTURE block. */
378
#define DRFLAC_PICTURE_TYPE_OTHER                   0
379
#define DRFLAC_PICTURE_TYPE_FILE_ICON               1
380
#define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON         2
381
#define DRFLAC_PICTURE_TYPE_COVER_FRONT             3
382
#define DRFLAC_PICTURE_TYPE_COVER_BACK              4
383
#define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE            5
384
#define DRFLAC_PICTURE_TYPE_MEDIA                   6
385
#define DRFLAC_PICTURE_TYPE_LEAD_ARTIST             7
386
#define DRFLAC_PICTURE_TYPE_ARTIST                  8
387
#define DRFLAC_PICTURE_TYPE_CONDUCTOR               9
388
#define DRFLAC_PICTURE_TYPE_BAND                    10
389
#define DRFLAC_PICTURE_TYPE_COMPOSER                11
390
#define DRFLAC_PICTURE_TYPE_LYRICIST                12
391
#define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION      13
392
#define DRFLAC_PICTURE_TYPE_DURING_RECORDING        14
393
#define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE      15
394
#define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE          16
395
#define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH     17
396
#define DRFLAC_PICTURE_TYPE_ILLUSTRATION            18
397
#define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE           19
398
#define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE      20
399

400
typedef enum
401
{
402
    drflac_container_native,
403
    drflac_container_ogg,
404
    drflac_container_unknown
405
} drflac_container;
406

407
typedef enum
408
{
409
    drflac_seek_origin_start,
410
    drflac_seek_origin_current
411
} drflac_seek_origin;
412

413
/* The order of members in this structure is important because we map this directly to the raw data within the SEEKTABLE metadata block. */
414
typedef struct
415
{
416
    drflac_uint64 firstPCMFrame;
417
    drflac_uint64 flacFrameOffset;   /* The offset from the first byte of the header of the first frame. */
418
    drflac_uint16 pcmFrameCount;
419
} drflac_seekpoint;
420

421
typedef struct
422
{
423
    drflac_uint16 minBlockSizeInPCMFrames;
424
    drflac_uint16 maxBlockSizeInPCMFrames;
425
    drflac_uint32 minFrameSizeInPCMFrames;
426
    drflac_uint32 maxFrameSizeInPCMFrames;
427
    drflac_uint32 sampleRate;
428
    drflac_uint8  channels;
429
    drflac_uint8  bitsPerSample;
430
    drflac_uint64 totalPCMFrameCount;
431
    drflac_uint8  md5[16];
432
} drflac_streaminfo;
433

434
typedef struct
435
{
436
    /*
437
    The metadata type. Use this to know how to interpret the data below. Will be set to one of the
438
    DRFLAC_METADATA_BLOCK_TYPE_* tokens.
439
    */
440
    drflac_uint32 type;
441

442
    /*
443
    A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
444
    not modify the contents of this buffer. Use the structures below for more meaningful and structured
445
    information about the metadata. It's possible for this to be null.
446
    */
447
    const void* pRawData;
448

449
    /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
450
    drflac_uint32 rawDataSize;
451

452
    union
453
    {
454
        drflac_streaminfo streaminfo;
455

456
        struct
457
        {
458
            int unused;
459
        } padding;
460

461
        struct
462
        {
463
            drflac_uint32 id;
464
            const void* pData;
465
            drflac_uint32 dataSize;
466
        } application;
467

468
        struct
469
        {
470
            drflac_uint32 seekpointCount;
471
            const drflac_seekpoint* pSeekpoints;
472
        } seektable;
473

474
        struct
475
        {
476
            drflac_uint32 vendorLength;
477
            const char* vendor;
478
            drflac_uint32 commentCount;
479
            const void* pComments;
480
        } vorbis_comment;
481

482
        struct
483
        {
484
            char catalog[128];
485
            drflac_uint64 leadInSampleCount;
486
            drflac_bool32 isCD;
487
            drflac_uint8 trackCount;
488
            const void* pTrackData;
489
        } cuesheet;
490

491
        struct
492
        {
493
            drflac_uint32 type;
494
            drflac_uint32 mimeLength;
495
            const char* mime;
496
            drflac_uint32 descriptionLength;
497
            const char* description;
498
            drflac_uint32 width;
499
            drflac_uint32 height;
500
            drflac_uint32 colorDepth;
501
            drflac_uint32 indexColorCount;
502
            drflac_uint32 pictureDataSize;
503
            const drflac_uint8* pPictureData;
504
        } picture;
505
    } data;
506
} drflac_metadata;
507

508

509
/*
510
Callback for when data needs to be read from the client.
511

512

513
Parameters
514
----------
515
pUserData (in)
516
    The user data that was passed to drflac_open() and family.
517

518
pBufferOut (out)
519
    The output buffer.
520

521
bytesToRead (in)
522
    The number of bytes to read.
523

524

525
Return Value
526
------------
527
The number of bytes actually read.
528

529

530
Remarks
531
-------
532
A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
533
you have reached the end of the stream.
534
*/
535
typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
536

537
/*
538
Callback for when data needs to be seeked.
539

540

541
Parameters
542
----------
543
pUserData (in)
544
    The user data that was passed to drflac_open() and family.
545

546
offset (in)
547
    The number of bytes to move, relative to the origin. Will never be negative.
548

549
origin (in)
550
    The origin of the seek - the current position or the start of the stream.
551

552

553
Return Value
554
------------
555
Whether or not the seek was successful.
556

557

558
Remarks
559
-------
560
The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
561
either drflac_seek_origin_start or drflac_seek_origin_current.
562

563
When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
564
and handled by returning DRFLAC_FALSE.
565
*/
566
typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
567

568
/*
569
Callback for when a metadata block is read.
570

571

572
Parameters
573
----------
574
pUserData (in)
575
    The user data that was passed to drflac_open() and family.
576

577
pMetadata (in)
578
    A pointer to a structure containing the data of the metadata block.
579

580

581
Remarks
582
-------
583
Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
584
will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
585
*/
586
typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
587

588

589
/* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
590
typedef struct
591
{
592
    const drflac_uint8* data;
593
    size_t dataSize;
594
    size_t currentReadPos;
595
} drflac__memory_stream;
596

597
/* Structure for internal use. Used for bit streaming. */
598
typedef struct
599
{
600
    /* The function to call when more data needs to be read. */
601
    drflac_read_proc onRead;
602

603
    /* The function to call when the current read position needs to be moved. */
604
    drflac_seek_proc onSeek;
605

606
    /* The user data to pass around to onRead and onSeek. */
607
    void* pUserData;
608

609

610
    /*
611
    The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
612
    stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
613
    or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
614
    */
615
    size_t unalignedByteCount;
616

617
    /* The content of the unaligned bytes. */
618
    drflac_cache_t unalignedCache;
619

620
    /* The index of the next valid cache line in the "L2" cache. */
621
    drflac_uint32 nextL2Line;
622

623
    /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
624
    drflac_uint32 consumedBits;
625

626
    /*
627
    The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
628
    Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
629
    */
630
    drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
631
    drflac_cache_t cache;
632

633
    /*
634
    CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
635
    is reset to 0 at the beginning of each frame.
636
    */
637
    drflac_uint16 crc16;
638
    drflac_cache_t crc16Cache;              /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
639
    drflac_uint32 crc16CacheIgnoredBytes;   /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
640
} drflac_bs;
641

642
typedef struct
643
{
644
    /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
645
    drflac_uint8 subframeType;
646

647
    /* The number of wasted bits per sample as specified by the sub-frame header. */
648
    drflac_uint8 wastedBitsPerSample;
649

650
    /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
651
    drflac_uint8 lpcOrder;
652

653
    /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
654
    drflac_int32* pSamplesS32;
655
} drflac_subframe;
656

657
typedef struct
658
{
659
    /*
660
    If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
661
    always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
662
    */
663
    drflac_uint64 pcmFrameNumber;
664

665
    /*
666
    If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
667
    is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
668
    */
669
    drflac_uint32 flacFrameNumber;
670

671
    /* The sample rate of this frame. */
672
    drflac_uint32 sampleRate;
673

674
    /* The number of PCM frames in each sub-frame within this frame. */
675
    drflac_uint16 blockSizeInPCMFrames;
676

677
    /*
678
    The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
679
    will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
680
    */
681
    drflac_uint8 channelAssignment;
682

683
    /* The number of bits per sample within this frame. */
684
    drflac_uint8 bitsPerSample;
685

686
    /* The frame's CRC. */
687
    drflac_uint8 crc8;
688
} drflac_frame_header;
689

690
typedef struct
691
{
692
    /* The header. */
693
    drflac_frame_header header;
694

695
    /*
696
    The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
697
    this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
698
    */
699
    drflac_uint32 pcmFramesRemaining;
700

701
    /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
702
    drflac_subframe subframes[8];
703
} drflac_frame;
704

705
typedef struct
706
{
707
    /* The function to call when a metadata block is read. */
708
    drflac_meta_proc onMeta;
709

710
    /* The user data posted to the metadata callback function. */
711
    void* pUserDataMD;
712

713
    /* Memory allocation callbacks. */
714
    drflac_allocation_callbacks allocationCallbacks;
715

716

717
    /* The sample rate. Will be set to something like 44100. */
718
    drflac_uint32 sampleRate;
719

720
    /*
721
    The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
722
    value specified in the STREAMINFO block.
723
    */
724
    drflac_uint8 channels;
725

726
    /* The bits per sample. Will be set to something like 16, 24, etc. */
727
    drflac_uint8 bitsPerSample;
728

729
    /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
730
    drflac_uint16 maxBlockSizeInPCMFrames;
731

732
    /*
733
    The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
734
    the total PCM frame count is unknown. Likely the case with streams like internet radio.
735
    */
736
    drflac_uint64 totalPCMFrameCount;
737

738

739
    /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
740
    drflac_container container;
741

742
    /* The number of seekpoints in the seektable. */
743
    drflac_uint32 seekpointCount;
744

745

746
    /* Information about the frame the decoder is currently sitting on. */
747
    drflac_frame currentFLACFrame;
748

749

750
    /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
751
    drflac_uint64 currentPCMFrame;
752

753
    /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
754
    drflac_uint64 firstFLACFramePosInBytes;
755

756

757
    /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
758
    drflac__memory_stream memoryStream;
759

760

761
    /* A pointer to the decoded sample data. This is an offset of pExtraData. */
762
    drflac_int32* pDecodedSamples;
763

764
    /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
765
    drflac_seekpoint* pSeekpoints;
766

767
    /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
768
    void* _oggbs;
769

770
    /* Internal use only. Used for profiling and testing different seeking modes. */
771
    drflac_bool32 _noSeekTableSeek    : 1;
772
    drflac_bool32 _noBinarySearchSeek : 1;
773
    drflac_bool32 _noBruteForceSeek   : 1;
774

775
    /* The bit streamer. The raw FLAC data is fed through this object. */
776
    drflac_bs bs;
777

778
    /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
779
    drflac_uint8 pExtraData[1];
780
} drflac;
781

782

783
/*
784
Opens a FLAC decoder.
785

786

787
Parameters
788
----------
789
onRead (in)
790
    The function to call when data needs to be read from the client.
791

792
onSeek (in)
793
    The function to call when the read position of the client data needs to move.
794

795
pUserData (in, optional)
796
    A pointer to application defined data that will be passed to onRead and onSeek.
797

798
pAllocationCallbacks (in, optional)
799
    A pointer to application defined callbacks for managing memory allocations.
800

801

802
Return Value
803
------------
804
Returns a pointer to an object representing the decoder.
805

806

807
Remarks
808
-------
809
Close the decoder with `drflac_close()`.
810

811
`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
812

813
This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
814
without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
815

816
This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
817
from a block of memory respectively.
818

819
The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
820

821
Use `drflac_open_with_metadata()` if you need access to metadata.
822

823

824
Seek Also
825
---------
826
drflac_open_file()
827
drflac_open_memory()
828
drflac_open_with_metadata()
829
drflac_close()
830
*/
831
DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
832

833
/*
834
Opens a FLAC stream with relaxed validation of the header block.
835

836

837
Parameters
838
----------
839
onRead (in)
840
    The function to call when data needs to be read from the client.
841

842
onSeek (in)
843
    The function to call when the read position of the client data needs to move.
844

845
container (in)
846
    Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
847

848
pUserData (in, optional)
849
    A pointer to application defined data that will be passed to onRead and onSeek.
850

851
pAllocationCallbacks (in, optional)
852
    A pointer to application defined callbacks for managing memory allocations.
853

854

855
Return Value
856
------------
857
A pointer to an object representing the decoder.
858

859

860
Remarks
861
-------
862
The same as drflac_open(), except attempts to open the stream even when a header block is not present.
863

864
Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
865
as that is for internal use only.
866

867
Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
868
force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
869

870
Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
871
*/
872
DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
873

874
/*
875
Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
876

877

878
Parameters
879
----------
880
onRead (in)
881
    The function to call when data needs to be read from the client.
882

883
onSeek (in)
884
    The function to call when the read position of the client data needs to move.
885

886
onMeta (in)
887
    The function to call for every metadata block.
888

889
pUserData (in, optional)
890
    A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
891

892
pAllocationCallbacks (in, optional)
893
    A pointer to application defined callbacks for managing memory allocations.
894

895

896
Return Value
897
------------
898
A pointer to an object representing the decoder.
899

900

901
Remarks
902
-------
903
Close the decoder with `drflac_close()`.
904

905
`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
906

907
This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
908
metadata block except for STREAMINFO and PADDING blocks.
909

910
The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
911
pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
912
the different metadata types.
913

914
The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
915

916
Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
917
the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
918
metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
919
returned depending on whether or not the stream is being opened with metadata.
920

921

922
Seek Also
923
---------
924
drflac_open_file_with_metadata()
925
drflac_open_memory_with_metadata()
926
drflac_open()
927
drflac_close()
928
*/
929
DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
930

931
/*
932
The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
933

934
See Also
935
--------
936
drflac_open_with_metadata()
937
drflac_open_relaxed()
938
*/
939
DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
940

941
/*
942
Closes the given FLAC decoder.
943

944

945
Parameters
946
----------
947
pFlac (in)
948
    The decoder to close.
949

950

951
Remarks
952
-------
953
This will destroy the decoder object.
954

955

956
See Also
957
--------
958
drflac_open()
959
drflac_open_with_metadata()
960
drflac_open_file()
961
drflac_open_file_w()
962
drflac_open_file_with_metadata()
963
drflac_open_file_with_metadata_w()
964
drflac_open_memory()
965
drflac_open_memory_with_metadata()
966
*/
967
DRFLAC_API void drflac_close(drflac* pFlac);
968

969

970
/*
971
Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
972

973

974
Parameters
975
----------
976
pFlac (in)
977
    The decoder.
978

979
framesToRead (in)
980
    The number of PCM frames to read.
981

982
pBufferOut (out, optional)
983
    A pointer to the buffer that will receive the decoded samples.
984

985

986
Return Value
987
------------
988
Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
989

990

991
Remarks
992
-------
993
pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
994
*/
995
DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
996

997

998
/*
999
Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
1000

1001

1002
Parameters
1003
----------
1004
pFlac (in)
1005
    The decoder.
1006

1007
framesToRead (in)
1008
    The number of PCM frames to read.
1009

1010
pBufferOut (out, optional)
1011
    A pointer to the buffer that will receive the decoded samples.
1012

1013

1014
Return Value
1015
------------
1016
Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1017

1018

1019
Remarks
1020
-------
1021
pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1022

1023
Note that this is lossy for streams where the bits per sample is larger than 16.
1024
*/
1025
DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
1026

1027
/*
1028
Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
1029

1030

1031
Parameters
1032
----------
1033
pFlac (in)
1034
    The decoder.
1035

1036
framesToRead (in)
1037
    The number of PCM frames to read.
1038

1039
pBufferOut (out, optional)
1040
    A pointer to the buffer that will receive the decoded samples.
1041

1042

1043
Return Value
1044
------------
1045
Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1046

1047

1048
Remarks
1049
-------
1050
pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1051

1052
Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
1053
*/
1054
DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
1055

1056
/*
1057
Seeks to the PCM frame at the given index.
1058

1059

1060
Parameters
1061
----------
1062
pFlac (in)
1063
    The decoder.
1064

1065
pcmFrameIndex (in)
1066
    The index of the PCM frame to seek to. See notes below.
1067

1068

1069
Return Value
1070
-------------
1071
`DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
1072
*/
1073
DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
1074

1075

1076

1077
#ifndef DR_FLAC_NO_STDIO
1078
/*
1079
Opens a FLAC decoder from the file at the given path.
1080

1081

1082
Parameters
1083
----------
1084
pFileName (in)
1085
    The path of the file to open, either absolute or relative to the current directory.
1086

1087
pAllocationCallbacks (in, optional)
1088
    A pointer to application defined callbacks for managing memory allocations.
1089

1090

1091
Return Value
1092
------------
1093
A pointer to an object representing the decoder.
1094

1095

1096
Remarks
1097
-------
1098
Close the decoder with drflac_close().
1099

1100

1101
Remarks
1102
-------
1103
This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
1104
at any given time, so keep this mind if you have many decoders open at the same time.
1105

1106

1107
See Also
1108
--------
1109
drflac_open_file_with_metadata()
1110
drflac_open()
1111
drflac_close()
1112
*/
1113
DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1114
DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1115

1116
/*
1117
Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
1118

1119

1120
Parameters
1121
----------
1122
pFileName (in)
1123
    The path of the file to open, either absolute or relative to the current directory.
1124

1125
pAllocationCallbacks (in, optional)
1126
    A pointer to application defined callbacks for managing memory allocations.
1127

1128
onMeta (in)
1129
    The callback to fire for each metadata block.
1130

1131
pUserData (in)
1132
    A pointer to the user data to pass to the metadata callback.
1133

1134
pAllocationCallbacks (in)
1135
    A pointer to application defined callbacks for managing memory allocations.
1136

1137

1138
Remarks
1139
-------
1140
Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1141

1142

1143
See Also
1144
--------
1145
drflac_open_with_metadata()
1146
drflac_open()
1147
drflac_close()
1148
*/
1149
DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1150
DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1151
#endif
1152

1153
/*
1154
Opens a FLAC decoder from a pre-allocated block of memory
1155

1156

1157
Parameters
1158
----------
1159
pData (in)
1160
    A pointer to the raw encoded FLAC data.
1161

1162
dataSize (in)
1163
    The size in bytes of `data`.
1164

1165
pAllocationCallbacks (in)
1166
    A pointer to application defined callbacks for managing memory allocations.
1167

1168

1169
Return Value
1170
------------
1171
A pointer to an object representing the decoder.
1172

1173

1174
Remarks
1175
-------
1176
This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
1177

1178

1179
See Also
1180
--------
1181
drflac_open()
1182
drflac_close()
1183
*/
1184
DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
1185

1186
/*
1187
Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
1188

1189

1190
Parameters
1191
----------
1192
pData (in)
1193
    A pointer to the raw encoded FLAC data.
1194

1195
dataSize (in)
1196
    The size in bytes of `data`.
1197

1198
onMeta (in)
1199
    The callback to fire for each metadata block.
1200

1201
pUserData (in)
1202
    A pointer to the user data to pass to the metadata callback.
1203

1204
pAllocationCallbacks (in)
1205
    A pointer to application defined callbacks for managing memory allocations.
1206

1207

1208
Remarks
1209
-------
1210
Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1211

1212

1213
See Also
1214
-------
1215
drflac_open_with_metadata()
1216
drflac_open()
1217
drflac_close()
1218
*/
1219
DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1220

1221

1222

1223
/* High Level APIs */
1224

1225
/*
1226
Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
1227
pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
1228

1229
You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
1230
case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
1231

1232
Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
1233
read samples into a dynamically sized buffer on the heap until no samples are left.
1234

1235
Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
1236
*/
1237
DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1238

1239
/* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1240
DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1241

1242
/* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1243
DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1244

1245
#ifndef DR_FLAC_NO_STDIO
1246
/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
1247
DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1248

1249
/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1250
DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1251

1252
/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1253
DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1254
#endif
1255

1256
/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
1257
DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1258

1259
/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1260
DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1261

1262
/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1263
DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1264

1265
/*
1266
Frees memory that was allocated internally by dr_flac.
1267

1268
Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
1269
*/
1270
DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
1271

1272

1273
/* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
1274
typedef struct
1275
{
1276
    drflac_uint32 countRemaining;
1277
    const char* pRunningData;
1278
} drflac_vorbis_comment_iterator;
1279

1280
/*
1281
Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
1282
metadata block.
1283
*/
1284
DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
1285

1286
/*
1287
Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
1288
returned string is NOT null terminated.
1289
*/
1290
DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
1291

1292

1293
/* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
1294
typedef struct
1295
{
1296
    drflac_uint32 countRemaining;
1297
    const char* pRunningData;
1298
} drflac_cuesheet_track_iterator;
1299

1300
/* The order of members here is important because we map this directly to the raw data within the CUESHEET metadata block. */
1301
typedef struct
1302
{
1303
    drflac_uint64 offset;
1304
    drflac_uint8 index;
1305
    drflac_uint8 reserved[3];
1306
} drflac_cuesheet_track_index;
1307

1308
typedef struct
1309
{
1310
    drflac_uint64 offset;
1311
    drflac_uint8 trackNumber;
1312
    char ISRC[12];
1313
    drflac_bool8 isAudio;
1314
    drflac_bool8 preEmphasis;
1315
    drflac_uint8 indexCount;
1316
    const drflac_cuesheet_track_index* pIndexPoints;
1317
} drflac_cuesheet_track;
1318

1319
/*
1320
Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
1321
block.
1322
*/
1323
DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
1324

1325
/* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
1326
DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
1327

1328

1329
#ifdef __cplusplus
1330
}
1331
#endif
1332
#endif  /* dr_flac_h */
1333

1334

1335
/************************************************************************************************************************************************************
1336
 ************************************************************************************************************************************************************
1337

1338
 IMPLEMENTATION
1339

1340
 ************************************************************************************************************************************************************
1341
 ************************************************************************************************************************************************************/
1342
#if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
1343
#ifndef dr_flac_c
1344
#define dr_flac_c
1345

1346
/* Disable some annoying warnings. */
1347
#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
1348
    #pragma GCC diagnostic push
1349
    #if __GNUC__ >= 7
1350
    #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
1351
    #endif
1352
#endif
1353

1354
#ifdef __linux__
1355
    #ifndef _BSD_SOURCE
1356
        #define _BSD_SOURCE
1357
    #endif
1358
    #ifndef _DEFAULT_SOURCE
1359
        #define _DEFAULT_SOURCE
1360
    #endif
1361
    #ifndef __USE_BSD
1362
        #define __USE_BSD
1363
    #endif
1364
    #include <endian.h>
1365
#endif
1366

1367
#include <stdlib.h>
1368
#include <string.h>
1369

1370
/* Inline */
1371
#ifdef _MSC_VER
1372
    #define DRFLAC_INLINE __forceinline
1373
#elif defined(__GNUC__)
1374
    /*
1375
    I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
1376
    the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
1377
    case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
1378
    command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
1379
    I am using "__inline__" only when we're compiling in strict ANSI mode.
1380
    */
1381
    #if defined(__STRICT_ANSI__)
1382
        #define DRFLAC_GNUC_INLINE_HINT __inline__
1383
    #else
1384
        #define DRFLAC_GNUC_INLINE_HINT inline
1385
    #endif
1386

1387
    #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__)
1388
        #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT __attribute__((always_inline))
1389
    #else
1390
        #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT
1391
    #endif
1392
#elif defined(__WATCOMC__)
1393
    #define DRFLAC_INLINE __inline
1394
#else
1395
    #define DRFLAC_INLINE
1396
#endif
1397
/* End Inline */
1398

1399
/*
1400
Intrinsics Support
1401

1402
There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
1403

1404
    "error: shift must be an immediate"
1405

1406
Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
1407
*/
1408
#if !defined(DR_FLAC_NO_SIMD)
1409
    #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
1410
        #if defined(_MSC_VER) && !defined(__clang__)
1411
            /* MSVC. */
1412
            #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2)    /* 2005 */
1413
                #define DRFLAC_SUPPORT_SSE2
1414
            #endif
1415
            #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41)   /* 2010 */
1416
                #define DRFLAC_SUPPORT_SSE41
1417
            #endif
1418
        #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
1419
            /* Assume GNUC-style. */
1420
            #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
1421
                #define DRFLAC_SUPPORT_SSE2
1422
            #endif
1423
            #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
1424
                #define DRFLAC_SUPPORT_SSE41
1425
            #endif
1426
        #endif
1427

1428
        /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
1429
        #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1430
            #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
1431
                #define DRFLAC_SUPPORT_SSE2
1432
            #endif
1433
            #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
1434
                #define DRFLAC_SUPPORT_SSE41
1435
            #endif
1436
        #endif
1437

1438
        #if defined(DRFLAC_SUPPORT_SSE41)
1439
            #include <smmintrin.h>
1440
        #elif defined(DRFLAC_SUPPORT_SSE2)
1441
            #include <emmintrin.h>
1442
        #endif
1443
    #endif
1444

1445
    #if defined(DRFLAC_ARM)
1446
        #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1447
            #define DRFLAC_SUPPORT_NEON
1448
            #include <arm_neon.h>
1449
        #endif
1450
    #endif
1451
#endif
1452

1453
/* Compile-time CPU feature support. */
1454
#if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
1455
    #if defined(_MSC_VER) && !defined(__clang__)
1456
        #if _MSC_VER >= 1400
1457
            #include <intrin.h>
1458
            static void drflac__cpuid(int info[4], int fid)
1459
            {
1460
                __cpuid(info, fid);
1461
            }
1462
        #else
1463
            #define DRFLAC_NO_CPUID
1464
        #endif
1465
    #else
1466
        #if defined(__GNUC__) || defined(__clang__)
1467
            static void drflac__cpuid(int info[4], int fid)
1468
            {
1469
                /*
1470
                It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
1471
                specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
1472
                supporting different assembly dialects.
1473

1474
                What's basically happening is that we're saving and restoring the ebx register manually.
1475
                */
1476
                #if defined(DRFLAC_X86) && defined(__PIC__)
1477
                    __asm__ __volatile__ (
1478
                        "xchg{l} {%%}ebx, %k1;"
1479
                        "cpuid;"
1480
                        "xchg{l} {%%}ebx, %k1;"
1481
                        : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1482
                    );
1483
                #else
1484
                    __asm__ __volatile__ (
1485
                        "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1486
                    );
1487
                #endif
1488
            }
1489
        #else
1490
            #define DRFLAC_NO_CPUID
1491
        #endif
1492
    #endif
1493
#else
1494
    #define DRFLAC_NO_CPUID
1495
#endif
1496

1497
static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
1498
{
1499
#if defined(DRFLAC_SUPPORT_SSE2)
1500
    #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
1501
        #if defined(DRFLAC_X64)
1502
            return DRFLAC_TRUE;    /* 64-bit targets always support SSE2. */
1503
        #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
1504
            return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
1505
        #else
1506
            #if defined(DRFLAC_NO_CPUID)
1507
                return DRFLAC_FALSE;
1508
            #else
1509
                int info[4];
1510
                drflac__cpuid(info, 1);
1511
                return (info[3] & (1 << 26)) != 0;
1512
            #endif
1513
        #endif
1514
    #else
1515
        return DRFLAC_FALSE;       /* SSE2 is only supported on x86 and x64 architectures. */
1516
    #endif
1517
#else
1518
    return DRFLAC_FALSE;           /* No compiler support. */
1519
#endif
1520
}
1521

1522
static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
1523
{
1524
#if defined(DRFLAC_SUPPORT_SSE41)
1525
    #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
1526
        #if defined(__SSE4_1__) || defined(__AVX__)
1527
            return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
1528
        #else
1529
            #if defined(DRFLAC_NO_CPUID)
1530
                return DRFLAC_FALSE;
1531
            #else
1532
                int info[4];
1533
                drflac__cpuid(info, 1);
1534
                return (info[2] & (1 << 19)) != 0;
1535
            #endif
1536
        #endif
1537
    #else
1538
        return DRFLAC_FALSE;       /* SSE41 is only supported on x86 and x64 architectures. */
1539
    #endif
1540
#else
1541
    return DRFLAC_FALSE;           /* No compiler support. */
1542
#endif
1543
}
1544

1545

1546
#if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
1547
    #define DRFLAC_HAS_LZCNT_INTRINSIC
1548
#elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
1549
    #define DRFLAC_HAS_LZCNT_INTRINSIC
1550
#elif defined(__clang__)
1551
    #if defined(__has_builtin)
1552
        #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
1553
            #define DRFLAC_HAS_LZCNT_INTRINSIC
1554
        #endif
1555
    #endif
1556
#endif
1557

1558
#if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
1559
    #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1560
    #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1561
    #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1562
#elif defined(__clang__)
1563
    #if defined(__has_builtin)
1564
        #if __has_builtin(__builtin_bswap16)
1565
            #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1566
        #endif
1567
        #if __has_builtin(__builtin_bswap32)
1568
            #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1569
        #endif
1570
        #if __has_builtin(__builtin_bswap64)
1571
            #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1572
        #endif
1573
    #endif
1574
#elif defined(__GNUC__)
1575
    #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
1576
        #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1577
        #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1578
    #endif
1579
    #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
1580
        #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1581
    #endif
1582
#elif defined(__WATCOMC__) && defined(__386__)
1583
    #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1584
    #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1585
    #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1586
    extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
1587
    extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
1588
    extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
1589
#pragma aux _watcom_bswap16 = \
1590
    "xchg al, ah" \
1591
    parm  [ax]    \
1592
    value [ax]    \
1593
    modify nomemory;
1594
#pragma aux _watcom_bswap32 = \
1595
    "bswap eax" \
1596
    parm  [eax] \
1597
    value [eax] \
1598
    modify nomemory;
1599
#pragma aux _watcom_bswap64 = \
1600
    "bswap eax"     \
1601
    "bswap edx"     \
1602
    "xchg eax,edx"  \
1603
    parm [eax edx]  \
1604
    value [eax edx] \
1605
    modify nomemory;
1606
#endif
1607

1608

1609
/* Standard library stuff. */
1610
#ifndef DRFLAC_ASSERT
1611
#include <assert.h>
1612
#define DRFLAC_ASSERT(expression)           assert(expression)
1613
#endif
1614
#ifndef DRFLAC_MALLOC
1615
#define DRFLAC_MALLOC(sz)                   malloc((sz))
1616
#endif
1617
#ifndef DRFLAC_REALLOC
1618
#define DRFLAC_REALLOC(p, sz)               realloc((p), (sz))
1619
#endif
1620
#ifndef DRFLAC_FREE
1621
#define DRFLAC_FREE(p)                      free((p))
1622
#endif
1623
#ifndef DRFLAC_COPY_MEMORY
1624
#define DRFLAC_COPY_MEMORY(dst, src, sz)    memcpy((dst), (src), (sz))
1625
#endif
1626
#ifndef DRFLAC_ZERO_MEMORY
1627
#define DRFLAC_ZERO_MEMORY(p, sz)           memset((p), 0, (sz))
1628
#endif
1629
#ifndef DRFLAC_ZERO_OBJECT
1630
#define DRFLAC_ZERO_OBJECT(p)               DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
1631
#endif
1632

1633
#define DRFLAC_MAX_SIMD_VECTOR_SIZE                     64  /* 64 for AVX-512 in the future. */
1634

1635
/* Result Codes */
1636
typedef drflac_int32 drflac_result;
1637
#define DRFLAC_SUCCESS                                   0
1638
#define DRFLAC_ERROR                                    -1   /* A generic error. */
1639
#define DRFLAC_INVALID_ARGS                             -2
1640
#define DRFLAC_INVALID_OPERATION                        -3
1641
#define DRFLAC_OUT_OF_MEMORY                            -4
1642
#define DRFLAC_OUT_OF_RANGE                             -5
1643
#define DRFLAC_ACCESS_DENIED                            -6
1644
#define DRFLAC_DOES_NOT_EXIST                           -7
1645
#define DRFLAC_ALREADY_EXISTS                           -8
1646
#define DRFLAC_TOO_MANY_OPEN_FILES                      -9
1647
#define DRFLAC_INVALID_FILE                             -10
1648
#define DRFLAC_TOO_BIG                                  -11
1649
#define DRFLAC_PATH_TOO_LONG                            -12
1650
#define DRFLAC_NAME_TOO_LONG                            -13
1651
#define DRFLAC_NOT_DIRECTORY                            -14
1652
#define DRFLAC_IS_DIRECTORY                             -15
1653
#define DRFLAC_DIRECTORY_NOT_EMPTY                      -16
1654
#define DRFLAC_END_OF_FILE                              -17
1655
#define DRFLAC_NO_SPACE                                 -18
1656
#define DRFLAC_BUSY                                     -19
1657
#define DRFLAC_IO_ERROR                                 -20
1658
#define DRFLAC_INTERRUPT                                -21
1659
#define DRFLAC_UNAVAILABLE                              -22
1660
#define DRFLAC_ALREADY_IN_USE                           -23
1661
#define DRFLAC_BAD_ADDRESS                              -24
1662
#define DRFLAC_BAD_SEEK                                 -25
1663
#define DRFLAC_BAD_PIPE                                 -26
1664
#define DRFLAC_DEADLOCK                                 -27
1665
#define DRFLAC_TOO_MANY_LINKS                           -28
1666
#define DRFLAC_NOT_IMPLEMENTED                          -29
1667
#define DRFLAC_NO_MESSAGE                               -30
1668
#define DRFLAC_BAD_MESSAGE                              -31
1669
#define DRFLAC_NO_DATA_AVAILABLE                        -32
1670
#define DRFLAC_INVALID_DATA                             -33
1671
#define DRFLAC_TIMEOUT                                  -34
1672
#define DRFLAC_NO_NETWORK                               -35
1673
#define DRFLAC_NOT_UNIQUE                               -36
1674
#define DRFLAC_NOT_SOCKET                               -37
1675
#define DRFLAC_NO_ADDRESS                               -38
1676
#define DRFLAC_BAD_PROTOCOL                             -39
1677
#define DRFLAC_PROTOCOL_UNAVAILABLE                     -40
1678
#define DRFLAC_PROTOCOL_NOT_SUPPORTED                   -41
1679
#define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED            -42
1680
#define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED             -43
1681
#define DRFLAC_SOCKET_NOT_SUPPORTED                     -44
1682
#define DRFLAC_CONNECTION_RESET                         -45
1683
#define DRFLAC_ALREADY_CONNECTED                        -46
1684
#define DRFLAC_NOT_CONNECTED                            -47
1685
#define DRFLAC_CONNECTION_REFUSED                       -48
1686
#define DRFLAC_NO_HOST                                  -49
1687
#define DRFLAC_IN_PROGRESS                              -50
1688
#define DRFLAC_CANCELLED                                -51
1689
#define DRFLAC_MEMORY_ALREADY_MAPPED                    -52
1690
#define DRFLAC_AT_END                                   -53
1691

1692
#define DRFLAC_CRC_MISMATCH                             -100
1693
/* End Result Codes */
1694

1695

1696
#define DRFLAC_SUBFRAME_CONSTANT                        0
1697
#define DRFLAC_SUBFRAME_VERBATIM                        1
1698
#define DRFLAC_SUBFRAME_FIXED                           8
1699
#define DRFLAC_SUBFRAME_LPC                             32
1700
#define DRFLAC_SUBFRAME_RESERVED                        255
1701

1702
#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE  0
1703
#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
1704

1705
#define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT           0
1706
#define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE             8
1707
#define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE            9
1708
#define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE              10
1709

1710
#define DRFLAC_SEEKPOINT_SIZE_IN_BYTES                  18
1711
#define DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES             36
1712
#define DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES       12
1713

1714
#define drflac_align(x, a)                              ((((x) + (a) - 1) / (a)) * (a))
1715

1716

1717
DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
1718
{
1719
    if (pMajor) {
1720
        *pMajor = DRFLAC_VERSION_MAJOR;
1721
    }
1722

1723
    if (pMinor) {
1724
        *pMinor = DRFLAC_VERSION_MINOR;
1725
    }
1726

1727
    if (pRevision) {
1728
        *pRevision = DRFLAC_VERSION_REVISION;
1729
    }
1730
}
1731

1732
DRFLAC_API const char* drflac_version_string(void)
1733
{
1734
    return DRFLAC_VERSION_STRING;
1735
}
1736

1737

1738
/* CPU caps. */
1739
#if defined(__has_feature)
1740
    #if __has_feature(thread_sanitizer)
1741
        #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
1742
    #else
1743
        #define DRFLAC_NO_THREAD_SANITIZE
1744
    #endif
1745
#else
1746
    #define DRFLAC_NO_THREAD_SANITIZE
1747
#endif
1748

1749
#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1750
static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
1751
#endif
1752

1753
#ifndef DRFLAC_NO_CPUID
1754
static drflac_bool32 drflac__gIsSSE2Supported  = DRFLAC_FALSE;
1755
static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
1756

1757
/*
1758
I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
1759
actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
1760
complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
1761
just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
1762
*/
1763
DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1764
{
1765
    static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
1766

1767
    if (!isCPUCapsInitialized) {
1768
        /* LZCNT */
1769
#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1770
        int info[4] = {0};
1771
        drflac__cpuid(info, 0x80000001);
1772
        drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
1773
#endif
1774

1775
        /* SSE2 */
1776
        drflac__gIsSSE2Supported = drflac_has_sse2();
1777

1778
        /* SSE4.1 */
1779
        drflac__gIsSSE41Supported = drflac_has_sse41();
1780

1781
        /* Initialized. */
1782
        isCPUCapsInitialized = DRFLAC_TRUE;
1783
    }
1784
}
1785
#else
1786
static drflac_bool32 drflac__gIsNEONSupported  = DRFLAC_FALSE;
1787

1788
static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
1789
{
1790
#if defined(DRFLAC_SUPPORT_NEON)
1791
    #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
1792
        #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1793
            return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate NEON code we can assume support. */
1794
        #else
1795
            /* TODO: Runtime check. */
1796
            return DRFLAC_FALSE;
1797
        #endif
1798
    #else
1799
        return DRFLAC_FALSE;       /* NEON is only supported on ARM architectures. */
1800
    #endif
1801
#else
1802
    return DRFLAC_FALSE;           /* No compiler support. */
1803
#endif
1804
}
1805

1806
DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1807
{
1808
    drflac__gIsNEONSupported = drflac__has_neon();
1809

1810
#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
1811
    drflac__gIsLZCNTSupported = DRFLAC_TRUE;
1812
#endif
1813
}
1814
#endif
1815

1816

1817
/* Endian Management */
1818
static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
1819
{
1820
#if defined(DRFLAC_X86) || defined(DRFLAC_X64)
1821
    return DRFLAC_TRUE;
1822
#elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
1823
    return DRFLAC_TRUE;
1824
#else
1825
    int n = 1;
1826
    return (*(char*)&n) == 1;
1827
#endif
1828
}
1829

1830
static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
1831
{
1832
#ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
1833
    #if defined(_MSC_VER) && !defined(__clang__)
1834
        return _byteswap_ushort(n);
1835
    #elif defined(__GNUC__) || defined(__clang__)
1836
        return __builtin_bswap16(n);
1837
    #elif defined(__WATCOMC__) && defined(__386__)
1838
        return _watcom_bswap16(n);
1839
    #else
1840
        #error "This compiler does not support the byte swap intrinsic."
1841
    #endif
1842
#else
1843
    return ((n & 0xFF00) >> 8) |
1844
           ((n & 0x00FF) << 8);
1845
#endif
1846
}
1847

1848
static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
1849
{
1850
#ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
1851
    #if defined(_MSC_VER) && !defined(__clang__)
1852
        return _byteswap_ulong(n);
1853
    #elif defined(__GNUC__) || defined(__clang__)
1854
        #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT)   /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
1855
            /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
1856
            drflac_uint32 r;
1857
            __asm__ __volatile__ (
1858
            #if defined(DRFLAC_64BIT)
1859
                "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n)   /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
1860
            #else
1861
                "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
1862
            #endif
1863
            );
1864
            return r;
1865
        #else
1866
            return __builtin_bswap32(n);
1867
        #endif
1868
    #elif defined(__WATCOMC__) && defined(__386__)
1869
        return _watcom_bswap32(n);
1870
    #else
1871
        #error "This compiler does not support the byte swap intrinsic."
1872
    #endif
1873
#else
1874
    return ((n & 0xFF000000) >> 24) |
1875
           ((n & 0x00FF0000) >>  8) |
1876
           ((n & 0x0000FF00) <<  8) |
1877
           ((n & 0x000000FF) << 24);
1878
#endif
1879
}
1880

1881
static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
1882
{
1883
#ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
1884
    #if defined(_MSC_VER) && !defined(__clang__)
1885
        return _byteswap_uint64(n);
1886
    #elif defined(__GNUC__) || defined(__clang__)
1887
        return __builtin_bswap64(n);
1888
    #elif defined(__WATCOMC__) && defined(__386__)
1889
        return _watcom_bswap64(n);
1890
    #else
1891
        #error "This compiler does not support the byte swap intrinsic."
1892
    #endif
1893
#else
1894
    /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
1895
    return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
1896
           ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
1897
           ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
1898
           ((n & ((drflac_uint64)0x000000FF << 32)) >>  8) |
1899
           ((n & ((drflac_uint64)0xFF000000      )) <<  8) |
1900
           ((n & ((drflac_uint64)0x00FF0000      )) << 24) |
1901
           ((n & ((drflac_uint64)0x0000FF00      )) << 40) |
1902
           ((n & ((drflac_uint64)0x000000FF      )) << 56);
1903
#endif
1904
}
1905

1906

1907
static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
1908
{
1909
    if (drflac__is_little_endian()) {
1910
        return drflac__swap_endian_uint16(n);
1911
    }
1912

1913
    return n;
1914
}
1915

1916
static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
1917
{
1918
    if (drflac__is_little_endian()) {
1919
        return drflac__swap_endian_uint32(n);
1920
    }
1921

1922
    return n;
1923
}
1924

1925
static DRFLAC_INLINE drflac_uint32 drflac__be2host_32_ptr_unaligned(const void* pData)
1926
{
1927
    const drflac_uint8* pNum = (drflac_uint8*)pData;
1928
    return *(pNum) << 24 | *(pNum+1) << 16 | *(pNum+2) << 8 | *(pNum+3);
1929
}
1930

1931
static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
1932
{
1933
    if (drflac__is_little_endian()) {
1934
        return drflac__swap_endian_uint64(n);
1935
    }
1936

1937
    return n;
1938
}
1939

1940

1941
static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
1942
{
1943
    if (!drflac__is_little_endian()) {
1944
        return drflac__swap_endian_uint32(n);
1945
    }
1946

1947
    return n;
1948
}
1949

1950
static DRFLAC_INLINE drflac_uint32 drflac__le2host_32_ptr_unaligned(const void* pData)
1951
{
1952
    const drflac_uint8* pNum = (drflac_uint8*)pData;
1953
    return *pNum | *(pNum+1) << 8 |  *(pNum+2) << 16 | *(pNum+3) << 24;
1954
}
1955

1956

1957
static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
1958
{
1959
    drflac_uint32 result = 0;
1960
    result |= (n & 0x7F000000) >> 3;
1961
    result |= (n & 0x007F0000) >> 2;
1962
    result |= (n & 0x00007F00) >> 1;
1963
    result |= (n & 0x0000007F) >> 0;
1964

1965
    return result;
1966
}
1967

1968

1969

1970
/* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
1971
static drflac_uint8 drflac__crc8_table[] = {
1972
    0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
1973
    0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
1974
    0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
1975
    0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
1976
    0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
1977
    0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
1978
    0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
1979
    0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
1980
    0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
1981
    0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
1982
    0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
1983
    0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
1984
    0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
1985
    0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
1986
    0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
1987
    0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
1988
};
1989

1990
static drflac_uint16 drflac__crc16_table[] = {
1991
    0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
1992
    0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
1993
    0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
1994
    0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
1995
    0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
1996
    0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
1997
    0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
1998
    0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
1999
    0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
2000
    0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
2001
    0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
2002
    0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
2003
    0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
2004
    0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
2005
    0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
2006
    0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
2007
    0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
2008
    0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
2009
    0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
2010
    0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
2011
    0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
2012
    0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
2013
    0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
2014
    0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
2015
    0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
2016
    0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
2017
    0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
2018
    0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
2019
    0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
2020
    0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
2021
    0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
2022
    0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
2023
};
2024

2025
static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
2026
{
2027
    return drflac__crc8_table[crc ^ data];
2028
}
2029

2030
static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
2031
{
2032
#ifdef DR_FLAC_NO_CRC
2033
    (void)crc;
2034
    (void)data;
2035
    (void)count;
2036
    return 0;
2037
#else
2038
#if 0
2039
    /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
2040
    drflac_uint8 p = 0x07;
2041
    for (int i = count-1; i >= 0; --i) {
2042
        drflac_uint8 bit = (data & (1 << i)) >> i;
2043
        if (crc & 0x80) {
2044
            crc = ((crc << 1) | bit) ^ p;
2045
        } else {
2046
            crc = ((crc << 1) | bit);
2047
        }
2048
    }
2049
    return crc;
2050
#else
2051
    drflac_uint32 wholeBytes;
2052
    drflac_uint32 leftoverBits;
2053
    drflac_uint64 leftoverDataMask;
2054

2055
    static drflac_uint64 leftoverDataMaskTable[8] = {
2056
        0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2057
    };
2058

2059
    DRFLAC_ASSERT(count <= 32);
2060

2061
    wholeBytes = count >> 3;
2062
    leftoverBits = count - (wholeBytes*8);
2063
    leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2064

2065
    switch (wholeBytes) {
2066
        case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2067
        case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2068
        case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2069
        case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2070
        case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
2071
    }
2072
    return crc;
2073
#endif
2074
#endif
2075
}
2076

2077
static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
2078
{
2079
    return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
2080
}
2081

2082
static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
2083
{
2084
#ifdef DRFLAC_64BIT
2085
    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2086
    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2087
    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2088
    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2089
#endif
2090
    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2091
    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2092
    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  8) & 0xFF));
2093
    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  0) & 0xFF));
2094

2095
    return crc;
2096
}
2097

2098
static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
2099
{
2100
    switch (byteCount)
2101
    {
2102
#ifdef DRFLAC_64BIT
2103
    case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2104
    case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2105
    case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2106
    case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2107
#endif
2108
    case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2109
    case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2110
    case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  8) & 0xFF));
2111
    case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  0) & 0xFF));
2112
    }
2113

2114
    return crc;
2115
}
2116

2117
#if 0
2118
static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
2119
{
2120
#ifdef DR_FLAC_NO_CRC
2121
    (void)crc;
2122
    (void)data;
2123
    (void)count;
2124
    return 0;
2125
#else
2126
#if 0
2127
    /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
2128
    drflac_uint16 p = 0x8005;
2129
    for (int i = count-1; i >= 0; --i) {
2130
        drflac_uint16 bit = (data & (1ULL << i)) >> i;
2131
        if (r & 0x8000) {
2132
            r = ((r << 1) | bit) ^ p;
2133
        } else {
2134
            r = ((r << 1) | bit);
2135
        }
2136
    }
2137

2138
    return crc;
2139
#else
2140
    drflac_uint32 wholeBytes;
2141
    drflac_uint32 leftoverBits;
2142
    drflac_uint64 leftoverDataMask;
2143

2144
    static drflac_uint64 leftoverDataMaskTable[8] = {
2145
        0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2146
    };
2147

2148
    DRFLAC_ASSERT(count <= 64);
2149

2150
    wholeBytes = count >> 3;
2151
    leftoverBits = count & 7;
2152
    leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2153

2154
    switch (wholeBytes) {
2155
        default:
2156
        case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2157
        case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2158
        case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2159
        case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2160
        case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2161
    }
2162
    return crc;
2163
#endif
2164
#endif
2165
}
2166

2167
static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
2168
{
2169
#ifdef DR_FLAC_NO_CRC
2170
    (void)crc;
2171
    (void)data;
2172
    (void)count;
2173
    return 0;
2174
#else
2175
    drflac_uint32 wholeBytes;
2176
    drflac_uint32 leftoverBits;
2177
    drflac_uint64 leftoverDataMask;
2178

2179
    static drflac_uint64 leftoverDataMaskTable[8] = {
2180
        0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2181
    };
2182

2183
    DRFLAC_ASSERT(count <= 64);
2184

2185
    wholeBytes = count >> 3;
2186
    leftoverBits = count & 7;
2187
    leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2188

2189
    switch (wholeBytes) {
2190
        default:
2191
        case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits)));    /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
2192
        case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
2193
        case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
2194
        case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
2195
        case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000      ) << leftoverBits)) >> (24 + leftoverBits)));
2196
        case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000      ) << leftoverBits)) >> (16 + leftoverBits)));
2197
        case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00      ) << leftoverBits)) >> ( 8 + leftoverBits)));
2198
        case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF      ) << leftoverBits)) >> ( 0 + leftoverBits)));
2199
        case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2200
    }
2201
    return crc;
2202
#endif
2203
}
2204

2205

2206
static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
2207
{
2208
#ifdef DRFLAC_64BIT
2209
    return drflac_crc16__64bit(crc, data, count);
2210
#else
2211
    return drflac_crc16__32bit(crc, data, count);
2212
#endif
2213
}
2214
#endif
2215

2216

2217
#ifdef DRFLAC_64BIT
2218
#define drflac__be2host__cache_line drflac__be2host_64
2219
#else
2220
#define drflac__be2host__cache_line drflac__be2host_32
2221
#endif
2222

2223
/*
2224
BIT READING ATTEMPT #2
2225

2226
This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
2227
on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
2228
is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
2229
array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
2230
from onRead() is read into.
2231
*/
2232
#define DRFLAC_CACHE_L1_SIZE_BYTES(bs)                      (sizeof((bs)->cache))
2233
#define DRFLAC_CACHE_L1_SIZE_BITS(bs)                       (sizeof((bs)->cache)*8)
2234
#define DRFLAC_CACHE_L1_BITS_REMAINING(bs)                  (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
2235
#define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount)           (~((~(drflac_cache_t)0) >> (_bitCount)))
2236
#define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount)      (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
2237
#define DRFLAC_CACHE_L1_SELECT(bs, _bitCount)               (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
2238
#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount)     (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >>  DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
2239
#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
2240
#define DRFLAC_CACHE_L2_SIZE_BYTES(bs)                      (sizeof((bs)->cacheL2))
2241
#define DRFLAC_CACHE_L2_LINE_COUNT(bs)                      (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
2242
#define DRFLAC_CACHE_L2_LINES_REMAINING(bs)                 (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
2243

2244

2245
#ifndef DR_FLAC_NO_CRC
2246
static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
2247
{
2248
    bs->crc16 = 0;
2249
    bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2250
}
2251

2252
static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
2253
{
2254
    if (bs->crc16CacheIgnoredBytes == 0) {
2255
        bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
2256
    } else {
2257
        bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
2258
        bs->crc16CacheIgnoredBytes = 0;
2259
    }
2260
}
2261

2262
static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
2263
{
2264
    /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
2265
    DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
2266

2267
    /*
2268
    The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
2269
    by the number of bits that have been consumed.
2270
    */
2271
    if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
2272
        drflac__update_crc16(bs);
2273
    } else {
2274
        /* We only accumulate the consumed bits. */
2275
        bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
2276

2277
        /*
2278
        The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
2279
        so we can handle that later.
2280
        */
2281
        bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2282
    }
2283

2284
    return bs->crc16;
2285
}
2286
#endif
2287

2288
static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
2289
{
2290
    size_t bytesRead;
2291
    size_t alignedL1LineCount;
2292

2293
    /* Fast path. Try loading straight from L2. */
2294
    if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
2295
        bs->cache = bs->cacheL2[bs->nextL2Line++];
2296
        return DRFLAC_TRUE;
2297
    }
2298

2299
    /*
2300
    If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
2301
    any left.
2302
    */
2303
    if (bs->unalignedByteCount > 0) {
2304
        return DRFLAC_FALSE;   /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
2305
    }
2306

2307
    bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
2308

2309
    bs->nextL2Line = 0;
2310
    if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
2311
        bs->cache = bs->cacheL2[bs->nextL2Line++];
2312
        return DRFLAC_TRUE;
2313
    }
2314

2315

2316
    /*
2317
    If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
2318
    means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
2319
    and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
2320
    the size of the L1 so we'll need to seek backwards by any misaligned bytes.
2321
    */
2322
    alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
2323

2324
    /* We need to keep track of any unaligned bytes for later use. */
2325
    bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2326
    if (bs->unalignedByteCount > 0) {
2327
        bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
2328
    }
2329

2330
    if (alignedL1LineCount > 0) {
2331
        size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
2332
        size_t i;
2333
        for (i = alignedL1LineCount; i > 0; --i) {
2334
            bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
2335
        }
2336

2337
        bs->nextL2Line = (drflac_uint32)offset;
2338
        bs->cache = bs->cacheL2[bs->nextL2Line++];
2339
        return DRFLAC_TRUE;
2340
    } else {
2341
        /* If we get into this branch it means we weren't able to load any L1-aligned data. */
2342
        bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
2343
        return DRFLAC_FALSE;
2344
    }
2345
}
2346

2347
static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
2348
{
2349
    size_t bytesRead;
2350

2351
#ifndef DR_FLAC_NO_CRC
2352
    drflac__update_crc16(bs);
2353
#endif
2354

2355
    /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
2356
    if (drflac__reload_l1_cache_from_l2(bs)) {
2357
        bs->cache = drflac__be2host__cache_line(bs->cache);
2358
        bs->consumedBits = 0;
2359
#ifndef DR_FLAC_NO_CRC
2360
        bs->crc16Cache = bs->cache;
2361
#endif
2362
        return DRFLAC_TRUE;
2363
    }
2364

2365
    /* Slow path. */
2366

2367
    /*
2368
    If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
2369
    few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
2370
    data from the unaligned cache.
2371
    */
2372
    bytesRead = bs->unalignedByteCount;
2373
    if (bytesRead == 0) {
2374
        bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);   /* <-- The stream has been exhausted, so marked the bits as consumed. */
2375
        return DRFLAC_FALSE;
2376
    }
2377

2378
    DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2379
    bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
2380

2381
    bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
2382
    bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs));    /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
2383
    bs->unalignedByteCount = 0;     /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
2384

2385
#ifndef DR_FLAC_NO_CRC
2386
    bs->crc16Cache = bs->cache >> bs->consumedBits;
2387
    bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2388
#endif
2389
    return DRFLAC_TRUE;
2390
}
2391

2392
static void drflac__reset_cache(drflac_bs* bs)
2393
{
2394
    bs->nextL2Line   = DRFLAC_CACHE_L2_LINE_COUNT(bs);  /* <-- This clears the L2 cache. */
2395
    bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);   /* <-- This clears the L1 cache. */
2396
    bs->cache = 0;
2397
    bs->unalignedByteCount = 0;                         /* <-- This clears the trailing unaligned bytes. */
2398
    bs->unalignedCache = 0;
2399

2400
#ifndef DR_FLAC_NO_CRC
2401
    bs->crc16Cache = 0;
2402
    bs->crc16CacheIgnoredBytes = 0;
2403
#endif
2404
}
2405

2406

2407
static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
2408
{
2409
    DRFLAC_ASSERT(bs != NULL);
2410
    DRFLAC_ASSERT(pResultOut != NULL);
2411
    DRFLAC_ASSERT(bitCount > 0);
2412
    DRFLAC_ASSERT(bitCount <= 32);
2413

2414
    if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2415
        if (!drflac__reload_cache(bs)) {
2416
            return DRFLAC_FALSE;
2417
        }
2418
    }
2419

2420
    if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2421
        /*
2422
        If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
2423
        a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
2424
        more optimal solution for this.
2425
        */
2426
#ifdef DRFLAC_64BIT
2427
        *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2428
        bs->consumedBits += bitCount;
2429
        bs->cache <<= bitCount;
2430
#else
2431
        if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2432
            *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2433
            bs->consumedBits += bitCount;
2434
            bs->cache <<= bitCount;
2435
        } else {
2436
            /* Cannot shift by 32-bits, so need to do it differently. */
2437
            *pResultOut = (drflac_uint32)bs->cache;
2438
            bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
2439
            bs->cache = 0;
2440
        }
2441
#endif
2442

2443
        return DRFLAC_TRUE;
2444
    } else {
2445
        /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
2446
        drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2447
        drflac_uint32 bitCountLo = bitCount - bitCountHi;
2448
        drflac_uint32 resultHi;
2449

2450
        DRFLAC_ASSERT(bitCountHi > 0);
2451
        DRFLAC_ASSERT(bitCountHi < 32);
2452
        resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
2453

2454
        if (!drflac__reload_cache(bs)) {
2455
            return DRFLAC_FALSE;
2456
        }
2457
        if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2458
            /* This happens when we get to end of stream */
2459
            return DRFLAC_FALSE;
2460
        }
2461

2462
        *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
2463
        bs->consumedBits += bitCountLo;
2464
        bs->cache <<= bitCountLo;
2465
        return DRFLAC_TRUE;
2466
    }
2467
}
2468

2469
static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
2470
{
2471
    drflac_uint32 result;
2472

2473
    DRFLAC_ASSERT(bs != NULL);
2474
    DRFLAC_ASSERT(pResult != NULL);
2475
    DRFLAC_ASSERT(bitCount > 0);
2476
    DRFLAC_ASSERT(bitCount <= 32);
2477

2478
    if (!drflac__read_uint32(bs, bitCount, &result)) {
2479
        return DRFLAC_FALSE;
2480
    }
2481

2482
    /* Do not attempt to shift by 32 as it's undefined. */
2483
    if (bitCount < 32) {
2484
        drflac_uint32 signbit;
2485
        signbit = ((result >> (bitCount-1)) & 0x01);
2486
        result |= (~signbit + 1) << bitCount;
2487
    }
2488

2489
    *pResult = (drflac_int32)result;
2490
    return DRFLAC_TRUE;
2491
}
2492

2493
#ifdef DRFLAC_64BIT
2494
static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
2495
{
2496
    drflac_uint32 resultHi;
2497
    drflac_uint32 resultLo;
2498

2499
    DRFLAC_ASSERT(bitCount <= 64);
2500
    DRFLAC_ASSERT(bitCount >  32);
2501

2502
    if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
2503
        return DRFLAC_FALSE;
2504
    }
2505

2506
    if (!drflac__read_uint32(bs, 32, &resultLo)) {
2507
        return DRFLAC_FALSE;
2508
    }
2509

2510
    *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
2511
    return DRFLAC_TRUE;
2512
}
2513
#endif
2514

2515
/* Function below is unused, but leaving it here in case I need to quickly add it again. */
2516
#if 0
2517
static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
2518
{
2519
    drflac_uint64 result;
2520
    drflac_uint64 signbit;
2521

2522
    DRFLAC_ASSERT(bitCount <= 64);
2523

2524
    if (!drflac__read_uint64(bs, bitCount, &result)) {
2525
        return DRFLAC_FALSE;
2526
    }
2527

2528
    signbit = ((result >> (bitCount-1)) & 0x01);
2529
    result |= (~signbit + 1) << bitCount;
2530

2531
    *pResultOut = (drflac_int64)result;
2532
    return DRFLAC_TRUE;
2533
}
2534
#endif
2535

2536
static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
2537
{
2538
    drflac_uint32 result;
2539

2540
    DRFLAC_ASSERT(bs != NULL);
2541
    DRFLAC_ASSERT(pResult != NULL);
2542
    DRFLAC_ASSERT(bitCount > 0);
2543
    DRFLAC_ASSERT(bitCount <= 16);
2544

2545
    if (!drflac__read_uint32(bs, bitCount, &result)) {
2546
        return DRFLAC_FALSE;
2547
    }
2548

2549
    *pResult = (drflac_uint16)result;
2550
    return DRFLAC_TRUE;
2551
}
2552

2553
#if 0
2554
static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
2555
{
2556
    drflac_int32 result;
2557

2558
    DRFLAC_ASSERT(bs != NULL);
2559
    DRFLAC_ASSERT(pResult != NULL);
2560
    DRFLAC_ASSERT(bitCount > 0);
2561
    DRFLAC_ASSERT(bitCount <= 16);
2562

2563
    if (!drflac__read_int32(bs, bitCount, &result)) {
2564
        return DRFLAC_FALSE;
2565
    }
2566

2567
    *pResult = (drflac_int16)result;
2568
    return DRFLAC_TRUE;
2569
}
2570
#endif
2571

2572
static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
2573
{
2574
    drflac_uint32 result;
2575

2576
    DRFLAC_ASSERT(bs != NULL);
2577
    DRFLAC_ASSERT(pResult != NULL);
2578
    DRFLAC_ASSERT(bitCount > 0);
2579
    DRFLAC_ASSERT(bitCount <= 8);
2580

2581
    if (!drflac__read_uint32(bs, bitCount, &result)) {
2582
        return DRFLAC_FALSE;
2583
    }
2584

2585
    *pResult = (drflac_uint8)result;
2586
    return DRFLAC_TRUE;
2587
}
2588

2589
static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
2590
{
2591
    drflac_int32 result;
2592

2593
    DRFLAC_ASSERT(bs != NULL);
2594
    DRFLAC_ASSERT(pResult != NULL);
2595
    DRFLAC_ASSERT(bitCount > 0);
2596
    DRFLAC_ASSERT(bitCount <= 8);
2597

2598
    if (!drflac__read_int32(bs, bitCount, &result)) {
2599
        return DRFLAC_FALSE;
2600
    }
2601

2602
    *pResult = (drflac_int8)result;
2603
    return DRFLAC_TRUE;
2604
}
2605

2606

2607
static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
2608
{
2609
    if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2610
        bs->consumedBits += (drflac_uint32)bitsToSeek;
2611
        bs->cache <<= bitsToSeek;
2612
        return DRFLAC_TRUE;
2613
    } else {
2614
        /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
2615
        bitsToSeek       -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2616
        bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2617
        bs->cache         = 0;
2618

2619
        /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
2620
#ifdef DRFLAC_64BIT
2621
        while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2622
            drflac_uint64 bin;
2623
            if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2624
                return DRFLAC_FALSE;
2625
            }
2626
            bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2627
        }
2628
#else
2629
        while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2630
            drflac_uint32 bin;
2631
            if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2632
                return DRFLAC_FALSE;
2633
            }
2634
            bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2635
        }
2636
#endif
2637

2638
        /* Whole leftover bytes. */
2639
        while (bitsToSeek >= 8) {
2640
            drflac_uint8 bin;
2641
            if (!drflac__read_uint8(bs, 8, &bin)) {
2642
                return DRFLAC_FALSE;
2643
            }
2644
            bitsToSeek -= 8;
2645
        }
2646

2647
        /* Leftover bits. */
2648
        if (bitsToSeek > 0) {
2649
            drflac_uint8 bin;
2650
            if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
2651
                return DRFLAC_FALSE;
2652
            }
2653
            bitsToSeek = 0; /* <-- Necessary for the assert below. */
2654
        }
2655

2656
        DRFLAC_ASSERT(bitsToSeek == 0);
2657
        return DRFLAC_TRUE;
2658
    }
2659
}
2660

2661

2662
/* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
2663
static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
2664
{
2665
    DRFLAC_ASSERT(bs != NULL);
2666

2667
    /*
2668
    The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
2669
    thing to do is align to the next byte.
2670
    */
2671
    if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2672
        return DRFLAC_FALSE;
2673
    }
2674

2675
    for (;;) {
2676
        drflac_uint8 hi;
2677

2678
#ifndef DR_FLAC_NO_CRC
2679
        drflac__reset_crc16(bs);
2680
#endif
2681

2682
        if (!drflac__read_uint8(bs, 8, &hi)) {
2683
            return DRFLAC_FALSE;
2684
        }
2685

2686
        if (hi == 0xFF) {
2687
            drflac_uint8 lo;
2688
            if (!drflac__read_uint8(bs, 6, &lo)) {
2689
                return DRFLAC_FALSE;
2690
            }
2691

2692
            if (lo == 0x3E) {
2693
                return DRFLAC_TRUE;
2694
            } else {
2695
                if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2696
                    return DRFLAC_FALSE;
2697
                }
2698
            }
2699
        }
2700
    }
2701

2702
    /* Should never get here. */
2703
    /*return DRFLAC_FALSE;*/
2704
}
2705

2706

2707
#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
2708
#define DRFLAC_IMPLEMENT_CLZ_LZCNT
2709
#endif
2710
#if  defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
2711
#define DRFLAC_IMPLEMENT_CLZ_MSVC
2712
#endif
2713
#if  defined(__WATCOMC__) && defined(__386__)
2714
#define DRFLAC_IMPLEMENT_CLZ_WATCOM
2715
#endif
2716
#ifdef __MRC__
2717
#include <intrinsics.h>
2718
#define DRFLAC_IMPLEMENT_CLZ_MRC
2719
#endif
2720

2721
static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
2722
{
2723
    drflac_uint32 n;
2724
    static drflac_uint32 clz_table_4[] = {
2725
        0,
2726
        4,
2727
        3, 3,
2728
        2, 2, 2, 2,
2729
        1, 1, 1, 1, 1, 1, 1, 1
2730
    };
2731

2732
    if (x == 0) {
2733
        return sizeof(x)*8;
2734
    }
2735

2736
    n = clz_table_4[x >> (sizeof(x)*8 - 4)];
2737
    if (n == 0) {
2738
#ifdef DRFLAC_64BIT
2739
        if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n  = 32; x <<= 32; }
2740
        if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
2741
        if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8;  x <<= 8;  }
2742
        if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4;  x <<= 4;  }
2743
#else
2744
        if ((x & 0xFFFF0000) == 0) { n  = 16; x <<= 16; }
2745
        if ((x & 0xFF000000) == 0) { n += 8;  x <<= 8;  }
2746
        if ((x & 0xF0000000) == 0) { n += 4;  x <<= 4;  }
2747
#endif
2748
        n += clz_table_4[x >> (sizeof(x)*8 - 4)];
2749
    }
2750

2751
    return n - 1;
2752
}
2753

2754
#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2755
static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
2756
{
2757
    /* Fast compile time check for ARM. */
2758
#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
2759
    return DRFLAC_TRUE;
2760
#elif defined(__MRC__)
2761
    return DRFLAC_TRUE;
2762
#else
2763
    /* If the compiler itself does not support the intrinsic then we'll need to return false. */
2764
    #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
2765
        return drflac__gIsLZCNTSupported;
2766
    #else
2767
        return DRFLAC_FALSE;
2768
    #endif
2769
#endif
2770
}
2771

2772
static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
2773
{
2774
    /*
2775
    It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
2776
    to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
2777
    it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
2778
    64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
2779
    around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
2780
    the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
2781
    in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
2782
    getting clobbered?
2783

2784
    I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
2785
    assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
2786

2787
    Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
2788
    compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
2789
    to know how to fix the inlined assembly for correctness sake, however.
2790
    */
2791

2792
#if defined(_MSC_VER) /*&& !defined(__clang__)*/    /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
2793
    #ifdef DRFLAC_64BIT
2794
        return (drflac_uint32)__lzcnt64(x);
2795
    #else
2796
        return (drflac_uint32)__lzcnt(x);
2797
    #endif
2798
#else
2799
    #if defined(__GNUC__) || defined(__clang__)
2800
        #if defined(DRFLAC_X64)
2801
            {
2802
                drflac_uint64 r;
2803
                __asm__ __volatile__ (
2804
                    "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2805
                );
2806

2807
                return (drflac_uint32)r;
2808
            }
2809
        #elif defined(DRFLAC_X86)
2810
            {
2811
                drflac_uint32 r;
2812
                __asm__ __volatile__ (
2813
                    "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2814
                );
2815

2816
                return r;
2817
            }
2818
        #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT)   /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
2819
            {
2820
                unsigned int r;
2821
                __asm__ __volatile__ (
2822
                #if defined(DRFLAC_64BIT)
2823
                    "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x)   /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
2824
                #else
2825
                    "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
2826
                #endif
2827
                );
2828

2829
                return r;
2830
            }
2831
        #else
2832
            if (x == 0) {
2833
                return sizeof(x)*8;
2834
            }
2835
            #ifdef DRFLAC_64BIT
2836
                return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
2837
            #else
2838
                return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
2839
            #endif
2840
        #endif
2841
    #else
2842
        /* Unsupported compiler. */
2843
        #error "This compiler does not support the lzcnt intrinsic."
2844
    #endif
2845
#endif
2846
}
2847
#endif
2848

2849
#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2850
#include <intrin.h> /* For BitScanReverse(). */
2851

2852
static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
2853
{
2854
    drflac_uint32 n;
2855

2856
    if (x == 0) {
2857
        return sizeof(x)*8;
2858
    }
2859

2860
#ifdef DRFLAC_64BIT
2861
    _BitScanReverse64((unsigned long*)&n, x);
2862
#else
2863
    _BitScanReverse((unsigned long*)&n, x);
2864
#endif
2865
    return sizeof(x)*8 - n - 1;
2866
}
2867
#endif
2868

2869
#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
2870
static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
2871
#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT
2872
/* Use the LZCNT instruction (only available on some processors since the 2010s). */
2873
#pragma aux drflac__clz_watcom_lzcnt = \
2874
    "db 0F3h, 0Fh, 0BDh, 0C0h" /* lzcnt eax, eax */ \
2875
    parm [eax] \
2876
    value [eax] \
2877
    modify nomemory;
2878
#else
2879
/* Use the 386+-compatible implementation. */
2880
#pragma aux drflac__clz_watcom = \
2881
    "bsr eax, eax" \
2882
    "xor eax, 31" \
2883
    parm [eax] nomemory \
2884
    value [eax] \
2885
    modify exact [eax] nomemory;
2886
#endif
2887
#endif
2888

2889
static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
2890
{
2891
#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2892
    if (drflac__is_lzcnt_supported()) {
2893
        return drflac__clz_lzcnt(x);
2894
    } else
2895
#endif
2896
    {
2897
#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2898
        return drflac__clz_msvc(x);
2899
#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT)
2900
        return drflac__clz_watcom_lzcnt(x);
2901
#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
2902
        return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
2903
#elif defined(__MRC__)
2904
        return __cntlzw(x);
2905
#else
2906
        return drflac__clz_software(x);
2907
#endif
2908
    }
2909
}
2910

2911

2912
static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
2913
{
2914
    drflac_uint32 zeroCounter = 0;
2915
    drflac_uint32 setBitOffsetPlus1;
2916

2917
    while (bs->cache == 0) {
2918
        zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2919
        if (!drflac__reload_cache(bs)) {
2920
            return DRFLAC_FALSE;
2921
        }
2922
    }
2923

2924
    if (bs->cache == 1) {
2925
        /* Not catching this would lead to undefined behaviour: a shift of a 32-bit number by 32 or more is undefined */
2926
        *pOffsetOut = zeroCounter + (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs) - 1;
2927
        if (!drflac__reload_cache(bs)) {
2928
            return DRFLAC_FALSE;
2929
        }
2930

2931
        return DRFLAC_TRUE;
2932
    }
2933

2934
    setBitOffsetPlus1 = drflac__clz(bs->cache);
2935
    setBitOffsetPlus1 += 1;
2936

2937
    if (setBitOffsetPlus1 > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2938
        /* This happens when we get to end of stream */
2939
        return DRFLAC_FALSE;
2940
    }
2941

2942
    bs->consumedBits += setBitOffsetPlus1;
2943
    bs->cache <<= setBitOffsetPlus1;
2944

2945
    *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
2946
    return DRFLAC_TRUE;
2947
}
2948

2949

2950

2951
static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
2952
{
2953
    DRFLAC_ASSERT(bs != NULL);
2954
    DRFLAC_ASSERT(offsetFromStart > 0);
2955

2956
    /*
2957
    Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
2958
    is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
2959
    To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
2960
    */
2961
    if (offsetFromStart > 0x7FFFFFFF) {
2962
        drflac_uint64 bytesRemaining = offsetFromStart;
2963
        if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
2964
            return DRFLAC_FALSE;
2965
        }
2966
        bytesRemaining -= 0x7FFFFFFF;
2967

2968
        while (bytesRemaining > 0x7FFFFFFF) {
2969
            if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
2970
                return DRFLAC_FALSE;
2971
            }
2972
            bytesRemaining -= 0x7FFFFFFF;
2973
        }
2974

2975
        if (bytesRemaining > 0) {
2976
            if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
2977
                return DRFLAC_FALSE;
2978
            }
2979
        }
2980
    } else {
2981
        if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
2982
            return DRFLAC_FALSE;
2983
        }
2984
    }
2985

2986
    /* The cache should be reset to force a reload of fresh data from the client. */
2987
    drflac__reset_cache(bs);
2988
    return DRFLAC_TRUE;
2989
}
2990

2991

2992
static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
2993
{
2994
    drflac_uint8 crc;
2995
    drflac_uint64 result;
2996
    drflac_uint8 utf8[7] = {0};
2997
    int byteCount;
2998
    int i;
2999

3000
    DRFLAC_ASSERT(bs != NULL);
3001
    DRFLAC_ASSERT(pNumberOut != NULL);
3002
    DRFLAC_ASSERT(pCRCOut != NULL);
3003

3004
    crc = *pCRCOut;
3005

3006
    if (!drflac__read_uint8(bs, 8, utf8)) {
3007
        *pNumberOut = 0;
3008
        return DRFLAC_AT_END;
3009
    }
3010
    crc = drflac_crc8(crc, utf8[0], 8);
3011

3012
    if ((utf8[0] & 0x80) == 0) {
3013
        *pNumberOut = utf8[0];
3014
        *pCRCOut = crc;
3015
        return DRFLAC_SUCCESS;
3016
    }
3017

3018
    /*byteCount = 1;*/
3019
    if ((utf8[0] & 0xE0) == 0xC0) {
3020
        byteCount = 2;
3021
    } else if ((utf8[0] & 0xF0) == 0xE0) {
3022
        byteCount = 3;
3023
    } else if ((utf8[0] & 0xF8) == 0xF0) {
3024
        byteCount = 4;
3025
    } else if ((utf8[0] & 0xFC) == 0xF8) {
3026
        byteCount = 5;
3027
    } else if ((utf8[0] & 0xFE) == 0xFC) {
3028
        byteCount = 6;
3029
    } else if ((utf8[0] & 0xFF) == 0xFE) {
3030
        byteCount = 7;
3031
    } else {
3032
        *pNumberOut = 0;
3033
        return DRFLAC_CRC_MISMATCH;     /* Bad UTF-8 encoding. */
3034
    }
3035

3036
    /* Read extra bytes. */
3037
    DRFLAC_ASSERT(byteCount > 1);
3038

3039
    result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
3040
    for (i = 1; i < byteCount; ++i) {
3041
        if (!drflac__read_uint8(bs, 8, utf8 + i)) {
3042
            *pNumberOut = 0;
3043
            return DRFLAC_AT_END;
3044
        }
3045
        crc = drflac_crc8(crc, utf8[i], 8);
3046

3047
        result = (result << 6) | (utf8[i] & 0x3F);
3048
    }
3049

3050
    *pNumberOut = result;
3051
    *pCRCOut = crc;
3052
    return DRFLAC_SUCCESS;
3053
}
3054

3055

3056
static DRFLAC_INLINE drflac_uint32 drflac__ilog2_u32(drflac_uint32 x)
3057
{
3058
#if 1   /* Needs optimizing. */
3059
    drflac_uint32 result = 0;
3060
    while (x > 0) {
3061
        result += 1;
3062
        x >>= 1;
3063
    }
3064

3065
    return result;
3066
#endif
3067
}
3068

3069
static DRFLAC_INLINE drflac_bool32 drflac__use_64_bit_prediction(drflac_uint32 bitsPerSample, drflac_uint32 order, drflac_uint32 precision)
3070
{
3071
    /* https://web.archive.org/web/20220205005724/https://github.com/ietf-wg-cellar/flac-specification/blob/37a49aa48ba4ba12e8757badfc59c0df35435fec/rfc_backmatter.md */
3072
    return bitsPerSample + precision + drflac__ilog2_u32(order) > 32;
3073
}
3074

3075

3076
/*
3077
The next two functions are responsible for calculating the prediction.
3078

3079
When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
3080
safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
3081
*/
3082
#if defined(__clang__)
3083
__attribute__((no_sanitize("signed-integer-overflow")))
3084
#endif
3085
static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3086
{
3087
    drflac_int32 prediction = 0;
3088

3089
    DRFLAC_ASSERT(order <= 32);
3090

3091
    /* 32-bit version. */
3092

3093
    /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
3094
    switch (order)
3095
    {
3096
    case 32: prediction += coefficients[31] * pDecodedSamples[-32];
3097
    case 31: prediction += coefficients[30] * pDecodedSamples[-31];
3098
    case 30: prediction += coefficients[29] * pDecodedSamples[-30];
3099
    case 29: prediction += coefficients[28] * pDecodedSamples[-29];
3100
    case 28: prediction += coefficients[27] * pDecodedSamples[-28];
3101
    case 27: prediction += coefficients[26] * pDecodedSamples[-27];
3102
    case 26: prediction += coefficients[25] * pDecodedSamples[-26];
3103
    case 25: prediction += coefficients[24] * pDecodedSamples[-25];
3104
    case 24: prediction += coefficients[23] * pDecodedSamples[-24];
3105
    case 23: prediction += coefficients[22] * pDecodedSamples[-23];
3106
    case 22: prediction += coefficients[21] * pDecodedSamples[-22];
3107
    case 21: prediction += coefficients[20] * pDecodedSamples[-21];
3108
    case 20: prediction += coefficients[19] * pDecodedSamples[-20];
3109
    case 19: prediction += coefficients[18] * pDecodedSamples[-19];
3110
    case 18: prediction += coefficients[17] * pDecodedSamples[-18];
3111
    case 17: prediction += coefficients[16] * pDecodedSamples[-17];
3112
    case 16: prediction += coefficients[15] * pDecodedSamples[-16];
3113
    case 15: prediction += coefficients[14] * pDecodedSamples[-15];
3114
    case 14: prediction += coefficients[13] * pDecodedSamples[-14];
3115
    case 13: prediction += coefficients[12] * pDecodedSamples[-13];
3116
    case 12: prediction += coefficients[11] * pDecodedSamples[-12];
3117
    case 11: prediction += coefficients[10] * pDecodedSamples[-11];
3118
    case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
3119
    case  9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
3120
    case  8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
3121
    case  7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
3122
    case  6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
3123
    case  5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
3124
    case  4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
3125
    case  3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
3126
    case  2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
3127
    case  1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
3128
    }
3129

3130
    return (drflac_int32)(prediction >> shift);
3131
}
3132

3133
static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3134
{
3135
    drflac_int64 prediction;
3136

3137
    DRFLAC_ASSERT(order <= 32);
3138

3139
    /* 64-bit version. */
3140

3141
    /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
3142
#ifndef DRFLAC_64BIT
3143
    if (order == 8)
3144
    {
3145
        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3146
        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3147
        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3148
        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3149
        prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3150
        prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3151
        prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3152
        prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3153
    }
3154
    else if (order == 7)
3155
    {
3156
        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3157
        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3158
        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3159
        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3160
        prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3161
        prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3162
        prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3163
    }
3164
    else if (order == 3)
3165
    {
3166
        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3167
        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3168
        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3169
    }
3170
    else if (order == 6)
3171
    {
3172
        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3173
        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3174
        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3175
        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3176
        prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3177
        prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3178
    }
3179
    else if (order == 5)
3180
    {
3181
        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3182
        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3183
        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3184
        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3185
        prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3186
    }
3187
    else if (order == 4)
3188
    {
3189
        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3190
        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3191
        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3192
        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3193
    }
3194
    else if (order == 12)
3195
    {
3196
        prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
3197
        prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
3198
        prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
3199
        prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
3200
        prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
3201
        prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
3202
        prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
3203
        prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
3204
        prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
3205
        prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
3206
        prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3207
        prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3208
    }
3209
    else if (order == 2)
3210
    {
3211
        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3212
        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3213
    }
3214
    else if (order == 1)
3215
    {
3216
        prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3217
    }
3218
    else if (order == 10)
3219
    {
3220
        prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
3221
        prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
3222
        prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
3223
        prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
3224
        prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
3225
        prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
3226
        prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
3227
        prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
3228
        prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
3229
        prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
3230
    }
3231
    else if (order == 9)
3232
    {
3233
        prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
3234
        prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
3235
        prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
3236
        prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
3237
        prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
3238
        prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
3239
        prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
3240
        prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
3241
        prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
3242
    }
3243
    else if (order == 11)
3244
    {
3245
        prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
3246
        prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
3247
        prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
3248
        prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
3249
        prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
3250
        prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
3251
        prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
3252
        prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
3253
        prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
3254
        prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
3255
        prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3256
    }
3257
    else
3258
    {
3259
        int j;
3260

3261
        prediction = 0;
3262
        for (j = 0; j < (int)order; ++j) {
3263
            prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
3264
        }
3265
    }
3266
#endif
3267

3268
    /*
3269
    VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
3270
    reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
3271
    */
3272
#ifdef DRFLAC_64BIT
3273
    prediction = 0;
3274
    switch (order)
3275
    {
3276
    case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
3277
    case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
3278
    case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
3279
    case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
3280
    case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
3281
    case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
3282
    case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
3283
    case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
3284
    case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
3285
    case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
3286
    case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
3287
    case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
3288
    case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
3289
    case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
3290
    case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
3291
    case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
3292
    case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
3293
    case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
3294
    case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
3295
    case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
3296
    case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3297
    case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3298
    case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
3299
    case  9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
3300
    case  8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
3301
    case  7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
3302
    case  6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
3303
    case  5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
3304
    case  4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
3305
    case  3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
3306
    case  2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
3307
    case  1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
3308
    }
3309
#endif
3310

3311
    return (drflac_int32)(prediction >> shift);
3312
}
3313

3314

3315
#if 0
3316
/*
3317
Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
3318
sake of readability and should only be used as a reference.
3319
*/
3320
static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3321
{
3322
    drflac_uint32 i;
3323

3324
    DRFLAC_ASSERT(bs != NULL);
3325
    DRFLAC_ASSERT(pSamplesOut != NULL);
3326

3327
    for (i = 0; i < count; ++i) {
3328
        drflac_uint32 zeroCounter = 0;
3329
        for (;;) {
3330
            drflac_uint8 bit;
3331
            if (!drflac__read_uint8(bs, 1, &bit)) {
3332
                return DRFLAC_FALSE;
3333
            }
3334

3335
            if (bit == 0) {
3336
                zeroCounter += 1;
3337
            } else {
3338
                break;
3339
            }
3340
        }
3341

3342
        drflac_uint32 decodedRice;
3343
        if (riceParam > 0) {
3344
            if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3345
                return DRFLAC_FALSE;
3346
            }
3347
        } else {
3348
            decodedRice = 0;
3349
        }
3350

3351
        decodedRice |= (zeroCounter << riceParam);
3352
        if ((decodedRice & 0x01)) {
3353
            decodedRice = ~(decodedRice >> 1);
3354
        } else {
3355
            decodedRice =  (decodedRice >> 1);
3356
        }
3357

3358

3359
        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3360
            pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3361
        } else {
3362
            pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3363
        }
3364
    }
3365

3366
    return DRFLAC_TRUE;
3367
}
3368
#endif
3369

3370
#if 0
3371
static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3372
{
3373
    drflac_uint32 zeroCounter = 0;
3374
    drflac_uint32 decodedRice;
3375

3376
    for (;;) {
3377
        drflac_uint8 bit;
3378
        if (!drflac__read_uint8(bs, 1, &bit)) {
3379
            return DRFLAC_FALSE;
3380
        }
3381

3382
        if (bit == 0) {
3383
            zeroCounter += 1;
3384
        } else {
3385
            break;
3386
        }
3387
    }
3388

3389
    if (riceParam > 0) {
3390
        if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3391
            return DRFLAC_FALSE;
3392
        }
3393
    } else {
3394
        decodedRice = 0;
3395
    }
3396

3397
    *pZeroCounterOut = zeroCounter;
3398
    *pRiceParamPartOut = decodedRice;
3399
    return DRFLAC_TRUE;
3400
}
3401
#endif
3402

3403
#if 0
3404
static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3405
{
3406
    drflac_cache_t riceParamMask;
3407
    drflac_uint32 zeroCounter;
3408
    drflac_uint32 setBitOffsetPlus1;
3409
    drflac_uint32 riceParamPart;
3410
    drflac_uint32 riceLength;
3411

3412
    DRFLAC_ASSERT(riceParam > 0);   /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
3413

3414
    riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
3415

3416
    zeroCounter = 0;
3417
    while (bs->cache == 0) {
3418
        zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
3419
        if (!drflac__reload_cache(bs)) {
3420
            return DRFLAC_FALSE;
3421
        }
3422
    }
3423

3424
    setBitOffsetPlus1 = drflac__clz(bs->cache);
3425
    zeroCounter += setBitOffsetPlus1;
3426
    setBitOffsetPlus1 += 1;
3427

3428
    riceLength = setBitOffsetPlus1 + riceParam;
3429
    if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3430
        riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
3431

3432
        bs->consumedBits += riceLength;
3433
        bs->cache <<= riceLength;
3434
    } else {
3435
        drflac_uint32 bitCountLo;
3436
        drflac_cache_t resultHi;
3437

3438
        bs->consumedBits += riceLength;
3439
        bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1);    /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
3440

3441
        /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
3442
        bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
3443
        resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam);  /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
3444

3445
        if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3446
#ifndef DR_FLAC_NO_CRC
3447
            drflac__update_crc16(bs);
3448
#endif
3449
            bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3450
            bs->consumedBits = 0;
3451
#ifndef DR_FLAC_NO_CRC
3452
            bs->crc16Cache = bs->cache;
3453
#endif
3454
        } else {
3455
            /* Slow path. We need to fetch more data from the client. */
3456
            if (!drflac__reload_cache(bs)) {
3457
                return DRFLAC_FALSE;
3458
            }
3459
            if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3460
                /* This happens when we get to end of stream */
3461
                return DRFLAC_FALSE;
3462
            }
3463
        }
3464

3465
        riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
3466

3467
        bs->consumedBits += bitCountLo;
3468
        bs->cache <<= bitCountLo;
3469
    }
3470

3471
    pZeroCounterOut[0] = zeroCounter;
3472
    pRiceParamPartOut[0] = riceParamPart;
3473

3474
    return DRFLAC_TRUE;
3475
}
3476
#endif
3477

3478
static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3479
{
3480
    drflac_uint32  riceParamPlus1 = riceParam + 1;
3481
    /*drflac_cache_t riceParamPlus1Mask  = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
3482
    drflac_uint32  riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
3483
    drflac_uint32  riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3484

3485
    /*
3486
    The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3487
    no idea how this will work in practice...
3488
    */
3489
    drflac_cache_t bs_cache = bs->cache;
3490
    drflac_uint32  bs_consumedBits = bs->consumedBits;
3491

3492
    /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3493
    drflac_uint32  lzcount = drflac__clz(bs_cache);
3494
    if (lzcount < sizeof(bs_cache)*8) {
3495
        pZeroCounterOut[0] = lzcount;
3496

3497
        /*
3498
        It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3499
        this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3500
        outside of this function at a higher level.
3501
        */
3502
    extract_rice_param_part:
3503
        bs_cache       <<= lzcount;
3504
        bs_consumedBits += lzcount;
3505

3506
        if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3507
            /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3508
            pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3509
            bs_cache       <<= riceParamPlus1;
3510
            bs_consumedBits += riceParamPlus1;
3511
        } else {
3512
            drflac_uint32 riceParamPartHi;
3513
            drflac_uint32 riceParamPartLo;
3514
            drflac_uint32 riceParamPartLoBitCount;
3515

3516
            /*
3517
            Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3518
            line, reload the cache, and then combine it with the head of the next cache line.
3519
            */
3520

3521
            /* Grab the high part of the rice parameter part. */
3522
            riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3523

3524
            /* Before reloading the cache we need to grab the size in bits of the low part. */
3525
            riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3526
            DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3527

3528
            /* Now reload the cache. */
3529
            if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3530
            #ifndef DR_FLAC_NO_CRC
3531
                drflac__update_crc16(bs);
3532
            #endif
3533
                bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3534
                bs_consumedBits = riceParamPartLoBitCount;
3535
            #ifndef DR_FLAC_NO_CRC
3536
                bs->crc16Cache = bs_cache;
3537
            #endif
3538
            } else {
3539
                /* Slow path. We need to fetch more data from the client. */
3540
                if (!drflac__reload_cache(bs)) {
3541
                    return DRFLAC_FALSE;
3542
                }
3543
                if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3544
                    /* This happens when we get to end of stream */
3545
                    return DRFLAC_FALSE;
3546
                }
3547

3548
                bs_cache = bs->cache;
3549
                bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3550
            }
3551

3552
            /* We should now have enough information to construct the rice parameter part. */
3553
            riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
3554
            pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
3555

3556
            bs_cache <<= riceParamPartLoBitCount;
3557
        }
3558
    } else {
3559
        /*
3560
        Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3561
        to drflac__clz() and we need to reload the cache.
3562
        */
3563
        drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
3564
        for (;;) {
3565
            if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3566
            #ifndef DR_FLAC_NO_CRC
3567
                drflac__update_crc16(bs);
3568
            #endif
3569
                bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3570
                bs_consumedBits = 0;
3571
            #ifndef DR_FLAC_NO_CRC
3572
                bs->crc16Cache = bs_cache;
3573
            #endif
3574
            } else {
3575
                /* Slow path. We need to fetch more data from the client. */
3576
                if (!drflac__reload_cache(bs)) {
3577
                    return DRFLAC_FALSE;
3578
                }
3579

3580
                bs_cache = bs->cache;
3581
                bs_consumedBits = bs->consumedBits;
3582
            }
3583

3584
            lzcount = drflac__clz(bs_cache);
3585
            zeroCounter += lzcount;
3586

3587
            if (lzcount < sizeof(bs_cache)*8) {
3588
                break;
3589
            }
3590
        }
3591

3592
        pZeroCounterOut[0] = zeroCounter;
3593
        goto extract_rice_param_part;
3594
    }
3595

3596
    /* Make sure the cache is restored at the end of it all. */
3597
    bs->cache = bs_cache;
3598
    bs->consumedBits = bs_consumedBits;
3599

3600
    return DRFLAC_TRUE;
3601
}
3602

3603
static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
3604
{
3605
    drflac_uint32  riceParamPlus1 = riceParam + 1;
3606
    drflac_uint32  riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3607

3608
    /*
3609
    The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3610
    no idea how this will work in practice...
3611
    */
3612
    drflac_cache_t bs_cache = bs->cache;
3613
    drflac_uint32  bs_consumedBits = bs->consumedBits;
3614

3615
    /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3616
    drflac_uint32  lzcount = drflac__clz(bs_cache);
3617
    if (lzcount < sizeof(bs_cache)*8) {
3618
        /*
3619
        It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3620
        this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3621
        outside of this function at a higher level.
3622
        */
3623
    extract_rice_param_part:
3624
        bs_cache       <<= lzcount;
3625
        bs_consumedBits += lzcount;
3626

3627
        if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3628
            /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3629
            bs_cache       <<= riceParamPlus1;
3630
            bs_consumedBits += riceParamPlus1;
3631
        } else {
3632
            /*
3633
            Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3634
            line, reload the cache, and then combine it with the head of the next cache line.
3635
            */
3636

3637
            /* Before reloading the cache we need to grab the size in bits of the low part. */
3638
            drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3639
            DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3640

3641
            /* Now reload the cache. */
3642
            if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3643
            #ifndef DR_FLAC_NO_CRC
3644
                drflac__update_crc16(bs);
3645
            #endif
3646
                bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3647
                bs_consumedBits = riceParamPartLoBitCount;
3648
            #ifndef DR_FLAC_NO_CRC
3649
                bs->crc16Cache = bs_cache;
3650
            #endif
3651
            } else {
3652
                /* Slow path. We need to fetch more data from the client. */
3653
                if (!drflac__reload_cache(bs)) {
3654
                    return DRFLAC_FALSE;
3655
                }
3656

3657
                if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3658
                    /* This happens when we get to end of stream */
3659
                    return DRFLAC_FALSE;
3660
                }
3661

3662
                bs_cache = bs->cache;
3663
                bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3664
            }
3665

3666
            bs_cache <<= riceParamPartLoBitCount;
3667
        }
3668
    } else {
3669
        /*
3670
        Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3671
        to drflac__clz() and we need to reload the cache.
3672
        */
3673
        for (;;) {
3674
            if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3675
            #ifndef DR_FLAC_NO_CRC
3676
                drflac__update_crc16(bs);
3677
            #endif
3678
                bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3679
                bs_consumedBits = 0;
3680
            #ifndef DR_FLAC_NO_CRC
3681
                bs->crc16Cache = bs_cache;
3682
            #endif
3683
            } else {
3684
                /* Slow path. We need to fetch more data from the client. */
3685
                if (!drflac__reload_cache(bs)) {
3686
                    return DRFLAC_FALSE;
3687
                }
3688

3689
                bs_cache = bs->cache;
3690
                bs_consumedBits = bs->consumedBits;
3691
            }
3692

3693
            lzcount = drflac__clz(bs_cache);
3694
            if (lzcount < sizeof(bs_cache)*8) {
3695
                break;
3696
            }
3697
        }
3698

3699
        goto extract_rice_param_part;
3700
    }
3701

3702
    /* Make sure the cache is restored at the end of it all. */
3703
    bs->cache = bs_cache;
3704
    bs->consumedBits = bs_consumedBits;
3705

3706
    return DRFLAC_TRUE;
3707
}
3708

3709

3710
static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3711
{
3712
    drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3713
    drflac_uint32 zeroCountPart0;
3714
    drflac_uint32 riceParamPart0;
3715
    drflac_uint32 riceParamMask;
3716
    drflac_uint32 i;
3717

3718
    DRFLAC_ASSERT(bs != NULL);
3719
    DRFLAC_ASSERT(pSamplesOut != NULL);
3720

3721
    (void)bitsPerSample;
3722
    (void)order;
3723
    (void)shift;
3724
    (void)coefficients;
3725

3726
    riceParamMask  = (drflac_uint32)~((~0UL) << riceParam);
3727

3728
    i = 0;
3729
    while (i < count) {
3730
        /* Rice extraction. */
3731
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3732
            return DRFLAC_FALSE;
3733
        }
3734

3735
        /* Rice reconstruction. */
3736
        riceParamPart0 &= riceParamMask;
3737
        riceParamPart0 |= (zeroCountPart0 << riceParam);
3738
        riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3739

3740
        pSamplesOut[i] = riceParamPart0;
3741

3742
        i += 1;
3743
    }
3744

3745
    return DRFLAC_TRUE;
3746
}
3747

3748
static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3749
{
3750
    drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3751
    drflac_uint32 zeroCountPart0 = 0;
3752
    drflac_uint32 zeroCountPart1 = 0;
3753
    drflac_uint32 zeroCountPart2 = 0;
3754
    drflac_uint32 zeroCountPart3 = 0;
3755
    drflac_uint32 riceParamPart0 = 0;
3756
    drflac_uint32 riceParamPart1 = 0;
3757
    drflac_uint32 riceParamPart2 = 0;
3758
    drflac_uint32 riceParamPart3 = 0;
3759
    drflac_uint32 riceParamMask;
3760
    const drflac_int32* pSamplesOutEnd;
3761
    drflac_uint32 i;
3762

3763
    DRFLAC_ASSERT(bs != NULL);
3764
    DRFLAC_ASSERT(pSamplesOut != NULL);
3765

3766
    if (lpcOrder == 0) {
3767
        return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
3768
    }
3769

3770
    riceParamMask  = (drflac_uint32)~((~0UL) << riceParam);
3771
    pSamplesOutEnd = pSamplesOut + (count & ~3);
3772

3773
    if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3774
        while (pSamplesOut < pSamplesOutEnd) {
3775
            /*
3776
            Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
3777
            against an array. Not sure why, but perhaps it's making more efficient use of registers?
3778
            */
3779
            if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3780
                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3781
                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3782
                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3783
                return DRFLAC_FALSE;
3784
            }
3785

3786
            riceParamPart0 &= riceParamMask;
3787
            riceParamPart1 &= riceParamMask;
3788
            riceParamPart2 &= riceParamMask;
3789
            riceParamPart3 &= riceParamMask;
3790

3791
            riceParamPart0 |= (zeroCountPart0 << riceParam);
3792
            riceParamPart1 |= (zeroCountPart1 << riceParam);
3793
            riceParamPart2 |= (zeroCountPart2 << riceParam);
3794
            riceParamPart3 |= (zeroCountPart3 << riceParam);
3795

3796
            riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3797
            riceParamPart1  = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3798
            riceParamPart2  = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3799
            riceParamPart3  = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3800

3801
            pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3802
            pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3803
            pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3804
            pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
3805

3806
            pSamplesOut += 4;
3807
        }
3808
    } else {
3809
        while (pSamplesOut < pSamplesOutEnd) {
3810
            if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3811
                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3812
                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3813
                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3814
                return DRFLAC_FALSE;
3815
            }
3816

3817
            riceParamPart0 &= riceParamMask;
3818
            riceParamPart1 &= riceParamMask;
3819
            riceParamPart2 &= riceParamMask;
3820
            riceParamPart3 &= riceParamMask;
3821

3822
            riceParamPart0 |= (zeroCountPart0 << riceParam);
3823
            riceParamPart1 |= (zeroCountPart1 << riceParam);
3824
            riceParamPart2 |= (zeroCountPart2 << riceParam);
3825
            riceParamPart3 |= (zeroCountPart3 << riceParam);
3826

3827
            riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3828
            riceParamPart1  = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3829
            riceParamPart2  = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3830
            riceParamPart3  = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3831

3832
            pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3833
            pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3834
            pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3835
            pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
3836

3837
            pSamplesOut += 4;
3838
        }
3839
    }
3840

3841
    i = (count & ~3);
3842
    while (i < count) {
3843
        /* Rice extraction. */
3844
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3845
            return DRFLAC_FALSE;
3846
        }
3847

3848
        /* Rice reconstruction. */
3849
        riceParamPart0 &= riceParamMask;
3850
        riceParamPart0 |= (zeroCountPart0 << riceParam);
3851
        riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3852
        /*riceParamPart0  = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
3853

3854
        /* Sample reconstruction. */
3855
        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3856
            pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3857
        } else {
3858
            pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3859
        }
3860

3861
        i += 1;
3862
        pSamplesOut += 1;
3863
    }
3864

3865
    return DRFLAC_TRUE;
3866
}
3867

3868
#if defined(DRFLAC_SUPPORT_SSE2)
3869
static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
3870
{
3871
    __m128i r;
3872

3873
    /* Pack. */
3874
    r = _mm_packs_epi32(a, b);
3875

3876
    /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
3877
    r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
3878

3879
    /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
3880
    r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3881
    r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3882

3883
    return r;
3884
}
3885
#endif
3886

3887
#if defined(DRFLAC_SUPPORT_SSE41)
3888
static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
3889
{
3890
    return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
3891
}
3892

3893
static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
3894
{
3895
    __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3896
    __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
3897
    return _mm_add_epi32(x64, x32);
3898
}
3899

3900
static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
3901
{
3902
    return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3903
}
3904

3905
static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
3906
{
3907
    /*
3908
    To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
3909
    is shifted with zero bits, whereas the right side is shifted with sign bits.
3910
    */
3911
    __m128i lo = _mm_srli_epi64(x, count);
3912
    __m128i hi = _mm_srai_epi32(x, count);
3913

3914
    hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0));    /* The high part needs to have the low part cleared. */
3915

3916
    return _mm_or_si128(lo, hi);
3917
}
3918

3919
static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3920
{
3921
    int i;
3922
    drflac_uint32 riceParamMask;
3923
    drflac_int32* pDecodedSamples    = pSamplesOut;
3924
    drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
3925
    drflac_uint32 zeroCountParts0 = 0;
3926
    drflac_uint32 zeroCountParts1 = 0;
3927
    drflac_uint32 zeroCountParts2 = 0;
3928
    drflac_uint32 zeroCountParts3 = 0;
3929
    drflac_uint32 riceParamParts0 = 0;
3930
    drflac_uint32 riceParamParts1 = 0;
3931
    drflac_uint32 riceParamParts2 = 0;
3932
    drflac_uint32 riceParamParts3 = 0;
3933
    __m128i coefficients128_0;
3934
    __m128i coefficients128_4;
3935
    __m128i coefficients128_8;
3936
    __m128i samples128_0;
3937
    __m128i samples128_4;
3938
    __m128i samples128_8;
3939
    __m128i riceParamMask128;
3940

3941
    const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3942

3943
    riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
3944
    riceParamMask128 = _mm_set1_epi32(riceParamMask);
3945

3946
    /* Pre-load. */
3947
    coefficients128_0 = _mm_setzero_si128();
3948
    coefficients128_4 = _mm_setzero_si128();
3949
    coefficients128_8 = _mm_setzero_si128();
3950

3951
    samples128_0 = _mm_setzero_si128();
3952
    samples128_4 = _mm_setzero_si128();
3953
    samples128_8 = _mm_setzero_si128();
3954

3955
    /*
3956
    Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
3957
    what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
3958
    in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
3959
    so I think there's opportunity for this to be simplified.
3960
    */
3961
#if 1
3962
    {
3963
        int runningOrder = order;
3964

3965
        /* 0 - 3. */
3966
        if (runningOrder >= 4) {
3967
            coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
3968
            samples128_0      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 4));
3969
            runningOrder -= 4;
3970
        } else {
3971
            switch (runningOrder) {
3972
                case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
3973
                case 2: coefficients128_0 = _mm_set_epi32(0, 0,               coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0,               0); break;
3974
                case 1: coefficients128_0 = _mm_set_epi32(0, 0,               0,               coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0,               0,               0); break;
3975
            }
3976
            runningOrder = 0;
3977
        }
3978

3979
        /* 4 - 7 */
3980
        if (runningOrder >= 4) {
3981
            coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
3982
            samples128_4      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 8));
3983
            runningOrder -= 4;
3984
        } else {
3985
            switch (runningOrder) {
3986
                case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
3987
                case 2: coefficients128_4 = _mm_set_epi32(0, 0,               coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0,               0); break;
3988
                case 1: coefficients128_4 = _mm_set_epi32(0, 0,               0,               coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0,               0,               0); break;
3989
            }
3990
            runningOrder = 0;
3991
        }
3992

3993
        /* 8 - 11 */
3994
        if (runningOrder == 4) {
3995
            coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
3996
            samples128_8      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 12));
3997
            runningOrder -= 4;
3998
        } else {
3999
            switch (runningOrder) {
4000
                case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4001
                case 2: coefficients128_8 = _mm_set_epi32(0, 0,                coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0,                0); break;
4002
                case 1: coefficients128_8 = _mm_set_epi32(0, 0,                0,               coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0,                0,                0); break;
4003
            }
4004
            runningOrder = 0;
4005
        }
4006

4007
        /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4008
        coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4009
        coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4010
        coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4011
    }
4012
#else
4013
    /* This causes strict-aliasing warnings with GCC. */
4014
    switch (order)
4015
    {
4016
    case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4017
    case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4018
    case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4019
    case 9:  ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4020
    case 8:  ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4021
    case 7:  ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4022
    case 6:  ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4023
    case 5:  ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4024
    case 4:  ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4025
    case 3:  ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4026
    case 2:  ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4027
    case 1:  ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4028
    }
4029
#endif
4030

4031
    /* For this version we are doing one sample at a time. */
4032
    while (pDecodedSamples < pDecodedSamplesEnd) {
4033
        __m128i prediction128;
4034
        __m128i zeroCountPart128;
4035
        __m128i riceParamPart128;
4036

4037
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4038
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4039
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4040
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4041
            return DRFLAC_FALSE;
4042
        }
4043

4044
        zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4045
        riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4046

4047
        riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4048
        riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4049
        riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01)));  /* <-- SSE2 compatible */
4050
        /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/   /* <-- Only supported from SSE4.1 and is slower in my testing... */
4051

4052
        if (order <= 4) {
4053
            for (i = 0; i < 4; i += 1) {
4054
                prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
4055

4056
                /* Horizontal add and shift. */
4057
                prediction128 = drflac__mm_hadd_epi32(prediction128);
4058
                prediction128 = _mm_srai_epi32(prediction128, shift);
4059
                prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4060

4061
                samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4062
                riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4063
            }
4064
        } else if (order <= 8) {
4065
            for (i = 0; i < 4; i += 1) {
4066
                prediction128 =                              _mm_mullo_epi32(coefficients128_4, samples128_4);
4067
                prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
4068

4069
                /* Horizontal add and shift. */
4070
                prediction128 = drflac__mm_hadd_epi32(prediction128);
4071
                prediction128 = _mm_srai_epi32(prediction128, shift);
4072
                prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4073

4074
                samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
4075
                samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4076
                riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4077
            }
4078
        } else {
4079
            for (i = 0; i < 4; i += 1) {
4080
                prediction128 =                              _mm_mullo_epi32(coefficients128_8, samples128_8);
4081
                prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
4082
                prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
4083

4084
                /* Horizontal add and shift. */
4085
                prediction128 = drflac__mm_hadd_epi32(prediction128);
4086
                prediction128 = _mm_srai_epi32(prediction128, shift);
4087
                prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4088

4089
                samples128_8 = _mm_alignr_epi8(samples128_4,  samples128_8, 4);
4090
                samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
4091
                samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4092
                riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4093
            }
4094
        }
4095

4096
        /* We store samples in groups of 4. */
4097
        _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4098
        pDecodedSamples += 4;
4099
    }
4100

4101
    /* Make sure we process the last few samples. */
4102
    i = (count & ~3);
4103
    while (i < (int)count) {
4104
        /* Rice extraction. */
4105
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4106
            return DRFLAC_FALSE;
4107
        }
4108

4109
        /* Rice reconstruction. */
4110
        riceParamParts0 &= riceParamMask;
4111
        riceParamParts0 |= (zeroCountParts0 << riceParam);
4112
        riceParamParts0  = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4113

4114
        /* Sample reconstruction. */
4115
        pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4116

4117
        i += 1;
4118
        pDecodedSamples += 1;
4119
    }
4120

4121
    return DRFLAC_TRUE;
4122
}
4123

4124
static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4125
{
4126
    int i;
4127
    drflac_uint32 riceParamMask;
4128
    drflac_int32* pDecodedSamples    = pSamplesOut;
4129
    drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4130
    drflac_uint32 zeroCountParts0 = 0;
4131
    drflac_uint32 zeroCountParts1 = 0;
4132
    drflac_uint32 zeroCountParts2 = 0;
4133
    drflac_uint32 zeroCountParts3 = 0;
4134
    drflac_uint32 riceParamParts0 = 0;
4135
    drflac_uint32 riceParamParts1 = 0;
4136
    drflac_uint32 riceParamParts2 = 0;
4137
    drflac_uint32 riceParamParts3 = 0;
4138
    __m128i coefficients128_0;
4139
    __m128i coefficients128_4;
4140
    __m128i coefficients128_8;
4141
    __m128i samples128_0;
4142
    __m128i samples128_4;
4143
    __m128i samples128_8;
4144
    __m128i prediction128;
4145
    __m128i riceParamMask128;
4146

4147
    const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4148

4149
    DRFLAC_ASSERT(order <= 12);
4150

4151
    riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
4152
    riceParamMask128 = _mm_set1_epi32(riceParamMask);
4153

4154
    prediction128 = _mm_setzero_si128();
4155

4156
    /* Pre-load. */
4157
    coefficients128_0  = _mm_setzero_si128();
4158
    coefficients128_4  = _mm_setzero_si128();
4159
    coefficients128_8  = _mm_setzero_si128();
4160

4161
    samples128_0  = _mm_setzero_si128();
4162
    samples128_4  = _mm_setzero_si128();
4163
    samples128_8  = _mm_setzero_si128();
4164

4165
#if 1
4166
    {
4167
        int runningOrder = order;
4168

4169
        /* 0 - 3. */
4170
        if (runningOrder >= 4) {
4171
            coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
4172
            samples128_0      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 4));
4173
            runningOrder -= 4;
4174
        } else {
4175
            switch (runningOrder) {
4176
                case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
4177
                case 2: coefficients128_0 = _mm_set_epi32(0, 0,               coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0,               0); break;
4178
                case 1: coefficients128_0 = _mm_set_epi32(0, 0,               0,               coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0,               0,               0); break;
4179
            }
4180
            runningOrder = 0;
4181
        }
4182

4183
        /* 4 - 7 */
4184
        if (runningOrder >= 4) {
4185
            coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
4186
            samples128_4      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 8));
4187
            runningOrder -= 4;
4188
        } else {
4189
            switch (runningOrder) {
4190
                case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
4191
                case 2: coefficients128_4 = _mm_set_epi32(0, 0,               coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0,               0); break;
4192
                case 1: coefficients128_4 = _mm_set_epi32(0, 0,               0,               coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0,               0,               0); break;
4193
            }
4194
            runningOrder = 0;
4195
        }
4196

4197
        /* 8 - 11 */
4198
        if (runningOrder == 4) {
4199
            coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
4200
            samples128_8      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 12));
4201
            runningOrder -= 4;
4202
        } else {
4203
            switch (runningOrder) {
4204
                case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4205
                case 2: coefficients128_8 = _mm_set_epi32(0, 0,                coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0,                0); break;
4206
                case 1: coefficients128_8 = _mm_set_epi32(0, 0,                0,               coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0,                0,                0); break;
4207
            }
4208
            runningOrder = 0;
4209
        }
4210

4211
        /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4212
        coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4213
        coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4214
        coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4215
    }
4216
#else
4217
    switch (order)
4218
    {
4219
    case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4220
    case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4221
    case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4222
    case 9:  ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4223
    case 8:  ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4224
    case 7:  ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4225
    case 6:  ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4226
    case 5:  ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4227
    case 4:  ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4228
    case 3:  ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4229
    case 2:  ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4230
    case 1:  ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4231
    }
4232
#endif
4233

4234
    /* For this version we are doing one sample at a time. */
4235
    while (pDecodedSamples < pDecodedSamplesEnd) {
4236
        __m128i zeroCountPart128;
4237
        __m128i riceParamPart128;
4238

4239
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4240
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4241
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4242
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4243
            return DRFLAC_FALSE;
4244
        }
4245

4246
        zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4247
        riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4248

4249
        riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4250
        riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4251
        riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
4252

4253
        for (i = 0; i < 4; i += 1) {
4254
            prediction128 = _mm_xor_si128(prediction128, prediction128);    /* Reset to 0. */
4255

4256
            switch (order)
4257
            {
4258
            case 12:
4259
            case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
4260
            case 10:
4261
            case  9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
4262
            case  8:
4263
            case  7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
4264
            case  6:
4265
            case  5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
4266
            case  4:
4267
            case  3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
4268
            case  2:
4269
            case  1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
4270
            }
4271

4272
            /* Horizontal add and shift. */
4273
            prediction128 = drflac__mm_hadd_epi64(prediction128);
4274
            prediction128 = drflac__mm_srai_epi64(prediction128, shift);
4275
            prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4276

4277
            /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
4278
            samples128_8 = _mm_alignr_epi8(samples128_4,  samples128_8, 4);
4279
            samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
4280
            samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4281

4282
            /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4283
            riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4284
        }
4285

4286
        /* We store samples in groups of 4. */
4287
        _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4288
        pDecodedSamples += 4;
4289
    }
4290

4291
    /* Make sure we process the last few samples. */
4292
    i = (count & ~3);
4293
    while (i < (int)count) {
4294
        /* Rice extraction. */
4295
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4296
            return DRFLAC_FALSE;
4297
        }
4298

4299
        /* Rice reconstruction. */
4300
        riceParamParts0 &= riceParamMask;
4301
        riceParamParts0 |= (zeroCountParts0 << riceParam);
4302
        riceParamParts0  = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4303

4304
        /* Sample reconstruction. */
4305
        pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4306

4307
        i += 1;
4308
        pDecodedSamples += 1;
4309
    }
4310

4311
    return DRFLAC_TRUE;
4312
}
4313

4314
static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4315
{
4316
    DRFLAC_ASSERT(bs != NULL);
4317
    DRFLAC_ASSERT(pSamplesOut != NULL);
4318

4319
    /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
4320
    if (lpcOrder > 0 && lpcOrder <= 12) {
4321
        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4322
            return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4323
        } else {
4324
            return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4325
        }
4326
    } else {
4327
        return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4328
    }
4329
}
4330
#endif
4331

4332
#if defined(DRFLAC_SUPPORT_NEON)
4333
static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
4334
{
4335
    vst1q_s32(p+0, x.val[0]);
4336
    vst1q_s32(p+4, x.val[1]);
4337
}
4338

4339
static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
4340
{
4341
    vst1q_u32(p+0, x.val[0]);
4342
    vst1q_u32(p+4, x.val[1]);
4343
}
4344

4345
static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
4346
{
4347
    vst1q_f32(p+0, x.val[0]);
4348
    vst1q_f32(p+4, x.val[1]);
4349
}
4350

4351
static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
4352
{
4353
    vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
4354
}
4355

4356
static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
4357
{
4358
    vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
4359
}
4360

4361
static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
4362
{
4363
    drflac_int32 x[4];
4364
    x[3] = x3;
4365
    x[2] = x2;
4366
    x[1] = x1;
4367
    x[0] = x0;
4368
    return vld1q_s32(x);
4369
}
4370

4371
static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
4372
{
4373
    /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4374

4375
    /* Reference */
4376
    /*return drflac__vdupq_n_s32x4(
4377
        vgetq_lane_s32(a, 0),
4378
        vgetq_lane_s32(b, 3),
4379
        vgetq_lane_s32(b, 2),
4380
        vgetq_lane_s32(b, 1)
4381
    );*/
4382

4383
    return vextq_s32(b, a, 1);
4384
}
4385

4386
static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
4387
{
4388
    /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4389

4390
    /* Reference */
4391
    /*return drflac__vdupq_n_s32x4(
4392
        vgetq_lane_s32(a, 0),
4393
        vgetq_lane_s32(b, 3),
4394
        vgetq_lane_s32(b, 2),
4395
        vgetq_lane_s32(b, 1)
4396
    );*/
4397

4398
    return vextq_u32(b, a, 1);
4399
}
4400

4401
static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
4402
{
4403
    /* The sum must end up in position 0. */
4404

4405
    /* Reference */
4406
    /*return vdupq_n_s32(
4407
        vgetq_lane_s32(x, 3) +
4408
        vgetq_lane_s32(x, 2) +
4409
        vgetq_lane_s32(x, 1) +
4410
        vgetq_lane_s32(x, 0)
4411
    );*/
4412

4413
    int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
4414
    return vpadd_s32(r, r);
4415
}
4416

4417
static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
4418
{
4419
    return vadd_s64(vget_high_s64(x), vget_low_s64(x));
4420
}
4421

4422
static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
4423
{
4424
    /* Reference */
4425
    /*return drflac__vdupq_n_s32x4(
4426
        vgetq_lane_s32(x, 0),
4427
        vgetq_lane_s32(x, 1),
4428
        vgetq_lane_s32(x, 2),
4429
        vgetq_lane_s32(x, 3)
4430
    );*/
4431

4432
    return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
4433
}
4434

4435
static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
4436
{
4437
    return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
4438
}
4439

4440
static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
4441
{
4442
    return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
4443
}
4444

4445
static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4446
{
4447
    int i;
4448
    drflac_uint32 riceParamMask;
4449
    drflac_int32* pDecodedSamples    = pSamplesOut;
4450
    drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4451
    drflac_uint32 zeroCountParts[4];
4452
    drflac_uint32 riceParamParts[4];
4453
    int32x4_t coefficients128_0;
4454
    int32x4_t coefficients128_4;
4455
    int32x4_t coefficients128_8;
4456
    int32x4_t samples128_0;
4457
    int32x4_t samples128_4;
4458
    int32x4_t samples128_8;
4459
    uint32x4_t riceParamMask128;
4460
    int32x4_t riceParam128;
4461
    int32x2_t shift64;
4462
    uint32x4_t one128;
4463

4464
    const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4465

4466
    riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
4467
    riceParamMask128 = vdupq_n_u32(riceParamMask);
4468

4469
    riceParam128 = vdupq_n_s32(riceParam);
4470
    shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4471
    one128 = vdupq_n_u32(1);
4472

4473
    /*
4474
    Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4475
    what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4476
    in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4477
    so I think there's opportunity for this to be simplified.
4478
    */
4479
    {
4480
        int runningOrder = order;
4481
        drflac_int32 tempC[4] = {0, 0, 0, 0};
4482
        drflac_int32 tempS[4] = {0, 0, 0, 0};
4483

4484
        /* 0 - 3. */
4485
        if (runningOrder >= 4) {
4486
            coefficients128_0 = vld1q_s32(coefficients + 0);
4487
            samples128_0      = vld1q_s32(pSamplesOut  - 4);
4488
            runningOrder -= 4;
4489
        } else {
4490
            switch (runningOrder) {
4491
                case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4492
                case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4493
                case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4494
            }
4495

4496
            coefficients128_0 = vld1q_s32(tempC);
4497
            samples128_0      = vld1q_s32(tempS);
4498
            runningOrder = 0;
4499
        }
4500

4501
        /* 4 - 7 */
4502
        if (runningOrder >= 4) {
4503
            coefficients128_4 = vld1q_s32(coefficients + 4);
4504
            samples128_4      = vld1q_s32(pSamplesOut  - 8);
4505
            runningOrder -= 4;
4506
        } else {
4507
            switch (runningOrder) {
4508
                case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4509
                case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4510
                case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4511
            }
4512

4513
            coefficients128_4 = vld1q_s32(tempC);
4514
            samples128_4      = vld1q_s32(tempS);
4515
            runningOrder = 0;
4516
        }
4517

4518
        /* 8 - 11 */
4519
        if (runningOrder == 4) {
4520
            coefficients128_8 = vld1q_s32(coefficients + 8);
4521
            samples128_8      = vld1q_s32(pSamplesOut  - 12);
4522
            runningOrder -= 4;
4523
        } else {
4524
            switch (runningOrder) {
4525
                case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4526
                case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4527
                case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4528
            }
4529

4530
            coefficients128_8 = vld1q_s32(tempC);
4531
            samples128_8      = vld1q_s32(tempS);
4532
            runningOrder = 0;
4533
        }
4534

4535
        /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4536
        coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4537
        coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4538
        coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4539
    }
4540

4541
    /* For this version we are doing one sample at a time. */
4542
    while (pDecodedSamples < pDecodedSamplesEnd) {
4543
        int32x4_t prediction128;
4544
        int32x2_t prediction64;
4545
        uint32x4_t zeroCountPart128;
4546
        uint32x4_t riceParamPart128;
4547

4548
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4549
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4550
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4551
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4552
            return DRFLAC_FALSE;
4553
        }
4554

4555
        zeroCountPart128 = vld1q_u32(zeroCountParts);
4556
        riceParamPart128 = vld1q_u32(riceParamParts);
4557

4558
        riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4559
        riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4560
        riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4561

4562
        if (order <= 4) {
4563
            for (i = 0; i < 4; i += 1) {
4564
                prediction128 = vmulq_s32(coefficients128_0, samples128_0);
4565

4566
                /* Horizontal add and shift. */
4567
                prediction64 = drflac__vhaddq_s32(prediction128);
4568
                prediction64 = vshl_s32(prediction64, shift64);
4569
                prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4570

4571
                samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4572
                riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4573
            }
4574
        } else if (order <= 8) {
4575
            for (i = 0; i < 4; i += 1) {
4576
                prediction128 =                vmulq_s32(coefficients128_4, samples128_4);
4577
                prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4578

4579
                /* Horizontal add and shift. */
4580
                prediction64 = drflac__vhaddq_s32(prediction128);
4581
                prediction64 = vshl_s32(prediction64, shift64);
4582
                prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4583

4584
                samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4585
                samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4586
                riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4587
            }
4588
        } else {
4589
            for (i = 0; i < 4; i += 1) {
4590
                prediction128 =                vmulq_s32(coefficients128_8, samples128_8);
4591
                prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
4592
                prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4593

4594
                /* Horizontal add and shift. */
4595
                prediction64 = drflac__vhaddq_s32(prediction128);
4596
                prediction64 = vshl_s32(prediction64, shift64);
4597
                prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4598

4599
                samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4600
                samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4601
                samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4602
                riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4603
            }
4604
        }
4605

4606
        /* We store samples in groups of 4. */
4607
        vst1q_s32(pDecodedSamples, samples128_0);
4608
        pDecodedSamples += 4;
4609
    }
4610

4611
    /* Make sure we process the last few samples. */
4612
    i = (count & ~3);
4613
    while (i < (int)count) {
4614
        /* Rice extraction. */
4615
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4616
            return DRFLAC_FALSE;
4617
        }
4618

4619
        /* Rice reconstruction. */
4620
        riceParamParts[0] &= riceParamMask;
4621
        riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4622
        riceParamParts[0]  = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4623

4624
        /* Sample reconstruction. */
4625
        pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4626

4627
        i += 1;
4628
        pDecodedSamples += 1;
4629
    }
4630

4631
    return DRFLAC_TRUE;
4632
}
4633

4634
static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4635
{
4636
    int i;
4637
    drflac_uint32 riceParamMask;
4638
    drflac_int32* pDecodedSamples    = pSamplesOut;
4639
    drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4640
    drflac_uint32 zeroCountParts[4];
4641
    drflac_uint32 riceParamParts[4];
4642
    int32x4_t coefficients128_0;
4643
    int32x4_t coefficients128_4;
4644
    int32x4_t coefficients128_8;
4645
    int32x4_t samples128_0;
4646
    int32x4_t samples128_4;
4647
    int32x4_t samples128_8;
4648
    uint32x4_t riceParamMask128;
4649
    int32x4_t riceParam128;
4650
    int64x1_t shift64;
4651
    uint32x4_t one128;
4652
    int64x2_t prediction128 = { 0 };
4653
    uint32x4_t zeroCountPart128;
4654
    uint32x4_t riceParamPart128;
4655

4656
    const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4657

4658
    riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
4659
    riceParamMask128 = vdupq_n_u32(riceParamMask);
4660

4661
    riceParam128 = vdupq_n_s32(riceParam);
4662
    shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4663
    one128 = vdupq_n_u32(1);
4664

4665
    /*
4666
    Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4667
    what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
4668
    in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4669
    so I think there's opportunity for this to be simplified.
4670
    */
4671
    {
4672
        int runningOrder = order;
4673
        drflac_int32 tempC[4] = {0, 0, 0, 0};
4674
        drflac_int32 tempS[4] = {0, 0, 0, 0};
4675

4676
        /* 0 - 3. */
4677
        if (runningOrder >= 4) {
4678
            coefficients128_0 = vld1q_s32(coefficients + 0);
4679
            samples128_0      = vld1q_s32(pSamplesOut  - 4);
4680
            runningOrder -= 4;
4681
        } else {
4682
            switch (runningOrder) {
4683
                case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4684
                case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4685
                case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4686
            }
4687

4688
            coefficients128_0 = vld1q_s32(tempC);
4689
            samples128_0      = vld1q_s32(tempS);
4690
            runningOrder = 0;
4691
        }
4692

4693
        /* 4 - 7 */
4694
        if (runningOrder >= 4) {
4695
            coefficients128_4 = vld1q_s32(coefficients + 4);
4696
            samples128_4      = vld1q_s32(pSamplesOut  - 8);
4697
            runningOrder -= 4;
4698
        } else {
4699
            switch (runningOrder) {
4700
                case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4701
                case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4702
                case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4703
            }
4704

4705
            coefficients128_4 = vld1q_s32(tempC);
4706
            samples128_4      = vld1q_s32(tempS);
4707
            runningOrder = 0;
4708
        }
4709

4710
        /* 8 - 11 */
4711
        if (runningOrder == 4) {
4712
            coefficients128_8 = vld1q_s32(coefficients + 8);
4713
            samples128_8      = vld1q_s32(pSamplesOut  - 12);
4714
            runningOrder -= 4;
4715
        } else {
4716
            switch (runningOrder) {
4717
                case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4718
                case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4719
                case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4720
            }
4721

4722
            coefficients128_8 = vld1q_s32(tempC);
4723
            samples128_8      = vld1q_s32(tempS);
4724
            runningOrder = 0;
4725
        }
4726

4727
        /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4728
        coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4729
        coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4730
        coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4731
    }
4732

4733
    /* For this version we are doing one sample at a time. */
4734
    while (pDecodedSamples < pDecodedSamplesEnd) {
4735
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4736
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4737
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4738
            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4739
            return DRFLAC_FALSE;
4740
        }
4741

4742
        zeroCountPart128 = vld1q_u32(zeroCountParts);
4743
        riceParamPart128 = vld1q_u32(riceParamParts);
4744

4745
        riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4746
        riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4747
        riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4748

4749
        for (i = 0; i < 4; i += 1) {
4750
            int64x1_t prediction64;
4751

4752
            prediction128 = veorq_s64(prediction128, prediction128);    /* Reset to 0. */
4753
            switch (order)
4754
            {
4755
            case 12:
4756
            case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
4757
            case 10:
4758
            case  9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
4759
            case  8:
4760
            case  7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
4761
            case  6:
4762
            case  5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
4763
            case  4:
4764
            case  3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
4765
            case  2:
4766
            case  1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
4767
            }
4768

4769
            /* Horizontal add and shift. */
4770
            prediction64 = drflac__vhaddq_s64(prediction128);
4771
            prediction64 = vshl_s64(prediction64, shift64);
4772
            prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
4773

4774
            /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
4775
            samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4776
            samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4777
            samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
4778

4779
            /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4780
            riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4781
        }
4782

4783
        /* We store samples in groups of 4. */
4784
        vst1q_s32(pDecodedSamples, samples128_0);
4785
        pDecodedSamples += 4;
4786
    }
4787

4788
    /* Make sure we process the last few samples. */
4789
    i = (count & ~3);
4790
    while (i < (int)count) {
4791
        /* Rice extraction. */
4792
        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4793
            return DRFLAC_FALSE;
4794
        }
4795

4796
        /* Rice reconstruction. */
4797
        riceParamParts[0] &= riceParamMask;
4798
        riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4799
        riceParamParts[0]  = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4800

4801
        /* Sample reconstruction. */
4802
        pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4803

4804
        i += 1;
4805
        pDecodedSamples += 1;
4806
    }
4807

4808
    return DRFLAC_TRUE;
4809
}
4810

4811
static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4812
{
4813
    DRFLAC_ASSERT(bs != NULL);
4814
    DRFLAC_ASSERT(pSamplesOut != NULL);
4815

4816
    /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
4817
    if (lpcOrder > 0 && lpcOrder <= 12) {
4818
        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4819
            return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4820
        } else {
4821
            return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4822
        }
4823
    } else {
4824
        return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4825
    }
4826
}
4827
#endif
4828

4829
static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4830
{
4831
#if defined(DRFLAC_SUPPORT_SSE41)
4832
    if (drflac__gIsSSE41Supported) {
4833
        return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4834
    } else
4835
#elif defined(DRFLAC_SUPPORT_NEON)
4836
    if (drflac__gIsNEONSupported) {
4837
        return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4838
    } else
4839
#endif
4840
    {
4841
        /* Scalar fallback. */
4842
    #if 0
4843
        return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4844
    #else
4845
        return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4846
    #endif
4847
    }
4848
}
4849

4850
/* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
4851
static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
4852
{
4853
    drflac_uint32 i;
4854

4855
    DRFLAC_ASSERT(bs != NULL);
4856

4857
    for (i = 0; i < count; ++i) {
4858
        if (!drflac__seek_rice_parts(bs, riceParam)) {
4859
            return DRFLAC_FALSE;
4860
        }
4861
    }
4862

4863
    return DRFLAC_TRUE;
4864
}
4865

4866
#if defined(__clang__)
4867
__attribute__((no_sanitize("signed-integer-overflow")))
4868
#endif
4869
static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4870
{
4871
    drflac_uint32 i;
4872

4873
    DRFLAC_ASSERT(bs != NULL);
4874
    DRFLAC_ASSERT(unencodedBitsPerSample <= 31);    /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
4875
    DRFLAC_ASSERT(pSamplesOut != NULL);
4876

4877
    for (i = 0; i < count; ++i) {
4878
        if (unencodedBitsPerSample > 0) {
4879
            if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
4880
                return DRFLAC_FALSE;
4881
            }
4882
        } else {
4883
            pSamplesOut[i] = 0;
4884
        }
4885

4886
        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4887
            pSamplesOut[i] += drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
4888
        } else {
4889
            pSamplesOut[i] += drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
4890
        }
4891
    }
4892

4893
    return DRFLAC_TRUE;
4894
}
4895

4896

4897
/*
4898
Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
4899
when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
4900
<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4901
*/
4902
static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
4903
{
4904
    drflac_uint8 residualMethod;
4905
    drflac_uint8 partitionOrder;
4906
    drflac_uint32 samplesInPartition;
4907
    drflac_uint32 partitionsRemaining;
4908

4909
    DRFLAC_ASSERT(bs != NULL);
4910
    DRFLAC_ASSERT(blockSize != 0);
4911
    DRFLAC_ASSERT(pDecodedSamples != NULL);       /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
4912

4913
    if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4914
        return DRFLAC_FALSE;
4915
    }
4916

4917
    if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4918
        return DRFLAC_FALSE;    /* Unknown or unsupported residual coding method. */
4919
    }
4920

4921
    /* Ignore the first <order> values. */
4922
    pDecodedSamples += lpcOrder;
4923

4924
    if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4925
        return DRFLAC_FALSE;
4926
    }
4927

4928
    /*
4929
    From the FLAC spec:
4930
      The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4931
    */
4932
    if (partitionOrder > 8) {
4933
        return DRFLAC_FALSE;
4934
    }
4935

4936
    /* Validation check. */
4937
    if ((blockSize / (1 << partitionOrder)) < lpcOrder) {
4938
        return DRFLAC_FALSE;
4939
    }
4940

4941
    samplesInPartition = (blockSize / (1 << partitionOrder)) - lpcOrder;
4942
    partitionsRemaining = (1 << partitionOrder);
4943
    for (;;) {
4944
        drflac_uint8 riceParam = 0;
4945
        if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4946
            if (!drflac__read_uint8(bs, 4, &riceParam)) {
4947
                return DRFLAC_FALSE;
4948
            }
4949
            if (riceParam == 15) {
4950
                riceParam = 0xFF;
4951
            }
4952
        } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4953
            if (!drflac__read_uint8(bs, 5, &riceParam)) {
4954
                return DRFLAC_FALSE;
4955
            }
4956
            if (riceParam == 31) {
4957
                riceParam = 0xFF;
4958
            }
4959
        }
4960

4961
        if (riceParam != 0xFF) {
4962
            if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
4963
                return DRFLAC_FALSE;
4964
            }
4965
        } else {
4966
            drflac_uint8 unencodedBitsPerSample = 0;
4967
            if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4968
                return DRFLAC_FALSE;
4969
            }
4970

4971
            if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
4972
                return DRFLAC_FALSE;
4973
            }
4974
        }
4975

4976
        pDecodedSamples += samplesInPartition;
4977

4978
        if (partitionsRemaining == 1) {
4979
            break;
4980
        }
4981

4982
        partitionsRemaining -= 1;
4983

4984
        if (partitionOrder != 0) {
4985
            samplesInPartition = blockSize / (1 << partitionOrder);
4986
        }
4987
    }
4988

4989
    return DRFLAC_TRUE;
4990
}
4991

4992
/*
4993
Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
4994
when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
4995
<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4996
*/
4997
static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
4998
{
4999
    drflac_uint8 residualMethod;
5000
    drflac_uint8 partitionOrder;
5001
    drflac_uint32 samplesInPartition;
5002
    drflac_uint32 partitionsRemaining;
5003

5004
    DRFLAC_ASSERT(bs != NULL);
5005
    DRFLAC_ASSERT(blockSize != 0);
5006

5007
    if (!drflac__read_uint8(bs, 2, &residualMethod)) {
5008
        return DRFLAC_FALSE;
5009
    }
5010

5011
    if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
5012
        return DRFLAC_FALSE;    /* Unknown or unsupported residual coding method. */
5013
    }
5014

5015
    if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
5016
        return DRFLAC_FALSE;
5017
    }
5018

5019
    /*
5020
    From the FLAC spec:
5021
      The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
5022
    */
5023
    if (partitionOrder > 8) {
5024
        return DRFLAC_FALSE;
5025
    }
5026

5027
    /* Validation check. */
5028
    if ((blockSize / (1 << partitionOrder)) <= order) {
5029
        return DRFLAC_FALSE;
5030
    }
5031

5032
    samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
5033
    partitionsRemaining = (1 << partitionOrder);
5034
    for (;;)
5035
    {
5036
        drflac_uint8 riceParam = 0;
5037
        if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
5038
            if (!drflac__read_uint8(bs, 4, &riceParam)) {
5039
                return DRFLAC_FALSE;
5040
            }
5041
            if (riceParam == 15) {
5042
                riceParam = 0xFF;
5043
            }
5044
        } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
5045
            if (!drflac__read_uint8(bs, 5, &riceParam)) {
5046
                return DRFLAC_FALSE;
5047
            }
5048
            if (riceParam == 31) {
5049
                riceParam = 0xFF;
5050
            }
5051
        }
5052

5053
        if (riceParam != 0xFF) {
5054
            if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
5055
                return DRFLAC_FALSE;
5056
            }
5057
        } else {
5058
            drflac_uint8 unencodedBitsPerSample = 0;
5059
            if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
5060
                return DRFLAC_FALSE;
5061
            }
5062

5063
            if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
5064
                return DRFLAC_FALSE;
5065
            }
5066
        }
5067

5068

5069
        if (partitionsRemaining == 1) {
5070
            break;
5071
        }
5072

5073
        partitionsRemaining -= 1;
5074
        samplesInPartition = blockSize / (1 << partitionOrder);
5075
    }
5076

5077
    return DRFLAC_TRUE;
5078
}
5079

5080

5081
static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5082
{
5083
    drflac_uint32 i;
5084

5085
    /* Only a single sample needs to be decoded here. */
5086
    drflac_int32 sample;
5087
    if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5088
        return DRFLAC_FALSE;
5089
    }
5090

5091
    /*
5092
    We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
5093
    we'll want to look at a more efficient way.
5094
    */
5095
    for (i = 0; i < blockSize; ++i) {
5096
        pDecodedSamples[i] = sample;
5097
    }
5098

5099
    return DRFLAC_TRUE;
5100
}
5101

5102
static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5103
{
5104
    drflac_uint32 i;
5105

5106
    for (i = 0; i < blockSize; ++i) {
5107
        drflac_int32 sample;
5108
        if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5109
            return DRFLAC_FALSE;
5110
        }
5111

5112
        pDecodedSamples[i] = sample;
5113
    }
5114

5115
    return DRFLAC_TRUE;
5116
}
5117

5118
static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5119
{
5120
    drflac_uint32 i;
5121

5122
    static drflac_int32 lpcCoefficientsTable[5][4] = {
5123
        {0,  0, 0,  0},
5124
        {1,  0, 0,  0},
5125
        {2, -1, 0,  0},
5126
        {3, -3, 1,  0},
5127
        {4, -6, 4, -1}
5128
    };
5129

5130
    /* Warm up samples and coefficients. */
5131
    for (i = 0; i < lpcOrder; ++i) {
5132
        drflac_int32 sample;
5133
        if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5134
            return DRFLAC_FALSE;
5135
        }
5136

5137
        pDecodedSamples[i] = sample;
5138
    }
5139

5140
    if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, 4, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
5141
        return DRFLAC_FALSE;
5142
    }
5143

5144
    return DRFLAC_TRUE;
5145
}
5146

5147
static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5148
{
5149
    drflac_uint8 i;
5150
    drflac_uint8 lpcPrecision;
5151
    drflac_int8 lpcShift;
5152
    drflac_int32 coefficients[32];
5153

5154
    /* Warm up samples. */
5155
    for (i = 0; i < lpcOrder; ++i) {
5156
        drflac_int32 sample;
5157
        if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
5158
            return DRFLAC_FALSE;
5159
        }
5160

5161
        pDecodedSamples[i] = sample;
5162
    }
5163

5164
    if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5165
        return DRFLAC_FALSE;
5166
    }
5167
    if (lpcPrecision == 15) {
5168
        return DRFLAC_FALSE;    /* Invalid. */
5169
    }
5170
    lpcPrecision += 1;
5171

5172
    if (!drflac__read_int8(bs, 5, &lpcShift)) {
5173
        return DRFLAC_FALSE;
5174
    }
5175

5176
    /*
5177
    From the FLAC specification:
5178

5179
        Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
5180

5181
    Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
5182
    not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
5183
    */
5184
    if (lpcShift < 0) {
5185
        return DRFLAC_FALSE;
5186
    }
5187

5188
    DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
5189
    for (i = 0; i < lpcOrder; ++i) {
5190
        if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
5191
            return DRFLAC_FALSE;
5192
        }
5193
    }
5194

5195
    if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
5196
        return DRFLAC_FALSE;
5197
    }
5198

5199
    return DRFLAC_TRUE;
5200
}
5201

5202

5203
static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
5204
{
5205
    const drflac_uint32 sampleRateTable[12]  = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
5206
    const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1};   /* -1 = reserved. */
5207

5208
    DRFLAC_ASSERT(bs != NULL);
5209
    DRFLAC_ASSERT(header != NULL);
5210

5211
    /* Keep looping until we find a valid sync code. */
5212
    for (;;) {
5213
        drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
5214
        drflac_uint8 reserved = 0;
5215
        drflac_uint8 blockingStrategy = 0;
5216
        drflac_uint8 blockSize = 0;
5217
        drflac_uint8 sampleRate = 0;
5218
        drflac_uint8 channelAssignment = 0;
5219
        drflac_uint8 bitsPerSample = 0;
5220
        drflac_bool32 isVariableBlockSize;
5221

5222
        if (!drflac__find_and_seek_to_next_sync_code(bs)) {
5223
            return DRFLAC_FALSE;
5224
        }
5225

5226
        if (!drflac__read_uint8(bs, 1, &reserved)) {
5227
            return DRFLAC_FALSE;
5228
        }
5229
        if (reserved == 1) {
5230
            continue;
5231
        }
5232
        crc8 = drflac_crc8(crc8, reserved, 1);
5233

5234
        if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
5235
            return DRFLAC_FALSE;
5236
        }
5237
        crc8 = drflac_crc8(crc8, blockingStrategy, 1);
5238

5239
        if (!drflac__read_uint8(bs, 4, &blockSize)) {
5240
            return DRFLAC_FALSE;
5241
        }
5242
        if (blockSize == 0) {
5243
            continue;
5244
        }
5245
        crc8 = drflac_crc8(crc8, blockSize, 4);
5246

5247
        if (!drflac__read_uint8(bs, 4, &sampleRate)) {
5248
            return DRFLAC_FALSE;
5249
        }
5250
        crc8 = drflac_crc8(crc8, sampleRate, 4);
5251

5252
        if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
5253
            return DRFLAC_FALSE;
5254
        }
5255
        if (channelAssignment > 10) {
5256
            continue;
5257
        }
5258
        crc8 = drflac_crc8(crc8, channelAssignment, 4);
5259

5260
        if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
5261
            return DRFLAC_FALSE;
5262
        }
5263
        if (bitsPerSample == 3 || bitsPerSample == 7) {
5264
            continue;
5265
        }
5266
        crc8 = drflac_crc8(crc8, bitsPerSample, 3);
5267

5268

5269
        if (!drflac__read_uint8(bs, 1, &reserved)) {
5270
            return DRFLAC_FALSE;
5271
        }
5272
        if (reserved == 1) {
5273
            continue;
5274
        }
5275
        crc8 = drflac_crc8(crc8, reserved, 1);
5276

5277

5278
        isVariableBlockSize = blockingStrategy == 1;
5279
        if (isVariableBlockSize) {
5280
            drflac_uint64 pcmFrameNumber;
5281
            drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
5282
            if (result != DRFLAC_SUCCESS) {
5283
                if (result == DRFLAC_AT_END) {
5284
                    return DRFLAC_FALSE;
5285
                } else {
5286
                    continue;
5287
                }
5288
            }
5289
            header->flacFrameNumber  = 0;
5290
            header->pcmFrameNumber = pcmFrameNumber;
5291
        } else {
5292
            drflac_uint64 flacFrameNumber = 0;
5293
            drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
5294
            if (result != DRFLAC_SUCCESS) {
5295
                if (result == DRFLAC_AT_END) {
5296
                    return DRFLAC_FALSE;
5297
                } else {
5298
                    continue;
5299
                }
5300
            }
5301
            header->flacFrameNumber  = (drflac_uint32)flacFrameNumber;   /* <-- Safe cast. */
5302
            header->pcmFrameNumber = 0;
5303
        }
5304

5305

5306
        DRFLAC_ASSERT(blockSize > 0);
5307
        if (blockSize == 1) {
5308
            header->blockSizeInPCMFrames = 192;
5309
        } else if (blockSize <= 5) {
5310
            DRFLAC_ASSERT(blockSize >= 2);
5311
            header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
5312
        } else if (blockSize == 6) {
5313
            if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
5314
                return DRFLAC_FALSE;
5315
            }
5316
            crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
5317
            header->blockSizeInPCMFrames += 1;
5318
        } else if (blockSize == 7) {
5319
            if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
5320
                return DRFLAC_FALSE;
5321
            }
5322
            crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
5323
            if (header->blockSizeInPCMFrames == 0xFFFF) {
5324
                return DRFLAC_FALSE;    /* Frame is too big. This is the size of the frame minus 1. The STREAMINFO block defines the max block size which is 16-bits. Adding one will make it 17 bits and therefore too big. */
5325
            }
5326
            header->blockSizeInPCMFrames += 1;
5327
        } else {
5328
            DRFLAC_ASSERT(blockSize >= 8);
5329
            header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
5330
        }
5331

5332

5333
        if (sampleRate <= 11) {
5334
            header->sampleRate = sampleRateTable[sampleRate];
5335
        } else if (sampleRate == 12) {
5336
            if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
5337
                return DRFLAC_FALSE;
5338
            }
5339
            crc8 = drflac_crc8(crc8, header->sampleRate, 8);
5340
            header->sampleRate *= 1000;
5341
        } else if (sampleRate == 13) {
5342
            if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5343
                return DRFLAC_FALSE;
5344
            }
5345
            crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5346
        } else if (sampleRate == 14) {
5347
            if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5348
                return DRFLAC_FALSE;
5349
            }
5350
            crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5351
            header->sampleRate *= 10;
5352
        } else {
5353
            continue;  /* Invalid. Assume an invalid block. */
5354
        }
5355

5356

5357
        header->channelAssignment = channelAssignment;
5358

5359
        header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
5360
        if (header->bitsPerSample == 0) {
5361
            header->bitsPerSample = streaminfoBitsPerSample;
5362
        }
5363

5364
        if (header->bitsPerSample != streaminfoBitsPerSample) {
5365
            /* If this subframe has a different bitsPerSample then streaminfo or the first frame, reject it */
5366
            return DRFLAC_FALSE;
5367
        }
5368

5369
        if (!drflac__read_uint8(bs, 8, &header->crc8)) {
5370
            return DRFLAC_FALSE;
5371
        }
5372

5373
#ifndef DR_FLAC_NO_CRC
5374
        if (header->crc8 != crc8) {
5375
            continue;    /* CRC mismatch. Loop back to the top and find the next sync code. */
5376
        }
5377
#endif
5378
        return DRFLAC_TRUE;
5379
    }
5380
}
5381

5382
static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
5383
{
5384
    drflac_uint8 header;
5385
    int type;
5386

5387
    if (!drflac__read_uint8(bs, 8, &header)) {
5388
        return DRFLAC_FALSE;
5389
    }
5390

5391
    /* First bit should always be 0. */
5392
    if ((header & 0x80) != 0) {
5393
        return DRFLAC_FALSE;
5394
    }
5395

5396
    type = (header & 0x7E) >> 1;
5397
    if (type == 0) {
5398
        pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
5399
    } else if (type == 1) {
5400
        pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
5401
    } else {
5402
        if ((type & 0x20) != 0) {
5403
            pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
5404
            pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
5405
        } else if ((type & 0x08) != 0) {
5406
            pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
5407
            pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
5408
            if (pSubframe->lpcOrder > 4) {
5409
                pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5410
                pSubframe->lpcOrder = 0;
5411
            }
5412
        } else {
5413
            pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5414
        }
5415
    }
5416

5417
    if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
5418
        return DRFLAC_FALSE;
5419
    }
5420

5421
    /* Wasted bits per sample. */
5422
    pSubframe->wastedBitsPerSample = 0;
5423
    if ((header & 0x01) == 1) {
5424
        unsigned int wastedBitsPerSample;
5425
        if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
5426
            return DRFLAC_FALSE;
5427
        }
5428
        pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
5429
    }
5430

5431
    return DRFLAC_TRUE;
5432
}
5433

5434
static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
5435
{
5436
    drflac_subframe* pSubframe;
5437
    drflac_uint32 subframeBitsPerSample;
5438

5439
    DRFLAC_ASSERT(bs != NULL);
5440
    DRFLAC_ASSERT(frame != NULL);
5441

5442
    pSubframe = frame->subframes + subframeIndex;
5443
    if (!drflac__read_subframe_header(bs, pSubframe)) {
5444
        return DRFLAC_FALSE;
5445
    }
5446

5447
    /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5448
    subframeBitsPerSample = frame->header.bitsPerSample;
5449
    if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5450
        subframeBitsPerSample += 1;
5451
    } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5452
        subframeBitsPerSample += 1;
5453
    }
5454

5455
    if (subframeBitsPerSample > 32) {
5456
        /* libFLAC and ffmpeg reject 33-bit subframes as well */
5457
        return DRFLAC_FALSE;
5458
    }
5459

5460
    /* Need to handle wasted bits per sample. */
5461
    if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5462
        return DRFLAC_FALSE;
5463
    }
5464
    subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5465

5466
    pSubframe->pSamplesS32 = pDecodedSamplesOut;
5467

5468
    switch (pSubframe->subframeType)
5469
    {
5470
        case DRFLAC_SUBFRAME_CONSTANT:
5471
        {
5472
            drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5473
        } break;
5474

5475
        case DRFLAC_SUBFRAME_VERBATIM:
5476
        {
5477
            drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5478
        } break;
5479

5480
        case DRFLAC_SUBFRAME_FIXED:
5481
        {
5482
            drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5483
        } break;
5484

5485
        case DRFLAC_SUBFRAME_LPC:
5486
        {
5487
            drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5488
        } break;
5489

5490
        default: return DRFLAC_FALSE;
5491
    }
5492

5493
    return DRFLAC_TRUE;
5494
}
5495

5496
static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
5497
{
5498
    drflac_subframe* pSubframe;
5499
    drflac_uint32 subframeBitsPerSample;
5500

5501
    DRFLAC_ASSERT(bs != NULL);
5502
    DRFLAC_ASSERT(frame != NULL);
5503

5504
    pSubframe = frame->subframes + subframeIndex;
5505
    if (!drflac__read_subframe_header(bs, pSubframe)) {
5506
        return DRFLAC_FALSE;
5507
    }
5508

5509
    /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5510
    subframeBitsPerSample = frame->header.bitsPerSample;
5511
    if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5512
        subframeBitsPerSample += 1;
5513
    } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5514
        subframeBitsPerSample += 1;
5515
    }
5516

5517
    /* Need to handle wasted bits per sample. */
5518
    if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5519
        return DRFLAC_FALSE;
5520
    }
5521
    subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5522

5523
    pSubframe->pSamplesS32 = NULL;
5524

5525
    switch (pSubframe->subframeType)
5526
    {
5527
        case DRFLAC_SUBFRAME_CONSTANT:
5528
        {
5529
            if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
5530
                return DRFLAC_FALSE;
5531
            }
5532
        } break;
5533

5534
        case DRFLAC_SUBFRAME_VERBATIM:
5535
        {
5536
            unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
5537
            if (!drflac__seek_bits(bs, bitsToSeek)) {
5538
                return DRFLAC_FALSE;
5539
            }
5540
        } break;
5541

5542
        case DRFLAC_SUBFRAME_FIXED:
5543
        {
5544
            unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5545
            if (!drflac__seek_bits(bs, bitsToSeek)) {
5546
                return DRFLAC_FALSE;
5547
            }
5548

5549
            if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5550
                return DRFLAC_FALSE;
5551
            }
5552
        } break;
5553

5554
        case DRFLAC_SUBFRAME_LPC:
5555
        {
5556
            drflac_uint8 lpcPrecision;
5557

5558
            unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5559
            if (!drflac__seek_bits(bs, bitsToSeek)) {
5560
                return DRFLAC_FALSE;
5561
            }
5562

5563
            if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5564
                return DRFLAC_FALSE;
5565
            }
5566
            if (lpcPrecision == 15) {
5567
                return DRFLAC_FALSE;    /* Invalid. */
5568
            }
5569
            lpcPrecision += 1;
5570

5571

5572
            bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5;    /* +5 for shift. */
5573
            if (!drflac__seek_bits(bs, bitsToSeek)) {
5574
                return DRFLAC_FALSE;
5575
            }
5576

5577
            if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5578
                return DRFLAC_FALSE;
5579
            }
5580
        } break;
5581

5582
        default: return DRFLAC_FALSE;
5583
    }
5584

5585
    return DRFLAC_TRUE;
5586
}
5587

5588

5589
static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
5590
{
5591
    drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
5592

5593
    DRFLAC_ASSERT(channelAssignment <= 10);
5594
    return lookup[channelAssignment];
5595
}
5596

5597
static drflac_result drflac__decode_flac_frame(drflac* pFlac)
5598
{
5599
    int channelCount;
5600
    int i;
5601
    drflac_uint8 paddingSizeInBits;
5602
    drflac_uint16 desiredCRC16;
5603
#ifndef DR_FLAC_NO_CRC
5604
    drflac_uint16 actualCRC16;
5605
#endif
5606

5607
    /* This function should be called while the stream is sitting on the first byte after the frame header. */
5608
    DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
5609

5610
    /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
5611
    if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
5612
        return DRFLAC_ERROR;
5613
    }
5614

5615
    /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
5616
    channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5617
    if (channelCount != (int)pFlac->channels) {
5618
        return DRFLAC_ERROR;
5619
    }
5620

5621
    for (i = 0; i < channelCount; ++i) {
5622
        if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
5623
            return DRFLAC_ERROR;
5624
        }
5625
    }
5626

5627
    paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
5628
    if (paddingSizeInBits > 0) {
5629
        drflac_uint8 padding = 0;
5630
        if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
5631
            return DRFLAC_AT_END;
5632
        }
5633
    }
5634

5635
#ifndef DR_FLAC_NO_CRC
5636
    actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5637
#endif
5638
    if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5639
        return DRFLAC_AT_END;
5640
    }
5641

5642
#ifndef DR_FLAC_NO_CRC
5643
    if (actualCRC16 != desiredCRC16) {
5644
        return DRFLAC_CRC_MISMATCH;    /* CRC mismatch. */
5645
    }
5646
#endif
5647

5648
    pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5649

5650
    return DRFLAC_SUCCESS;
5651
}
5652

5653
static drflac_result drflac__seek_flac_frame(drflac* pFlac)
5654
{
5655
    int channelCount;
5656
    int i;
5657
    drflac_uint16 desiredCRC16;
5658
#ifndef DR_FLAC_NO_CRC
5659
    drflac_uint16 actualCRC16;
5660
#endif
5661

5662
    channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5663
    for (i = 0; i < channelCount; ++i) {
5664
        if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
5665
            return DRFLAC_ERROR;
5666
        }
5667
    }
5668

5669
    /* Padding. */
5670
    if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
5671
        return DRFLAC_ERROR;
5672
    }
5673

5674
    /* CRC. */
5675
#ifndef DR_FLAC_NO_CRC
5676
    actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5677
#endif
5678
    if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5679
        return DRFLAC_AT_END;
5680
    }
5681

5682
#ifndef DR_FLAC_NO_CRC
5683
    if (actualCRC16 != desiredCRC16) {
5684
        return DRFLAC_CRC_MISMATCH;    /* CRC mismatch. */
5685
    }
5686
#endif
5687

5688
    return DRFLAC_SUCCESS;
5689
}
5690

5691
static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
5692
{
5693
    DRFLAC_ASSERT(pFlac != NULL);
5694

5695
    for (;;) {
5696
        drflac_result result;
5697

5698
        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5699
            return DRFLAC_FALSE;
5700
        }
5701

5702
        result = drflac__decode_flac_frame(pFlac);
5703
        if (result != DRFLAC_SUCCESS) {
5704
            if (result == DRFLAC_CRC_MISMATCH) {
5705
                continue;   /* CRC mismatch. Skip to the next frame. */
5706
            } else {
5707
                return DRFLAC_FALSE;
5708
            }
5709
        }
5710

5711
        return DRFLAC_TRUE;
5712
    }
5713
}
5714

5715
static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
5716
{
5717
    drflac_uint64 firstPCMFrame;
5718
    drflac_uint64 lastPCMFrame;
5719

5720
    DRFLAC_ASSERT(pFlac != NULL);
5721

5722
    firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
5723
    if (firstPCMFrame == 0) {
5724
        firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
5725
    }
5726

5727
    lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5728
    if (lastPCMFrame > 0) {
5729
        lastPCMFrame -= 1; /* Needs to be zero based. */
5730
    }
5731

5732
    if (pFirstPCMFrame) {
5733
        *pFirstPCMFrame = firstPCMFrame;
5734
    }
5735
    if (pLastPCMFrame) {
5736
        *pLastPCMFrame = lastPCMFrame;
5737
    }
5738
}
5739

5740
static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
5741
{
5742
    drflac_bool32 result;
5743

5744
    DRFLAC_ASSERT(pFlac != NULL);
5745

5746
    result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
5747

5748
    DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5749
    pFlac->currentPCMFrame = 0;
5750

5751
    return result;
5752
}
5753

5754
static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
5755
{
5756
    /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
5757
    DRFLAC_ASSERT(pFlac != NULL);
5758
    return drflac__seek_flac_frame(pFlac);
5759
}
5760

5761

5762
static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
5763
{
5764
    drflac_uint64 pcmFramesRead = 0;
5765
    while (pcmFramesToSeek > 0) {
5766
        if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5767
            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5768
                break;  /* Couldn't read the next frame, so just break from the loop and return. */
5769
            }
5770
        } else {
5771
            if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
5772
                pcmFramesRead   += pcmFramesToSeek;
5773
                pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek;   /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
5774
                pcmFramesToSeek  = 0;
5775
            } else {
5776
                pcmFramesRead   += pFlac->currentFLACFrame.pcmFramesRemaining;
5777
                pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
5778
                pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5779
            }
5780
        }
5781
    }
5782

5783
    pFlac->currentPCMFrame += pcmFramesRead;
5784
    return pcmFramesRead;
5785
}
5786

5787

5788
static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5789
{
5790
    drflac_bool32 isMidFrame = DRFLAC_FALSE;
5791
    drflac_uint64 runningPCMFrameCount;
5792

5793
    DRFLAC_ASSERT(pFlac != NULL);
5794

5795
    /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
5796
    if (pcmFrameIndex >= pFlac->currentPCMFrame) {
5797
        /* Seeking forward. Need to seek from the current position. */
5798
        runningPCMFrameCount = pFlac->currentPCMFrame;
5799

5800
        /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
5801
        if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5802
            if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5803
                return DRFLAC_FALSE;
5804
            }
5805
        } else {
5806
            isMidFrame = DRFLAC_TRUE;
5807
        }
5808
    } else {
5809
        /* Seeking backwards. Need to seek from the start of the file. */
5810
        runningPCMFrameCount = 0;
5811

5812
        /* Move back to the start. */
5813
        if (!drflac__seek_to_first_frame(pFlac)) {
5814
            return DRFLAC_FALSE;
5815
        }
5816

5817
        /* Decode the first frame in preparation for sample-exact seeking below. */
5818
        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5819
            return DRFLAC_FALSE;
5820
        }
5821
    }
5822

5823
    /*
5824
    We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
5825
    header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
5826
    */
5827
    for (;;) {
5828
        drflac_uint64 pcmFrameCountInThisFLACFrame;
5829
        drflac_uint64 firstPCMFrameInFLACFrame = 0;
5830
        drflac_uint64 lastPCMFrameInFLACFrame = 0;
5831

5832
        drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
5833

5834
        pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
5835
        if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
5836
            /*
5837
            The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
5838
            it never existed and keep iterating.
5839
            */
5840
            drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
5841

5842
            if (!isMidFrame) {
5843
                drflac_result result = drflac__decode_flac_frame(pFlac);
5844
                if (result == DRFLAC_SUCCESS) {
5845
                    /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
5846
                    return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
5847
                } else {
5848
                    if (result == DRFLAC_CRC_MISMATCH) {
5849
                        goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
5850
                    } else {
5851
                        return DRFLAC_FALSE;
5852
                    }
5853
                }
5854
            } else {
5855
                /* We started seeking mid-frame which means we need to skip the frame decoding part. */
5856
                return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
5857
            }
5858
        } else {
5859
            /*
5860
            It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
5861
            frame never existed and leave the running sample count untouched.
5862
            */
5863
            if (!isMidFrame) {
5864
                drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
5865
                if (result == DRFLAC_SUCCESS) {
5866
                    runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
5867
                } else {
5868
                    if (result == DRFLAC_CRC_MISMATCH) {
5869
                        goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
5870
                    } else {
5871
                        return DRFLAC_FALSE;
5872
                    }
5873
                }
5874
            } else {
5875
                /*
5876
                We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
5877
                drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
5878
                */
5879
                runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
5880
                pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5881
                isMidFrame = DRFLAC_FALSE;
5882
            }
5883

5884
            /* If we are seeking to the end of the file and we've just hit it, we're done. */
5885
            if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
5886
                return DRFLAC_TRUE;
5887
            }
5888
        }
5889

5890
    next_iteration:
5891
        /* Grab the next frame in preparation for the next iteration. */
5892
        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5893
            return DRFLAC_FALSE;
5894
        }
5895
    }
5896
}
5897

5898

5899
#if !defined(DR_FLAC_NO_CRC)
5900
/*
5901
We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
5902
uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
5903
location.
5904
*/
5905
#define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
5906

5907
static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
5908
{
5909
    DRFLAC_ASSERT(pFlac != NULL);
5910
    DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
5911
    DRFLAC_ASSERT(targetByte >= rangeLo);
5912
    DRFLAC_ASSERT(targetByte <= rangeHi);
5913

5914
    *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
5915

5916
    for (;;) {
5917
        /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
5918
        drflac_uint64 lastTargetByte = targetByte;
5919

5920
        /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
5921
        if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
5922
            /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
5923
            if (targetByte == 0) {
5924
                drflac__seek_to_first_frame(pFlac); /* Try to recover. */
5925
                return DRFLAC_FALSE;
5926
            }
5927

5928
            /* Halve the byte location and continue. */
5929
            targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5930
            rangeHi = targetByte;
5931
        } else {
5932
            /* Getting here should mean that we have seeked to an appropriate byte. */
5933

5934
            /* Clear the details of the FLAC frame so we don't misreport data. */
5935
            DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5936

5937
            /*
5938
            Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
5939
            CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
5940
            so it needs to stay this way for now.
5941
            */
5942
#if 1
5943
            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5944
                /* Halve the byte location and continue. */
5945
                targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5946
                rangeHi = targetByte;
5947
            } else {
5948
                break;
5949
            }
5950
#else
5951
            if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5952
                /* Halve the byte location and continue. */
5953
                targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5954
                rangeHi = targetByte;
5955
            } else {
5956
                break;
5957
            }
5958
#endif
5959
        }
5960

5961
        /* We already tried this byte and there are no more to try, break out. */
5962
        if(targetByte == lastTargetByte) {
5963
            return DRFLAC_FALSE;
5964
        }
5965
    }
5966

5967
    /* The current PCM frame needs to be updated based on the frame we just seeked to. */
5968
    drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
5969

5970
    DRFLAC_ASSERT(targetByte <= rangeHi);
5971

5972
    *pLastSuccessfulSeekOffset = targetByte;
5973
    return DRFLAC_TRUE;
5974
}
5975

5976
static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
5977
{
5978
    /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
5979
#if 0
5980
    if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
5981
        /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
5982
        if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
5983
            return DRFLAC_FALSE;
5984
        }
5985
    }
5986
#endif
5987

5988
    return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
5989
}
5990

5991

5992
static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
5993
{
5994
    /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
5995

5996
    drflac_uint64 targetByte;
5997
    drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
5998
    drflac_uint64 pcmRangeHi = 0;
5999
    drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
6000
    drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
6001
    drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
6002

6003
    targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
6004
    if (targetByte > byteRangeHi) {
6005
        targetByte = byteRangeHi;
6006
    }
6007

6008
    for (;;) {
6009
        if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
6010
            /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
6011
            drflac_uint64 newPCMRangeLo;
6012
            drflac_uint64 newPCMRangeHi;
6013
            drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
6014

6015
            /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
6016
            if (pcmRangeLo == newPCMRangeLo) {
6017
                if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
6018
                    break;  /* Failed to seek to closest frame. */
6019
                }
6020

6021
                if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
6022
                    return DRFLAC_TRUE;
6023
                } else {
6024
                    break;  /* Failed to seek forward. */
6025
                }
6026
            }
6027

6028
            pcmRangeLo = newPCMRangeLo;
6029
            pcmRangeHi = newPCMRangeHi;
6030

6031
            if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
6032
                /* The target PCM frame is in this FLAC frame. */
6033
                if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
6034
                    return DRFLAC_TRUE;
6035
                } else {
6036
                    break;  /* Failed to seek to FLAC frame. */
6037
                }
6038
            } else {
6039
                const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6040

6041
                if (pcmRangeLo > pcmFrameIndex) {
6042
                    /* We seeked too far forward. We need to move our target byte backward and try again. */
6043
                    byteRangeHi = lastSuccessfulSeekOffset;
6044
                    if (byteRangeLo > byteRangeHi) {
6045
                        byteRangeLo = byteRangeHi;
6046
                    }
6047

6048
                    targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
6049
                    if (targetByte < byteRangeLo) {
6050
                        targetByte = byteRangeLo;
6051
                    }
6052
                } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
6053
                    /* We didn't seek far enough. We need to move our target byte forward and try again. */
6054

6055
                    /* If we're close enough we can just seek forward. */
6056
                    if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
6057
                        if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
6058
                            return DRFLAC_TRUE;
6059
                        } else {
6060
                            break;  /* Failed to seek to FLAC frame. */
6061
                        }
6062
                    } else {
6063
                        byteRangeLo = lastSuccessfulSeekOffset;
6064
                        if (byteRangeHi < byteRangeLo) {
6065
                            byteRangeHi = byteRangeLo;
6066
                        }
6067

6068
                        targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
6069
                        if (targetByte > byteRangeHi) {
6070
                            targetByte = byteRangeHi;
6071
                        }
6072

6073
                        if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
6074
                            closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
6075
                        }
6076
                    }
6077
                }
6078
            }
6079
        } else {
6080
            /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
6081
            break;
6082
        }
6083
    }
6084

6085
    drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
6086
    return DRFLAC_FALSE;
6087
}
6088

6089
static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6090
{
6091
    drflac_uint64 byteRangeLo;
6092
    drflac_uint64 byteRangeHi;
6093
    drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
6094

6095
    /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
6096
    if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
6097
        return DRFLAC_FALSE;
6098
    }
6099

6100
    /* If we're close enough to the start, just move to the start and seek forward. */
6101
    if (pcmFrameIndex < seekForwardThreshold) {
6102
        return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
6103
    }
6104

6105
    /*
6106
    Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
6107
    the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
6108
    */
6109
    byteRangeLo = pFlac->firstFLACFramePosInBytes;
6110
    byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6111

6112
    return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
6113
}
6114
#endif  /* !DR_FLAC_NO_CRC */
6115

6116
static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6117
{
6118
    drflac_uint32 iClosestSeekpoint = 0;
6119
    drflac_bool32 isMidFrame = DRFLAC_FALSE;
6120
    drflac_uint64 runningPCMFrameCount;
6121
    drflac_uint32 iSeekpoint;
6122

6123

6124
    DRFLAC_ASSERT(pFlac != NULL);
6125

6126
    if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
6127
        return DRFLAC_FALSE;
6128
    }
6129

6130
    /* Do not use the seektable if pcmFramIndex is not coverd by it. */
6131
    if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) {
6132
        return DRFLAC_FALSE;
6133
    }
6134

6135
    for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
6136
        if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
6137
            break;
6138
        }
6139

6140
        iClosestSeekpoint = iSeekpoint;
6141
    }
6142

6143
    /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
6144
    if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
6145
        return DRFLAC_FALSE;
6146
    }
6147
    if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
6148
        return DRFLAC_FALSE;
6149
    }
6150

6151
#if !defined(DR_FLAC_NO_CRC)
6152
    /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
6153
    if (pFlac->totalPCMFrameCount > 0) {
6154
        drflac_uint64 byteRangeLo;
6155
        drflac_uint64 byteRangeHi;
6156

6157
        byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6158
        byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
6159

6160
        /*
6161
        If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
6162
        value for byteRangeHi which will clamp it appropriately.
6163

6164
        Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
6165
        have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
6166
        */
6167
        if (iClosestSeekpoint < pFlac->seekpointCount-1) {
6168
            drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
6169

6170
            /* Basic validation on the seekpoints to ensure they're usable. */
6171
            if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
6172
                return DRFLAC_FALSE;    /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
6173
            }
6174

6175
            if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
6176
                byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
6177
            }
6178
        }
6179

6180
        if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6181
            if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6182
                drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
6183

6184
                if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
6185
                    return DRFLAC_TRUE;
6186
                }
6187
            }
6188
        }
6189
    }
6190
#endif  /* !DR_FLAC_NO_CRC */
6191

6192
    /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
6193

6194
    /*
6195
    If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
6196
    from the seekpoint's first sample.
6197
    */
6198
    if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
6199
        /* Optimized case. Just seek forward from where we are. */
6200
        runningPCMFrameCount = pFlac->currentPCMFrame;
6201

6202
        /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
6203
        if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
6204
            if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6205
                return DRFLAC_FALSE;
6206
            }
6207
        } else {
6208
            isMidFrame = DRFLAC_TRUE;
6209
        }
6210
    } else {
6211
        /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
6212
        runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
6213

6214
        if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6215
            return DRFLAC_FALSE;
6216
        }
6217

6218
        /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
6219
        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6220
            return DRFLAC_FALSE;
6221
        }
6222
    }
6223

6224
    for (;;) {
6225
        drflac_uint64 pcmFrameCountInThisFLACFrame;
6226
        drflac_uint64 firstPCMFrameInFLACFrame = 0;
6227
        drflac_uint64 lastPCMFrameInFLACFrame = 0;
6228

6229
        drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
6230

6231
        pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
6232
        if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
6233
            /*
6234
            The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
6235
            it never existed and keep iterating.
6236
            */
6237
            drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
6238

6239
            if (!isMidFrame) {
6240
                drflac_result result = drflac__decode_flac_frame(pFlac);
6241
                if (result == DRFLAC_SUCCESS) {
6242
                    /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
6243
                    return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
6244
                } else {
6245
                    if (result == DRFLAC_CRC_MISMATCH) {
6246
                        goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
6247
                    } else {
6248
                        return DRFLAC_FALSE;
6249
                    }
6250
                }
6251
            } else {
6252
                /* We started seeking mid-frame which means we need to skip the frame decoding part. */
6253
                return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
6254
            }
6255
        } else {
6256
            /*
6257
            It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
6258
            frame never existed and leave the running sample count untouched.
6259
            */
6260
            if (!isMidFrame) {
6261
                drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
6262
                if (result == DRFLAC_SUCCESS) {
6263
                    runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
6264
                } else {
6265
                    if (result == DRFLAC_CRC_MISMATCH) {
6266
                        goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
6267
                    } else {
6268
                        return DRFLAC_FALSE;
6269
                    }
6270
                }
6271
            } else {
6272
                /*
6273
                We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
6274
                drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
6275
                */
6276
                runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
6277
                pFlac->currentFLACFrame.pcmFramesRemaining = 0;
6278
                isMidFrame = DRFLAC_FALSE;
6279
            }
6280

6281
            /* If we are seeking to the end of the file and we've just hit it, we're done. */
6282
            if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
6283
                return DRFLAC_TRUE;
6284
            }
6285
        }
6286

6287
    next_iteration:
6288
        /* Grab the next frame in preparation for the next iteration. */
6289
        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6290
            return DRFLAC_FALSE;
6291
        }
6292
    }
6293
}
6294

6295

6296
#ifndef DR_FLAC_NO_OGG
6297
typedef struct
6298
{
6299
    drflac_uint8 capturePattern[4];  /* Should be "OggS" */
6300
    drflac_uint8 structureVersion;   /* Always 0. */
6301
    drflac_uint8 headerType;
6302
    drflac_uint64 granulePosition;
6303
    drflac_uint32 serialNumber;
6304
    drflac_uint32 sequenceNumber;
6305
    drflac_uint32 checksum;
6306
    drflac_uint8 segmentCount;
6307
    drflac_uint8 segmentTable[255];
6308
} drflac_ogg_page_header;
6309
#endif
6310

6311
typedef struct
6312
{
6313
    drflac_read_proc onRead;
6314
    drflac_seek_proc onSeek;
6315
    drflac_meta_proc onMeta;
6316
    drflac_container container;
6317
    void* pUserData;
6318
    void* pUserDataMD;
6319
    drflac_uint32 sampleRate;
6320
    drflac_uint8  channels;
6321
    drflac_uint8  bitsPerSample;
6322
    drflac_uint64 totalPCMFrameCount;
6323
    drflac_uint16 maxBlockSizeInPCMFrames;
6324
    drflac_uint64 runningFilePos;
6325
    drflac_bool32 hasStreamInfoBlock;
6326
    drflac_bool32 hasMetadataBlocks;
6327
    drflac_bs bs;                           /* <-- A bit streamer is required for loading data during initialization. */
6328
    drflac_frame_header firstFrameHeader;   /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
6329

6330
#ifndef DR_FLAC_NO_OGG
6331
    drflac_uint32 oggSerial;
6332
    drflac_uint64 oggFirstBytePos;
6333
    drflac_ogg_page_header oggBosHeader;
6334
#endif
6335
} drflac_init_info;
6336

6337
static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6338
{
6339
    blockHeader = drflac__be2host_32(blockHeader);
6340
    *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
6341
    *blockType   = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
6342
    *blockSize   =                (blockHeader & 0x00FFFFFFUL);
6343
}
6344

6345
static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6346
{
6347
    drflac_uint32 blockHeader;
6348

6349
    *blockSize = 0;
6350
    if (onRead(pUserData, &blockHeader, 4) != 4) {
6351
        return DRFLAC_FALSE;
6352
    }
6353

6354
    drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
6355
    return DRFLAC_TRUE;
6356
}
6357

6358
static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
6359
{
6360
    drflac_uint32 blockSizes;
6361
    drflac_uint64 frameSizes = 0;
6362
    drflac_uint64 importantProps;
6363
    drflac_uint8 md5[16];
6364

6365
    /* min/max block size. */
6366
    if (onRead(pUserData, &blockSizes, 4) != 4) {
6367
        return DRFLAC_FALSE;
6368
    }
6369

6370
    /* min/max frame size. */
6371
    if (onRead(pUserData, &frameSizes, 6) != 6) {
6372
        return DRFLAC_FALSE;
6373
    }
6374

6375
    /* Sample rate, channels, bits per sample and total sample count. */
6376
    if (onRead(pUserData, &importantProps, 8) != 8) {
6377
        return DRFLAC_FALSE;
6378
    }
6379

6380
    /* MD5 */
6381
    if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
6382
        return DRFLAC_FALSE;
6383
    }
6384

6385
    blockSizes     = drflac__be2host_32(blockSizes);
6386
    frameSizes     = drflac__be2host_64(frameSizes);
6387
    importantProps = drflac__be2host_64(importantProps);
6388

6389
    pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
6390
    pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
6391
    pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes     &  (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
6392
    pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes     &  (((drflac_uint64)0x00FFFFFF << 16) <<  0)) >> 16);
6393
    pStreamInfo->sampleRate              = (drflac_uint32)((importantProps &  (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
6394
    pStreamInfo->channels                = (drflac_uint8 )((importantProps &  (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
6395
    pStreamInfo->bitsPerSample           = (drflac_uint8 )((importantProps &  (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
6396
    pStreamInfo->totalPCMFrameCount      =                ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
6397
    DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
6398

6399
    return DRFLAC_TRUE;
6400
}
6401

6402

6403
static void* drflac__malloc_default(size_t sz, void* pUserData)
6404
{
6405
    (void)pUserData;
6406
    return DRFLAC_MALLOC(sz);
6407
}
6408

6409
static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
6410
{
6411
    (void)pUserData;
6412
    return DRFLAC_REALLOC(p, sz);
6413
}
6414

6415
static void drflac__free_default(void* p, void* pUserData)
6416
{
6417
    (void)pUserData;
6418
    DRFLAC_FREE(p);
6419
}
6420

6421

6422
static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
6423
{
6424
    if (pAllocationCallbacks == NULL) {
6425
        return NULL;
6426
    }
6427

6428
    if (pAllocationCallbacks->onMalloc != NULL) {
6429
        return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
6430
    }
6431

6432
    /* Try using realloc(). */
6433
    if (pAllocationCallbacks->onRealloc != NULL) {
6434
        return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
6435
    }
6436

6437
    return NULL;
6438
}
6439

6440
static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
6441
{
6442
    if (pAllocationCallbacks == NULL) {
6443
        return NULL;
6444
    }
6445

6446
    if (pAllocationCallbacks->onRealloc != NULL) {
6447
        return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
6448
    }
6449

6450
    /* Try emulating realloc() in terms of malloc()/free(). */
6451
    if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
6452
        void* p2;
6453

6454
        p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
6455
        if (p2 == NULL) {
6456
            return NULL;
6457
        }
6458

6459
        if (p != NULL) {
6460
            DRFLAC_COPY_MEMORY(p2, p, szOld);
6461
            pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6462
        }
6463

6464
        return p2;
6465
    }
6466

6467
    return NULL;
6468
}
6469

6470
static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
6471
{
6472
    if (p == NULL || pAllocationCallbacks == NULL) {
6473
        return;
6474
    }
6475

6476
    if (pAllocationCallbacks->onFree != NULL) {
6477
        pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6478
    }
6479
}
6480

6481

6482
static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeekpointCount, drflac_allocation_callbacks* pAllocationCallbacks)
6483
{
6484
    /*
6485
    We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
6486
    we'll be sitting on byte 42.
6487
    */
6488
    drflac_uint64 runningFilePos = 42;
6489
    drflac_uint64 seektablePos   = 0;
6490
    drflac_uint32 seektableSize  = 0;
6491

6492
    for (;;) {
6493
        drflac_metadata metadata;
6494
        drflac_uint8 isLastBlock = 0;
6495
        drflac_uint8 blockType = 0;
6496
        drflac_uint32 blockSize;
6497
        if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
6498
            return DRFLAC_FALSE;
6499
        }
6500
        runningFilePos += 4;
6501

6502
        metadata.type = blockType;
6503
        metadata.pRawData = NULL;
6504
        metadata.rawDataSize = 0;
6505

6506
        switch (blockType)
6507
        {
6508
            case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
6509
            {
6510
                if (blockSize < 4) {
6511
                    return DRFLAC_FALSE;
6512
                }
6513

6514
                if (onMeta) {
6515
                    void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6516
                    if (pRawData == NULL) {
6517
                        return DRFLAC_FALSE;
6518
                    }
6519

6520
                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6521
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6522
                        return DRFLAC_FALSE;
6523
                    }
6524

6525
                    metadata.pRawData = pRawData;
6526
                    metadata.rawDataSize = blockSize;
6527
                    metadata.data.application.id       = drflac__be2host_32(*(drflac_uint32*)pRawData);
6528
                    metadata.data.application.pData    = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
6529
                    metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
6530
                    onMeta(pUserDataMD, &metadata);
6531

6532
                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6533
                }
6534
            } break;
6535

6536
            case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
6537
            {
6538
                seektablePos  = runningFilePos;
6539
                seektableSize = blockSize;
6540

6541
                if (onMeta) {
6542
                    drflac_uint32 seekpointCount;
6543
                    drflac_uint32 iSeekpoint;
6544
                    void* pRawData;
6545

6546
                    seekpointCount = blockSize/DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6547

6548
                    pRawData = drflac__malloc_from_callbacks(seekpointCount * sizeof(drflac_seekpoint), pAllocationCallbacks);
6549
                    if (pRawData == NULL) {
6550
                        return DRFLAC_FALSE;
6551
                    }
6552

6553
                    /* We need to read seekpoint by seekpoint and do some processing. */
6554
                    for (iSeekpoint = 0; iSeekpoint < seekpointCount; ++iSeekpoint) {
6555
                        drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
6556

6557
                        if (onRead(pUserData, pSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) != DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
6558
                            drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6559
                            return DRFLAC_FALSE;
6560
                        }
6561

6562
                        /* Endian swap. */
6563
                        pSeekpoint->firstPCMFrame   = drflac__be2host_64(pSeekpoint->firstPCMFrame);
6564
                        pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
6565
                        pSeekpoint->pcmFrameCount   = drflac__be2host_16(pSeekpoint->pcmFrameCount);
6566
                    }
6567

6568
                    metadata.pRawData = pRawData;
6569
                    metadata.rawDataSize = blockSize;
6570
                    metadata.data.seektable.seekpointCount = seekpointCount;
6571
                    metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
6572

6573
                    onMeta(pUserDataMD, &metadata);
6574

6575
                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6576
                }
6577
            } break;
6578

6579
            case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
6580
            {
6581
                if (blockSize < 8) {
6582
                    return DRFLAC_FALSE;
6583
                }
6584

6585
                if (onMeta) {
6586
                    void* pRawData;
6587
                    const char* pRunningData;
6588
                    const char* pRunningDataEnd;
6589
                    drflac_uint32 i;
6590

6591
                    pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6592
                    if (pRawData == NULL) {
6593
                        return DRFLAC_FALSE;
6594
                    }
6595

6596
                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6597
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6598
                        return DRFLAC_FALSE;
6599
                    }
6600

6601
                    metadata.pRawData = pRawData;
6602
                    metadata.rawDataSize = blockSize;
6603

6604
                    pRunningData    = (const char*)pRawData;
6605
                    pRunningDataEnd = (const char*)pRawData + blockSize;
6606

6607
                    metadata.data.vorbis_comment.vendorLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6608

6609
                    /* Need space for the rest of the block */
6610
                    if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6611
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6612
                        return DRFLAC_FALSE;
6613
                    }
6614
                    metadata.data.vorbis_comment.vendor       = pRunningData;                                            pRunningData += metadata.data.vorbis_comment.vendorLength;
6615
                    metadata.data.vorbis_comment.commentCount = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6616

6617
                    /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
6618
                    if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
6619
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6620
                        return DRFLAC_FALSE;
6621
                    }
6622
                    metadata.data.vorbis_comment.pComments    = pRunningData;
6623

6624
                    /* Check that the comments section is valid before passing it to the callback */
6625
                    for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
6626
                        drflac_uint32 commentLength;
6627

6628
                        if (pRunningDataEnd - pRunningData < 4) {
6629
                            drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6630
                            return DRFLAC_FALSE;
6631
                        }
6632

6633
                        commentLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6634
                        if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6635
                            drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6636
                            return DRFLAC_FALSE;
6637
                        }
6638
                        pRunningData += commentLength;
6639
                    }
6640

6641
                    onMeta(pUserDataMD, &metadata);
6642

6643
                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6644
                }
6645
            } break;
6646

6647
            case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
6648
            {
6649
                if (blockSize < 396) {
6650
                    return DRFLAC_FALSE;
6651
                }
6652

6653
                if (onMeta) {
6654
                    void* pRawData;
6655
                    const char* pRunningData;
6656
                    const char* pRunningDataEnd;
6657
                    size_t bufferSize;
6658
                    drflac_uint8 iTrack;
6659
                    drflac_uint8 iIndex;
6660
                    void* pTrackData;
6661

6662
                    /*
6663
                    This needs to be loaded in two passes. The first pass is used to calculate the size of the memory allocation
6664
                    we need for storing the necessary data. The second pass will fill that buffer with usable data.
6665
                    */
6666
                    pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6667
                    if (pRawData == NULL) {
6668
                        return DRFLAC_FALSE;
6669
                    }
6670

6671
                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6672
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6673
                        return DRFLAC_FALSE;
6674
                    }
6675

6676
                    metadata.pRawData = pRawData;
6677
                    metadata.rawDataSize = blockSize;
6678

6679
                    pRunningData    = (const char*)pRawData;
6680
                    pRunningDataEnd = (const char*)pRawData + blockSize;
6681

6682
                    DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128);                              pRunningData += 128;
6683
                    metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
6684
                    metadata.data.cuesheet.isCD              = (pRunningData[0] & 0x80) != 0;                           pRunningData += 259;
6685
                    metadata.data.cuesheet.trackCount        = pRunningData[0];                                         pRunningData += 1;
6686
                    metadata.data.cuesheet.pTrackData        = NULL;    /* Will be filled later. */
6687

6688
                    /* Pass 1: Calculate the size of the buffer for the track data. */
6689
                    {
6690
                        const char* pRunningDataSaved = pRunningData;   /* Will be restored at the end in preparation for the second pass. */
6691

6692
                        bufferSize = metadata.data.cuesheet.trackCount * DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES;
6693

6694
                        for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6695
                            drflac_uint8 indexCount;
6696
                            drflac_uint32 indexPointSize;
6697

6698
                            if (pRunningDataEnd - pRunningData < DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES) {
6699
                                drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6700
                                return DRFLAC_FALSE;
6701
                            }
6702

6703
                            /* Skip to the index point count */
6704
                            pRunningData += 35;
6705
                            
6706
                            indexCount = pRunningData[0];
6707
                            pRunningData += 1;
6708
                            
6709
                            bufferSize += indexCount * sizeof(drflac_cuesheet_track_index);
6710

6711
                            /* Quick validation check. */
6712
                            indexPointSize = indexCount * DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6713
                            if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
6714
                                drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6715
                                return DRFLAC_FALSE;
6716
                            }
6717

6718
                            pRunningData += indexPointSize;
6719
                        }
6720

6721
                        pRunningData = pRunningDataSaved;
6722
                    }
6723

6724
                    /* Pass 2: Allocate a buffer and fill the data. Validation was done in the step above so can be skipped. */
6725
                    {
6726
                        char* pRunningTrackData;
6727

6728
                        pTrackData = drflac__malloc_from_callbacks(bufferSize, pAllocationCallbacks);
6729
                        if (pTrackData == NULL) {
6730
                            drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6731
                            return DRFLAC_FALSE;
6732
                        }
6733

6734
                        pRunningTrackData = (char*)pTrackData;
6735

6736
                        for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6737
                            drflac_uint8 indexCount;
6738

6739
                            DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES);
6740
                            pRunningData      += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; /* Skip forward, but not beyond the last byte in the CUESHEET_TRACK block which is the index count. */
6741
                            pRunningTrackData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1;
6742

6743
                            /* Grab the index count for the next part. */
6744
                            indexCount = pRunningData[0];
6745
                            pRunningData      += 1;
6746
                            pRunningTrackData += 1;
6747

6748
                            /* Extract each track index. */
6749
                            for (iIndex = 0; iIndex < indexCount; ++iIndex) {
6750
                                drflac_cuesheet_track_index* pTrackIndex = (drflac_cuesheet_track_index*)pRunningTrackData;
6751

6752
                                DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES);
6753
                                pRunningData      += DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6754
                                pRunningTrackData += sizeof(drflac_cuesheet_track_index);
6755

6756
                                pTrackIndex->offset = drflac__be2host_64(pTrackIndex->offset);
6757
                            }
6758
                        }
6759

6760
                        metadata.data.cuesheet.pTrackData = pTrackData;
6761
                    }
6762

6763
                    /* The original data is no longer needed. */
6764
                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6765
                    pRawData = NULL;
6766

6767
                    onMeta(pUserDataMD, &metadata);
6768

6769
                    drflac__free_from_callbacks(pTrackData, pAllocationCallbacks);
6770
                    pTrackData = NULL;
6771
                }
6772
            } break;
6773

6774
            case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
6775
            {
6776
                if (blockSize < 32) {
6777
                    return DRFLAC_FALSE;
6778
                }
6779

6780
                if (onMeta) {
6781
                    void* pRawData;
6782
                    const char* pRunningData;
6783
                    const char* pRunningDataEnd;
6784

6785
                    pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6786
                    if (pRawData == NULL) {
6787
                        return DRFLAC_FALSE;
6788
                    }
6789

6790
                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6791
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6792
                        return DRFLAC_FALSE;
6793
                    }
6794

6795
                    metadata.pRawData = pRawData;
6796
                    metadata.rawDataSize = blockSize;
6797

6798
                    pRunningData    = (const char*)pRawData;
6799
                    pRunningDataEnd = (const char*)pRawData + blockSize;
6800

6801
                    metadata.data.picture.type       = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6802
                    metadata.data.picture.mimeLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6803

6804
                    /* Need space for the rest of the block */
6805
                    if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6806
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6807
                        return DRFLAC_FALSE;
6808
                    }
6809
                    metadata.data.picture.mime              = pRunningData;                                   pRunningData += metadata.data.picture.mimeLength;
6810
                    metadata.data.picture.descriptionLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6811

6812
                    /* Need space for the rest of the block */
6813
                    if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6814
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6815
                        return DRFLAC_FALSE;
6816
                    }
6817
                    metadata.data.picture.description     = pRunningData;                                   pRunningData += metadata.data.picture.descriptionLength;
6818
                    metadata.data.picture.width           = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6819
                    metadata.data.picture.height          = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6820
                    metadata.data.picture.colorDepth      = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6821
                    metadata.data.picture.indexColorCount = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6822
                    metadata.data.picture.pictureDataSize = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6823
                    metadata.data.picture.pPictureData    = (const drflac_uint8*)pRunningData;
6824

6825
                    /* Need space for the picture after the block */
6826
                    if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
6827
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6828
                        return DRFLAC_FALSE;
6829
                    }
6830

6831
                    onMeta(pUserDataMD, &metadata);
6832

6833
                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6834
                }
6835
            } break;
6836

6837
            case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
6838
            {
6839
                if (onMeta) {
6840
                    metadata.data.padding.unused = 0;
6841

6842
                    /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
6843
                    if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6844
                        isLastBlock = DRFLAC_TRUE;  /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6845
                    } else {
6846
                        onMeta(pUserDataMD, &metadata);
6847
                    }
6848
                }
6849
            } break;
6850

6851
            case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
6852
            {
6853
                /* Invalid chunk. Just skip over this one. */
6854
                if (onMeta) {
6855
                    if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6856
                        isLastBlock = DRFLAC_TRUE;  /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6857
                    }
6858
                }
6859
            } break;
6860

6861
            default:
6862
            {
6863
                /*
6864
                It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
6865
                can at the very least report the chunk to the application and let it look at the raw data.
6866
                */
6867
                if (onMeta) {
6868
                    void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6869
                    if (pRawData == NULL) {
6870
                        return DRFLAC_FALSE;
6871
                    }
6872

6873
                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6874
                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6875
                        return DRFLAC_FALSE;
6876
                    }
6877

6878
                    metadata.pRawData = pRawData;
6879
                    metadata.rawDataSize = blockSize;
6880
                    onMeta(pUserDataMD, &metadata);
6881

6882
                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6883
                }
6884
            } break;
6885
        }
6886

6887
        /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
6888
        if (onMeta == NULL && blockSize > 0) {
6889
            if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6890
                isLastBlock = DRFLAC_TRUE;
6891
            }
6892
        }
6893

6894
        runningFilePos += blockSize;
6895
        if (isLastBlock) {
6896
            break;
6897
        }
6898
    }
6899

6900
    *pSeektablePos   = seektablePos;
6901
    *pSeekpointCount = seektableSize / DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6902
    *pFirstFramePos  = runningFilePos;
6903

6904
    return DRFLAC_TRUE;
6905
}
6906

6907
static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
6908
{
6909
    /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
6910

6911
    drflac_uint8 isLastBlock;
6912
    drflac_uint8 blockType;
6913
    drflac_uint32 blockSize;
6914

6915
    (void)onSeek;
6916

6917
    pInit->container = drflac_container_native;
6918

6919
    /* The first metadata block should be the STREAMINFO block. */
6920
    if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
6921
        return DRFLAC_FALSE;
6922
    }
6923

6924
    if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
6925
        if (!relaxed) {
6926
            /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
6927
            return DRFLAC_FALSE;
6928
        } else {
6929
            /*
6930
            Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
6931
            for that frame.
6932
            */
6933
            pInit->hasStreamInfoBlock = DRFLAC_FALSE;
6934
            pInit->hasMetadataBlocks  = DRFLAC_FALSE;
6935

6936
            if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
6937
                return DRFLAC_FALSE;    /* Couldn't find a frame. */
6938
            }
6939

6940
            if (pInit->firstFrameHeader.bitsPerSample == 0) {
6941
                return DRFLAC_FALSE;    /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
6942
            }
6943

6944
            pInit->sampleRate              = pInit->firstFrameHeader.sampleRate;
6945
            pInit->channels                = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
6946
            pInit->bitsPerSample           = pInit->firstFrameHeader.bitsPerSample;
6947
            pInit->maxBlockSizeInPCMFrames = 65535;   /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
6948
            return DRFLAC_TRUE;
6949
        }
6950
    } else {
6951
        drflac_streaminfo streaminfo;
6952
        if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
6953
            return DRFLAC_FALSE;
6954
        }
6955

6956
        pInit->hasStreamInfoBlock      = DRFLAC_TRUE;
6957
        pInit->sampleRate              = streaminfo.sampleRate;
6958
        pInit->channels                = streaminfo.channels;
6959
        pInit->bitsPerSample           = streaminfo.bitsPerSample;
6960
        pInit->totalPCMFrameCount      = streaminfo.totalPCMFrameCount;
6961
        pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;    /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
6962
        pInit->hasMetadataBlocks       = !isLastBlock;
6963

6964
        if (onMeta) {
6965
            drflac_metadata metadata;
6966
            metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
6967
            metadata.pRawData = NULL;
6968
            metadata.rawDataSize = 0;
6969
            metadata.data.streaminfo = streaminfo;
6970
            onMeta(pUserDataMD, &metadata);
6971
        }
6972

6973
        return DRFLAC_TRUE;
6974
    }
6975
}
6976

6977
#ifndef DR_FLAC_NO_OGG
6978
#define DRFLAC_OGG_MAX_PAGE_SIZE            65307
6979
#define DRFLAC_OGG_CAPTURE_PATTERN_CRC32    1605413199  /* CRC-32 of "OggS". */
6980

6981
typedef enum
6982
{
6983
    drflac_ogg_recover_on_crc_mismatch,
6984
    drflac_ogg_fail_on_crc_mismatch
6985
} drflac_ogg_crc_mismatch_recovery;
6986

6987
#ifndef DR_FLAC_NO_CRC
6988
static drflac_uint32 drflac__crc32_table[] = {
6989
    0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
6990
    0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
6991
    0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
6992
    0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
6993
    0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
6994
    0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
6995
    0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
6996
    0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
6997
    0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
6998
    0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
6999
    0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
7000
    0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
7001
    0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
7002
    0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
7003
    0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
7004
    0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
7005
    0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
7006
    0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
7007
    0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
7008
    0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
7009
    0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
7010
    0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
7011
    0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
7012
    0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
7013
    0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
7014
    0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
7015
    0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
7016
    0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
7017
    0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
7018
    0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
7019
    0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
7020
    0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
7021
    0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
7022
    0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
7023
    0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
7024
    0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
7025
    0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
7026
    0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
7027
    0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
7028
    0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
7029
    0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
7030
    0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
7031
    0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
7032
    0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
7033
    0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
7034
    0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
7035
    0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
7036
    0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
7037
    0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
7038
    0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
7039
    0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
7040
    0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
7041
    0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
7042
    0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
7043
    0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
7044
    0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
7045
    0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
7046
    0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
7047
    0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
7048
    0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
7049
    0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
7050
    0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
7051
    0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
7052
    0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
7053
};
7054
#endif
7055

7056
static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
7057
{
7058
#ifndef DR_FLAC_NO_CRC
7059
    return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
7060
#else
7061
    (void)data;
7062
    return crc32;
7063
#endif
7064
}
7065

7066
#if 0
7067
static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
7068
{
7069
    crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
7070
    crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
7071
    crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >>  8) & 0xFF));
7072
    crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >>  0) & 0xFF));
7073
    return crc32;
7074
}
7075

7076
static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
7077
{
7078
    crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
7079
    crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >>  0) & 0xFFFFFFFF));
7080
    return crc32;
7081
}
7082
#endif
7083

7084
static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
7085
{
7086
    /* This can be optimized. */
7087
    drflac_uint32 i;
7088
    for (i = 0; i < dataSize; ++i) {
7089
        crc32 = drflac_crc32_byte(crc32, pData[i]);
7090
    }
7091
    return crc32;
7092
}
7093

7094

7095
static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
7096
{
7097
    return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
7098
}
7099

7100
static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
7101
{
7102
    return 27 + pHeader->segmentCount;
7103
}
7104

7105
static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
7106
{
7107
    drflac_uint32 pageBodySize = 0;
7108
    int i;
7109

7110
    for (i = 0; i < pHeader->segmentCount; ++i) {
7111
        pageBodySize += pHeader->segmentTable[i];
7112
    }
7113

7114
    return pageBodySize;
7115
}
7116

7117
static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7118
{
7119
    drflac_uint8 data[23];
7120
    drflac_uint32 i;
7121

7122
    DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
7123

7124
    if (onRead(pUserData, data, 23) != 23) {
7125
        return DRFLAC_AT_END;
7126
    }
7127
    *pBytesRead += 23;
7128

7129
    /*
7130
    It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
7131
    us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
7132
    like to have it map to the structure of the underlying data.
7133
    */
7134
    pHeader->capturePattern[0] = 'O';
7135
    pHeader->capturePattern[1] = 'g';
7136
    pHeader->capturePattern[2] = 'g';
7137
    pHeader->capturePattern[3] = 'S';
7138

7139
    pHeader->structureVersion = data[0];
7140
    pHeader->headerType       = data[1];
7141
    DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
7142
    DRFLAC_COPY_MEMORY(&pHeader->serialNumber,    &data[10], 4);
7143
    DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber,  &data[14], 4);
7144
    DRFLAC_COPY_MEMORY(&pHeader->checksum,        &data[18], 4);
7145
    pHeader->segmentCount     = data[22];
7146

7147
    /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
7148
    data[18] = 0;
7149
    data[19] = 0;
7150
    data[20] = 0;
7151
    data[21] = 0;
7152

7153
    for (i = 0; i < 23; ++i) {
7154
        *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
7155
    }
7156

7157

7158
    if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
7159
        return DRFLAC_AT_END;
7160
    }
7161
    *pBytesRead += pHeader->segmentCount;
7162

7163
    for (i = 0; i < pHeader->segmentCount; ++i) {
7164
        *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
7165
    }
7166

7167
    return DRFLAC_SUCCESS;
7168
}
7169

7170
static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7171
{
7172
    drflac_uint8 id[4];
7173

7174
    *pBytesRead = 0;
7175

7176
    if (onRead(pUserData, id, 4) != 4) {
7177
        return DRFLAC_AT_END;
7178
    }
7179
    *pBytesRead += 4;
7180

7181
    /* We need to read byte-by-byte until we find the OggS capture pattern. */
7182
    for (;;) {
7183
        if (drflac_ogg__is_capture_pattern(id)) {
7184
            drflac_result result;
7185

7186
            *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7187

7188
            result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
7189
            if (result == DRFLAC_SUCCESS) {
7190
                return DRFLAC_SUCCESS;
7191
            } else {
7192
                if (result == DRFLAC_CRC_MISMATCH) {
7193
                    continue;
7194
                } else {
7195
                    return result;
7196
                }
7197
            }
7198
        } else {
7199
            /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
7200
            id[0] = id[1];
7201
            id[1] = id[2];
7202
            id[2] = id[3];
7203
            if (onRead(pUserData, &id[3], 1) != 1) {
7204
                return DRFLAC_AT_END;
7205
            }
7206
            *pBytesRead += 1;
7207
        }
7208
    }
7209
}
7210

7211

7212
/*
7213
The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
7214
in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
7215
in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
7216
dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
7217
the physical Ogg bitstream are converted and delivered in native FLAC format.
7218
*/
7219
typedef struct
7220
{
7221
    drflac_read_proc onRead;                /* The original onRead callback from drflac_open() and family. */
7222
    drflac_seek_proc onSeek;                /* The original onSeek callback from drflac_open() and family. */
7223
    void* pUserData;                        /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
7224
    drflac_uint64 currentBytePos;           /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
7225
    drflac_uint64 firstBytePos;             /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
7226
    drflac_uint32 serialNumber;             /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
7227
    drflac_ogg_page_header bosPageHeader;   /* Used for seeking. */
7228
    drflac_ogg_page_header currentPageHeader;
7229
    drflac_uint32 bytesRemainingInPage;
7230
    drflac_uint32 pageDataSize;
7231
    drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
7232
} drflac_oggbs; /* oggbs = Ogg Bitstream */
7233

7234
static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
7235
{
7236
    size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
7237
    oggbs->currentBytePos += bytesActuallyRead;
7238

7239
    return bytesActuallyRead;
7240
}
7241

7242
static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
7243
{
7244
    if (origin == drflac_seek_origin_start) {
7245
        if (offset <= 0x7FFFFFFF) {
7246
            if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
7247
                return DRFLAC_FALSE;
7248
            }
7249
            oggbs->currentBytePos = offset;
7250

7251
            return DRFLAC_TRUE;
7252
        } else {
7253
            if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
7254
                return DRFLAC_FALSE;
7255
            }
7256
            oggbs->currentBytePos = offset;
7257

7258
            return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
7259
        }
7260
    } else {
7261
        while (offset > 0x7FFFFFFF) {
7262
            if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
7263
                return DRFLAC_FALSE;
7264
            }
7265
            oggbs->currentBytePos += 0x7FFFFFFF;
7266
            offset -= 0x7FFFFFFF;
7267
        }
7268

7269
        if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) {    /* <-- Safe cast thanks to the loop above. */
7270
            return DRFLAC_FALSE;
7271
        }
7272
        oggbs->currentBytePos += offset;
7273

7274
        return DRFLAC_TRUE;
7275
    }
7276
}
7277

7278
static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
7279
{
7280
    drflac_ogg_page_header header;
7281
    for (;;) {
7282
        drflac_uint32 crc32 = 0;
7283
        drflac_uint32 bytesRead;
7284
        drflac_uint32 pageBodySize;
7285
#ifndef DR_FLAC_NO_CRC
7286
        drflac_uint32 actualCRC32;
7287
#endif
7288

7289
        if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7290
            return DRFLAC_FALSE;
7291
        }
7292
        oggbs->currentBytePos += bytesRead;
7293

7294
        pageBodySize = drflac_ogg__get_page_body_size(&header);
7295
        if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
7296
            continue;   /* Invalid page size. Assume it's corrupted and just move to the next page. */
7297
        }
7298

7299
        if (header.serialNumber != oggbs->serialNumber) {
7300
            /* It's not a FLAC page. Skip it. */
7301
            if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
7302
                return DRFLAC_FALSE;
7303
            }
7304
            continue;
7305
        }
7306

7307

7308
        /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
7309
        if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
7310
            return DRFLAC_FALSE;
7311
        }
7312
        oggbs->pageDataSize = pageBodySize;
7313

7314
#ifndef DR_FLAC_NO_CRC
7315
        actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
7316
        if (actualCRC32 != header.checksum) {
7317
            if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
7318
                continue;   /* CRC mismatch. Skip this page. */
7319
            } else {
7320
                /*
7321
                Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
7322
                go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
7323
                seek did not fully complete.
7324
                */
7325
                drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
7326
                return DRFLAC_FALSE;
7327
            }
7328
        }
7329
#else
7330
        (void)recoveryMethod;   /* <-- Silence a warning. */
7331
#endif
7332

7333
        oggbs->currentPageHeader = header;
7334
        oggbs->bytesRemainingInPage = pageBodySize;
7335
        return DRFLAC_TRUE;
7336
    }
7337
}
7338

7339
/* Function below is unused at the moment, but I might be re-adding it later. */
7340
#if 0
7341
static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
7342
{
7343
    drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
7344
    drflac_uint8 iSeg = 0;
7345
    drflac_uint32 iByte = 0;
7346
    while (iByte < bytesConsumedInPage) {
7347
        drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7348
        if (iByte + segmentSize > bytesConsumedInPage) {
7349
            break;
7350
        } else {
7351
            iSeg += 1;
7352
            iByte += segmentSize;
7353
        }
7354
    }
7355

7356
    *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
7357
    return iSeg;
7358
}
7359

7360
static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
7361
{
7362
    /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
7363
    for (;;) {
7364
        drflac_bool32 atEndOfPage = DRFLAC_FALSE;
7365

7366
        drflac_uint8 bytesRemainingInSeg;
7367
        drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
7368

7369
        drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
7370
        for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
7371
            drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7372
            if (segmentSize < 255) {
7373
                if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
7374
                    atEndOfPage = DRFLAC_TRUE;
7375
                }
7376

7377
                break;
7378
            }
7379

7380
            bytesToEndOfPacketOrPage += segmentSize;
7381
        }
7382

7383
        /*
7384
        At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
7385
        want to load the next page and keep searching for the end of the packet.
7386
        */
7387
        drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
7388
        oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
7389

7390
        if (atEndOfPage) {
7391
            /*
7392
            We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
7393
            straddle pages.
7394
            */
7395
            if (!drflac_oggbs__goto_next_page(oggbs)) {
7396
                return DRFLAC_FALSE;
7397
            }
7398

7399
            /* If it's a fresh packet it most likely means we're at the next packet. */
7400
            if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
7401
                return DRFLAC_TRUE;
7402
            }
7403
        } else {
7404
            /* We're at the next packet. */
7405
            return DRFLAC_TRUE;
7406
        }
7407
    }
7408
}
7409

7410
static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
7411
{
7412
    /* The bitstream should be sitting on the first byte just after the header of the frame. */
7413

7414
    /* What we're actually doing here is seeking to the start of the next packet. */
7415
    return drflac_oggbs__seek_to_next_packet(oggbs);
7416
}
7417
#endif
7418

7419
static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
7420
{
7421
    drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7422
    drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
7423
    size_t bytesRead = 0;
7424

7425
    DRFLAC_ASSERT(oggbs != NULL);
7426
    DRFLAC_ASSERT(pRunningBufferOut != NULL);
7427

7428
    /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
7429
    while (bytesRead < bytesToRead) {
7430
        size_t bytesRemainingToRead = bytesToRead - bytesRead;
7431

7432
        if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
7433
            DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
7434
            bytesRead += bytesRemainingToRead;
7435
            oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
7436
            break;
7437
        }
7438

7439
        /* If we get here it means some of the requested data is contained in the next pages. */
7440
        if (oggbs->bytesRemainingInPage > 0) {
7441
            DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
7442
            bytesRead += oggbs->bytesRemainingInPage;
7443
            pRunningBufferOut += oggbs->bytesRemainingInPage;
7444
            oggbs->bytesRemainingInPage = 0;
7445
        }
7446

7447
        DRFLAC_ASSERT(bytesRemainingToRead > 0);
7448
        if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7449
            break;  /* Failed to go to the next page. Might have simply hit the end of the stream. */
7450
        }
7451
    }
7452

7453
    return bytesRead;
7454
}
7455

7456
static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
7457
{
7458
    drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7459
    int bytesSeeked = 0;
7460

7461
    DRFLAC_ASSERT(oggbs != NULL);
7462
    DRFLAC_ASSERT(offset >= 0);  /* <-- Never seek backwards. */
7463

7464
    /* Seeking is always forward which makes things a lot simpler. */
7465
    if (origin == drflac_seek_origin_start) {
7466
        if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
7467
            return DRFLAC_FALSE;
7468
        }
7469

7470
        if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7471
            return DRFLAC_FALSE;
7472
        }
7473

7474
        return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
7475
    }
7476

7477
    DRFLAC_ASSERT(origin == drflac_seek_origin_current);
7478

7479
    while (bytesSeeked < offset) {
7480
        int bytesRemainingToSeek = offset - bytesSeeked;
7481
        DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
7482

7483
        if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
7484
            bytesSeeked += bytesRemainingToSeek;
7485
            (void)bytesSeeked;  /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
7486
            oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
7487
            break;
7488
        }
7489

7490
        /* If we get here it means some of the requested data is contained in the next pages. */
7491
        if (oggbs->bytesRemainingInPage > 0) {
7492
            bytesSeeked += (int)oggbs->bytesRemainingInPage;
7493
            oggbs->bytesRemainingInPage = 0;
7494
        }
7495

7496
        DRFLAC_ASSERT(bytesRemainingToSeek > 0);
7497
        if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7498
            /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
7499
            return DRFLAC_FALSE;
7500
        }
7501
    }
7502

7503
    return DRFLAC_TRUE;
7504
}
7505

7506

7507
static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
7508
{
7509
    drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
7510
    drflac_uint64 originalBytePos;
7511
    drflac_uint64 runningGranulePosition;
7512
    drflac_uint64 runningFrameBytePos;
7513
    drflac_uint64 runningPCMFrameCount;
7514

7515
    DRFLAC_ASSERT(oggbs != NULL);
7516

7517
    originalBytePos = oggbs->currentBytePos;   /* For recovery. Points to the OggS identifier. */
7518

7519
    /* First seek to the first frame. */
7520
    if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
7521
        return DRFLAC_FALSE;
7522
    }
7523
    oggbs->bytesRemainingInPage = 0;
7524

7525
    runningGranulePosition = 0;
7526
    for (;;) {
7527
        if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7528
            drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
7529
            return DRFLAC_FALSE;   /* Never did find that sample... */
7530
        }
7531

7532
        runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
7533
        if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
7534
            break; /* The sample is somewhere in the previous page. */
7535
        }
7536

7537
        /*
7538
        At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
7539
        disregard any pages that do not begin a fresh packet.
7540
        */
7541
        if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {    /* <-- Is it a fresh page? */
7542
            if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
7543
                drflac_uint8 firstBytesInPage[2];
7544
                firstBytesInPage[0] = oggbs->pageData[0];
7545
                firstBytesInPage[1] = oggbs->pageData[1];
7546

7547
                if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) {    /* <-- Does the page begin with a frame's sync code? */
7548
                    runningGranulePosition = oggbs->currentPageHeader.granulePosition;
7549
                }
7550

7551
                continue;
7552
            }
7553
        }
7554
    }
7555

7556
    /*
7557
    We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
7558
    start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
7559
    a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
7560
    we find the one containing the target sample.
7561
    */
7562
    if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
7563
        return DRFLAC_FALSE;
7564
    }
7565
    if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7566
        return DRFLAC_FALSE;
7567
    }
7568

7569
    /*
7570
    At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
7571
    looping over these frames until we find the one containing the sample we're after.
7572
    */
7573
    runningPCMFrameCount = runningGranulePosition;
7574
    for (;;) {
7575
        /*
7576
        There are two ways to find the sample and seek past irrelevant frames:
7577
          1) Use the native FLAC decoder.
7578
          2) Use Ogg's framing system.
7579

7580
        Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
7581
        do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
7582
        duplication for the decoding of frame headers.
7583

7584
        Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
7585
        bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
7586
        standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
7587
        the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
7588
        using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
7589
        avoid the use of the drflac_bs object.
7590

7591
        Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
7592
          1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
7593
          2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
7594
          3) Simplicity.
7595
        */
7596
        drflac_uint64 firstPCMFrameInFLACFrame = 0;
7597
        drflac_uint64 lastPCMFrameInFLACFrame = 0;
7598
        drflac_uint64 pcmFrameCountInThisFrame;
7599

7600
        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7601
            return DRFLAC_FALSE;
7602
        }
7603

7604
        drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
7605

7606
        pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
7607

7608
        /* If we are seeking to the end of the file and we've just hit it, we're done. */
7609
        if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
7610
            drflac_result result = drflac__decode_flac_frame(pFlac);
7611
            if (result == DRFLAC_SUCCESS) {
7612
                pFlac->currentPCMFrame = pcmFrameIndex;
7613
                pFlac->currentFLACFrame.pcmFramesRemaining = 0;
7614
                return DRFLAC_TRUE;
7615
            } else {
7616
                return DRFLAC_FALSE;
7617
            }
7618
        }
7619

7620
        if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
7621
            /*
7622
            The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
7623
            it never existed and keep iterating.
7624
            */
7625
            drflac_result result = drflac__decode_flac_frame(pFlac);
7626
            if (result == DRFLAC_SUCCESS) {
7627
                /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
7628
                drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount);    /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
7629
                if (pcmFramesToDecode == 0) {
7630
                    return DRFLAC_TRUE;
7631
                }
7632

7633
                pFlac->currentPCMFrame = runningPCMFrameCount;
7634

7635
                return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
7636
            } else {
7637
                if (result == DRFLAC_CRC_MISMATCH) {
7638
                    continue;   /* CRC mismatch. Pretend this frame never existed. */
7639
                } else {
7640
                    return DRFLAC_FALSE;
7641
                }
7642
            }
7643
        } else {
7644
            /*
7645
            It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
7646
            frame never existed and leave the running sample count untouched.
7647
            */
7648
            drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
7649
            if (result == DRFLAC_SUCCESS) {
7650
                runningPCMFrameCount += pcmFrameCountInThisFrame;
7651
            } else {
7652
                if (result == DRFLAC_CRC_MISMATCH) {
7653
                    continue;   /* CRC mismatch. Pretend this frame never existed. */
7654
                } else {
7655
                    return DRFLAC_FALSE;
7656
                }
7657
            }
7658
        }
7659
    }
7660
}
7661

7662

7663

7664
static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
7665
{
7666
    drflac_ogg_page_header header;
7667
    drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7668
    drflac_uint32 bytesRead = 0;
7669

7670
    /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
7671
    (void)relaxed;
7672

7673
    pInit->container = drflac_container_ogg;
7674
    pInit->oggFirstBytePos = 0;
7675

7676
    /*
7677
    We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
7678
    stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
7679
    any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
7680
    */
7681
    if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7682
        return DRFLAC_FALSE;
7683
    }
7684
    pInit->runningFilePos += bytesRead;
7685

7686
    for (;;) {
7687
        int pageBodySize;
7688

7689
        /* Break if we're past the beginning of stream page. */
7690
        if ((header.headerType & 0x02) == 0) {
7691
            return DRFLAC_FALSE;
7692
        }
7693

7694
        /* Check if it's a FLAC header. */
7695
        pageBodySize = drflac_ogg__get_page_body_size(&header);
7696
        if (pageBodySize == 51) {   /* 51 = the lacing value of the FLAC header packet. */
7697
            /* It could be a FLAC page... */
7698
            drflac_uint32 bytesRemainingInPage = pageBodySize;
7699
            drflac_uint8 packetType;
7700

7701
            if (onRead(pUserData, &packetType, 1) != 1) {
7702
                return DRFLAC_FALSE;
7703
            }
7704

7705
            bytesRemainingInPage -= 1;
7706
            if (packetType == 0x7F) {
7707
                /* Increasingly more likely to be a FLAC page... */
7708
                drflac_uint8 sig[4];
7709
                if (onRead(pUserData, sig, 4) != 4) {
7710
                    return DRFLAC_FALSE;
7711
                }
7712

7713
                bytesRemainingInPage -= 4;
7714
                if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
7715
                    /* Almost certainly a FLAC page... */
7716
                    drflac_uint8 mappingVersion[2];
7717
                    if (onRead(pUserData, mappingVersion, 2) != 2) {
7718
                        return DRFLAC_FALSE;
7719
                    }
7720

7721
                    if (mappingVersion[0] != 1) {
7722
                        return DRFLAC_FALSE;   /* Only supporting version 1.x of the Ogg mapping. */
7723
                    }
7724

7725
                    /*
7726
                    The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
7727
                    be handling it in a generic way based on the serial number and packet types.
7728
                    */
7729
                    if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
7730
                        return DRFLAC_FALSE;
7731
                    }
7732

7733
                    /* Expecting the native FLAC signature "fLaC". */
7734
                    if (onRead(pUserData, sig, 4) != 4) {
7735
                        return DRFLAC_FALSE;
7736
                    }
7737

7738
                    if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
7739
                        /* The remaining data in the page should be the STREAMINFO block. */
7740
                        drflac_streaminfo streaminfo;
7741
                        drflac_uint8 isLastBlock;
7742
                        drflac_uint8 blockType;
7743
                        drflac_uint32 blockSize;
7744
                        if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
7745
                            return DRFLAC_FALSE;
7746
                        }
7747

7748
                        if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
7749
                            return DRFLAC_FALSE;    /* Invalid block type. First block must be the STREAMINFO block. */
7750
                        }
7751

7752
                        if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
7753
                            /* Success! */
7754
                            pInit->hasStreamInfoBlock      = DRFLAC_TRUE;
7755
                            pInit->sampleRate              = streaminfo.sampleRate;
7756
                            pInit->channels                = streaminfo.channels;
7757
                            pInit->bitsPerSample           = streaminfo.bitsPerSample;
7758
                            pInit->totalPCMFrameCount      = streaminfo.totalPCMFrameCount;
7759
                            pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
7760
                            pInit->hasMetadataBlocks       = !isLastBlock;
7761

7762
                            if (onMeta) {
7763
                                drflac_metadata metadata;
7764
                                metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
7765
                                metadata.pRawData = NULL;
7766
                                metadata.rawDataSize = 0;
7767
                                metadata.data.streaminfo = streaminfo;
7768
                                onMeta(pUserDataMD, &metadata);
7769
                            }
7770

7771
                            pInit->runningFilePos  += pageBodySize;
7772
                            pInit->oggFirstBytePos  = pInit->runningFilePos - 79;   /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
7773
                            pInit->oggSerial        = header.serialNumber;
7774
                            pInit->oggBosHeader     = header;
7775
                            break;
7776
                        } else {
7777
                            /* Failed to read STREAMINFO block. Aww, so close... */
7778
                            return DRFLAC_FALSE;
7779
                        }
7780
                    } else {
7781
                        /* Invalid file. */
7782
                        return DRFLAC_FALSE;
7783
                    }
7784
                } else {
7785
                    /* Not a FLAC header. Skip it. */
7786
                    if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7787
                        return DRFLAC_FALSE;
7788
                    }
7789
                }
7790
            } else {
7791
                /* Not a FLAC header. Seek past the entire page and move on to the next. */
7792
                if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7793
                    return DRFLAC_FALSE;
7794
                }
7795
            }
7796
        } else {
7797
            if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
7798
                return DRFLAC_FALSE;
7799
            }
7800
        }
7801

7802
        pInit->runningFilePos += pageBodySize;
7803

7804

7805
        /* Read the header of the next page. */
7806
        if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7807
            return DRFLAC_FALSE;
7808
        }
7809
        pInit->runningFilePos += bytesRead;
7810
    }
7811

7812
    /*
7813
    If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
7814
    packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
7815
    Ogg bistream object.
7816
    */
7817
    pInit->hasMetadataBlocks = DRFLAC_TRUE;    /* <-- Always have at least VORBIS_COMMENT metadata block. */
7818
    return DRFLAC_TRUE;
7819
}
7820
#endif
7821

7822
static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
7823
{
7824
    drflac_bool32 relaxed;
7825
    drflac_uint8 id[4];
7826

7827
    if (pInit == NULL || onRead == NULL || onSeek == NULL) {
7828
        return DRFLAC_FALSE;
7829
    }
7830

7831
    DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
7832
    pInit->onRead       = onRead;
7833
    pInit->onSeek       = onSeek;
7834
    pInit->onMeta       = onMeta;
7835
    pInit->container    = container;
7836
    pInit->pUserData    = pUserData;
7837
    pInit->pUserDataMD  = pUserDataMD;
7838

7839
    pInit->bs.onRead    = onRead;
7840
    pInit->bs.onSeek    = onSeek;
7841
    pInit->bs.pUserData = pUserData;
7842
    drflac__reset_cache(&pInit->bs);
7843

7844

7845
    /* If the container is explicitly defined then we can try opening in relaxed mode. */
7846
    relaxed = container != drflac_container_unknown;
7847

7848
    /* Skip over any ID3 tags. */
7849
    for (;;) {
7850
        if (onRead(pUserData, id, 4) != 4) {
7851
            return DRFLAC_FALSE;    /* Ran out of data. */
7852
        }
7853
        pInit->runningFilePos += 4;
7854

7855
        if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
7856
            drflac_uint8 header[6];
7857
            drflac_uint8 flags;
7858
            drflac_uint32 headerSize;
7859

7860
            if (onRead(pUserData, header, 6) != 6) {
7861
                return DRFLAC_FALSE;    /* Ran out of data. */
7862
            }
7863
            pInit->runningFilePos += 6;
7864

7865
            flags = header[1];
7866

7867
            DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
7868
            headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
7869
            if (flags & 0x10) {
7870
                headerSize += 10;
7871
            }
7872

7873
            if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
7874
                return DRFLAC_FALSE;    /* Failed to seek past the tag. */
7875
            }
7876
            pInit->runningFilePos += headerSize;
7877
        } else {
7878
            break;
7879
        }
7880
    }
7881

7882
    if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
7883
        return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7884
    }
7885
#ifndef DR_FLAC_NO_OGG
7886
    if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
7887
        return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7888
    }
7889
#endif
7890

7891
    /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
7892
    if (relaxed) {
7893
        if (container == drflac_container_native) {
7894
            return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7895
        }
7896
#ifndef DR_FLAC_NO_OGG
7897
        if (container == drflac_container_ogg) {
7898
            return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7899
        }
7900
#endif
7901
    }
7902

7903
    /* Unsupported container. */
7904
    return DRFLAC_FALSE;
7905
}
7906

7907
static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
7908
{
7909
    DRFLAC_ASSERT(pFlac != NULL);
7910
    DRFLAC_ASSERT(pInit != NULL);
7911

7912
    DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
7913
    pFlac->bs                      = pInit->bs;
7914
    pFlac->onMeta                  = pInit->onMeta;
7915
    pFlac->pUserDataMD             = pInit->pUserDataMD;
7916
    pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
7917
    pFlac->sampleRate              = pInit->sampleRate;
7918
    pFlac->channels                = (drflac_uint8)pInit->channels;
7919
    pFlac->bitsPerSample           = (drflac_uint8)pInit->bitsPerSample;
7920
    pFlac->totalPCMFrameCount      = pInit->totalPCMFrameCount;
7921
    pFlac->container               = pInit->container;
7922
}
7923

7924

7925
static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
7926
{
7927
    drflac_init_info init;
7928
    drflac_uint32 allocationSize;
7929
    drflac_uint32 wholeSIMDVectorCountPerChannel;
7930
    drflac_uint32 decodedSamplesAllocationSize;
7931
#ifndef DR_FLAC_NO_OGG
7932
    drflac_oggbs* pOggbs = NULL;
7933
#endif
7934
    drflac_uint64 firstFramePos;
7935
    drflac_uint64 seektablePos;
7936
    drflac_uint32 seekpointCount;
7937
    drflac_allocation_callbacks allocationCallbacks;
7938
    drflac* pFlac;
7939

7940
    /* CPU support first. */
7941
    drflac__init_cpu_caps();
7942

7943
    if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
7944
        return NULL;
7945
    }
7946

7947
    if (pAllocationCallbacks != NULL) {
7948
        allocationCallbacks = *pAllocationCallbacks;
7949
        if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
7950
            return NULL;    /* Invalid allocation callbacks. */
7951
        }
7952
    } else {
7953
        allocationCallbacks.pUserData = NULL;
7954
        allocationCallbacks.onMalloc  = drflac__malloc_default;
7955
        allocationCallbacks.onRealloc = drflac__realloc_default;
7956
        allocationCallbacks.onFree    = drflac__free_default;
7957
    }
7958

7959

7960
    /*
7961
    The size of the allocation for the drflac object needs to be large enough to fit the following:
7962
      1) The main members of the drflac structure
7963
      2) A block of memory large enough to store the decoded samples of the largest frame in the stream
7964
      3) If the container is Ogg, a drflac_oggbs object
7965

7966
    The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
7967
    the different SIMD instruction sets.
7968
    */
7969
    allocationSize = sizeof(drflac);
7970

7971
    /*
7972
    The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
7973
    we are supporting.
7974
    */
7975
    if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
7976
        wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
7977
    } else {
7978
        wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
7979
    }
7980

7981
    decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
7982

7983
    allocationSize += decodedSamplesAllocationSize;
7984
    allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE;  /* Allocate extra bytes to ensure we have enough for alignment. */
7985

7986
#ifndef DR_FLAC_NO_OGG
7987
    /* There's additional data required for Ogg streams. */
7988
    if (init.container == drflac_container_ogg) {
7989
        allocationSize += sizeof(drflac_oggbs);
7990

7991
        pOggbs = (drflac_oggbs*)drflac__malloc_from_callbacks(sizeof(*pOggbs), &allocationCallbacks);
7992
        if (pOggbs == NULL) {
7993
            return NULL; /*DRFLAC_OUT_OF_MEMORY;*/
7994
        }
7995

7996
        DRFLAC_ZERO_MEMORY(pOggbs, sizeof(*pOggbs));
7997
        pOggbs->onRead = onRead;
7998
        pOggbs->onSeek = onSeek;
7999
        pOggbs->pUserData = pUserData;
8000
        pOggbs->currentBytePos = init.oggFirstBytePos;
8001
        pOggbs->firstBytePos = init.oggFirstBytePos;
8002
        pOggbs->serialNumber = init.oggSerial;
8003
        pOggbs->bosPageHeader = init.oggBosHeader;
8004
        pOggbs->bytesRemainingInPage = 0;
8005
    }
8006
#endif
8007

8008
    /*
8009
    This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
8010
    consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
8011
    and decoding the metadata.
8012
    */
8013
    firstFramePos  = 42;   /* <-- We know we are at byte 42 at this point. */
8014
    seektablePos   = 0;
8015
    seekpointCount = 0;
8016
    if (init.hasMetadataBlocks) {
8017
        drflac_read_proc onReadOverride = onRead;
8018
        drflac_seek_proc onSeekOverride = onSeek;
8019
        void* pUserDataOverride = pUserData;
8020

8021
#ifndef DR_FLAC_NO_OGG
8022
        if (init.container == drflac_container_ogg) {
8023
            onReadOverride = drflac__on_read_ogg;
8024
            onSeekOverride = drflac__on_seek_ogg;
8025
            pUserDataOverride = (void*)pOggbs;
8026
        }
8027
#endif
8028

8029
        if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seekpointCount, &allocationCallbacks)) {
8030
        #ifndef DR_FLAC_NO_OGG
8031
            drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8032
        #endif
8033
            return NULL;
8034
        }
8035

8036
        allocationSize += seekpointCount * sizeof(drflac_seekpoint);
8037
    }
8038

8039

8040
    pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
8041
    if (pFlac == NULL) {
8042
    #ifndef DR_FLAC_NO_OGG
8043
        drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8044
    #endif
8045
        return NULL;
8046
    }
8047

8048
    drflac__init_from_info(pFlac, &init);
8049
    pFlac->allocationCallbacks = allocationCallbacks;
8050
    pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
8051

8052
#ifndef DR_FLAC_NO_OGG
8053
    if (init.container == drflac_container_ogg) {
8054
        drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + (seekpointCount * sizeof(drflac_seekpoint)));
8055
        DRFLAC_COPY_MEMORY(pInternalOggbs, pOggbs, sizeof(*pOggbs));
8056

8057
        /* At this point the pOggbs object has been handed over to pInternalOggbs and can be freed. */
8058
        drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8059
        pOggbs = NULL;
8060

8061
        /* The Ogg bistream needs to be layered on top of the original bitstream. */
8062
        pFlac->bs.onRead = drflac__on_read_ogg;
8063
        pFlac->bs.onSeek = drflac__on_seek_ogg;
8064
        pFlac->bs.pUserData = (void*)pInternalOggbs;
8065
        pFlac->_oggbs = (void*)pInternalOggbs;
8066
    }
8067
#endif
8068

8069
    pFlac->firstFLACFramePosInBytes = firstFramePos;
8070

8071
    /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
8072
#ifndef DR_FLAC_NO_OGG
8073
    if (init.container == drflac_container_ogg)
8074
    {
8075
        pFlac->pSeekpoints = NULL;
8076
        pFlac->seekpointCount = 0;
8077
    }
8078
    else
8079
#endif
8080
    {
8081
        /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
8082
        if (seektablePos != 0) {
8083
            pFlac->seekpointCount = seekpointCount;
8084
            pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
8085

8086
            DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
8087
            DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
8088

8089
            /* Seek to the seektable, then just read directly into our seektable buffer. */
8090
            if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
8091
                drflac_uint32 iSeekpoint;
8092

8093
                for (iSeekpoint = 0; iSeekpoint < seekpointCount; iSeekpoint += 1) {
8094
                    if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints + iSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) == DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
8095
                        /* Endian swap. */
8096
                        pFlac->pSeekpoints[iSeekpoint].firstPCMFrame   = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
8097
                        pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
8098
                        pFlac->pSeekpoints[iSeekpoint].pcmFrameCount   = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
8099
                    } else {
8100
                        /* Failed to read the seektable. Pretend we don't have one. */
8101
                        pFlac->pSeekpoints = NULL;
8102
                        pFlac->seekpointCount = 0;
8103
                        break;
8104
                    }
8105
                }
8106

8107
                /* We need to seek back to where we were. If this fails it's a critical error. */
8108
                if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
8109
                    drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8110
                    return NULL;
8111
                }
8112
            } else {
8113
                /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
8114
                pFlac->pSeekpoints = NULL;
8115
                pFlac->seekpointCount = 0;
8116
            }
8117
        }
8118
    }
8119

8120

8121
    /*
8122
    If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
8123
    the first frame.
8124
    */
8125
    if (!init.hasStreamInfoBlock) {
8126
        pFlac->currentFLACFrame.header = init.firstFrameHeader;
8127
        for (;;) {
8128
            drflac_result result = drflac__decode_flac_frame(pFlac);
8129
            if (result == DRFLAC_SUCCESS) {
8130
                break;
8131
            } else {
8132
                if (result == DRFLAC_CRC_MISMATCH) {
8133
                    if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
8134
                        drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8135
                        return NULL;
8136
                    }
8137
                    continue;
8138
                } else {
8139
                    drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8140
                    return NULL;
8141
                }
8142
            }
8143
        }
8144
    }
8145

8146
    return pFlac;
8147
}
8148

8149

8150

8151
#ifndef DR_FLAC_NO_STDIO
8152
#include <stdio.h>
8153
#ifndef DR_FLAC_NO_WCHAR
8154
#include <wchar.h>      /* For wcslen(), wcsrtombs() */
8155
#endif
8156

8157
/* Errno */
8158
/* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
8159
#include <errno.h>
8160
static drflac_result drflac_result_from_errno(int e)
8161
{
8162
    switch (e)
8163
    {
8164
        case 0: return DRFLAC_SUCCESS;
8165
    #ifdef EPERM
8166
        case EPERM: return DRFLAC_INVALID_OPERATION;
8167
    #endif
8168
    #ifdef ENOENT
8169
        case ENOENT: return DRFLAC_DOES_NOT_EXIST;
8170
    #endif
8171
    #ifdef ESRCH
8172
        case ESRCH: return DRFLAC_DOES_NOT_EXIST;
8173
    #endif
8174
    #ifdef EINTR
8175
        case EINTR: return DRFLAC_INTERRUPT;
8176
    #endif
8177
    #ifdef EIO
8178
        case EIO: return DRFLAC_IO_ERROR;
8179
    #endif
8180
    #ifdef ENXIO
8181
        case ENXIO: return DRFLAC_DOES_NOT_EXIST;
8182
    #endif
8183
    #ifdef E2BIG
8184
        case E2BIG: return DRFLAC_INVALID_ARGS;
8185
    #endif
8186
    #ifdef ENOEXEC
8187
        case ENOEXEC: return DRFLAC_INVALID_FILE;
8188
    #endif
8189
    #ifdef EBADF
8190
        case EBADF: return DRFLAC_INVALID_FILE;
8191
    #endif
8192
    #ifdef ECHILD
8193
        case ECHILD: return DRFLAC_ERROR;
8194
    #endif
8195
    #ifdef EAGAIN
8196
        case EAGAIN: return DRFLAC_UNAVAILABLE;
8197
    #endif
8198
    #ifdef ENOMEM
8199
        case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
8200
    #endif
8201
    #ifdef EACCES
8202
        case EACCES: return DRFLAC_ACCESS_DENIED;
8203
    #endif
8204
    #ifdef EFAULT
8205
        case EFAULT: return DRFLAC_BAD_ADDRESS;
8206
    #endif
8207
    #ifdef ENOTBLK
8208
        case ENOTBLK: return DRFLAC_ERROR;
8209
    #endif
8210
    #ifdef EBUSY
8211
        case EBUSY: return DRFLAC_BUSY;
8212
    #endif
8213
    #ifdef EEXIST
8214
        case EEXIST: return DRFLAC_ALREADY_EXISTS;
8215
    #endif
8216
    #ifdef EXDEV
8217
        case EXDEV: return DRFLAC_ERROR;
8218
    #endif
8219
    #ifdef ENODEV
8220
        case ENODEV: return DRFLAC_DOES_NOT_EXIST;
8221
    #endif
8222
    #ifdef ENOTDIR
8223
        case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
8224
    #endif
8225
    #ifdef EISDIR
8226
        case EISDIR: return DRFLAC_IS_DIRECTORY;
8227
    #endif
8228
    #ifdef EINVAL
8229
        case EINVAL: return DRFLAC_INVALID_ARGS;
8230
    #endif
8231
    #ifdef ENFILE
8232
        case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8233
    #endif
8234
    #ifdef EMFILE
8235
        case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8236
    #endif
8237
    #ifdef ENOTTY
8238
        case ENOTTY: return DRFLAC_INVALID_OPERATION;
8239
    #endif
8240
    #ifdef ETXTBSY
8241
        case ETXTBSY: return DRFLAC_BUSY;
8242
    #endif
8243
    #ifdef EFBIG
8244
        case EFBIG: return DRFLAC_TOO_BIG;
8245
    #endif
8246
    #ifdef ENOSPC
8247
        case ENOSPC: return DRFLAC_NO_SPACE;
8248
    #endif
8249
    #ifdef ESPIPE
8250
        case ESPIPE: return DRFLAC_BAD_SEEK;
8251
    #endif
8252
    #ifdef EROFS
8253
        case EROFS: return DRFLAC_ACCESS_DENIED;
8254
    #endif
8255
    #ifdef EMLINK
8256
        case EMLINK: return DRFLAC_TOO_MANY_LINKS;
8257
    #endif
8258
    #ifdef EPIPE
8259
        case EPIPE: return DRFLAC_BAD_PIPE;
8260
    #endif
8261
    #ifdef EDOM
8262
        case EDOM: return DRFLAC_OUT_OF_RANGE;
8263
    #endif
8264
    #ifdef ERANGE
8265
        case ERANGE: return DRFLAC_OUT_OF_RANGE;
8266
    #endif
8267
    #ifdef EDEADLK
8268
        case EDEADLK: return DRFLAC_DEADLOCK;
8269
    #endif
8270
    #ifdef ENAMETOOLONG
8271
        case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
8272
    #endif
8273
    #ifdef ENOLCK
8274
        case ENOLCK: return DRFLAC_ERROR;
8275
    #endif
8276
    #ifdef ENOSYS
8277
        case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
8278
    #endif
8279
    #ifdef ENOTEMPTY
8280
        case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
8281
    #endif
8282
    #ifdef ELOOP
8283
        case ELOOP: return DRFLAC_TOO_MANY_LINKS;
8284
    #endif
8285
    #ifdef ENOMSG
8286
        case ENOMSG: return DRFLAC_NO_MESSAGE;
8287
    #endif
8288
    #ifdef EIDRM
8289
        case EIDRM: return DRFLAC_ERROR;
8290
    #endif
8291
    #ifdef ECHRNG
8292
        case ECHRNG: return DRFLAC_ERROR;
8293
    #endif
8294
    #ifdef EL2NSYNC
8295
        case EL2NSYNC: return DRFLAC_ERROR;
8296
    #endif
8297
    #ifdef EL3HLT
8298
        case EL3HLT: return DRFLAC_ERROR;
8299
    #endif
8300
    #ifdef EL3RST
8301
        case EL3RST: return DRFLAC_ERROR;
8302
    #endif
8303
    #ifdef ELNRNG
8304
        case ELNRNG: return DRFLAC_OUT_OF_RANGE;
8305
    #endif
8306
    #ifdef EUNATCH
8307
        case EUNATCH: return DRFLAC_ERROR;
8308
    #endif
8309
    #ifdef ENOCSI
8310
        case ENOCSI: return DRFLAC_ERROR;
8311
    #endif
8312
    #ifdef EL2HLT
8313
        case EL2HLT: return DRFLAC_ERROR;
8314
    #endif
8315
    #ifdef EBADE
8316
        case EBADE: return DRFLAC_ERROR;
8317
    #endif
8318
    #ifdef EBADR
8319
        case EBADR: return DRFLAC_ERROR;
8320
    #endif
8321
    #ifdef EXFULL
8322
        case EXFULL: return DRFLAC_ERROR;
8323
    #endif
8324
    #ifdef ENOANO
8325
        case ENOANO: return DRFLAC_ERROR;
8326
    #endif
8327
    #ifdef EBADRQC
8328
        case EBADRQC: return DRFLAC_ERROR;
8329
    #endif
8330
    #ifdef EBADSLT
8331
        case EBADSLT: return DRFLAC_ERROR;
8332
    #endif
8333
    #ifdef EBFONT
8334
        case EBFONT: return DRFLAC_INVALID_FILE;
8335
    #endif
8336
    #ifdef ENOSTR
8337
        case ENOSTR: return DRFLAC_ERROR;
8338
    #endif
8339
    #ifdef ENODATA
8340
        case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
8341
    #endif
8342
    #ifdef ETIME
8343
        case ETIME: return DRFLAC_TIMEOUT;
8344
    #endif
8345
    #ifdef ENOSR
8346
        case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
8347
    #endif
8348
    #ifdef ENONET
8349
        case ENONET: return DRFLAC_NO_NETWORK;
8350
    #endif
8351
    #ifdef ENOPKG
8352
        case ENOPKG: return DRFLAC_ERROR;
8353
    #endif
8354
    #ifdef EREMOTE
8355
        case EREMOTE: return DRFLAC_ERROR;
8356
    #endif
8357
    #ifdef ENOLINK
8358
        case ENOLINK: return DRFLAC_ERROR;
8359
    #endif
8360
    #ifdef EADV
8361
        case EADV: return DRFLAC_ERROR;
8362
    #endif
8363
    #ifdef ESRMNT
8364
        case ESRMNT: return DRFLAC_ERROR;
8365
    #endif
8366
    #ifdef ECOMM
8367
        case ECOMM: return DRFLAC_ERROR;
8368
    #endif
8369
    #ifdef EPROTO
8370
        case EPROTO: return DRFLAC_ERROR;
8371
    #endif
8372
    #ifdef EMULTIHOP
8373
        case EMULTIHOP: return DRFLAC_ERROR;
8374
    #endif
8375
    #ifdef EDOTDOT
8376
        case EDOTDOT: return DRFLAC_ERROR;
8377
    #endif
8378
    #ifdef EBADMSG
8379
        case EBADMSG: return DRFLAC_BAD_MESSAGE;
8380
    #endif
8381
    #ifdef EOVERFLOW
8382
        case EOVERFLOW: return DRFLAC_TOO_BIG;
8383
    #endif
8384
    #ifdef ENOTUNIQ
8385
        case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
8386
    #endif
8387
    #ifdef EBADFD
8388
        case EBADFD: return DRFLAC_ERROR;
8389
    #endif
8390
    #ifdef EREMCHG
8391
        case EREMCHG: return DRFLAC_ERROR;
8392
    #endif
8393
    #ifdef ELIBACC
8394
        case ELIBACC: return DRFLAC_ACCESS_DENIED;
8395
    #endif
8396
    #ifdef ELIBBAD
8397
        case ELIBBAD: return DRFLAC_INVALID_FILE;
8398
    #endif
8399
    #ifdef ELIBSCN
8400
        case ELIBSCN: return DRFLAC_INVALID_FILE;
8401
    #endif
8402
    #ifdef ELIBMAX
8403
        case ELIBMAX: return DRFLAC_ERROR;
8404
    #endif
8405
    #ifdef ELIBEXEC
8406
        case ELIBEXEC: return DRFLAC_ERROR;
8407
    #endif
8408
    #ifdef EILSEQ
8409
        case EILSEQ: return DRFLAC_INVALID_DATA;
8410
    #endif
8411
    #ifdef ERESTART
8412
        case ERESTART: return DRFLAC_ERROR;
8413
    #endif
8414
    #ifdef ESTRPIPE
8415
        case ESTRPIPE: return DRFLAC_ERROR;
8416
    #endif
8417
    #ifdef EUSERS
8418
        case EUSERS: return DRFLAC_ERROR;
8419
    #endif
8420
    #ifdef ENOTSOCK
8421
        case ENOTSOCK: return DRFLAC_NOT_SOCKET;
8422
    #endif
8423
    #ifdef EDESTADDRREQ
8424
        case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
8425
    #endif
8426
    #ifdef EMSGSIZE
8427
        case EMSGSIZE: return DRFLAC_TOO_BIG;
8428
    #endif
8429
    #ifdef EPROTOTYPE
8430
        case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
8431
    #endif
8432
    #ifdef ENOPROTOOPT
8433
        case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
8434
    #endif
8435
    #ifdef EPROTONOSUPPORT
8436
        case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
8437
    #endif
8438
    #ifdef ESOCKTNOSUPPORT
8439
        case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
8440
    #endif
8441
    #ifdef EOPNOTSUPP
8442
        case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
8443
    #endif
8444
    #ifdef EPFNOSUPPORT
8445
        case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
8446
    #endif
8447
    #ifdef EAFNOSUPPORT
8448
        case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
8449
    #endif
8450
    #ifdef EADDRINUSE
8451
        case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
8452
    #endif
8453
    #ifdef EADDRNOTAVAIL
8454
        case EADDRNOTAVAIL: return DRFLAC_ERROR;
8455
    #endif
8456
    #ifdef ENETDOWN
8457
        case ENETDOWN: return DRFLAC_NO_NETWORK;
8458
    #endif
8459
    #ifdef ENETUNREACH
8460
        case ENETUNREACH: return DRFLAC_NO_NETWORK;
8461
    #endif
8462
    #ifdef ENETRESET
8463
        case ENETRESET: return DRFLAC_NO_NETWORK;
8464
    #endif
8465
    #ifdef ECONNABORTED
8466
        case ECONNABORTED: return DRFLAC_NO_NETWORK;
8467
    #endif
8468
    #ifdef ECONNRESET
8469
        case ECONNRESET: return DRFLAC_CONNECTION_RESET;
8470
    #endif
8471
    #ifdef ENOBUFS
8472
        case ENOBUFS: return DRFLAC_NO_SPACE;
8473
    #endif
8474
    #ifdef EISCONN
8475
        case EISCONN: return DRFLAC_ALREADY_CONNECTED;
8476
    #endif
8477
    #ifdef ENOTCONN
8478
        case ENOTCONN: return DRFLAC_NOT_CONNECTED;
8479
    #endif
8480
    #ifdef ESHUTDOWN
8481
        case ESHUTDOWN: return DRFLAC_ERROR;
8482
    #endif
8483
    #ifdef ETOOMANYREFS
8484
        case ETOOMANYREFS: return DRFLAC_ERROR;
8485
    #endif
8486
    #ifdef ETIMEDOUT
8487
        case ETIMEDOUT: return DRFLAC_TIMEOUT;
8488
    #endif
8489
    #ifdef ECONNREFUSED
8490
        case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
8491
    #endif
8492
    #ifdef EHOSTDOWN
8493
        case EHOSTDOWN: return DRFLAC_NO_HOST;
8494
    #endif
8495
    #ifdef EHOSTUNREACH
8496
        case EHOSTUNREACH: return DRFLAC_NO_HOST;
8497
    #endif
8498
    #ifdef EALREADY
8499
        case EALREADY: return DRFLAC_IN_PROGRESS;
8500
    #endif
8501
    #ifdef EINPROGRESS
8502
        case EINPROGRESS: return DRFLAC_IN_PROGRESS;
8503
    #endif
8504
    #ifdef ESTALE
8505
        case ESTALE: return DRFLAC_INVALID_FILE;
8506
    #endif
8507
    #ifdef EUCLEAN
8508
        case EUCLEAN: return DRFLAC_ERROR;
8509
    #endif
8510
    #ifdef ENOTNAM
8511
        case ENOTNAM: return DRFLAC_ERROR;
8512
    #endif
8513
    #ifdef ENAVAIL
8514
        case ENAVAIL: return DRFLAC_ERROR;
8515
    #endif
8516
    #ifdef EISNAM
8517
        case EISNAM: return DRFLAC_ERROR;
8518
    #endif
8519
    #ifdef EREMOTEIO
8520
        case EREMOTEIO: return DRFLAC_IO_ERROR;
8521
    #endif
8522
    #ifdef EDQUOT
8523
        case EDQUOT: return DRFLAC_NO_SPACE;
8524
    #endif
8525
    #ifdef ENOMEDIUM
8526
        case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
8527
    #endif
8528
    #ifdef EMEDIUMTYPE
8529
        case EMEDIUMTYPE: return DRFLAC_ERROR;
8530
    #endif
8531
    #ifdef ECANCELED
8532
        case ECANCELED: return DRFLAC_CANCELLED;
8533
    #endif
8534
    #ifdef ENOKEY
8535
        case ENOKEY: return DRFLAC_ERROR;
8536
    #endif
8537
    #ifdef EKEYEXPIRED
8538
        case EKEYEXPIRED: return DRFLAC_ERROR;
8539
    #endif
8540
    #ifdef EKEYREVOKED
8541
        case EKEYREVOKED: return DRFLAC_ERROR;
8542
    #endif
8543
    #ifdef EKEYREJECTED
8544
        case EKEYREJECTED: return DRFLAC_ERROR;
8545
    #endif
8546
    #ifdef EOWNERDEAD
8547
        case EOWNERDEAD: return DRFLAC_ERROR;
8548
    #endif
8549
    #ifdef ENOTRECOVERABLE
8550
        case ENOTRECOVERABLE: return DRFLAC_ERROR;
8551
    #endif
8552
    #ifdef ERFKILL
8553
        case ERFKILL: return DRFLAC_ERROR;
8554
    #endif
8555
    #ifdef EHWPOISON
8556
        case EHWPOISON: return DRFLAC_ERROR;
8557
    #endif
8558
        default: return DRFLAC_ERROR;
8559
    }
8560
}
8561
/* End Errno */
8562

8563
/* fopen */
8564
static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
8565
{
8566
#if defined(_MSC_VER) && _MSC_VER >= 1400
8567
    errno_t err;
8568
#endif
8569

8570
    if (ppFile != NULL) {
8571
        *ppFile = NULL;  /* Safety. */
8572
    }
8573

8574
    if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8575
        return DRFLAC_INVALID_ARGS;
8576
    }
8577

8578
#if defined(_MSC_VER) && _MSC_VER >= 1400
8579
    err = fopen_s(ppFile, pFilePath, pOpenMode);
8580
    if (err != 0) {
8581
        return drflac_result_from_errno(err);
8582
    }
8583
#else
8584
#if defined(_WIN32) || defined(__APPLE__)
8585
    *ppFile = fopen(pFilePath, pOpenMode);
8586
#else
8587
    #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
8588
        *ppFile = fopen64(pFilePath, pOpenMode);
8589
    #else
8590
        *ppFile = fopen(pFilePath, pOpenMode);
8591
    #endif
8592
#endif
8593
    if (*ppFile == NULL) {
8594
        drflac_result result = drflac_result_from_errno(errno);
8595
        if (result == DRFLAC_SUCCESS) {
8596
            result = DRFLAC_ERROR;   /* Just a safety check to make sure we never ever return success when pFile == NULL. */
8597
        }
8598

8599
        return result;
8600
    }
8601
#endif
8602

8603
    return DRFLAC_SUCCESS;
8604
}
8605

8606
/*
8607
_wfopen() isn't always available in all compilation environments.
8608

8609
    * Windows only.
8610
    * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
8611
    * MinGW-64 (both 32- and 64-bit) seems to support it.
8612
    * MinGW wraps it in !defined(__STRICT_ANSI__).
8613
    * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
8614

8615
This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
8616
fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
8617
*/
8618
#if defined(_WIN32)
8619
    #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
8620
        #define DRFLAC_HAS_WFOPEN
8621
    #endif
8622
#endif
8623

8624
#ifndef DR_FLAC_NO_WCHAR
8625
static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
8626
{
8627
    if (ppFile != NULL) {
8628
        *ppFile = NULL;  /* Safety. */
8629
    }
8630

8631
    if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8632
        return DRFLAC_INVALID_ARGS;
8633
    }
8634

8635
#if defined(DRFLAC_HAS_WFOPEN)
8636
    {
8637
        /* Use _wfopen() on Windows. */
8638
    #if defined(_MSC_VER) && _MSC_VER >= 1400
8639
        errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
8640
        if (err != 0) {
8641
            return drflac_result_from_errno(err);
8642
        }
8643
    #else
8644
        *ppFile = _wfopen(pFilePath, pOpenMode);
8645
        if (*ppFile == NULL) {
8646
            return drflac_result_from_errno(errno);
8647
        }
8648
    #endif
8649
        (void)pAllocationCallbacks;
8650
    }
8651
#else
8652
    /*
8653
    Use fopen() on anything other than Windows. Requires a conversion. This is annoying because
8654
	fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note
8655
	that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
8656
    maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler
8657
	error I'll look into improving compatibility.
8658
    */
8659

8660
	/*
8661
	Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just
8662
	need to abort with an error. If you encounter a compiler lacking such support, add it to this list
8663
	and submit a bug report and it'll be added to the library upstream.
8664
	*/
8665
	#if defined(__DJGPP__)
8666
	{
8667
		/* Nothing to do here. This will fall through to the error check below. */
8668
	}
8669
	#else
8670
    {
8671
        mbstate_t mbs;
8672
        size_t lenMB;
8673
        const wchar_t* pFilePathTemp = pFilePath;
8674
        char* pFilePathMB = NULL;
8675
        char pOpenModeMB[32] = {0};
8676

8677
        /* Get the length first. */
8678
        DRFLAC_ZERO_OBJECT(&mbs);
8679
        lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
8680
        if (lenMB == (size_t)-1) {
8681
            return drflac_result_from_errno(errno);
8682
        }
8683

8684
        pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
8685
        if (pFilePathMB == NULL) {
8686
            return DRFLAC_OUT_OF_MEMORY;
8687
        }
8688

8689
        pFilePathTemp = pFilePath;
8690
        DRFLAC_ZERO_OBJECT(&mbs);
8691
        wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
8692

8693
        /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
8694
        {
8695
            size_t i = 0;
8696
            for (;;) {
8697
                if (pOpenMode[i] == 0) {
8698
                    pOpenModeMB[i] = '\0';
8699
                    break;
8700
                }
8701

8702
                pOpenModeMB[i] = (char)pOpenMode[i];
8703
                i += 1;
8704
            }
8705
        }
8706

8707
        *ppFile = fopen(pFilePathMB, pOpenModeMB);
8708

8709
        drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
8710
    }
8711
	#endif
8712

8713
    if (*ppFile == NULL) {
8714
        return DRFLAC_ERROR;
8715
    }
8716
#endif
8717

8718
    return DRFLAC_SUCCESS;
8719
}
8720
#endif
8721
/* End fopen */
8722

8723
static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
8724
{
8725
    return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
8726
}
8727

8728
static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
8729
{
8730
    DRFLAC_ASSERT(offset >= 0);  /* <-- Never seek backwards. */
8731

8732
    return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
8733
}
8734

8735

8736
DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8737
{
8738
    drflac* pFlac;
8739
    FILE* pFile;
8740

8741
    if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8742
        return NULL;
8743
    }
8744

8745
    pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8746
    if (pFlac == NULL) {
8747
        fclose(pFile);
8748
        return NULL;
8749
    }
8750

8751
    return pFlac;
8752
}
8753

8754
#ifndef DR_FLAC_NO_WCHAR
8755
DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8756
{
8757
    drflac* pFlac;
8758
    FILE* pFile;
8759

8760
    if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8761
        return NULL;
8762
    }
8763

8764
    pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8765
    if (pFlac == NULL) {
8766
        fclose(pFile);
8767
        return NULL;
8768
    }
8769

8770
    return pFlac;
8771
}
8772
#endif
8773

8774
DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8775
{
8776
    drflac* pFlac;
8777
    FILE* pFile;
8778

8779
    if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8780
        return NULL;
8781
    }
8782

8783
    pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8784
    if (pFlac == NULL) {
8785
        fclose(pFile);
8786
        return pFlac;
8787
    }
8788

8789
    return pFlac;
8790
}
8791

8792
#ifndef DR_FLAC_NO_WCHAR
8793
DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8794
{
8795
    drflac* pFlac;
8796
    FILE* pFile;
8797

8798
    if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8799
        return NULL;
8800
    }
8801

8802
    pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8803
    if (pFlac == NULL) {
8804
        fclose(pFile);
8805
        return pFlac;
8806
    }
8807

8808
    return pFlac;
8809
}
8810
#endif
8811
#endif  /* DR_FLAC_NO_STDIO */
8812

8813
static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
8814
{
8815
    drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8816
    size_t bytesRemaining;
8817

8818
    DRFLAC_ASSERT(memoryStream != NULL);
8819
    DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
8820

8821
    bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
8822
    if (bytesToRead > bytesRemaining) {
8823
        bytesToRead = bytesRemaining;
8824
    }
8825

8826
    if (bytesToRead > 0) {
8827
        DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
8828
        memoryStream->currentReadPos += bytesToRead;
8829
    }
8830

8831
    return bytesToRead;
8832
}
8833

8834
static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
8835
{
8836
    drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8837

8838
    DRFLAC_ASSERT(memoryStream != NULL);
8839
    DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8840

8841
    if (offset > (drflac_int64)memoryStream->dataSize) {
8842
        return DRFLAC_FALSE;
8843
    }
8844

8845
    if (origin == drflac_seek_origin_current) {
8846
        if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
8847
            memoryStream->currentReadPos += offset;
8848
        } else {
8849
            return DRFLAC_FALSE;  /* Trying to seek too far forward. */
8850
        }
8851
    } else {
8852
        if ((drflac_uint32)offset <= memoryStream->dataSize) {
8853
            memoryStream->currentReadPos = offset;
8854
        } else {
8855
            return DRFLAC_FALSE;  /* Trying to seek too far forward. */
8856
        }
8857
    }
8858

8859
    return DRFLAC_TRUE;
8860
}
8861

8862
DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
8863
{
8864
    drflac__memory_stream memoryStream;
8865
    drflac* pFlac;
8866

8867
    memoryStream.data = (const drflac_uint8*)pData;
8868
    memoryStream.dataSize = dataSize;
8869
    memoryStream.currentReadPos = 0;
8870
    pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
8871
    if (pFlac == NULL) {
8872
        return NULL;
8873
    }
8874

8875
    pFlac->memoryStream = memoryStream;
8876

8877
    /* This is an awful hack... */
8878
#ifndef DR_FLAC_NO_OGG
8879
    if (pFlac->container == drflac_container_ogg)
8880
    {
8881
        drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8882
        oggbs->pUserData = &pFlac->memoryStream;
8883
    }
8884
    else
8885
#endif
8886
    {
8887
        pFlac->bs.pUserData = &pFlac->memoryStream;
8888
    }
8889

8890
    return pFlac;
8891
}
8892

8893
DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8894
{
8895
    drflac__memory_stream memoryStream;
8896
    drflac* pFlac;
8897

8898
    memoryStream.data = (const drflac_uint8*)pData;
8899
    memoryStream.dataSize = dataSize;
8900
    memoryStream.currentReadPos = 0;
8901
    pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
8902
    if (pFlac == NULL) {
8903
        return NULL;
8904
    }
8905

8906
    pFlac->memoryStream = memoryStream;
8907

8908
    /* This is an awful hack... */
8909
#ifndef DR_FLAC_NO_OGG
8910
    if (pFlac->container == drflac_container_ogg)
8911
    {
8912
        drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8913
        oggbs->pUserData = &pFlac->memoryStream;
8914
    }
8915
    else
8916
#endif
8917
    {
8918
        pFlac->bs.pUserData = &pFlac->memoryStream;
8919
    }
8920

8921
    return pFlac;
8922
}
8923

8924

8925

8926
DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8927
{
8928
    return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8929
}
8930
DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8931
{
8932
    return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
8933
}
8934

8935
DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8936
{
8937
    return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8938
}
8939
DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8940
{
8941
    return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
8942
}
8943

8944
DRFLAC_API void drflac_close(drflac* pFlac)
8945
{
8946
    if (pFlac == NULL) {
8947
        return;
8948
    }
8949

8950
#ifndef DR_FLAC_NO_STDIO
8951
    /*
8952
    If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
8953
    was used by looking at the callbacks.
8954
    */
8955
    if (pFlac->bs.onRead == drflac__on_read_stdio) {
8956
        fclose((FILE*)pFlac->bs.pUserData);
8957
    }
8958

8959
#ifndef DR_FLAC_NO_OGG
8960
    /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
8961
    if (pFlac->container == drflac_container_ogg) {
8962
        drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8963
        DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
8964

8965
        if (oggbs->onRead == drflac__on_read_stdio) {
8966
            fclose((FILE*)oggbs->pUserData);
8967
        }
8968
    }
8969
#endif
8970
#endif
8971

8972
    drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
8973
}
8974

8975

8976
#if 0
8977
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8978
{
8979
    drflac_uint64 i;
8980
    for (i = 0; i < frameCount; ++i) {
8981
        drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8982
        drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8983
        drflac_uint32 right = left - side;
8984

8985
        pOutputSamples[i*2+0] = (drflac_int32)left;
8986
        pOutputSamples[i*2+1] = (drflac_int32)right;
8987
    }
8988
}
8989
#endif
8990

8991
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8992
{
8993
    drflac_uint64 i;
8994
    drflac_uint64 frameCount4 = frameCount >> 2;
8995
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8996
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8997
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8998
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8999

9000
    for (i = 0; i < frameCount4; ++i) {
9001
        drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9002
        drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9003
        drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9004
        drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9005

9006
        drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9007
        drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9008
        drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9009
        drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9010

9011
        drflac_uint32 right0 = left0 - side0;
9012
        drflac_uint32 right1 = left1 - side1;
9013
        drflac_uint32 right2 = left2 - side2;
9014
        drflac_uint32 right3 = left3 - side3;
9015

9016
        pOutputSamples[i*8+0] = (drflac_int32)left0;
9017
        pOutputSamples[i*8+1] = (drflac_int32)right0;
9018
        pOutputSamples[i*8+2] = (drflac_int32)left1;
9019
        pOutputSamples[i*8+3] = (drflac_int32)right1;
9020
        pOutputSamples[i*8+4] = (drflac_int32)left2;
9021
        pOutputSamples[i*8+5] = (drflac_int32)right2;
9022
        pOutputSamples[i*8+6] = (drflac_int32)left3;
9023
        pOutputSamples[i*8+7] = (drflac_int32)right3;
9024
    }
9025

9026
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9027
        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
9028
        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
9029
        drflac_uint32 right = left - side;
9030

9031
        pOutputSamples[i*2+0] = (drflac_int32)left;
9032
        pOutputSamples[i*2+1] = (drflac_int32)right;
9033
    }
9034
}
9035

9036
#if defined(DRFLAC_SUPPORT_SSE2)
9037
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9038
{
9039
    drflac_uint64 i;
9040
    drflac_uint64 frameCount4 = frameCount >> 2;
9041
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9042
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9043
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9044
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9045

9046
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9047

9048
    for (i = 0; i < frameCount4; ++i) {
9049
        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9050
        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9051
        __m128i right = _mm_sub_epi32(left, side);
9052

9053
        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9054
        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9055
    }
9056

9057
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9058
        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
9059
        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
9060
        drflac_uint32 right = left - side;
9061

9062
        pOutputSamples[i*2+0] = (drflac_int32)left;
9063
        pOutputSamples[i*2+1] = (drflac_int32)right;
9064
    }
9065
}
9066
#endif
9067

9068
#if defined(DRFLAC_SUPPORT_NEON)
9069
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9070
{
9071
    drflac_uint64 i;
9072
    drflac_uint64 frameCount4 = frameCount >> 2;
9073
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9074
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9075
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9076
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9077
    int32x4_t shift0_4;
9078
    int32x4_t shift1_4;
9079

9080
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9081

9082
    shift0_4 = vdupq_n_s32(shift0);
9083
    shift1_4 = vdupq_n_s32(shift1);
9084

9085
    for (i = 0; i < frameCount4; ++i) {
9086
        uint32x4_t left;
9087
        uint32x4_t side;
9088
        uint32x4_t right;
9089

9090
        left  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9091
        side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9092
        right = vsubq_u32(left, side);
9093

9094
        drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9095
    }
9096

9097
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9098
        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
9099
        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
9100
        drflac_uint32 right = left - side;
9101

9102
        pOutputSamples[i*2+0] = (drflac_int32)left;
9103
        pOutputSamples[i*2+1] = (drflac_int32)right;
9104
    }
9105
}
9106
#endif
9107

9108
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9109
{
9110
#if defined(DRFLAC_SUPPORT_SSE2)
9111
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9112
        drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9113
    } else
9114
#elif defined(DRFLAC_SUPPORT_NEON)
9115
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9116
        drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9117
    } else
9118
#endif
9119
    {
9120
        /* Scalar fallback. */
9121
#if 0
9122
        drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9123
#else
9124
        drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9125
#endif
9126
    }
9127
}
9128

9129

9130
#if 0
9131
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9132
{
9133
    drflac_uint64 i;
9134
    for (i = 0; i < frameCount; ++i) {
9135
        drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9136
        drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9137
        drflac_uint32 left  = right + side;
9138

9139
        pOutputSamples[i*2+0] = (drflac_int32)left;
9140
        pOutputSamples[i*2+1] = (drflac_int32)right;
9141
    }
9142
}
9143
#endif
9144

9145
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9146
{
9147
    drflac_uint64 i;
9148
    drflac_uint64 frameCount4 = frameCount >> 2;
9149
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9150
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9151
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9152
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9153

9154
    for (i = 0; i < frameCount4; ++i) {
9155
        drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
9156
        drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
9157
        drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
9158
        drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
9159

9160
        drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
9161
        drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
9162
        drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
9163
        drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
9164

9165
        drflac_uint32 left0 = right0 + side0;
9166
        drflac_uint32 left1 = right1 + side1;
9167
        drflac_uint32 left2 = right2 + side2;
9168
        drflac_uint32 left3 = right3 + side3;
9169

9170
        pOutputSamples[i*8+0] = (drflac_int32)left0;
9171
        pOutputSamples[i*8+1] = (drflac_int32)right0;
9172
        pOutputSamples[i*8+2] = (drflac_int32)left1;
9173
        pOutputSamples[i*8+3] = (drflac_int32)right1;
9174
        pOutputSamples[i*8+4] = (drflac_int32)left2;
9175
        pOutputSamples[i*8+5] = (drflac_int32)right2;
9176
        pOutputSamples[i*8+6] = (drflac_int32)left3;
9177
        pOutputSamples[i*8+7] = (drflac_int32)right3;
9178
    }
9179

9180
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9181
        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
9182
        drflac_uint32 right = pInputSamples1U32[i] << shift1;
9183
        drflac_uint32 left  = right + side;
9184

9185
        pOutputSamples[i*2+0] = (drflac_int32)left;
9186
        pOutputSamples[i*2+1] = (drflac_int32)right;
9187
    }
9188
}
9189

9190
#if defined(DRFLAC_SUPPORT_SSE2)
9191
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9192
{
9193
    drflac_uint64 i;
9194
    drflac_uint64 frameCount4 = frameCount >> 2;
9195
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9196
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9197
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9198
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9199

9200
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9201

9202
    for (i = 0; i < frameCount4; ++i) {
9203
        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9204
        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9205
        __m128i left  = _mm_add_epi32(right, side);
9206

9207
        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9208
        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9209
    }
9210

9211
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9212
        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
9213
        drflac_uint32 right = pInputSamples1U32[i] << shift1;
9214
        drflac_uint32 left  = right + side;
9215

9216
        pOutputSamples[i*2+0] = (drflac_int32)left;
9217
        pOutputSamples[i*2+1] = (drflac_int32)right;
9218
    }
9219
}
9220
#endif
9221

9222
#if defined(DRFLAC_SUPPORT_NEON)
9223
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9224
{
9225
    drflac_uint64 i;
9226
    drflac_uint64 frameCount4 = frameCount >> 2;
9227
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9228
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9229
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9230
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9231
    int32x4_t shift0_4;
9232
    int32x4_t shift1_4;
9233

9234
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9235

9236
    shift0_4 = vdupq_n_s32(shift0);
9237
    shift1_4 = vdupq_n_s32(shift1);
9238

9239
    for (i = 0; i < frameCount4; ++i) {
9240
        uint32x4_t side;
9241
        uint32x4_t right;
9242
        uint32x4_t left;
9243

9244
        side  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9245
        right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9246
        left  = vaddq_u32(right, side);
9247

9248
        drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9249
    }
9250

9251
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9252
        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
9253
        drflac_uint32 right = pInputSamples1U32[i] << shift1;
9254
        drflac_uint32 left  = right + side;
9255

9256
        pOutputSamples[i*2+0] = (drflac_int32)left;
9257
        pOutputSamples[i*2+1] = (drflac_int32)right;
9258
    }
9259
}
9260
#endif
9261

9262
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9263
{
9264
#if defined(DRFLAC_SUPPORT_SSE2)
9265
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9266
        drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9267
    } else
9268
#elif defined(DRFLAC_SUPPORT_NEON)
9269
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9270
        drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9271
    } else
9272
#endif
9273
    {
9274
        /* Scalar fallback. */
9275
#if 0
9276
        drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9277
#else
9278
        drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9279
#endif
9280
    }
9281
}
9282

9283

9284
#if 0
9285
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9286
{
9287
    for (drflac_uint64 i = 0; i < frameCount; ++i) {
9288
        drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9289
        drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9290

9291
        mid = (mid << 1) | (side & 0x01);
9292

9293
        pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9294
        pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9295
    }
9296
}
9297
#endif
9298

9299
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9300
{
9301
    drflac_uint64 i;
9302
    drflac_uint64 frameCount4 = frameCount >> 2;
9303
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9304
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9305
    drflac_int32 shift = unusedBitsPerSample;
9306

9307
    if (shift > 0) {
9308
        shift -= 1;
9309
        for (i = 0; i < frameCount4; ++i) {
9310
            drflac_uint32 temp0L;
9311
            drflac_uint32 temp1L;
9312
            drflac_uint32 temp2L;
9313
            drflac_uint32 temp3L;
9314
            drflac_uint32 temp0R;
9315
            drflac_uint32 temp1R;
9316
            drflac_uint32 temp2R;
9317
            drflac_uint32 temp3R;
9318

9319
            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9320
            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9321
            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9322
            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9323

9324
            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9325
            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9326
            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9327
            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9328

9329
            mid0 = (mid0 << 1) | (side0 & 0x01);
9330
            mid1 = (mid1 << 1) | (side1 & 0x01);
9331
            mid2 = (mid2 << 1) | (side2 & 0x01);
9332
            mid3 = (mid3 << 1) | (side3 & 0x01);
9333

9334
            temp0L = (mid0 + side0) << shift;
9335
            temp1L = (mid1 + side1) << shift;
9336
            temp2L = (mid2 + side2) << shift;
9337
            temp3L = (mid3 + side3) << shift;
9338

9339
            temp0R = (mid0 - side0) << shift;
9340
            temp1R = (mid1 - side1) << shift;
9341
            temp2R = (mid2 - side2) << shift;
9342
            temp3R = (mid3 - side3) << shift;
9343

9344
            pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9345
            pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9346
            pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9347
            pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9348
            pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9349
            pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9350
            pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9351
            pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9352
        }
9353
    } else {
9354
        for (i = 0; i < frameCount4; ++i) {
9355
            drflac_uint32 temp0L;
9356
            drflac_uint32 temp1L;
9357
            drflac_uint32 temp2L;
9358
            drflac_uint32 temp3L;
9359
            drflac_uint32 temp0R;
9360
            drflac_uint32 temp1R;
9361
            drflac_uint32 temp2R;
9362
            drflac_uint32 temp3R;
9363

9364
            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9365
            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9366
            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9367
            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9368

9369
            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9370
            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9371
            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9372
            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9373

9374
            mid0 = (mid0 << 1) | (side0 & 0x01);
9375
            mid1 = (mid1 << 1) | (side1 & 0x01);
9376
            mid2 = (mid2 << 1) | (side2 & 0x01);
9377
            mid3 = (mid3 << 1) | (side3 & 0x01);
9378

9379
            temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
9380
            temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
9381
            temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
9382
            temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
9383

9384
            temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
9385
            temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
9386
            temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
9387
            temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
9388

9389
            pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9390
            pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9391
            pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9392
            pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9393
            pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9394
            pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9395
            pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9396
            pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9397
        }
9398
    }
9399

9400
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9401
        drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9402
        drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9403

9404
        mid = (mid << 1) | (side & 0x01);
9405

9406
        pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9407
        pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9408
    }
9409
}
9410

9411
#if defined(DRFLAC_SUPPORT_SSE2)
9412
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9413
{
9414
    drflac_uint64 i;
9415
    drflac_uint64 frameCount4 = frameCount >> 2;
9416
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9417
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9418
    drflac_int32 shift = unusedBitsPerSample;
9419

9420
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9421

9422
    if (shift == 0) {
9423
        for (i = 0; i < frameCount4; ++i) {
9424
            __m128i mid;
9425
            __m128i side;
9426
            __m128i left;
9427
            __m128i right;
9428

9429
            mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9430
            side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9431

9432
            mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9433

9434
            left  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
9435
            right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
9436

9437
            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9438
            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9439
        }
9440

9441
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
9442
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9443
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9444

9445
            mid = (mid << 1) | (side & 0x01);
9446

9447
            pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9448
            pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9449
        }
9450
    } else {
9451
        shift -= 1;
9452
        for (i = 0; i < frameCount4; ++i) {
9453
            __m128i mid;
9454
            __m128i side;
9455
            __m128i left;
9456
            __m128i right;
9457

9458
            mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9459
            side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9460

9461
            mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9462

9463
            left  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
9464
            right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
9465

9466
            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9467
            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9468
        }
9469

9470
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
9471
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9472
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9473

9474
            mid = (mid << 1) | (side & 0x01);
9475

9476
            pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9477
            pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9478
        }
9479
    }
9480
}
9481
#endif
9482

9483
#if defined(DRFLAC_SUPPORT_NEON)
9484
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9485
{
9486
    drflac_uint64 i;
9487
    drflac_uint64 frameCount4 = frameCount >> 2;
9488
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9489
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9490
    drflac_int32 shift = unusedBitsPerSample;
9491
    int32x4_t  wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
9492
    int32x4_t  wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
9493
    uint32x4_t one4;
9494

9495
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9496

9497
    wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9498
    wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9499
    one4         = vdupq_n_u32(1);
9500

9501
    if (shift == 0) {
9502
        for (i = 0; i < frameCount4; ++i) {
9503
            uint32x4_t mid;
9504
            uint32x4_t side;
9505
            int32x4_t left;
9506
            int32x4_t right;
9507

9508
            mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9509
            side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9510

9511
            mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9512

9513
            left  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
9514
            right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
9515

9516
            drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9517
        }
9518

9519
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
9520
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9521
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9522

9523
            mid = (mid << 1) | (side & 0x01);
9524

9525
            pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9526
            pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9527
        }
9528
    } else {
9529
        int32x4_t shift4;
9530

9531
        shift -= 1;
9532
        shift4 = vdupq_n_s32(shift);
9533

9534
        for (i = 0; i < frameCount4; ++i) {
9535
            uint32x4_t mid;
9536
            uint32x4_t side;
9537
            int32x4_t left;
9538
            int32x4_t right;
9539

9540
            mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9541
            side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9542

9543
            mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9544

9545
            left  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
9546
            right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
9547

9548
            drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9549
        }
9550

9551
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
9552
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9553
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9554

9555
            mid = (mid << 1) | (side & 0x01);
9556

9557
            pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9558
            pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9559
        }
9560
    }
9561
}
9562
#endif
9563

9564
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9565
{
9566
#if defined(DRFLAC_SUPPORT_SSE2)
9567
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9568
        drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9569
    } else
9570
#elif defined(DRFLAC_SUPPORT_NEON)
9571
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9572
        drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9573
    } else
9574
#endif
9575
    {
9576
        /* Scalar fallback. */
9577
#if 0
9578
        drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9579
#else
9580
        drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9581
#endif
9582
    }
9583
}
9584

9585

9586
#if 0
9587
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9588
{
9589
    for (drflac_uint64 i = 0; i < frameCount; ++i) {
9590
        pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
9591
        pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
9592
    }
9593
}
9594
#endif
9595

9596
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9597
{
9598
    drflac_uint64 i;
9599
    drflac_uint64 frameCount4 = frameCount >> 2;
9600
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9601
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9602
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9603
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9604

9605
    for (i = 0; i < frameCount4; ++i) {
9606
        drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
9607
        drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
9608
        drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
9609
        drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
9610

9611
        drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
9612
        drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
9613
        drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
9614
        drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
9615

9616
        pOutputSamples[i*8+0] = (drflac_int32)tempL0;
9617
        pOutputSamples[i*8+1] = (drflac_int32)tempR0;
9618
        pOutputSamples[i*8+2] = (drflac_int32)tempL1;
9619
        pOutputSamples[i*8+3] = (drflac_int32)tempR1;
9620
        pOutputSamples[i*8+4] = (drflac_int32)tempL2;
9621
        pOutputSamples[i*8+5] = (drflac_int32)tempR2;
9622
        pOutputSamples[i*8+6] = (drflac_int32)tempL3;
9623
        pOutputSamples[i*8+7] = (drflac_int32)tempR3;
9624
    }
9625

9626
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9627
        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9628
        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9629
    }
9630
}
9631

9632
#if defined(DRFLAC_SUPPORT_SSE2)
9633
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9634
{
9635
    drflac_uint64 i;
9636
    drflac_uint64 frameCount4 = frameCount >> 2;
9637
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9638
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9639
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9640
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9641

9642
    for (i = 0; i < frameCount4; ++i) {
9643
        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9644
        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9645

9646
        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9647
        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9648
    }
9649

9650
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9651
        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9652
        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9653
    }
9654
}
9655
#endif
9656

9657
#if defined(DRFLAC_SUPPORT_NEON)
9658
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9659
{
9660
    drflac_uint64 i;
9661
    drflac_uint64 frameCount4 = frameCount >> 2;
9662
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9663
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9664
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9665
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9666

9667
    int32x4_t shift4_0 = vdupq_n_s32(shift0);
9668
    int32x4_t shift4_1 = vdupq_n_s32(shift1);
9669

9670
    for (i = 0; i < frameCount4; ++i) {
9671
        int32x4_t left;
9672
        int32x4_t right;
9673

9674
        left  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
9675
        right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
9676

9677
        drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9678
    }
9679

9680
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9681
        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9682
        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9683
    }
9684
}
9685
#endif
9686

9687
static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9688
{
9689
#if defined(DRFLAC_SUPPORT_SSE2)
9690
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9691
        drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9692
    } else
9693
#elif defined(DRFLAC_SUPPORT_NEON)
9694
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9695
        drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9696
    } else
9697
#endif
9698
    {
9699
        /* Scalar fallback. */
9700
#if 0
9701
        drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9702
#else
9703
        drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9704
#endif
9705
    }
9706
}
9707

9708

9709
DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
9710
{
9711
    drflac_uint64 framesRead;
9712
    drflac_uint32 unusedBitsPerSample;
9713

9714
    if (pFlac == NULL || framesToRead == 0) {
9715
        return 0;
9716
    }
9717

9718
    if (pBufferOut == NULL) {
9719
        return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
9720
    }
9721

9722
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
9723
    unusedBitsPerSample = 32 - pFlac->bitsPerSample;
9724

9725
    framesRead = 0;
9726
    while (framesToRead > 0) {
9727
        /* If we've run out of samples in this frame, go to the next. */
9728
        if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
9729
            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
9730
                break;  /* Couldn't read the next frame, so just break from the loop and return. */
9731
            }
9732
        } else {
9733
            unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
9734
            drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
9735
            drflac_uint64 frameCountThisIteration = framesToRead;
9736

9737
            if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
9738
                frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
9739
            }
9740

9741
            if (channelCount == 2) {
9742
                const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
9743
                const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
9744

9745
                switch (pFlac->currentFLACFrame.header.channelAssignment)
9746
                {
9747
                    case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
9748
                    {
9749
                        drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9750
                    } break;
9751

9752
                    case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
9753
                    {
9754
                        drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9755
                    } break;
9756

9757
                    case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
9758
                    {
9759
                        drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9760
                    } break;
9761

9762
                    case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
9763
                    default:
9764
                    {
9765
                        drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9766
                    } break;
9767
                }
9768
            } else {
9769
                /* Generic interleaving. */
9770
                drflac_uint64 i;
9771
                for (i = 0; i < frameCountThisIteration; ++i) {
9772
                    unsigned int j;
9773
                    for (j = 0; j < channelCount; ++j) {
9774
                        pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
9775
                    }
9776
                }
9777
            }
9778

9779
            framesRead                += frameCountThisIteration;
9780
            pBufferOut                += frameCountThisIteration * channelCount;
9781
            framesToRead              -= frameCountThisIteration;
9782
            pFlac->currentPCMFrame    += frameCountThisIteration;
9783
            pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
9784
        }
9785
    }
9786

9787
    return framesRead;
9788
}
9789

9790

9791
#if 0
9792
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9793
{
9794
    drflac_uint64 i;
9795
    for (i = 0; i < frameCount; ++i) {
9796
        drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9797
        drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9798
        drflac_uint32 right = left - side;
9799

9800
        left  >>= 16;
9801
        right >>= 16;
9802

9803
        pOutputSamples[i*2+0] = (drflac_int16)left;
9804
        pOutputSamples[i*2+1] = (drflac_int16)right;
9805
    }
9806
}
9807
#endif
9808

9809
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9810
{
9811
    drflac_uint64 i;
9812
    drflac_uint64 frameCount4 = frameCount >> 2;
9813
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9814
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9815
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9816
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9817

9818
    for (i = 0; i < frameCount4; ++i) {
9819
        drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9820
        drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9821
        drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9822
        drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9823

9824
        drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9825
        drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9826
        drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9827
        drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9828

9829
        drflac_uint32 right0 = left0 - side0;
9830
        drflac_uint32 right1 = left1 - side1;
9831
        drflac_uint32 right2 = left2 - side2;
9832
        drflac_uint32 right3 = left3 - side3;
9833

9834
        left0  >>= 16;
9835
        left1  >>= 16;
9836
        left2  >>= 16;
9837
        left3  >>= 16;
9838

9839
        right0 >>= 16;
9840
        right1 >>= 16;
9841
        right2 >>= 16;
9842
        right3 >>= 16;
9843

9844
        pOutputSamples[i*8+0] = (drflac_int16)left0;
9845
        pOutputSamples[i*8+1] = (drflac_int16)right0;
9846
        pOutputSamples[i*8+2] = (drflac_int16)left1;
9847
        pOutputSamples[i*8+3] = (drflac_int16)right1;
9848
        pOutputSamples[i*8+4] = (drflac_int16)left2;
9849
        pOutputSamples[i*8+5] = (drflac_int16)right2;
9850
        pOutputSamples[i*8+6] = (drflac_int16)left3;
9851
        pOutputSamples[i*8+7] = (drflac_int16)right3;
9852
    }
9853

9854
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9855
        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
9856
        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
9857
        drflac_uint32 right = left - side;
9858

9859
        left  >>= 16;
9860
        right >>= 16;
9861

9862
        pOutputSamples[i*2+0] = (drflac_int16)left;
9863
        pOutputSamples[i*2+1] = (drflac_int16)right;
9864
    }
9865
}
9866

9867
#if defined(DRFLAC_SUPPORT_SSE2)
9868
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9869
{
9870
    drflac_uint64 i;
9871
    drflac_uint64 frameCount4 = frameCount >> 2;
9872
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9873
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9874
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9875
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9876

9877
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9878

9879
    for (i = 0; i < frameCount4; ++i) {
9880
        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9881
        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9882
        __m128i right = _mm_sub_epi32(left, side);
9883

9884
        left  = _mm_srai_epi32(left,  16);
9885
        right = _mm_srai_epi32(right, 16);
9886

9887
        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9888
    }
9889

9890
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9891
        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
9892
        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
9893
        drflac_uint32 right = left - side;
9894

9895
        left  >>= 16;
9896
        right >>= 16;
9897

9898
        pOutputSamples[i*2+0] = (drflac_int16)left;
9899
        pOutputSamples[i*2+1] = (drflac_int16)right;
9900
    }
9901
}
9902
#endif
9903

9904
#if defined(DRFLAC_SUPPORT_NEON)
9905
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9906
{
9907
    drflac_uint64 i;
9908
    drflac_uint64 frameCount4 = frameCount >> 2;
9909
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9910
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9911
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9912
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9913
    int32x4_t shift0_4;
9914
    int32x4_t shift1_4;
9915

9916
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9917

9918
    shift0_4 = vdupq_n_s32(shift0);
9919
    shift1_4 = vdupq_n_s32(shift1);
9920

9921
    for (i = 0; i < frameCount4; ++i) {
9922
        uint32x4_t left;
9923
        uint32x4_t side;
9924
        uint32x4_t right;
9925

9926
        left  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9927
        side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9928
        right = vsubq_u32(left, side);
9929

9930
        left  = vshrq_n_u32(left,  16);
9931
        right = vshrq_n_u32(right, 16);
9932

9933
        drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9934
    }
9935

9936
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
9937
        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
9938
        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
9939
        drflac_uint32 right = left - side;
9940

9941
        left  >>= 16;
9942
        right >>= 16;
9943

9944
        pOutputSamples[i*2+0] = (drflac_int16)left;
9945
        pOutputSamples[i*2+1] = (drflac_int16)right;
9946
    }
9947
}
9948
#endif
9949

9950
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9951
{
9952
#if defined(DRFLAC_SUPPORT_SSE2)
9953
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9954
        drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9955
    } else
9956
#elif defined(DRFLAC_SUPPORT_NEON)
9957
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9958
        drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9959
    } else
9960
#endif
9961
    {
9962
        /* Scalar fallback. */
9963
#if 0
9964
        drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9965
#else
9966
        drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9967
#endif
9968
    }
9969
}
9970

9971

9972
#if 0
9973
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9974
{
9975
    drflac_uint64 i;
9976
    for (i = 0; i < frameCount; ++i) {
9977
        drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9978
        drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9979
        drflac_uint32 left  = right + side;
9980

9981
        left  >>= 16;
9982
        right >>= 16;
9983

9984
        pOutputSamples[i*2+0] = (drflac_int16)left;
9985
        pOutputSamples[i*2+1] = (drflac_int16)right;
9986
    }
9987
}
9988
#endif
9989

9990
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9991
{
9992
    drflac_uint64 i;
9993
    drflac_uint64 frameCount4 = frameCount >> 2;
9994
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9995
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9996
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9997
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9998

9999
    for (i = 0; i < frameCount4; ++i) {
10000
        drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
10001
        drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
10002
        drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
10003
        drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
10004

10005
        drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10006
        drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10007
        drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10008
        drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10009

10010
        drflac_uint32 left0 = right0 + side0;
10011
        drflac_uint32 left1 = right1 + side1;
10012
        drflac_uint32 left2 = right2 + side2;
10013
        drflac_uint32 left3 = right3 + side3;
10014

10015
        left0  >>= 16;
10016
        left1  >>= 16;
10017
        left2  >>= 16;
10018
        left3  >>= 16;
10019

10020
        right0 >>= 16;
10021
        right1 >>= 16;
10022
        right2 >>= 16;
10023
        right3 >>= 16;
10024

10025
        pOutputSamples[i*8+0] = (drflac_int16)left0;
10026
        pOutputSamples[i*8+1] = (drflac_int16)right0;
10027
        pOutputSamples[i*8+2] = (drflac_int16)left1;
10028
        pOutputSamples[i*8+3] = (drflac_int16)right1;
10029
        pOutputSamples[i*8+4] = (drflac_int16)left2;
10030
        pOutputSamples[i*8+5] = (drflac_int16)right2;
10031
        pOutputSamples[i*8+6] = (drflac_int16)left3;
10032
        pOutputSamples[i*8+7] = (drflac_int16)right3;
10033
    }
10034

10035
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10036
        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
10037
        drflac_uint32 right = pInputSamples1U32[i] << shift1;
10038
        drflac_uint32 left  = right + side;
10039

10040
        left  >>= 16;
10041
        right >>= 16;
10042

10043
        pOutputSamples[i*2+0] = (drflac_int16)left;
10044
        pOutputSamples[i*2+1] = (drflac_int16)right;
10045
    }
10046
}
10047

10048
#if defined(DRFLAC_SUPPORT_SSE2)
10049
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10050
{
10051
    drflac_uint64 i;
10052
    drflac_uint64 frameCount4 = frameCount >> 2;
10053
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10054
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10055
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10056
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10057

10058
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10059

10060
    for (i = 0; i < frameCount4; ++i) {
10061
        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10062
        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10063
        __m128i left  = _mm_add_epi32(right, side);
10064

10065
        left  = _mm_srai_epi32(left,  16);
10066
        right = _mm_srai_epi32(right, 16);
10067

10068
        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10069
    }
10070

10071
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10072
        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
10073
        drflac_uint32 right = pInputSamples1U32[i] << shift1;
10074
        drflac_uint32 left  = right + side;
10075

10076
        left  >>= 16;
10077
        right >>= 16;
10078

10079
        pOutputSamples[i*2+0] = (drflac_int16)left;
10080
        pOutputSamples[i*2+1] = (drflac_int16)right;
10081
    }
10082
}
10083
#endif
10084

10085
#if defined(DRFLAC_SUPPORT_NEON)
10086
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10087
{
10088
    drflac_uint64 i;
10089
    drflac_uint64 frameCount4 = frameCount >> 2;
10090
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10091
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10092
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10093
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10094
    int32x4_t shift0_4;
10095
    int32x4_t shift1_4;
10096

10097
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10098

10099
    shift0_4 = vdupq_n_s32(shift0);
10100
    shift1_4 = vdupq_n_s32(shift1);
10101

10102
    for (i = 0; i < frameCount4; ++i) {
10103
        uint32x4_t side;
10104
        uint32x4_t right;
10105
        uint32x4_t left;
10106

10107
        side  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10108
        right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10109
        left  = vaddq_u32(right, side);
10110

10111
        left  = vshrq_n_u32(left,  16);
10112
        right = vshrq_n_u32(right, 16);
10113

10114
        drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
10115
    }
10116

10117
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10118
        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
10119
        drflac_uint32 right = pInputSamples1U32[i] << shift1;
10120
        drflac_uint32 left  = right + side;
10121

10122
        left  >>= 16;
10123
        right >>= 16;
10124

10125
        pOutputSamples[i*2+0] = (drflac_int16)left;
10126
        pOutputSamples[i*2+1] = (drflac_int16)right;
10127
    }
10128
}
10129
#endif
10130

10131
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10132
{
10133
#if defined(DRFLAC_SUPPORT_SSE2)
10134
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10135
        drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10136
    } else
10137
#elif defined(DRFLAC_SUPPORT_NEON)
10138
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10139
        drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10140
    } else
10141
#endif
10142
    {
10143
        /* Scalar fallback. */
10144
#if 0
10145
        drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10146
#else
10147
        drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10148
#endif
10149
    }
10150
}
10151

10152

10153
#if 0
10154
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10155
{
10156
    for (drflac_uint64 i = 0; i < frameCount; ++i) {
10157
        drflac_uint32 mid  = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10158
        drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10159

10160
        mid = (mid << 1) | (side & 0x01);
10161

10162
        pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10163
        pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10164
    }
10165
}
10166
#endif
10167

10168
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10169
{
10170
    drflac_uint64 i;
10171
    drflac_uint64 frameCount4 = frameCount >> 2;
10172
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10173
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10174
    drflac_uint32 shift = unusedBitsPerSample;
10175

10176
    if (shift > 0) {
10177
        shift -= 1;
10178
        for (i = 0; i < frameCount4; ++i) {
10179
            drflac_uint32 temp0L;
10180
            drflac_uint32 temp1L;
10181
            drflac_uint32 temp2L;
10182
            drflac_uint32 temp3L;
10183
            drflac_uint32 temp0R;
10184
            drflac_uint32 temp1R;
10185
            drflac_uint32 temp2R;
10186
            drflac_uint32 temp3R;
10187

10188
            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10189
            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10190
            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10191
            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10192

10193
            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10194
            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10195
            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10196
            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10197

10198
            mid0 = (mid0 << 1) | (side0 & 0x01);
10199
            mid1 = (mid1 << 1) | (side1 & 0x01);
10200
            mid2 = (mid2 << 1) | (side2 & 0x01);
10201
            mid3 = (mid3 << 1) | (side3 & 0x01);
10202

10203
            temp0L = (mid0 + side0) << shift;
10204
            temp1L = (mid1 + side1) << shift;
10205
            temp2L = (mid2 + side2) << shift;
10206
            temp3L = (mid3 + side3) << shift;
10207

10208
            temp0R = (mid0 - side0) << shift;
10209
            temp1R = (mid1 - side1) << shift;
10210
            temp2R = (mid2 - side2) << shift;
10211
            temp3R = (mid3 - side3) << shift;
10212

10213
            temp0L >>= 16;
10214
            temp1L >>= 16;
10215
            temp2L >>= 16;
10216
            temp3L >>= 16;
10217

10218
            temp0R >>= 16;
10219
            temp1R >>= 16;
10220
            temp2R >>= 16;
10221
            temp3R >>= 16;
10222

10223
            pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10224
            pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10225
            pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10226
            pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10227
            pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10228
            pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10229
            pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10230
            pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10231
        }
10232
    } else {
10233
        for (i = 0; i < frameCount4; ++i) {
10234
            drflac_uint32 temp0L;
10235
            drflac_uint32 temp1L;
10236
            drflac_uint32 temp2L;
10237
            drflac_uint32 temp3L;
10238
            drflac_uint32 temp0R;
10239
            drflac_uint32 temp1R;
10240
            drflac_uint32 temp2R;
10241
            drflac_uint32 temp3R;
10242

10243
            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10244
            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10245
            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10246
            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10247

10248
            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10249
            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10250
            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10251
            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10252

10253
            mid0 = (mid0 << 1) | (side0 & 0x01);
10254
            mid1 = (mid1 << 1) | (side1 & 0x01);
10255
            mid2 = (mid2 << 1) | (side2 & 0x01);
10256
            mid3 = (mid3 << 1) | (side3 & 0x01);
10257

10258
            temp0L = ((drflac_int32)(mid0 + side0) >> 1);
10259
            temp1L = ((drflac_int32)(mid1 + side1) >> 1);
10260
            temp2L = ((drflac_int32)(mid2 + side2) >> 1);
10261
            temp3L = ((drflac_int32)(mid3 + side3) >> 1);
10262

10263
            temp0R = ((drflac_int32)(mid0 - side0) >> 1);
10264
            temp1R = ((drflac_int32)(mid1 - side1) >> 1);
10265
            temp2R = ((drflac_int32)(mid2 - side2) >> 1);
10266
            temp3R = ((drflac_int32)(mid3 - side3) >> 1);
10267

10268
            temp0L >>= 16;
10269
            temp1L >>= 16;
10270
            temp2L >>= 16;
10271
            temp3L >>= 16;
10272

10273
            temp0R >>= 16;
10274
            temp1R >>= 16;
10275
            temp2R >>= 16;
10276
            temp3R >>= 16;
10277

10278
            pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10279
            pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10280
            pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10281
            pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10282
            pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10283
            pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10284
            pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10285
            pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10286
        }
10287
    }
10288

10289
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10290
        drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10291
        drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10292

10293
        mid = (mid << 1) | (side & 0x01);
10294

10295
        pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10296
        pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10297
    }
10298
}
10299

10300
#if defined(DRFLAC_SUPPORT_SSE2)
10301
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10302
{
10303
    drflac_uint64 i;
10304
    drflac_uint64 frameCount4 = frameCount >> 2;
10305
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10306
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10307
    drflac_uint32 shift = unusedBitsPerSample;
10308

10309
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10310

10311
    if (shift == 0) {
10312
        for (i = 0; i < frameCount4; ++i) {
10313
            __m128i mid;
10314
            __m128i side;
10315
            __m128i left;
10316
            __m128i right;
10317

10318
            mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10319
            side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10320

10321
            mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10322

10323
            left  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10324
            right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10325

10326
            left  = _mm_srai_epi32(left,  16);
10327
            right = _mm_srai_epi32(right, 16);
10328

10329
            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10330
        }
10331

10332
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
10333
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10334
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10335

10336
            mid = (mid << 1) | (side & 0x01);
10337

10338
            pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10339
            pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10340
        }
10341
    } else {
10342
        shift -= 1;
10343
        for (i = 0; i < frameCount4; ++i) {
10344
            __m128i mid;
10345
            __m128i side;
10346
            __m128i left;
10347
            __m128i right;
10348

10349
            mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10350
            side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10351

10352
            mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10353

10354
            left  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10355
            right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10356

10357
            left  = _mm_srai_epi32(left,  16);
10358
            right = _mm_srai_epi32(right, 16);
10359

10360
            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10361
        }
10362

10363
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
10364
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10365
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10366

10367
            mid = (mid << 1) | (side & 0x01);
10368

10369
            pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10370
            pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10371
        }
10372
    }
10373
}
10374
#endif
10375

10376
#if defined(DRFLAC_SUPPORT_NEON)
10377
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10378
{
10379
    drflac_uint64 i;
10380
    drflac_uint64 frameCount4 = frameCount >> 2;
10381
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10382
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10383
    drflac_uint32 shift = unusedBitsPerSample;
10384
    int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
10385
    int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
10386

10387
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10388

10389
    wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10390
    wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10391

10392
    if (shift == 0) {
10393
        for (i = 0; i < frameCount4; ++i) {
10394
            uint32x4_t mid;
10395
            uint32x4_t side;
10396
            int32x4_t left;
10397
            int32x4_t right;
10398

10399
            mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10400
            side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10401

10402
            mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10403

10404
            left  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10405
            right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10406

10407
            left  = vshrq_n_s32(left,  16);
10408
            right = vshrq_n_s32(right, 16);
10409

10410
            drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10411
        }
10412

10413
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
10414
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10415
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10416

10417
            mid = (mid << 1) | (side & 0x01);
10418

10419
            pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10420
            pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10421
        }
10422
    } else {
10423
        int32x4_t shift4;
10424

10425
        shift -= 1;
10426
        shift4 = vdupq_n_s32(shift);
10427

10428
        for (i = 0; i < frameCount4; ++i) {
10429
            uint32x4_t mid;
10430
            uint32x4_t side;
10431
            int32x4_t left;
10432
            int32x4_t right;
10433

10434
            mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10435
            side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10436

10437
            mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10438

10439
            left  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10440
            right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10441

10442
            left  = vshrq_n_s32(left,  16);
10443
            right = vshrq_n_s32(right, 16);
10444

10445
            drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10446
        }
10447

10448
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
10449
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10450
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10451

10452
            mid = (mid << 1) | (side & 0x01);
10453

10454
            pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10455
            pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10456
        }
10457
    }
10458
}
10459
#endif
10460

10461
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10462
{
10463
#if defined(DRFLAC_SUPPORT_SSE2)
10464
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10465
        drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10466
    } else
10467
#elif defined(DRFLAC_SUPPORT_NEON)
10468
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10469
        drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10470
    } else
10471
#endif
10472
    {
10473
        /* Scalar fallback. */
10474
#if 0
10475
        drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10476
#else
10477
        drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10478
#endif
10479
    }
10480
}
10481

10482

10483
#if 0
10484
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10485
{
10486
    for (drflac_uint64 i = 0; i < frameCount; ++i) {
10487
        pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
10488
        pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
10489
    }
10490
}
10491
#endif
10492

10493
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10494
{
10495
    drflac_uint64 i;
10496
    drflac_uint64 frameCount4 = frameCount >> 2;
10497
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10498
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10499
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10500
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10501

10502
    for (i = 0; i < frameCount4; ++i) {
10503
        drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
10504
        drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
10505
        drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
10506
        drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
10507

10508
        drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
10509
        drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
10510
        drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
10511
        drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
10512

10513
        tempL0 >>= 16;
10514
        tempL1 >>= 16;
10515
        tempL2 >>= 16;
10516
        tempL3 >>= 16;
10517

10518
        tempR0 >>= 16;
10519
        tempR1 >>= 16;
10520
        tempR2 >>= 16;
10521
        tempR3 >>= 16;
10522

10523
        pOutputSamples[i*8+0] = (drflac_int16)tempL0;
10524
        pOutputSamples[i*8+1] = (drflac_int16)tempR0;
10525
        pOutputSamples[i*8+2] = (drflac_int16)tempL1;
10526
        pOutputSamples[i*8+3] = (drflac_int16)tempR1;
10527
        pOutputSamples[i*8+4] = (drflac_int16)tempL2;
10528
        pOutputSamples[i*8+5] = (drflac_int16)tempR2;
10529
        pOutputSamples[i*8+6] = (drflac_int16)tempL3;
10530
        pOutputSamples[i*8+7] = (drflac_int16)tempR3;
10531
    }
10532

10533
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10534
        pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10535
        pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10536
    }
10537
}
10538

10539
#if defined(DRFLAC_SUPPORT_SSE2)
10540
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10541
{
10542
    drflac_uint64 i;
10543
    drflac_uint64 frameCount4 = frameCount >> 2;
10544
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10545
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10546
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10547
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10548

10549
    for (i = 0; i < frameCount4; ++i) {
10550
        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10551
        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10552

10553
        left  = _mm_srai_epi32(left,  16);
10554
        right = _mm_srai_epi32(right, 16);
10555

10556
        /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
10557
        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10558
    }
10559

10560
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10561
        pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10562
        pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10563
    }
10564
}
10565
#endif
10566

10567
#if defined(DRFLAC_SUPPORT_NEON)
10568
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10569
{
10570
    drflac_uint64 i;
10571
    drflac_uint64 frameCount4 = frameCount >> 2;
10572
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10573
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10574
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10575
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10576

10577
    int32x4_t shift0_4 = vdupq_n_s32(shift0);
10578
    int32x4_t shift1_4 = vdupq_n_s32(shift1);
10579

10580
    for (i = 0; i < frameCount4; ++i) {
10581
        int32x4_t left;
10582
        int32x4_t right;
10583

10584
        left  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
10585
        right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
10586

10587
        left  = vshrq_n_s32(left,  16);
10588
        right = vshrq_n_s32(right, 16);
10589

10590
        drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10591
    }
10592

10593
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10594
        pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10595
        pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10596
    }
10597
}
10598
#endif
10599

10600
static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10601
{
10602
#if defined(DRFLAC_SUPPORT_SSE2)
10603
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10604
        drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10605
    } else
10606
#elif defined(DRFLAC_SUPPORT_NEON)
10607
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10608
        drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10609
    } else
10610
#endif
10611
    {
10612
        /* Scalar fallback. */
10613
#if 0
10614
        drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10615
#else
10616
        drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10617
#endif
10618
    }
10619
}
10620

10621
DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10622
{
10623
    drflac_uint64 framesRead;
10624
    drflac_uint32 unusedBitsPerSample;
10625

10626
    if (pFlac == NULL || framesToRead == 0) {
10627
        return 0;
10628
    }
10629

10630
    if (pBufferOut == NULL) {
10631
        return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10632
    }
10633

10634
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
10635
    unusedBitsPerSample = 32 - pFlac->bitsPerSample;
10636

10637
    framesRead = 0;
10638
    while (framesToRead > 0) {
10639
        /* If we've run out of samples in this frame, go to the next. */
10640
        if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
10641
            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10642
                break;  /* Couldn't read the next frame, so just break from the loop and return. */
10643
            }
10644
        } else {
10645
            unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10646
            drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10647
            drflac_uint64 frameCountThisIteration = framesToRead;
10648

10649
            if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10650
                frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10651
            }
10652

10653
            if (channelCount == 2) {
10654
                const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
10655
                const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
10656

10657
                switch (pFlac->currentFLACFrame.header.channelAssignment)
10658
                {
10659
                    case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10660
                    {
10661
                        drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10662
                    } break;
10663

10664
                    case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10665
                    {
10666
                        drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10667
                    } break;
10668

10669
                    case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10670
                    {
10671
                        drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10672
                    } break;
10673

10674
                    case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10675
                    default:
10676
                    {
10677
                        drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10678
                    } break;
10679
                }
10680
            } else {
10681
                /* Generic interleaving. */
10682
                drflac_uint64 i;
10683
                for (i = 0; i < frameCountThisIteration; ++i) {
10684
                    unsigned int j;
10685
                    for (j = 0; j < channelCount; ++j) {
10686
                        drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10687
                        pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
10688
                    }
10689
                }
10690
            }
10691

10692
            framesRead                += frameCountThisIteration;
10693
            pBufferOut                += frameCountThisIteration * channelCount;
10694
            framesToRead              -= frameCountThisIteration;
10695
            pFlac->currentPCMFrame    += frameCountThisIteration;
10696
            pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10697
        }
10698
    }
10699

10700
    return framesRead;
10701
}
10702

10703

10704
#if 0
10705
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10706
{
10707
    drflac_uint64 i;
10708
    for (i = 0; i < frameCount; ++i) {
10709
        drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10710
        drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10711
        drflac_uint32 right = left - side;
10712

10713
        pOutputSamples[i*2+0] = (float)((drflac_int32)left  / 2147483648.0);
10714
        pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10715
    }
10716
}
10717
#endif
10718

10719
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10720
{
10721
    drflac_uint64 i;
10722
    drflac_uint64 frameCount4 = frameCount >> 2;
10723
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10724
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10725
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10726
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10727

10728
    float factor = 1 / 2147483648.0;
10729

10730
    for (i = 0; i < frameCount4; ++i) {
10731
        drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
10732
        drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
10733
        drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
10734
        drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
10735

10736
        drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
10737
        drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
10738
        drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
10739
        drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
10740

10741
        drflac_uint32 right0 = left0 - side0;
10742
        drflac_uint32 right1 = left1 - side1;
10743
        drflac_uint32 right2 = left2 - side2;
10744
        drflac_uint32 right3 = left3 - side3;
10745

10746
        pOutputSamples[i*8+0] = (drflac_int32)left0  * factor;
10747
        pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10748
        pOutputSamples[i*8+2] = (drflac_int32)left1  * factor;
10749
        pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10750
        pOutputSamples[i*8+4] = (drflac_int32)left2  * factor;
10751
        pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10752
        pOutputSamples[i*8+6] = (drflac_int32)left3  * factor;
10753
        pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10754
    }
10755

10756
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10757
        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
10758
        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
10759
        drflac_uint32 right = left - side;
10760

10761
        pOutputSamples[i*2+0] = (drflac_int32)left  * factor;
10762
        pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10763
    }
10764
}
10765

10766
#if defined(DRFLAC_SUPPORT_SSE2)
10767
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10768
{
10769
    drflac_uint64 i;
10770
    drflac_uint64 frameCount4 = frameCount >> 2;
10771
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10772
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10773
    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10774
    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10775
    __m128 factor;
10776

10777
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10778

10779
    factor = _mm_set1_ps(1.0f / 8388608.0f);
10780

10781
    for (i = 0; i < frameCount4; ++i) {
10782
        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10783
        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10784
        __m128i right = _mm_sub_epi32(left, side);
10785
        __m128 leftf  = _mm_mul_ps(_mm_cvtepi32_ps(left),  factor);
10786
        __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10787

10788
        _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10789
        _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10790
    }
10791

10792
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10793
        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
10794
        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
10795
        drflac_uint32 right = left - side;
10796

10797
        pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
10798
        pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10799
    }
10800
}
10801
#endif
10802

10803
#if defined(DRFLAC_SUPPORT_NEON)
10804
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10805
{
10806
    drflac_uint64 i;
10807
    drflac_uint64 frameCount4 = frameCount >> 2;
10808
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10809
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10810
    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10811
    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10812
    float32x4_t factor4;
10813
    int32x4_t shift0_4;
10814
    int32x4_t shift1_4;
10815

10816
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10817

10818
    factor4  = vdupq_n_f32(1.0f / 8388608.0f);
10819
    shift0_4 = vdupq_n_s32(shift0);
10820
    shift1_4 = vdupq_n_s32(shift1);
10821

10822
    for (i = 0; i < frameCount4; ++i) {
10823
        uint32x4_t left;
10824
        uint32x4_t side;
10825
        uint32x4_t right;
10826
        float32x4_t leftf;
10827
        float32x4_t rightf;
10828

10829
        left   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10830
        side   = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10831
        right  = vsubq_u32(left, side);
10832
        leftf  = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)),  factor4);
10833
        rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10834

10835
        drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10836
    }
10837

10838
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10839
        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
10840
        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
10841
        drflac_uint32 right = left - side;
10842

10843
        pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
10844
        pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10845
    }
10846
}
10847
#endif
10848

10849
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10850
{
10851
#if defined(DRFLAC_SUPPORT_SSE2)
10852
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10853
        drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10854
    } else
10855
#elif defined(DRFLAC_SUPPORT_NEON)
10856
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10857
        drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10858
    } else
10859
#endif
10860
    {
10861
        /* Scalar fallback. */
10862
#if 0
10863
        drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10864
#else
10865
        drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10866
#endif
10867
    }
10868
}
10869

10870

10871
#if 0
10872
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10873
{
10874
    drflac_uint64 i;
10875
    for (i = 0; i < frameCount; ++i) {
10876
        drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10877
        drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10878
        drflac_uint32 left  = right + side;
10879

10880
        pOutputSamples[i*2+0] = (float)((drflac_int32)left  / 2147483648.0);
10881
        pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10882
    }
10883
}
10884
#endif
10885

10886
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10887
{
10888
    drflac_uint64 i;
10889
    drflac_uint64 frameCount4 = frameCount >> 2;
10890
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10891
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10892
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10893
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10894
    float factor = 1 / 2147483648.0;
10895

10896
    for (i = 0; i < frameCount4; ++i) {
10897
        drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
10898
        drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
10899
        drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
10900
        drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
10901

10902
        drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10903
        drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10904
        drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10905
        drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10906

10907
        drflac_uint32 left0 = right0 + side0;
10908
        drflac_uint32 left1 = right1 + side1;
10909
        drflac_uint32 left2 = right2 + side2;
10910
        drflac_uint32 left3 = right3 + side3;
10911

10912
        pOutputSamples[i*8+0] = (drflac_int32)left0  * factor;
10913
        pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10914
        pOutputSamples[i*8+2] = (drflac_int32)left1  * factor;
10915
        pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10916
        pOutputSamples[i*8+4] = (drflac_int32)left2  * factor;
10917
        pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10918
        pOutputSamples[i*8+6] = (drflac_int32)left3  * factor;
10919
        pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10920
    }
10921

10922
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10923
        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
10924
        drflac_uint32 right = pInputSamples1U32[i] << shift1;
10925
        drflac_uint32 left  = right + side;
10926

10927
        pOutputSamples[i*2+0] = (drflac_int32)left  * factor;
10928
        pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10929
    }
10930
}
10931

10932
#if defined(DRFLAC_SUPPORT_SSE2)
10933
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10934
{
10935
    drflac_uint64 i;
10936
    drflac_uint64 frameCount4 = frameCount >> 2;
10937
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10938
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10939
    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10940
    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10941
    __m128 factor;
10942

10943
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10944

10945
    factor = _mm_set1_ps(1.0f / 8388608.0f);
10946

10947
    for (i = 0; i < frameCount4; ++i) {
10948
        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10949
        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10950
        __m128i left  = _mm_add_epi32(right, side);
10951
        __m128 leftf  = _mm_mul_ps(_mm_cvtepi32_ps(left),  factor);
10952
        __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10953

10954
        _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10955
        _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10956
    }
10957

10958
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
10959
        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
10960
        drflac_uint32 right = pInputSamples1U32[i] << shift1;
10961
        drflac_uint32 left  = right + side;
10962

10963
        pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
10964
        pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10965
    }
10966
}
10967
#endif
10968

10969
#if defined(DRFLAC_SUPPORT_NEON)
10970
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10971
{
10972
    drflac_uint64 i;
10973
    drflac_uint64 frameCount4 = frameCount >> 2;
10974
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10975
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10976
    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10977
    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10978
    float32x4_t factor4;
10979
    int32x4_t shift0_4;
10980
    int32x4_t shift1_4;
10981

10982
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10983

10984
    factor4  = vdupq_n_f32(1.0f / 8388608.0f);
10985
    shift0_4 = vdupq_n_s32(shift0);
10986
    shift1_4 = vdupq_n_s32(shift1);
10987

10988
    for (i = 0; i < frameCount4; ++i) {
10989
        uint32x4_t side;
10990
        uint32x4_t right;
10991
        uint32x4_t left;
10992
        float32x4_t leftf;
10993
        float32x4_t rightf;
10994

10995
        side   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10996
        right  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10997
        left   = vaddq_u32(right, side);
10998
        leftf  = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)),  factor4);
10999
        rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
11000

11001
        drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11002
    }
11003

11004
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
11005
        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
11006
        drflac_uint32 right = pInputSamples1U32[i] << shift1;
11007
        drflac_uint32 left  = right + side;
11008

11009
        pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
11010
        pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
11011
    }
11012
}
11013
#endif
11014

11015
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11016
{
11017
#if defined(DRFLAC_SUPPORT_SSE2)
11018
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11019
        drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11020
    } else
11021
#elif defined(DRFLAC_SUPPORT_NEON)
11022
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11023
        drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11024
    } else
11025
#endif
11026
    {
11027
        /* Scalar fallback. */
11028
#if 0
11029
        drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11030
#else
11031
        drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11032
#endif
11033
    }
11034
}
11035

11036

11037
#if 0
11038
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11039
{
11040
    for (drflac_uint64 i = 0; i < frameCount; ++i) {
11041
        drflac_uint32 mid  = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11042
        drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11043

11044
        mid = (mid << 1) | (side & 0x01);
11045

11046
        pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11047
        pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11048
    }
11049
}
11050
#endif
11051

11052
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11053
{
11054
    drflac_uint64 i;
11055
    drflac_uint64 frameCount4 = frameCount >> 2;
11056
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11057
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11058
    drflac_uint32 shift = unusedBitsPerSample;
11059
    float factor = 1 / 2147483648.0;
11060

11061
    if (shift > 0) {
11062
        shift -= 1;
11063
        for (i = 0; i < frameCount4; ++i) {
11064
            drflac_uint32 temp0L;
11065
            drflac_uint32 temp1L;
11066
            drflac_uint32 temp2L;
11067
            drflac_uint32 temp3L;
11068
            drflac_uint32 temp0R;
11069
            drflac_uint32 temp1R;
11070
            drflac_uint32 temp2R;
11071
            drflac_uint32 temp3R;
11072

11073
            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11074
            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11075
            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11076
            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11077

11078
            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11079
            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11080
            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11081
            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11082

11083
            mid0 = (mid0 << 1) | (side0 & 0x01);
11084
            mid1 = (mid1 << 1) | (side1 & 0x01);
11085
            mid2 = (mid2 << 1) | (side2 & 0x01);
11086
            mid3 = (mid3 << 1) | (side3 & 0x01);
11087

11088
            temp0L = (mid0 + side0) << shift;
11089
            temp1L = (mid1 + side1) << shift;
11090
            temp2L = (mid2 + side2) << shift;
11091
            temp3L = (mid3 + side3) << shift;
11092

11093
            temp0R = (mid0 - side0) << shift;
11094
            temp1R = (mid1 - side1) << shift;
11095
            temp2R = (mid2 - side2) << shift;
11096
            temp3R = (mid3 - side3) << shift;
11097

11098
            pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11099
            pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11100
            pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11101
            pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11102
            pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11103
            pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11104
            pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11105
            pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11106
        }
11107
    } else {
11108
        for (i = 0; i < frameCount4; ++i) {
11109
            drflac_uint32 temp0L;
11110
            drflac_uint32 temp1L;
11111
            drflac_uint32 temp2L;
11112
            drflac_uint32 temp3L;
11113
            drflac_uint32 temp0R;
11114
            drflac_uint32 temp1R;
11115
            drflac_uint32 temp2R;
11116
            drflac_uint32 temp3R;
11117

11118
            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11119
            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11120
            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11121
            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11122

11123
            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11124
            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11125
            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11126
            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11127

11128
            mid0 = (mid0 << 1) | (side0 & 0x01);
11129
            mid1 = (mid1 << 1) | (side1 & 0x01);
11130
            mid2 = (mid2 << 1) | (side2 & 0x01);
11131
            mid3 = (mid3 << 1) | (side3 & 0x01);
11132

11133
            temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
11134
            temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
11135
            temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
11136
            temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
11137

11138
            temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
11139
            temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
11140
            temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
11141
            temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
11142

11143
            pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11144
            pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11145
            pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11146
            pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11147
            pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11148
            pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11149
            pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11150
            pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11151
        }
11152
    }
11153

11154
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
11155
        drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11156
        drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11157

11158
        mid = (mid << 1) | (side & 0x01);
11159

11160
        pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
11161
        pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
11162
    }
11163
}
11164

11165
#if defined(DRFLAC_SUPPORT_SSE2)
11166
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11167
{
11168
    drflac_uint64 i;
11169
    drflac_uint64 frameCount4 = frameCount >> 2;
11170
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11171
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11172
    drflac_uint32 shift = unusedBitsPerSample - 8;
11173
    float factor;
11174
    __m128 factor128;
11175

11176
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11177

11178
    factor = 1.0f / 8388608.0f;
11179
    factor128 = _mm_set1_ps(factor);
11180

11181
    if (shift == 0) {
11182
        for (i = 0; i < frameCount4; ++i) {
11183
            __m128i mid;
11184
            __m128i side;
11185
            __m128i tempL;
11186
            __m128i tempR;
11187
            __m128  leftf;
11188
            __m128  rightf;
11189

11190
            mid    = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11191
            side   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11192

11193
            mid    = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11194

11195
            tempL  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
11196
            tempR  = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
11197

11198
            leftf  = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11199
            rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11200

11201
            _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11202
            _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11203
        }
11204

11205
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
11206
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11207
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11208

11209
            mid = (mid << 1) | (side & 0x01);
11210

11211
            pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11212
            pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11213
        }
11214
    } else {
11215
        shift -= 1;
11216
        for (i = 0; i < frameCount4; ++i) {
11217
            __m128i mid;
11218
            __m128i side;
11219
            __m128i tempL;
11220
            __m128i tempR;
11221
            __m128 leftf;
11222
            __m128 rightf;
11223

11224
            mid    = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11225
            side   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11226

11227
            mid    = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11228

11229
            tempL  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
11230
            tempR  = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
11231

11232
            leftf  = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11233
            rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11234

11235
            _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11236
            _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11237
        }
11238

11239
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
11240
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11241
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11242

11243
            mid = (mid << 1) | (side & 0x01);
11244

11245
            pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11246
            pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11247
        }
11248
    }
11249
}
11250
#endif
11251

11252
#if defined(DRFLAC_SUPPORT_NEON)
11253
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11254
{
11255
    drflac_uint64 i;
11256
    drflac_uint64 frameCount4 = frameCount >> 2;
11257
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11258
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11259
    drflac_uint32 shift = unusedBitsPerSample - 8;
11260
    float factor;
11261
    float32x4_t factor4;
11262
    int32x4_t shift4;
11263
    int32x4_t wbps0_4;  /* Wasted Bits Per Sample */
11264
    int32x4_t wbps1_4;  /* Wasted Bits Per Sample */
11265

11266
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11267

11268
    factor  = 1.0f / 8388608.0f;
11269
    factor4 = vdupq_n_f32(factor);
11270
    wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11271
    wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11272

11273
    if (shift == 0) {
11274
        for (i = 0; i < frameCount4; ++i) {
11275
            int32x4_t lefti;
11276
            int32x4_t righti;
11277
            float32x4_t leftf;
11278
            float32x4_t rightf;
11279

11280
            uint32x4_t mid  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11281
            uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11282

11283
            mid    = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11284

11285
            lefti  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
11286
            righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
11287

11288
            leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
11289
            rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11290

11291
            drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11292
        }
11293

11294
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
11295
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11296
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11297

11298
            mid = (mid << 1) | (side & 0x01);
11299

11300
            pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11301
            pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11302
        }
11303
    } else {
11304
        shift -= 1;
11305
        shift4 = vdupq_n_s32(shift);
11306
        for (i = 0; i < frameCount4; ++i) {
11307
            uint32x4_t mid;
11308
            uint32x4_t side;
11309
            int32x4_t lefti;
11310
            int32x4_t righti;
11311
            float32x4_t leftf;
11312
            float32x4_t rightf;
11313

11314
            mid    = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11315
            side   = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11316

11317
            mid    = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11318

11319
            lefti  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
11320
            righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
11321

11322
            leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
11323
            rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11324

11325
            drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11326
        }
11327

11328
        for (i = (frameCount4 << 2); i < frameCount; ++i) {
11329
            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11330
            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11331

11332
            mid = (mid << 1) | (side & 0x01);
11333

11334
            pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11335
            pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11336
        }
11337
    }
11338
}
11339
#endif
11340

11341
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11342
{
11343
#if defined(DRFLAC_SUPPORT_SSE2)
11344
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11345
        drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11346
    } else
11347
#elif defined(DRFLAC_SUPPORT_NEON)
11348
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11349
        drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11350
    } else
11351
#endif
11352
    {
11353
        /* Scalar fallback. */
11354
#if 0
11355
        drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11356
#else
11357
        drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11358
#endif
11359
    }
11360
}
11361

11362
#if 0
11363
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11364
{
11365
    for (drflac_uint64 i = 0; i < frameCount; ++i) {
11366
        pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
11367
        pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
11368
    }
11369
}
11370
#endif
11371

11372
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11373
{
11374
    drflac_uint64 i;
11375
    drflac_uint64 frameCount4 = frameCount >> 2;
11376
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11377
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11378
    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11379
    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11380
    float factor = 1 / 2147483648.0;
11381

11382
    for (i = 0; i < frameCount4; ++i) {
11383
        drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
11384
        drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
11385
        drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
11386
        drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
11387

11388
        drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
11389
        drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
11390
        drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
11391
        drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
11392

11393
        pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
11394
        pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
11395
        pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
11396
        pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
11397
        pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
11398
        pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
11399
        pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
11400
        pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
11401
    }
11402

11403
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
11404
        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11405
        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11406
    }
11407
}
11408

11409
#if defined(DRFLAC_SUPPORT_SSE2)
11410
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11411
{
11412
    drflac_uint64 i;
11413
    drflac_uint64 frameCount4 = frameCount >> 2;
11414
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11415
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11416
    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11417
    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11418

11419
    float factor = 1.0f / 8388608.0f;
11420
    __m128 factor128 = _mm_set1_ps(factor);
11421

11422
    for (i = 0; i < frameCount4; ++i) {
11423
        __m128i lefti;
11424
        __m128i righti;
11425
        __m128 leftf;
11426
        __m128 rightf;
11427

11428
        lefti  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11429
        righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11430

11431
        leftf  = _mm_mul_ps(_mm_cvtepi32_ps(lefti),  factor128);
11432
        rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11433

11434
        _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11435
        _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11436
    }
11437

11438
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
11439
        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11440
        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11441
    }
11442
}
11443
#endif
11444

11445
#if defined(DRFLAC_SUPPORT_NEON)
11446
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11447
{
11448
    drflac_uint64 i;
11449
    drflac_uint64 frameCount4 = frameCount >> 2;
11450
    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11451
    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11452
    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11453
    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11454

11455
    float factor = 1.0f / 8388608.0f;
11456
    float32x4_t factor4 = vdupq_n_f32(factor);
11457
    int32x4_t shift0_4  = vdupq_n_s32(shift0);
11458
    int32x4_t shift1_4  = vdupq_n_s32(shift1);
11459

11460
    for (i = 0; i < frameCount4; ++i) {
11461
        int32x4_t lefti;
11462
        int32x4_t righti;
11463
        float32x4_t leftf;
11464
        float32x4_t rightf;
11465

11466
        lefti  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
11467
        righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
11468

11469
        leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
11470
        rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11471

11472
        drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11473
    }
11474

11475
    for (i = (frameCount4 << 2); i < frameCount; ++i) {
11476
        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11477
        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11478
    }
11479
}
11480
#endif
11481

11482
static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11483
{
11484
#if defined(DRFLAC_SUPPORT_SSE2)
11485
    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11486
        drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11487
    } else
11488
#elif defined(DRFLAC_SUPPORT_NEON)
11489
    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11490
        drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11491
    } else
11492
#endif
11493
    {
11494
        /* Scalar fallback. */
11495
#if 0
11496
        drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11497
#else
11498
        drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11499
#endif
11500
    }
11501
}
11502

11503
DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11504
{
11505
    drflac_uint64 framesRead;
11506
    drflac_uint32 unusedBitsPerSample;
11507

11508
    if (pFlac == NULL || framesToRead == 0) {
11509
        return 0;
11510
    }
11511

11512
    if (pBufferOut == NULL) {
11513
        return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11514
    }
11515

11516
    DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
11517
    unusedBitsPerSample = 32 - pFlac->bitsPerSample;
11518

11519
    framesRead = 0;
11520
    while (framesToRead > 0) {
11521
        /* If we've run out of samples in this frame, go to the next. */
11522
        if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
11523
            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11524
                break;  /* Couldn't read the next frame, so just break from the loop and return. */
11525
            }
11526
        } else {
11527
            unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11528
            drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11529
            drflac_uint64 frameCountThisIteration = framesToRead;
11530

11531
            if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11532
                frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11533
            }
11534

11535
            if (channelCount == 2) {
11536
                const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
11537
                const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
11538

11539
                switch (pFlac->currentFLACFrame.header.channelAssignment)
11540
                {
11541
                    case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11542
                    {
11543
                        drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11544
                    } break;
11545

11546
                    case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11547
                    {
11548
                        drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11549
                    } break;
11550

11551
                    case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11552
                    {
11553
                        drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11554
                    } break;
11555

11556
                    case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11557
                    default:
11558
                    {
11559
                        drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11560
                    } break;
11561
                }
11562
            } else {
11563
                /* Generic interleaving. */
11564
                drflac_uint64 i;
11565
                for (i = 0; i < frameCountThisIteration; ++i) {
11566
                    unsigned int j;
11567
                    for (j = 0; j < channelCount; ++j) {
11568
                        drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11569
                        pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
11570
                    }
11571
                }
11572
            }
11573

11574
            framesRead                += frameCountThisIteration;
11575
            pBufferOut                += frameCountThisIteration * channelCount;
11576
            framesToRead              -= frameCountThisIteration;
11577
            pFlac->currentPCMFrame    += frameCountThisIteration;
11578
            pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11579
        }
11580
    }
11581

11582
    return framesRead;
11583
}
11584

11585

11586
DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11587
{
11588
    if (pFlac == NULL) {
11589
        return DRFLAC_FALSE;
11590
    }
11591

11592
    /* Don't do anything if we're already on the seek point. */
11593
    if (pFlac->currentPCMFrame == pcmFrameIndex) {
11594
        return DRFLAC_TRUE;
11595
    }
11596

11597
    /*
11598
    If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11599
    when the decoder was opened.
11600
    */
11601
    if (pFlac->firstFLACFramePosInBytes == 0) {
11602
        return DRFLAC_FALSE;
11603
    }
11604

11605
    if (pcmFrameIndex == 0) {
11606
        pFlac->currentPCMFrame = 0;
11607
        return drflac__seek_to_first_frame(pFlac);
11608
    } else {
11609
        drflac_bool32 wasSuccessful = DRFLAC_FALSE;
11610
        drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
11611

11612
        /* Clamp the sample to the end. */
11613
        if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11614
            pcmFrameIndex = pFlac->totalPCMFrameCount;
11615
        }
11616

11617
        /* If the target sample and the current sample are in the same frame we just move the position forward. */
11618
        if (pcmFrameIndex > pFlac->currentPCMFrame) {
11619
            /* Forward. */
11620
            drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11621
            if (pFlac->currentFLACFrame.pcmFramesRemaining >  offset) {
11622
                pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11623
                pFlac->currentPCMFrame = pcmFrameIndex;
11624
                return DRFLAC_TRUE;
11625
            }
11626
        } else {
11627
            /* Backward. */
11628
            drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11629
            drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11630
            drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11631
            if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11632
                pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11633
                pFlac->currentPCMFrame = pcmFrameIndex;
11634
                return DRFLAC_TRUE;
11635
            }
11636
        }
11637

11638
        /*
11639
        Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11640
        we'll instead use Ogg's natural seeking facility.
11641
        */
11642
#ifndef DR_FLAC_NO_OGG
11643
        if (pFlac->container == drflac_container_ogg)
11644
        {
11645
            wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11646
        }
11647
        else
11648
#endif
11649
        {
11650
            /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
11651
            if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
11652
                wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11653
            }
11654

11655
#if !defined(DR_FLAC_NO_CRC)
11656
            /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
11657
            if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
11658
                wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11659
            }
11660
#endif
11661

11662
            /* Fall back to brute force if all else fails. */
11663
            if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11664
                wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11665
            }
11666
        }
11667

11668
        if (wasSuccessful) {
11669
            pFlac->currentPCMFrame = pcmFrameIndex;
11670
        } else {
11671
            /* Seek failed. Try putting the decoder back to it's original state. */
11672
            if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
11673
                /* Failed to seek back to the original PCM frame. Fall back to 0. */
11674
                drflac_seek_to_pcm_frame(pFlac, 0);
11675
            }
11676
        }
11677

11678
        return wasSuccessful;
11679
    }
11680
}
11681

11682

11683

11684
/* High Level APIs */
11685

11686
/* SIZE_MAX */
11687
#if defined(SIZE_MAX)
11688
    #define DRFLAC_SIZE_MAX  SIZE_MAX
11689
#else
11690
    #if defined(DRFLAC_64BIT)
11691
        #define DRFLAC_SIZE_MAX  ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11692
    #else
11693
        #define DRFLAC_SIZE_MAX  0xFFFFFFFF
11694
    #endif
11695
#endif
11696
/* End SIZE_MAX */
11697

11698

11699
/* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
11700
#define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11701
static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11702
{                                                                                                                                                                   \
11703
    type* pSampleData = NULL;                                                                                                                                       \
11704
    drflac_uint64 totalPCMFrameCount;                                                                                                                               \
11705
                                                                                                                                                                    \
11706
    DRFLAC_ASSERT(pFlac != NULL);                                                                                                                                   \
11707
                                                                                                                                                                    \
11708
    totalPCMFrameCount = pFlac->totalPCMFrameCount;                                                                                                                 \
11709
                                                                                                                                                                    \
11710
    if (totalPCMFrameCount == 0) {                                                                                                                                  \
11711
        type buffer[4096];                                                                                                                                          \
11712
        drflac_uint64 pcmFramesRead;                                                                                                                                \
11713
        size_t sampleDataBufferSize = sizeof(buffer);                                                                                                               \
11714
                                                                                                                                                                    \
11715
        pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks);                                                      \
11716
        if (pSampleData == NULL) {                                                                                                                                  \
11717
            goto on_error;                                                                                                                                          \
11718
        }                                                                                                                                                           \
11719
                                                                                                                                                                    \
11720
        while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) {          \
11721
            if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) {                                                   \
11722
                type* pNewSampleData;                                                                                                                               \
11723
                size_t newSampleDataBufferSize;                                                                                                                     \
11724
                                                                                                                                                                    \
11725
                newSampleDataBufferSize = sampleDataBufferSize * 2;                                                                                                 \
11726
                pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks);    \
11727
                if (pNewSampleData == NULL) {                                                                                                                       \
11728
                    drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks);                                                                          \
11729
                    goto on_error;                                                                                                                                  \
11730
                }                                                                                                                                                   \
11731
                                                                                                                                                                    \
11732
                sampleDataBufferSize = newSampleDataBufferSize;                                                                                                     \
11733
                pSampleData = pNewSampleData;                                                                                                                       \
11734
            }                                                                                                                                                       \
11735
                                                                                                                                                                    \
11736
            DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type)));                   \
11737
            totalPCMFrameCount += pcmFramesRead;                                                                                                                    \
11738
        }                                                                                                                                                           \
11739
                                                                                                                                                                    \
11740
        /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to                                       \
11741
           protect those ears from random noise! */                                                                                                                 \
11742
        DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type)));   \
11743
    } else {                                                                                                                                                        \
11744
        drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type);                                                                                   \
11745
        if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) {                                                                                                            \
11746
            goto on_error;  /* The decoded data is too big. */                                                                                                      \
11747
        }                                                                                                                                                           \
11748
                                                                                                                                                                    \
11749
        pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks);    /* <-- Safe cast as per the check above. */           \
11750
        if (pSampleData == NULL) {                                                                                                                                  \
11751
            goto on_error;                                                                                                                                          \
11752
        }                                                                                                                                                           \
11753
                                                                                                                                                                    \
11754
        totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData);                                                     \
11755
    }                                                                                                                                                               \
11756
                                                                                                                                                                    \
11757
    if (sampleRateOut) *sampleRateOut = pFlac->sampleRate;                                                                                                          \
11758
    if (channelsOut) *channelsOut = pFlac->channels;                                                                                                                \
11759
    if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount;                                                                                         \
11760
                                                                                                                                                                    \
11761
    drflac_close(pFlac);                                                                                                                                            \
11762
    return pSampleData;                                                                                                                                             \
11763
                                                                                                                                                                    \
11764
on_error:                                                                                                                                                           \
11765
    drflac_close(pFlac);                                                                                                                                            \
11766
    return NULL;                                                                                                                                                    \
11767
}
11768

11769
DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11770
DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11771
DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11772

11773
DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11774
{
11775
    drflac* pFlac;
11776

11777
    if (channelsOut) {
11778
        *channelsOut = 0;
11779
    }
11780
    if (sampleRateOut) {
11781
        *sampleRateOut = 0;
11782
    }
11783
    if (totalPCMFrameCountOut) {
11784
        *totalPCMFrameCountOut = 0;
11785
    }
11786

11787
    pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11788
    if (pFlac == NULL) {
11789
        return NULL;
11790
    }
11791

11792
    return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11793
}
11794

11795
DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11796
{
11797
    drflac* pFlac;
11798

11799
    if (channelsOut) {
11800
        *channelsOut = 0;
11801
    }
11802
    if (sampleRateOut) {
11803
        *sampleRateOut = 0;
11804
    }
11805
    if (totalPCMFrameCountOut) {
11806
        *totalPCMFrameCountOut = 0;
11807
    }
11808

11809
    pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11810
    if (pFlac == NULL) {
11811
        return NULL;
11812
    }
11813

11814
    return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11815
}
11816

11817
DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11818
{
11819
    drflac* pFlac;
11820

11821
    if (channelsOut) {
11822
        *channelsOut = 0;
11823
    }
11824
    if (sampleRateOut) {
11825
        *sampleRateOut = 0;
11826
    }
11827
    if (totalPCMFrameCountOut) {
11828
        *totalPCMFrameCountOut = 0;
11829
    }
11830

11831
    pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11832
    if (pFlac == NULL) {
11833
        return NULL;
11834
    }
11835

11836
    return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11837
}
11838

11839
#ifndef DR_FLAC_NO_STDIO
11840
DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11841
{
11842
    drflac* pFlac;
11843

11844
    if (sampleRate) {
11845
        *sampleRate = 0;
11846
    }
11847
    if (channels) {
11848
        *channels = 0;
11849
    }
11850
    if (totalPCMFrameCount) {
11851
        *totalPCMFrameCount = 0;
11852
    }
11853

11854
    pFlac = drflac_open_file(filename, pAllocationCallbacks);
11855
    if (pFlac == NULL) {
11856
        return NULL;
11857
    }
11858

11859
    return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11860
}
11861

11862
DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11863
{
11864
    drflac* pFlac;
11865

11866
    if (sampleRate) {
11867
        *sampleRate = 0;
11868
    }
11869
    if (channels) {
11870
        *channels = 0;
11871
    }
11872
    if (totalPCMFrameCount) {
11873
        *totalPCMFrameCount = 0;
11874
    }
11875

11876
    pFlac = drflac_open_file(filename, pAllocationCallbacks);
11877
    if (pFlac == NULL) {
11878
        return NULL;
11879
    }
11880

11881
    return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11882
}
11883

11884
DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11885
{
11886
    drflac* pFlac;
11887

11888
    if (sampleRate) {
11889
        *sampleRate = 0;
11890
    }
11891
    if (channels) {
11892
        *channels = 0;
11893
    }
11894
    if (totalPCMFrameCount) {
11895
        *totalPCMFrameCount = 0;
11896
    }
11897

11898
    pFlac = drflac_open_file(filename, pAllocationCallbacks);
11899
    if (pFlac == NULL) {
11900
        return NULL;
11901
    }
11902

11903
    return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11904
}
11905
#endif
11906

11907
DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11908
{
11909
    drflac* pFlac;
11910

11911
    if (sampleRate) {
11912
        *sampleRate = 0;
11913
    }
11914
    if (channels) {
11915
        *channels = 0;
11916
    }
11917
    if (totalPCMFrameCount) {
11918
        *totalPCMFrameCount = 0;
11919
    }
11920

11921
    pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11922
    if (pFlac == NULL) {
11923
        return NULL;
11924
    }
11925

11926
    return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11927
}
11928

11929
DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11930
{
11931
    drflac* pFlac;
11932

11933
    if (sampleRate) {
11934
        *sampleRate = 0;
11935
    }
11936
    if (channels) {
11937
        *channels = 0;
11938
    }
11939
    if (totalPCMFrameCount) {
11940
        *totalPCMFrameCount = 0;
11941
    }
11942

11943
    pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11944
    if (pFlac == NULL) {
11945
        return NULL;
11946
    }
11947

11948
    return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11949
}
11950

11951
DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11952
{
11953
    drflac* pFlac;
11954

11955
    if (sampleRate) {
11956
        *sampleRate = 0;
11957
    }
11958
    if (channels) {
11959
        *channels = 0;
11960
    }
11961
    if (totalPCMFrameCount) {
11962
        *totalPCMFrameCount = 0;
11963
    }
11964

11965
    pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11966
    if (pFlac == NULL) {
11967
        return NULL;
11968
    }
11969

11970
    return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11971
}
11972

11973

11974
DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11975
{
11976
    if (pAllocationCallbacks != NULL) {
11977
        drflac__free_from_callbacks(p, pAllocationCallbacks);
11978
    } else {
11979
        drflac__free_default(p, NULL);
11980
    }
11981
}
11982

11983

11984

11985

11986
DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11987
{
11988
    if (pIter == NULL) {
11989
        return;
11990
    }
11991

11992
    pIter->countRemaining = commentCount;
11993
    pIter->pRunningData   = (const char*)pComments;
11994
}
11995

11996
DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11997
{
11998
    drflac_int32 length;
11999
    const char* pComment;
12000

12001
    /* Safety. */
12002
    if (pCommentLengthOut) {
12003
        *pCommentLengthOut = 0;
12004
    }
12005

12006
    if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12007
        return NULL;
12008
    }
12009

12010
    length = drflac__le2host_32_ptr_unaligned(pIter->pRunningData);
12011
    pIter->pRunningData += 4;
12012

12013
    pComment = pIter->pRunningData;
12014
    pIter->pRunningData += length;
12015
    pIter->countRemaining -= 1;
12016

12017
    if (pCommentLengthOut) {
12018
        *pCommentLengthOut = length;
12019
    }
12020

12021
    return pComment;
12022
}
12023

12024

12025

12026

12027
DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
12028
{
12029
    if (pIter == NULL) {
12030
        return;
12031
    }
12032

12033
    pIter->countRemaining = trackCount;
12034
    pIter->pRunningData   = (const char*)pTrackData;
12035
}
12036

12037
DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
12038
{
12039
    drflac_cuesheet_track cuesheetTrack;
12040
    const char* pRunningData;
12041
    drflac_uint64 offsetHi;
12042
    drflac_uint64 offsetLo;
12043

12044
    if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12045
        return DRFLAC_FALSE;
12046
    }
12047

12048
    pRunningData = pIter->pRunningData;
12049

12050
    offsetHi                   = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12051
    offsetLo                   = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12052
    cuesheetTrack.offset       = offsetLo | (offsetHi << 32);
12053
    cuesheetTrack.trackNumber  = pRunningData[0];                                         pRunningData += 1;
12054
    DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC));     pRunningData += 12;
12055
    cuesheetTrack.isAudio      = (pRunningData[0] & 0x80) != 0;
12056
    cuesheetTrack.preEmphasis  = (pRunningData[0] & 0x40) != 0;                           pRunningData += 14;
12057
    cuesheetTrack.indexCount   = pRunningData[0];                                         pRunningData += 1;
12058
    cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData;        pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
12059

12060
    pIter->pRunningData = pRunningData;
12061
    pIter->countRemaining -= 1;
12062

12063
    if (pCuesheetTrack) {
12064
        *pCuesheetTrack = cuesheetTrack;
12065
    }
12066

12067
    return DRFLAC_TRUE;
12068
}
12069

12070
#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
12071
    #pragma GCC diagnostic pop
12072
#endif
12073
#endif  /* dr_flac_c */
12074
#endif  /* DR_FLAC_IMPLEMENTATION */
12075

12076

12077
/*
12078
REVISION HISTORY
12079
================
12080
v0.12.42 - 2023-11-02
12081
  - Fix build for ARMv6-M.
12082
  - Fix a compilation warning with GCC.
12083

12084
v0.12.41 - 2023-06-17
12085
  - Fix an incorrect date in revision history. No functional change.
12086

12087
v0.12.40 - 2023-05-22
12088
  - Minor code restructure. No functional change.
12089

12090
v0.12.39 - 2022-09-17
12091
  - Fix compilation with DJGPP.
12092
  - Fix compilation error with Visual Studio 2019 and the ARM build.
12093
  - Fix an error with SSE 4.1 detection.
12094
  - Add support for disabling wchar_t with DR_WAV_NO_WCHAR.
12095
  - Improve compatibility with compilers which lack support for explicit struct packing.
12096
  - Improve compatibility with low-end and embedded hardware by reducing the amount of stack
12097
    allocation when loading an Ogg encapsulated file.
12098

12099
v0.12.38 - 2022-04-10
12100
  - Fix compilation error on older versions of GCC.
12101

12102
v0.12.37 - 2022-02-12
12103
  - Improve ARM detection.
12104

12105
v0.12.36 - 2022-02-07
12106
  - Fix a compilation error with the ARM build.
12107

12108
v0.12.35 - 2022-02-06
12109
  - Fix a bug due to underestimating the amount of precision required for the prediction stage.
12110
  - Fix some bugs found from fuzz testing.
12111

12112
v0.12.34 - 2022-01-07
12113
  - Fix some misalignment bugs when reading metadata.
12114

12115
v0.12.33 - 2021-12-22
12116
  - Fix a bug with seeking when the seek table does not start at PCM frame 0.
12117

12118
v0.12.32 - 2021-12-11
12119
  - Fix a warning with Clang.
12120

12121
v0.12.31 - 2021-08-16
12122
  - Silence some warnings.
12123

12124
v0.12.30 - 2021-07-31
12125
  - Fix platform detection for ARM64.
12126

12127
v0.12.29 - 2021-04-02
12128
  - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
12129
  - Fix a decoding error due to an incorrect validation check.
12130

12131
v0.12.28 - 2021-02-21
12132
  - Fix a warning due to referencing _MSC_VER when it is undefined.
12133

12134
v0.12.27 - 2021-01-31
12135
  - Fix a static analysis warning.
12136

12137
v0.12.26 - 2021-01-17
12138
  - Fix a compilation warning due to _BSD_SOURCE being deprecated.
12139

12140
v0.12.25 - 2020-12-26
12141
  - Update documentation.
12142

12143
v0.12.24 - 2020-11-29
12144
  - Fix ARM64/NEON detection when compiling with MSVC.
12145

12146
v0.12.23 - 2020-11-21
12147
  - Fix compilation with OpenWatcom.
12148

12149
v0.12.22 - 2020-11-01
12150
  - Fix an error with the previous release.
12151

12152
v0.12.21 - 2020-11-01
12153
  - Fix a possible deadlock when seeking.
12154
  - Improve compiler support for older versions of GCC.
12155

12156
v0.12.20 - 2020-09-08
12157
  - Fix a compilation error on older compilers.
12158

12159
v0.12.19 - 2020-08-30
12160
  - Fix a bug due to an undefined 32-bit shift.
12161

12162
v0.12.18 - 2020-08-14
12163
  - Fix a crash when compiling with clang-cl.
12164

12165
v0.12.17 - 2020-08-02
12166
  - Simplify sized types.
12167

12168
v0.12.16 - 2020-07-25
12169
  - Fix a compilation warning.
12170

12171
v0.12.15 - 2020-07-06
12172
  - Check for negative LPC shifts and return an error.
12173

12174
v0.12.14 - 2020-06-23
12175
  - Add include guard for the implementation section.
12176

12177
v0.12.13 - 2020-05-16
12178
  - Add compile-time and run-time version querying.
12179
    - DRFLAC_VERSION_MINOR
12180
    - DRFLAC_VERSION_MAJOR
12181
    - DRFLAC_VERSION_REVISION
12182
    - DRFLAC_VERSION_STRING
12183
    - drflac_version()
12184
    - drflac_version_string()
12185

12186
v0.12.12 - 2020-04-30
12187
  - Fix compilation errors with VC6.
12188

12189
v0.12.11 - 2020-04-19
12190
  - Fix some pedantic warnings.
12191
  - Fix some undefined behaviour warnings.
12192

12193
v0.12.10 - 2020-04-10
12194
  - Fix some bugs when trying to seek with an invalid seek table.
12195

12196
v0.12.9 - 2020-04-05
12197
  - Fix warnings.
12198

12199
v0.12.8 - 2020-04-04
12200
  - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
12201
  - Fix some static analysis warnings.
12202
  - Minor documentation updates.
12203

12204
v0.12.7 - 2020-03-14
12205
  - Fix compilation errors with VC6.
12206

12207
v0.12.6 - 2020-03-07
12208
  - Fix compilation error with Visual Studio .NET 2003.
12209

12210
v0.12.5 - 2020-01-30
12211
  - Silence some static analysis warnings.
12212

12213
v0.12.4 - 2020-01-29
12214
  - Silence some static analysis warnings.
12215

12216
v0.12.3 - 2019-12-02
12217
  - Fix some warnings when compiling with GCC and the -Og flag.
12218
  - Fix a crash in out-of-memory situations.
12219
  - Fix potential integer overflow bug.
12220
  - Fix some static analysis warnings.
12221
  - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
12222
  - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
12223

12224
v0.12.2 - 2019-10-07
12225
  - Internal code clean up.
12226

12227
v0.12.1 - 2019-09-29
12228
  - Fix some Clang Static Analyzer warnings.
12229
  - Fix an unused variable warning.
12230

12231
v0.12.0 - 2019-09-23
12232
  - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
12233
    routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
12234
    - drflac_open()
12235
    - drflac_open_relaxed()
12236
    - drflac_open_with_metadata()
12237
    - drflac_open_with_metadata_relaxed()
12238
    - drflac_open_file()
12239
    - drflac_open_file_with_metadata()
12240
    - drflac_open_memory()
12241
    - drflac_open_memory_with_metadata()
12242
    - drflac_open_and_read_pcm_frames_s32()
12243
    - drflac_open_and_read_pcm_frames_s16()
12244
    - drflac_open_and_read_pcm_frames_f32()
12245
    - drflac_open_file_and_read_pcm_frames_s32()
12246
    - drflac_open_file_and_read_pcm_frames_s16()
12247
    - drflac_open_file_and_read_pcm_frames_f32()
12248
    - drflac_open_memory_and_read_pcm_frames_s32()
12249
    - drflac_open_memory_and_read_pcm_frames_s16()
12250
    - drflac_open_memory_and_read_pcm_frames_f32()
12251
    Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
12252
    DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
12253
  - Remove deprecated APIs:
12254
    - drflac_read_s32()
12255
    - drflac_read_s16()
12256
    - drflac_read_f32()
12257
    - drflac_seek_to_sample()
12258
    - drflac_open_and_decode_s32()
12259
    - drflac_open_and_decode_s16()
12260
    - drflac_open_and_decode_f32()
12261
    - drflac_open_and_decode_file_s32()
12262
    - drflac_open_and_decode_file_s16()
12263
    - drflac_open_and_decode_file_f32()
12264
    - drflac_open_and_decode_memory_s32()
12265
    - drflac_open_and_decode_memory_s16()
12266
    - drflac_open_and_decode_memory_f32()
12267
  - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
12268
    by doing pFlac->totalPCMFrameCount*pFlac->channels.
12269
  - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
12270
  - Fix errors when seeking to the end of a stream.
12271
  - Optimizations to seeking.
12272
  - SSE improvements and optimizations.
12273
  - ARM NEON optimizations.
12274
  - Optimizations to drflac_read_pcm_frames_s16().
12275
  - Optimizations to drflac_read_pcm_frames_s32().
12276

12277
v0.11.10 - 2019-06-26
12278
  - Fix a compiler error.
12279

12280
v0.11.9 - 2019-06-16
12281
  - Silence some ThreadSanitizer warnings.
12282

12283
v0.11.8 - 2019-05-21
12284
  - Fix warnings.
12285

12286
v0.11.7 - 2019-05-06
12287
  - C89 fixes.
12288

12289
v0.11.6 - 2019-05-05
12290
  - Add support for C89.
12291
  - Fix a compiler warning when CRC is disabled.
12292
  - Change license to choice of public domain or MIT-0.
12293

12294
v0.11.5 - 2019-04-19
12295
  - Fix a compiler error with GCC.
12296

12297
v0.11.4 - 2019-04-17
12298
  - Fix some warnings with GCC when compiling with -std=c99.
12299

12300
v0.11.3 - 2019-04-07
12301
  - Silence warnings with GCC.
12302

12303
v0.11.2 - 2019-03-10
12304
  - Fix a warning.
12305

12306
v0.11.1 - 2019-02-17
12307
  - Fix a potential bug with seeking.
12308

12309
v0.11.0 - 2018-12-16
12310
  - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
12311
    drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
12312
    and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
12313
    dividing it by the channel count, and then do the same with the return value.
12314
  - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
12315
    the changes to drflac_read_*() apply.
12316
  - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
12317
    the changes to drflac_read_*() apply.
12318
  - Optimizations.
12319

12320
v0.10.0 - 2018-09-11
12321
  - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
12322
    need to do it yourself via the callback API.
12323
  - Fix the clang build.
12324
  - Fix undefined behavior.
12325
  - Fix errors with CUESHEET metdata blocks.
12326
  - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
12327
    Vorbis comment API.
12328
  - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
12329
  - Minor optimizations.
12330

12331
v0.9.11 - 2018-08-29
12332
  - Fix a bug with sample reconstruction.
12333

12334
v0.9.10 - 2018-08-07
12335
  - Improve 64-bit detection.
12336

12337
v0.9.9 - 2018-08-05
12338
  - Fix C++ build on older versions of GCC.
12339

12340
v0.9.8 - 2018-07-24
12341
  - Fix compilation errors.
12342

12343
v0.9.7 - 2018-07-05
12344
  - Fix a warning.
12345

12346
v0.9.6 - 2018-06-29
12347
  - Fix some typos.
12348

12349
v0.9.5 - 2018-06-23
12350
  - Fix some warnings.
12351

12352
v0.9.4 - 2018-06-14
12353
  - Optimizations to seeking.
12354
  - Clean up.
12355

12356
v0.9.3 - 2018-05-22
12357
  - Bug fix.
12358

12359
v0.9.2 - 2018-05-12
12360
  - Fix a compilation error due to a missing break statement.
12361

12362
v0.9.1 - 2018-04-29
12363
  - Fix compilation error with Clang.
12364

12365
v0.9 - 2018-04-24
12366
  - Fix Clang build.
12367
  - Start using major.minor.revision versioning.
12368

12369
v0.8g - 2018-04-19
12370
  - Fix build on non-x86/x64 architectures.
12371

12372
v0.8f - 2018-02-02
12373
  - Stop pretending to support changing rate/channels mid stream.
12374

12375
v0.8e - 2018-02-01
12376
  - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
12377
  - Fix a crash the the Rice partition order is invalid.
12378

12379
v0.8d - 2017-09-22
12380
  - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
12381

12382
v0.8c - 2017-09-07
12383
  - Fix warning on non-x86/x64 architectures.
12384

12385
v0.8b - 2017-08-19
12386
  - Fix build on non-x86/x64 architectures.
12387

12388
v0.8a - 2017-08-13
12389
  - A small optimization for the Clang build.
12390

12391
v0.8 - 2017-08-12
12392
  - API CHANGE: Rename dr_* types to drflac_*.
12393
  - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
12394
  - Add support for custom implementations of malloc(), realloc(), etc.
12395
  - Add CRC checking to Ogg encapsulated streams.
12396
  - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
12397
  - Bug fixes.
12398

12399
v0.7 - 2017-07-23
12400
  - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
12401

12402
v0.6 - 2017-07-22
12403
  - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
12404
    never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
12405

12406
v0.5 - 2017-07-16
12407
  - Fix typos.
12408
  - Change drflac_bool* types to unsigned.
12409
  - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
12410

12411
v0.4f - 2017-03-10
12412
  - Fix a couple of bugs with the bitstreaming code.
12413

12414
v0.4e - 2017-02-17
12415
  - Fix some warnings.
12416

12417
v0.4d - 2016-12-26
12418
  - Add support for 32-bit floating-point PCM decoding.
12419
  - Use drflac_int* and drflac_uint* sized types to improve compiler support.
12420
  - Minor improvements to documentation.
12421

12422
v0.4c - 2016-12-26
12423
  - Add support for signed 16-bit integer PCM decoding.
12424

12425
v0.4b - 2016-10-23
12426
  - A minor change to drflac_bool8 and drflac_bool32 types.
12427

12428
v0.4a - 2016-10-11
12429
  - Rename drBool32 to drflac_bool32 for styling consistency.
12430

12431
v0.4 - 2016-09-29
12432
  - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
12433
  - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
12434
  - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
12435
    keep it consistent with drflac_audio.
12436

12437
v0.3f - 2016-09-21
12438
  - Fix a warning with GCC.
12439

12440
v0.3e - 2016-09-18
12441
  - Fixed a bug where GCC 4.3+ was not getting properly identified.
12442
  - Fixed a few typos.
12443
  - Changed date formats to ISO 8601 (YYYY-MM-DD).
12444

12445
v0.3d - 2016-06-11
12446
  - Minor clean up.
12447

12448
v0.3c - 2016-05-28
12449
  - Fixed compilation error.
12450

12451
v0.3b - 2016-05-16
12452
  - Fixed Linux/GCC build.
12453
  - Updated documentation.
12454

12455
v0.3a - 2016-05-15
12456
  - Minor fixes to documentation.
12457

12458
v0.3 - 2016-05-11
12459
  - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12460
  - Lots of clean up.
12461

12462
v0.2b - 2016-05-10
12463
  - Bug fixes.
12464

12465
v0.2a - 2016-05-10
12466
  - Made drflac_open_and_decode() more robust.
12467
  - Removed an unused debugging variable
12468

12469
v0.2 - 2016-05-09
12470
  - Added support for Ogg encapsulation.
12471
  - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12472
    should be relative to the start or the current position. Also changes the seeking rules such that
12473
    seeking offsets will never be negative.
12474
  - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12475

12476
v0.1b - 2016-05-07
12477
  - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12478
  - Removed a stale comment.
12479

12480
v0.1a - 2016-05-05
12481
  - Minor formatting changes.
12482
  - Fixed a warning on the GCC build.
12483

12484
v0.1 - 2016-05-03
12485
  - Initial versioned release.
12486
*/
12487

12488
/*
12489
This software is available as a choice of the following licenses. Choose
12490
whichever you prefer.
12491

12492
===============================================================================
12493
ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12494
===============================================================================
12495
This is free and unencumbered software released into the public domain.
12496

12497
Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12498
software, either in source code form or as a compiled binary, for any purpose,
12499
commercial or non-commercial, and by any means.
12500

12501
In jurisdictions that recognize copyright laws, the author or authors of this
12502
software dedicate any and all copyright interest in the software to the public
12503
domain. We make this dedication for the benefit of the public at large and to
12504
the detriment of our heirs and successors. We intend this dedication to be an
12505
overt act of relinquishment in perpetuity of all present and future rights to
12506
this software under copyright law.
12507

12508
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12509
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12510
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12511
AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12512
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12513
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12514

12515
For more information, please refer to <http://unlicense.org/>
12516

12517
===============================================================================
12518
ALTERNATIVE 2 - MIT No Attribution
12519
===============================================================================
12520
Copyright 2023 David Reid
12521

12522
Permission is hereby granted, free of charge, to any person obtaining a copy of
12523
this software and associated documentation files (the "Software"), to deal in
12524
the Software without restriction, including without limitation the rights to
12525
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12526
of the Software, and to permit persons to whom the Software is furnished to do
12527
so.
12528

12529
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12530
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12531
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12532
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12533
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12534
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
12535
SOFTWARE.
12536
*/
12537

12538
Product

Resources

Company