0
0
Fork 0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2024-11-26 02:55:17 +00:00
yt-dlp/youtube_dlc
pukkandan 8c04f0be96 batch-file enumeration improvements (https://github.com/ytdl-org/youtube-dl/pull/26813)
Co-authored by: glenn-slayden
Modified from c9a9ccf8a3

These improvements apply to reading the list of URLs from the file supplied via the `--batch-file` (`-a`) command line option.

1. Skip blank and empty lines in the file. Currently, lines with leading whitespace are only skipped when that whitespace is followed by a comment character (`#`, `;`, or `]`). This means that empty lines and lines consisting only of whitespace are returned as (trimmed) empty strings in the list of URLs to process.

2. [bug fix] Detect and remove the Unicode BOM when the file descriptor is already decoding Unicode.

With Python 3, the `batch_fd` enumerator returns the lines of the file as Unicode. For UTF-8, this means that the raw BOM bytes from the file `\xef \xbb \xbf` show up converted into a single `\ufeff` character prefixed to the first enumerated text line.

This fix solves several buggy interactions between the presence of BOM, the skipping of comments and/or blank lines, and ensuring the list of URLs is consistently trimmed. For example, if the first line of the file is blank, the BOM is incorrectly returned as a URL standing alone. If the first line contains a URL, it will be prefixed with this unwanted single character--but note that its being there will have inhibited the proper trimming of any leading whitespace. Currently, the `UnicodeBOMIE` helper attempts to recover from some of these error cases, but this fix prevents the error from happening in the first place (at least on Python3). In any case, the `UnicodeBOMIE` approach is flawed, because it is clearly illogical for a BOM to appear in the (non-batch) URL(s) specified directly on the command line (and for that matter, on URLs *after the first line* of a batch list, also)

3. Adds proper trimming of the " #" into the read_batch_urls processing so that the URLs it enumerates are cleaned and trimmed more consistently.
2021-01-09 18:08:03 +05:30
..
downloader Update to ytdl-2021.01.08 2021-01-08 21:59:10 +05:30
extractor [youtube] Fix bug (Closes https://github.com/pukkandan/yt-dlc/issues/10) 2021-01-08 23:27:00 +05:30
postprocessor Allow passing different arguments to different postprocessors 2021-01-08 01:41:08 +05:30
__init__.py Allow passing different arguments to different postprocessors 2021-01-08 01:41:08 +05:30
__main__.py [skip travis] renaming 2020-09-02 20:25:25 +02:00
aes.py [skip travis] renaming 2020-09-02 20:25:25 +02:00
cache.py [skip travis] renaming 2020-09-02 20:25:25 +02:00
compat.py Add --write-*-link by h-h-h-h 2020-12-13 20:05:04 +05:30
jsinterp.py [skip travis] renaming 2020-09-02 20:25:25 +02:00
options.py Allow passing different arguments to different postprocessors 2021-01-08 01:41:08 +05:30
socks.py [skip travis] renaming 2020-09-02 20:25:25 +02:00
swfinterp.py [skip travis] renaming 2020-09-02 20:25:25 +02:00
update.py Disable Updates 2021-01-06 17:43:27 +05:30
utils.py batch-file enumeration improvements (https://github.com/ytdl-org/youtube-dl/pull/26813) 2021-01-09 18:08:03 +05:30
version.py [version] update 2021-01-08 22:59:45 +05:30
YoutubeDL.py Add post_hooks option to YoutubeDL.py (https://github.com/ytdl-org/youtube-dl/pull/27573) 2021-01-09 16:00:49 +05:30