0
0
Fork 0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2024-11-20 01:55:11 +00:00

Revert "pull changes from remote master (#190)" (#193)

This reverts commit b827ee921f.
This commit is contained in:
Aakash Gajjar 2020-08-26 20:22:32 +05:30 committed by GitHub
parent 7f7edf837c
commit 19a107f21c
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
134 changed files with 2623 additions and 4150 deletions

View file

@ -18,7 +18,7 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ ## Checklist
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2020.07.28** - [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ ## Verbose log
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.07.28 [debug] youtube-dl version 2019.11.28
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View file

@ -19,7 +19,7 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ ## Checklist
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2020.07.28** - [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View file

@ -18,13 +18,13 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2020.07.28** - [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones

View file

@ -18,7 +18,7 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ ## Checklist
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2020.07.28** - [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ ## Verbose log
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.07.28 [debug] youtube-dl version 2019.11.28
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View file

@ -19,13 +19,13 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates. - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes (like this [x])
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2020.07.28** - [ ] I've verified that I'm running youtube-dl version **2019.11.28**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones

View file

@ -13,7 +13,7 @@ dist: trusty
env: env:
- YTDL_TEST_SET=core - YTDL_TEST_SET=core
- YTDL_TEST_SET=download - YTDL_TEST_SET=download
jobs: matrix:
include: include:
- python: 3.7 - python: 3.7
dist: xenial dist: xenial
@ -35,11 +35,6 @@ jobs:
env: YTDL_TEST_SET=download env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core - env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download - env: JYTHON=true; YTDL_TEST_SET=download
- name: flake8
python: 3.8
dist: xenial
install: pip install flake8
script: flake8 .
fast_finish: true fast_finish: true
allow_failures: allow_failures:
- env: YTDL_TEST_SET=download - env: YTDL_TEST_SET=download

View file

@ -153,7 +153,7 @@ ### Adding support for a new site
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart): 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py $ flake8 youtube_dl/extractor/yourextractor.py

308
ChangeLog
View file

@ -1,316 +1,10 @@
version 2020.07.28 version <unreleased>
Extractors
* [youtube] Fix sigfunc name extraction (#26134, #26135, #26136, #26137)
* [youtube] Improve description extraction (#25937, #25980)
* [wistia] Restrict embed regular expression (#25969)
* [youtube] Prevent excess HTTP 301 (#25786)
+ [youtube:playlists] Extend URL regular expression (#25810)
+ [bellmedia] Add support for cp24.com clip URLs (#25764)
* [brightcove] Improve embed detection (#25674)
version 2020.06.16.1
Extractors
* [youtube] Force old layout (#25682, #25683, #25680, #25686)
* [youtube] Fix categories and improve tags extraction
version 2020.06.16
Extractors
* [youtube] Fix uploader id and uploader URL extraction
* [youtube] Improve view count extraction
* [youtube] Fix upload date extraction (#25677)
* [youtube] Fix thumbnails extraction (#25676)
* [youtube] Fix playlist and feed extraction (#25675)
+ [facebook] Add support for single-video ID links
+ [youtube] Extract chapters from JSON (#24819)
+ [kaltura] Add support for multiple embeds on a webpage (#25523)
version 2020.06.06
Extractors
* [tele5] Bypass geo restriction
+ [jwplatform] Add support for bypass geo restriction
* [tele5] Prefer jwplatform over nexx (#25533)
* [twitch:stream] Expect 400 and 410 HTTP errors from API
* [twitch:stream] Fix extraction (#25528)
* [twitch] Fix thumbnails extraction (#25531)
+ [twitch] Pass v5 Accept HTTP header (#25531)
* [brightcove] Fix subtitles extraction (#25540)
+ [malltv] Add support for sk.mall.tv (#25445)
* [periscope] Fix untitled broadcasts (#25482)
* [jwplatform] Improve embeds extraction (#25467)
version 2020.05.29
Core
* [postprocessor/ffmpeg] Embed series metadata with --add-metadata
* [utils] Fix file permissions in write_json_file (#12471, #25122)
Extractors
* [ard:beta] Extend URL regular expression (#25405)
+ [youtube] Add support for more invidious instances (#25417)
* [giantbomb] Extend URL regular expression (#25222)
* [ard] Improve URL regular expression (#25134, #25198)
* [redtube] Improve formats extraction and extract m3u8 formats (#25311,
#25321)
* [indavideo] Switch to HTTPS for API request (#25191)
* [redtube] Improve title extraction (#25208)
* [vimeo] Improve format extraction and sorting (#25285)
* [soundcloud] Reduce API playlist page limit (#25274)
+ [youtube] Add support for yewtu.be (#25226)
* [mailru] Fix extraction (#24530, #25239)
* [bellator] Fix mgid extraction (#25195)
version 2020.05.08
Core
* [downloader/http] Request last data block of exact remaining size
* [downloader/http] Finish downloading once received data length matches
expected
* [extractor/common] Use compat_cookiejar_Cookie for _set_cookie to always
ensure cookie name and value are bytestrings on python 2 (#23256, #24776)
+ [compat] Introduce compat_cookiejar_Cookie
* [utils] Improve cookie files support
+ Add support for UTF-8 in cookie files
* Skip malformed cookie file entries instead of crashing (invalid entry
length, invalid expires at)
Extractors
* [youtube] Improve signature cipher extraction (#25187, #25188)
* [iprima] Improve extraction (#25138)
* [uol] Fix extraction (#22007)
+ [orf] Add support for more radio stations (#24938, #24968)
* [dailymotion] Fix typo
- [puhutv] Remove no longer available HTTP formats (#25124)
version 2020.05.03
Core
+ [extractor/common] Extract multiple JSON-LD entries
* [options] Clarify doc on --exec command (#19087, #24883)
* [extractor/common] Skip malformed ISM manifest XMLs while extracting
ISM formats (#24667)
Extractors
* [crunchyroll] Fix and improve extraction (#25096, #25060)
* [youtube] Improve player id extraction
* [youtube] Use redirected video id if any (#25063)
* [yahoo] Fix GYAO Player extraction and relax URL regular expression
(#24178, #24778)
* [tvplay] Fix Viafree extraction (#15189, #24473, #24789)
* [tenplay] Relax URL regular expression (#25001)
+ [prosiebensat1] Extract series metadata
* [prosiebensat1] Improve extraction and remove 7tv.de support (#24948)
- [prosiebensat1] Remove 7tv.de support (#24948)
* [youtube] Fix DRM videos detection (#24736)
* [thisoldhouse] Fix video id extraction (#24548, #24549)
+ [soundcloud] Extract AAC format (#19173, #24708)
* [youtube] Skip broken multifeed videos (#24711)
* [nova:embed] Fix extraction (#24700)
* [motherless] Fix extraction (#24699)
* [twitch:clips] Extend URL regular expression (#24290, #24642)
* [tv4] Fix ISM formats extraction (#24667)
* [tele5] Fix extraction (#24553)
+ [mofosex] Add support for generic embeds (#24633)
+ [youporn] Add support for generic embeds
+ [spankwire] Add support for generic embeds (#24633)
* [spankwire] Fix extraction (#18924, #20648)
version 2020.03.24
Core
- [utils] Revert support for cookie files with spaces used instead of tabs
Extractors
* [teachable] Update upskillcourses and gns3 domains
* [generic] Look for teachable embeds before wistia
+ [teachable] Extract chapter metadata (#24421)
+ [bilibili] Add support for player.bilibili.com (#24402)
+ [bilibili] Add support for new URL schema with BV ids (#24439, #24442)
* [limelight] Remove disabled API requests (#24255)
* [soundcloud] Fix download URL extraction (#24394)
+ [cbc:watch] Add support for authentication (#19160)
* [hellporno] Fix extraction (#24399)
* [xtube] Fix formats extraction (#24348)
* [ndr] Fix extraction (#24326)
* [nhk] Update m3u8 URL and use native HLS downloader (#24329)
- [nhk] Remove obsolete rtmp formats (#24329)
* [nhk] Relax URL regular expression (#24329)
- [vimeo] Revert fix showcase password protected video extraction (#24224)
version 2020.03.08
Core
+ [utils] Add support for cookie files with spaces used instead of tabs
Extractors
+ [pornhub] Add support for pornhubpremium.com (#24288)
- [youtube] Remove outdated code and unnecessary requests
* [youtube] Improve extraction in 429 HTTP error conditions (#24283)
* [nhk] Update API version (#24270)
version 2020.03.06
Extractors
* [youtube] Fix age-gated videos support without login (#24248)
* [vimeo] Fix showcase password protected video extraction (#24224)
* [pornhub] Improve title extraction (#24184)
* [peertube] Improve extraction (#23657)
+ [servus] Add support for new URL schema (#23475, #23583, #24142)
* [vimeo] Fix subtitles URLs (#24209)
version 2020.03.01
Core
* [YoutubeDL] Force redirect URL to unicode on python 2
- [options] Remove duplicate short option -v for --version (#24162)
Extractors
* [xhamster] Fix extraction (#24205)
* [franceculture] Fix extraction (#24204)
+ [telecinco] Add support for article opening videos
* [telecinco] Fix extraction (#24195)
* [xtube] Fix metadata extraction (#21073, #22455)
* [youjizz] Fix extraction (#24181)
- Remove no longer needed compat_str around geturl
* [pornhd] Fix extraction (#24128)
+ [teachable] Add support for multiple videos per lecture (#24101)
+ [wistia] Add support for multiple generic embeds (#8347, 11385)
* [imdb] Fix extraction (#23443)
* [tv2dk:bornholm:play] Fix extraction (#24076)
version 2020.02.16
Core
* [YoutubeDL] Fix playlist entry indexing with --playlist-items (#10591,
#10622)
* [update] Fix updating via symlinks (#23991)
+ [compat] Introduce compat_realpath (#23991)
Extractors
+ [npr] Add support for streams (#24042)
+ [24video] Add support for porn.24video.net (#23779, #23784)
- [jpopsuki] Remove extractor (#23858)
* [nova] Improve extraction (#23690)
* [nova:embed] Improve (#23690)
* [nova:embed] Fix extraction (#23672)
+ [abc:iview] Add support for 720p (#22907, #22921)
* [nytimes] Improve format sorting (#24010)
+ [toggle] Add support for mewatch.sg (#23895, #23930)
* [thisoldhouse] Fix extraction (#23951)
+ [popcorntimes] Add support for popcorntimes.tv (#23949)
* [sportdeutschland] Update to new API
* [twitch:stream] Lowercase channel id for stream request (#23917)
* [tv5mondeplus] Fix extraction (#23907, #23911)
* [tva] Relax URL regular expression (#23903)
* [vimeo] Fix album extraction (#23864)
* [viewlift] Improve extraction
* Fix extraction (#23851)
+ Add support for authentication
+ Add support for more domains
* [svt] Fix series extraction (#22297)
* [svt] Fix article extraction (#22897, #22919)
* [soundcloud] Imporve private playlist/set tracks extraction (#3707)
version 2020.01.24
Extractors
* [youtube] Fix sigfunc name extraction (#23819)
* [stretchinternet] Fix extraction (#4319)
* [voicerepublic] Fix extraction
* [azmedien] Fix extraction (#23783)
* [businessinsider] Fix jwplatform id extraction (#22929, #22954)
+ [24video] Add support for 24video.vip (#23753)
* [ivi:compilation] Fix entries extraction (#23770)
* [ard] Improve extraction (#23761)
* Simplify extraction
+ Extract age limit and series
* Bypass geo-restriction
+ [nbc] Add support for nbc multi network URLs (#23049)
* [americastestkitchen] Fix extraction
* [zype] Improve extraction
+ Extract subtitles (#21258)
+ Support URLs with alternative keys/tokens (#21258)
+ Extract more metadata
* [orf:tvthek] Improve geo restricted videos detection (#23741)
* [soundcloud] Restore previews extraction (#23739)
version 2020.01.15
Extractors
* [yourporn] Fix extraction (#21645, #22255, #23459)
+ [canvas] Add support for new API endpoint (#17680, #18629)
* [ndr:base:embed] Improve thumbnails extraction (#23731)
+ [vodplatform] Add support for embed.kwikmotion.com domain
+ [twitter] Add support for promo_video_website cards (#23711)
* [orf:radio] Clean description and improve extraction
* [orf:fm4] Fix extraction (#23599)
* [safari] Fix kaltura session extraction (#23679, #23670)
* [lego] Fix extraction and extract subtitle (#23687)
* [cloudflarestream] Improve extraction
+ Add support for bytehighway.net domain
+ Add support for signed URLs
+ Extract thumbnail
* [naver] Improve extraction
* Improve geo-restriction handling
+ Extract automatic captions
+ Extract uploader metadata
+ Extract VLive HLS formats
* Improve metadata extraction
- [pandatv] Remove extractor (#23630)
* [dctp] Fix format extraction (#23656)
+ [scrippsnetworks] Add support for www.discovery.com videos
* [discovery] Fix anonymous token extraction (#23650)
* [nrktv:seriebase] Fix extraction (#23625, #23537)
* [wistia] Improve format extraction and extract subtitles (#22590)
* [vice] Improve extraction (#23631)
* [redtube] Detect private videos (#23518)
version 2020.01.01
Extractors
* [brightcove] Invalidate policy key cache on failing requests
* [pornhub] Improve locked videos detection (#22449, #22780)
+ [pornhub] Add support for m3u8 formats
* [pornhub] Fix extraction (#22749, #23082)
* [brightcove] Update policy key on failing requests
* [spankbang] Improve removed video detection (#23423)
* [spankbang] Fix extraction (#23307, #23423, #23444)
* [soundcloud] Automatically update client id on failing requests
* [prosiebensat1] Improve geo restriction handling (#23571)
* [brightcove] Cache brightcove player policy keys
* [teachable] Fail with error message if no video URL found
* [teachable] Improve locked lessons detection (#23528)
+ [scrippsnetworks] Add support for Scripps Networks sites (#19857, #22981)
* [mitele] Fix extraction (#21354, #23456)
* [soundcloud] Update client id (#23516)
* [mailru] Relax URL regular expressions (#23509)
version 2019.12.25
Core Core
* [utils] Improve str_to_int * [utils] Improve str_to_int
+ [downloader/hls] Add ability to override AES decryption key URL (#17521) + [downloader/hls] Add ability to override AES decryption key URL (#17521)
Extractors Extractors
* [mediaset] Fix parse formats (#23508)
+ [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291) + [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291)
+ [slideslive] Add support for url and vimeo service names (#23414) + [slideslive] Add support for url and vimeo service names (#23414)
* [slideslive] Fix extraction (#23413) * [slideslive] Fix extraction (#23413)

View file

@ -434,9 +434,9 @@ ## Post-processing Options:
either the path to the binary or its either the path to the binary or its
containing directory. containing directory.
--exec CMD Execute a command on the file after --exec CMD Execute a command on the file after
downloading and post-processing, similar to downloading, similar to find's -exec
find's -exec syntax. Example: --exec 'adb syntax. Example: --exec 'adb push {}
push {} /sdcard/Music/ && rm {}' /sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to other format --convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt|lrc) (currently supported: srt|ass|vtt|lrc)
@ -835,9 +835,7 @@ ### ExtractorError: Could not find JS function u'OF'
### HTTP Error 429: Too Many Requests or 402: Payment Required ### HTTP Error 429: Too Many Requests or 402: Payment Required
These two error codes indicate that the service is blocking your IP address because of overuse. Usually this is a soft block meaning that you can gain access again after solving CAPTCHA. Just open a browser and solve a CAPTCHA the service suggests you and after that [pass cookies](#how-do-i-pass-cookies-to-youtube-dl) to youtube-dl. Note that if your machine has multiple external IPs then you should also pass exactly the same IP you've used for solving CAPTCHA with [`--source-address`](#network-options). Also you may need to pass a `User-Agent` HTTP header of your browser with [`--user-agent`](#workarounds). These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
If this is not the case (no CAPTCHA suggested to solve by the service) then you can contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
### SyntaxError: Non-ASCII character ### SyntaxError: Non-ASCII character
@ -1032,7 +1030,7 @@ ### Adding support for a new site
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart): 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py $ flake8 youtube_dl/extractor/yourextractor.py

View file

@ -1,6 +1,7 @@
#!/usr/bin/env python #!/usr/bin/env python
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import io import io
import json import json
import mimetypes import mimetypes
@ -14,6 +15,7 @@
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_basestring, compat_basestring,
compat_input,
compat_getpass, compat_getpass,
compat_print, compat_print,
compat_urllib_request, compat_urllib_request,
@ -38,20 +40,28 @@ def _init_github_account(self):
try: try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE) info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None: if info is not None:
self._token = info[2] self._username = info[0]
self._password = info[2]
compat_print('Using GitHub credentials found in .netrc...') compat_print('Using GitHub credentials found in .netrc...')
return return
else: else:
compat_print('No GitHub credentials found in .netrc') compat_print('No GitHub credentials found in .netrc')
except (IOError, netrc.NetrcParseError): except (IOError, netrc.NetrcParseError):
compat_print('Unable to parse .netrc') compat_print('Unable to parse .netrc')
self._token = compat_getpass( self._username = compat_input(
'Type your GitHub PAT (personal access token) and press [Return]: ') 'Type your GitHub username or email address and press [Return]: ')
self._password = compat_getpass(
'Type your GitHub password and press [Return]: ')
def _call(self, req): def _call(self, req):
if isinstance(req, compat_basestring): if isinstance(req, compat_basestring):
req = sanitized_Request(req) req = sanitized_Request(req)
req.add_header('Authorization', 'token %s' % self._token) # Authorizing manually since GitHub does not response with 401 with
# WWW-Authenticate header set (see
# https://developer.github.com/v3/#basic-authentication)
b64 = base64.b64encode(
('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
req.add_header('Authorization', 'Basic %s' % b64)
response = self._opener.open(req).read().decode('utf-8') response = self._opener.open(req).read().decode('utf-8')
return json.loads(response) return json.loads(response)

View file

@ -28,11 +28,10 @@ # Supported sites
- **acast:channel** - **acast:channel**
- **ADN**: Anime Digital Network - **ADN**: Anime Digital Network
- **AdobeConnect** - **AdobeConnect**
- **adobetv** - **AdobeTV**
- **adobetv:channel** - **AdobeTVChannel**
- **adobetv:embed** - **AdobeTVShow**
- **adobetv:show** - **AdobeTVVideo**
- **adobetv:video**
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
- **afreecatv**: afreecatv.com - **afreecatv**: afreecatv.com
@ -98,7 +97,6 @@ # Supported sites
- **BiliBili** - **BiliBili**
- **BilibiliAudio** - **BilibiliAudio**
- **BilibiliAudioAlbum** - **BilibiliAudioAlbum**
- **BiliBiliPlayer**
- **BioBioChileTV** - **BioBioChileTV**
- **BIQLE** - **BIQLE**
- **BitChute** - **BitChute**
@ -390,6 +388,7 @@ # Supported sites
- **JeuxVideo** - **JeuxVideo**
- **Joj** - **Joj**
- **Jove** - **Jove**
- **jpopsuki.tv**
- **JWPlatform** - **JWPlatform**
- **Kakao** - **Kakao**
- **Kaltura** - **Kaltura**
@ -397,7 +396,6 @@ # Supported sites
- **Kankan** - **Kankan**
- **Karaoketv** - **Karaoketv**
- **KarriereVideos** - **KarriereVideos**
- **Katsomo**
- **KeezMovies** - **KeezMovies**
- **Ketnet** - **Ketnet**
- **KhanAcademy** - **KhanAcademy**
@ -405,6 +403,7 @@ # Supported sites
- **KinjaEmbed** - **KinjaEmbed**
- **KinoPoisk** - **KinoPoisk**
- **KonserthusetPlay** - **KonserthusetPlay**
- **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью - **KrasView**: Красвью
- **Ku6** - **Ku6**
- **KUSI** - **KUSI**
@ -497,7 +496,6 @@ # Supported sites
- **MNetTV** - **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex** - **Mofosex**
- **MofosexEmbed**
- **Mojvideo** - **Mojvideo**
- **Morningstar**: morningstar.com - **Morningstar**: morningstar.com
- **Motherless** - **Motherless**
@ -515,6 +513,7 @@ # Supported sites
- **mtvjapan** - **mtvjapan**
- **mtvservices:embedded** - **mtvservices:embedded**
- **MuenchenTV**: münchen.tv - **MuenchenTV**: münchen.tv
- **MusicPlayOn**
- **mva**: Microsoft Virtual Academy videos - **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses - **mva:course**: Microsoft Virtual Academy courses
- **Mwave** - **Mwave**
@ -620,25 +619,16 @@ # Supported sites
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
- **OraTV** - **OraTV**
- **orf:burgenland**: Radio Burgenland
- **orf:fm4**: radio FM4 - **orf:fm4**: radio FM4
- **orf:fm4:story**: fm4.orf.at stories - **orf:fm4:story**: fm4.orf.at stories
- **orf:iptv**: iptv.ORF.at - **orf:iptv**: iptv.ORF.at
- **orf:kaernten**: Radio Kärnten
- **orf:noe**: Radio Niederösterreich
- **orf:oberoesterreich**: Radio Oberösterreich
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:oe3**: Radio Österreich 3
- **orf:salzburg**: Radio Salzburg
- **orf:steiermark**: Radio Steiermark
- **orf:tirol**: Radio Tirol
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
- **orf:vorarlberg**: Radio Vorarlberg
- **orf:wien**: Radio Wien
- **OsnatelTV** - **OsnatelTV**
- **OutsideTV** - **OutsideTV**
- **PacktPub** - **PacktPub**
- **PacktPubCourse** - **PacktPubCourse**
- **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **ParamountNetwork** - **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
@ -674,7 +664,6 @@ # Supported sites
- **Pokemon** - **Pokemon**
- **PolskieRadio** - **PolskieRadio**
- **PolskieRadioCategory** - **PolskieRadioCategory**
- **Popcorntimes**
- **PopcornTV** - **PopcornTV**
- **PornCom** - **PornCom**
- **PornerBros** - **PornerBros**
@ -772,7 +761,6 @@ # Supported sites
- **screen.yahoo:search**: Yahoo screen search - **screen.yahoo:search**: Yahoo screen search
- **Screencast** - **Screencast**
- **ScreencastOMatic** - **ScreencastOMatic**
- **ScrippsNetworks**
- **scrippsnetworks:watch** - **scrippsnetworks:watch**
- **SCTE** - **SCTE**
- **SCTECourse** - **SCTECourse**
@ -925,7 +913,6 @@ # Supported sites
- **tv2.hu** - **tv2.hu**
- **TV2Article** - **TV2Article**
- **TV2DK** - **TV2DK**
- **TV2DKBornholmPlay**
- **TV4**: tv4.se and tv4play.se - **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+ - **TV5MondePlus**: TV5MONDE+
- **TVA** - **TVA**
@ -967,7 +954,6 @@ # Supported sites
- **udemy** - **udemy**
- **udemy:course** - **udemy:course**
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **UFCArabia**
- **UFCTV** - **UFCTV**
- **UKTVPlay** - **UKTVPlay**
- **umg:de**: Universal Music Deutschland - **umg:de**: Universal Music Deutschland
@ -1007,6 +993,7 @@ # Supported sites
- **videomore** - **videomore**
- **videomore:season** - **videomore:season**
- **videomore:video** - **videomore:video**
- **VideoPremium**
- **VideoPress** - **VideoPress**
- **Vidio** - **Vidio**
- **VidLii** - **VidLii**
@ -1016,8 +1003,8 @@ # Supported sites
- **Vidzi** - **Vidzi**
- **vier**: vier.be and vijf.be - **vier**: vier.be and vijf.be
- **vier:videos** - **vier:videos**
- **viewlift** - **ViewLift**
- **viewlift:embed** - **ViewLiftEmbed**
- **Viidea** - **Viidea**
- **viki** - **viki**
- **viki:channel** - **viki:channel**

View file

@ -816,15 +816,11 @@ def test_playlist_items_selection(self):
'webpage_url': 'http://example.com', 'webpage_url': 'http://example.com',
} }
def get_downloaded_info_dicts(params):
ydl = YDL(params)
# make a deep copy because the dictionary and nested entries
# can be modified
ydl.process_ie_result(copy.deepcopy(playlist))
return ydl.downloaded_info_dicts
def get_ids(params): def get_ids(params):
return [int(v['id']) for v in get_downloaded_info_dicts(params)] ydl = YDL(params)
# make a copy because the dictionary can be modified
ydl.process_ie_result(playlist.copy())
return [int(v['id']) for v in ydl.downloaded_info_dicts]
result = get_ids({}) result = get_ids({})
self.assertEqual(result, [1, 2, 3, 4]) self.assertEqual(result, [1, 2, 3, 4])
@ -856,22 +852,6 @@ def get_ids(params):
result = get_ids({'playlist_items': '2-4,3-4,3'}) result = get_ids({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result, [2, 3, 4]) self.assertEqual(result, [2, 3, 4])
# Tests for https://github.com/ytdl-org/youtube-dl/issues/10591
# @{
result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result[0]['playlist_index'], 2)
self.assertEqual(result[1]['playlist_index'], 3)
result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result[0]['playlist_index'], 2)
self.assertEqual(result[1]['playlist_index'], 3)
self.assertEqual(result[2]['playlist_index'], 4)
result = get_downloaded_info_dicts({'playlist_items': '4,2'})
self.assertEqual(result[0]['playlist_index'], 4)
self.assertEqual(result[1]['playlist_index'], 2)
# @}
def test_urlopen_no_file_protocol(self): def test_urlopen_no_file_protocol(self):
# see https://github.com/ytdl-org/youtube-dl/issues/8227 # see https://github.com/ytdl-org/youtube-dl/issues/8227
ydl = YDL() ydl = YDL()

View file

@ -39,13 +39,6 @@ def assert_cookie_has_value(key):
assert_cookie_has_value('HTTPONLY_COOKIE') assert_cookie_has_value('HTTPONLY_COOKIE')
assert_cookie_has_value('JS_ACCESSIBLE_COOKIE') assert_cookie_has_value('JS_ACCESSIBLE_COOKIE')
def test_malformed_cookies(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/malformed_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
# Cookies should be empty since all malformed cookie file entries
# will be ignored
self.assertFalse(cookiejar._cookies)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View file

@ -26,6 +26,7 @@
ThePlatformIE, ThePlatformIE,
ThePlatformFeedIE, ThePlatformFeedIE,
RTVEALaCartaIE, RTVEALaCartaIE,
FunnyOrDieIE,
DemocracynowIE, DemocracynowIE,
) )
@ -321,6 +322,18 @@ def test_allsubtitles(self):
self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca') self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca')
class TestFunnyOrDieSubtitles(BaseTestSubtitles):
url = 'http://www.funnyordie.com/videos/224829ff6d/judd-apatow-will-direct-your-vine'
IE = FunnyOrDieIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'c5593c193eacd353596c11c2d4f9ecc4')
class TestDemocracynowSubtitles(BaseTestSubtitles): class TestDemocracynowSubtitles(BaseTestSubtitles):
url = 'http://www.democracynow.org/shows/2015/7/3' url = 'http://www.democracynow.org/shows/2015/7/3'
IE = DemocracynowIE IE = DemocracynowIE

View file

@ -267,7 +267,7 @@ def test_youtube_chapters(self):
for description, duration, expected_chapters in self._TEST_CASES: for description, duration, expected_chapters in self._TEST_CASES:
ie = YoutubeIE() ie = YoutubeIE()
expect_value( expect_value(
self, ie._extract_chapters_from_description(description, duration), self, ie._extract_chapters(description, duration),
expected_chapters, None) expected_chapters, None)

View file

@ -74,28 +74,6 @@
] ]
class TestPlayerInfo(unittest.TestCase):
def test_youtube_extract_player_info(self):
PLAYER_URLS = (
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
# obsolete
('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
('https://www.youtube.com/yts/jsbin/player_ias-vflCPQUIL/en_US/base.js', 'vflCPQUIL'),
('https://www.youtube.com/yts/jsbin/player-vflzQZbt7/en_US/base.js', 'vflzQZbt7'),
('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'),
('http://s.ytimg.com/yt/swfbin/watch_as3-vflrEm9Nq.swf', 'vflrEm9Nq'),
('https://s.ytimg.com/yts/swfbin/player-vflenCdZL/watch_as3.swf', 'vflenCdZL'),
)
for player_url, expected_player_id in PLAYER_URLS:
expected_player_type = player_url.split('.')[-1]
player_type, player_id = YoutubeIE._extract_player_info(player_url)
self.assertEqual(player_type, expected_player_type)
self.assertEqual(player_id, expected_player_id)
class TestSignature(unittest.TestCase): class TestSignature(unittest.TestCase):
def setUp(self): def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__)) TEST_DIR = os.path.dirname(os.path.abspath(__file__))

View file

@ -1,9 +0,0 @@
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file! Do not edit.
# Cookie file entry with invalid number of fields - 6 instead of 7
www.foobar.foobar FALSE / FALSE 0 COOKIE
# Cookie file entry with invalid expires at
www.foobar.foobar FALSE / FALSE 1.7976931348623157e+308 COOKIE VALUE

View file

@ -92,7 +92,6 @@
YoutubeDLCookieJar, YoutubeDLCookieJar,
YoutubeDLCookieProcessor, YoutubeDLCookieProcessor,
YoutubeDLHandler, YoutubeDLHandler,
YoutubeDLRedirectHandler,
) )
from .cache import Cache from .cache import Cache
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
@ -991,7 +990,7 @@ def report_download(num_entries):
'playlist_title': ie_result.get('title'), 'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'), 'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'), 'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart, 'playlist_index': i + playliststart,
'extractor': ie_result['extractor'], 'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'], 'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']), 'webpage_url_basename': url_basename(ie_result['webpage_url']),
@ -2344,7 +2343,6 @@ def _setup_opener(self):
debuglevel = 1 if self.params.get('debug_printtraffic') else 0 debuglevel = 1 if self.params.get('debug_printtraffic') else 0
https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel) https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel) ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
redirect_handler = YoutubeDLRedirectHandler()
data_handler = compat_urllib_request_DataHandler() data_handler = compat_urllib_request_DataHandler()
# When passing our own FileHandler instance, build_opener won't add the # When passing our own FileHandler instance, build_opener won't add the
@ -2358,7 +2356,7 @@ def file_open(*args, **kwargs):
file_handler.file_open = file_open file_handler.file_open = file_open
opener = compat_urllib_request.build_opener( opener = compat_urllib_request.build_opener(
proxy_handler, https_handler, cookie_processor, ydlh, redirect_handler, data_handler, file_handler) proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler)
# Delete the default user-agent header, which would otherwise apply in # Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play # cases where our custom HTTP handler doesn't come into play

View file

@ -57,17 +57,6 @@
except ImportError: # Python 2 except ImportError: # Python 2
import cookielib as compat_cookiejar import cookielib as compat_cookiejar
if sys.version_info[0] == 2:
class compat_cookiejar_Cookie(compat_cookiejar.Cookie):
def __init__(self, version, name, value, *args, **kwargs):
if isinstance(name, compat_str):
name = name.encode()
if isinstance(value, compat_str):
value = value.encode()
compat_cookiejar.Cookie.__init__(self, version, name, value, *args, **kwargs)
else:
compat_cookiejar_Cookie = compat_cookiejar.Cookie
try: try:
import http.cookies as compat_cookies import http.cookies as compat_cookies
except ImportError: # Python 2 except ImportError: # Python 2
@ -2765,17 +2754,6 @@ def compat_expanduser(path):
compat_expanduser = os.path.expanduser compat_expanduser = os.path.expanduser
if compat_os_name == 'nt' and sys.version_info < (3, 8):
# os.path.realpath on Windows does not follow symbolic links
# prior to Python 3.8 (see https://bugs.python.org/issue9949)
def compat_realpath(path):
while os.path.islink(path):
path = os.path.abspath(os.readlink(path))
return path
else:
compat_realpath = os.path.realpath
if sys.version_info < (3, 0): if sys.version_info < (3, 0):
def compat_print(s): def compat_print(s):
from .utils import preferredencoding from .utils import preferredencoding
@ -2998,7 +2976,6 @@ def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
'compat_basestring', 'compat_basestring',
'compat_chr', 'compat_chr',
'compat_cookiejar', 'compat_cookiejar',
'compat_cookiejar_Cookie',
'compat_cookies', 'compat_cookies',
'compat_ctypes_WINFUNCTYPE', 'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element', 'compat_etree_Element',
@ -3021,7 +2998,6 @@ def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
'compat_os_name', 'compat_os_name',
'compat_parse_qs', 'compat_parse_qs',
'compat_print', 'compat_print',
'compat_realpath',
'compat_setenv', 'compat_setenv',
'compat_shlex_quote', 'compat_shlex_quote',
'compat_shlex_split', 'compat_shlex_split',

View file

@ -227,7 +227,7 @@ def retry(e):
while True: while True:
try: try:
# Download and write # Download and write
data_block = ctx.data.read(block_size if data_len is None else min(block_size, data_len - byte_counter)) data_block = ctx.data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
# socket.timeout is a subclass of socket.error but may not have # socket.timeout is a subclass of socket.error but may not have
# errno set # errno set
except socket.timeout as e: except socket.timeout as e:
@ -299,7 +299,7 @@ def retry(e):
'elapsed': now - ctx.start_time, 'elapsed': now - ctx.start_time,
}) })
if data_len is not None and byte_counter == data_len: if is_test and byte_counter == data_len:
break break
if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len: if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:

View file

@ -110,17 +110,17 @@ class ABCIViewIE(InfoExtractor):
# ABC iview programs are normally available for 14 days only. # ABC iview programs are normally available for 14 days only.
_TESTS = [{ _TESTS = [{
'url': 'https://iview.abc.net.au/show/gruen/series/11/video/LE1927H001S00', 'url': 'https://iview.abc.net.au/show/ben-and-hollys-little-kingdom/series/0/video/ZX9371A050S00',
'md5': '67715ce3c78426b11ba167d875ac6abf', 'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'info_dict': { 'info_dict': {
'id': 'LE1927H001S00', 'id': 'ZX9371A050S00',
'ext': 'mp4', 'ext': 'mp4',
'title': "Series 11 Ep 1", 'title': "Gaston's Birthday",
'series': "Gruen", 'series': "Ben And Holly's Little Kingdom",
'description': 'md5:52cc744ad35045baf6aded2ce7287f67', 'description': 'md5:f9de914d02f226968f598ac76f105bcf',
'upload_date': '20190925', 'upload_date': '20180604',
'uploader_id': 'abc1', 'uploader_id': 'abc4kids',
'timestamp': 1569445289, 'timestamp': 1528140219,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -148,7 +148,7 @@ def tokenize_url(url, token):
'hdnea': token, 'hdnea': token,
}) })
for sd in ('720', 'sd', 'sd-low'): for sd in ('sd', 'sd-low'):
sd_url = try_get( sd_url = try_get(
stream, lambda x: x['streams']['hls'][sd], compat_str) stream, lambda x: x['streams']['hls'][sd], compat_str)
if not sd_url: if not sd_url:

View file

@ -5,7 +5,6 @@
from ..utils import ( from ..utils import (
clean_html, clean_html,
int_or_none, int_or_none,
js_to_json,
try_get, try_get,
unified_strdate, unified_strdate,
) )
@ -14,21 +13,22 @@
class AmericasTestKitchenIE(InfoExtractor): class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers', 'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party',
'md5': 'b861c3e365ac38ad319cfd509c30577f', 'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': { 'info_dict': {
'id': '5b400b9ee338f922cb06450c', 'id': '1_5g5zua6e',
'title': 'Weeknight Japanese Suppers', 'title': 'Summer Dinner Party',
'ext': 'mp4', 'ext': 'mp4',
'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8', 'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec',
'thumbnail': r're:^https?://', 'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1523664000, 'timestamp': 1497285541,
'upload_date': '20180414', 'upload_date': '20170612',
'release_date': '20180414', 'uploader_id': 'roger.metcalf@americastestkitchen.com',
'release_date': '20170617',
'series': "America's Test Kitchen", 'series': "America's Test Kitchen",
'season_number': 18, 'season_number': 17,
'episode': 'Weeknight Japanese Suppers', 'episode': 'Summer Dinner Party',
'episode_number': 15, 'episode_number': 24,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -47,7 +47,7 @@ def _real_extract(self, url):
self._search_regex( self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>', r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'), webpage, 'initial context'),
video_id, js_to_json) video_id)
ep_data = try_get( ep_data = try_get(
video_data, video_data,
@ -55,7 +55,17 @@ def _real_extract(self, url):
lambda x: x['videoDetail']['content']['data']), dict) lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {}) ep_meta = ep_data.get('full_video', {})
zype_id = ep_data.get('zype_id') or ep_meta['zype_id'] zype_id = ep_meta.get('zype_id')
if zype_id:
embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
ie_key = 'Zype'
else:
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
external_id = ep_data.get('external_id') or ep_meta['external_id']
embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
ie_key = 'Kaltura'
title = ep_data.get('title') or ep_meta.get('title') title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get( description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@ -69,8 +79,8 @@ def _real_extract(self, url):
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id, 'url': embed_url,
'ie_key': 'Zype', 'ie_key': ie_key,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,

View file

@ -1,7 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -23,101 +22,7 @@
from ..compat import compat_etree_fromstring from ..compat import compat_etree_fromstring
class ARDMediathekBaseIE(InfoExtractor): class ARDMediathekIE(InfoExtractor):
_GEO_COUNTRIES = ['DE']
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
return self._parse_media_info(media_info, video_id, '"fsk"' in webpage)
def _parse_media_info(self, media_info, video_id, fsk):
formats = self._extract_formats(media_info, video_id)
if not formats:
if fsk:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
self.raise_geo_restricted(
'This video is not available due to geoblocking',
countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'ttml',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': int_or_none(media_info.get('_duration')),
'thumbnail': media_info.get('_previewImage'),
'is_live': media_info.get('_isLive') is True,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
if not url_or_none(stream_url):
continue
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}), video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
else:
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
m = re.search(
r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$',
stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
class ARDMediathekIE(ARDMediathekBaseIE):
IE_NAME = 'ARD:mediathek' IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?' _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
@ -158,6 +63,94 @@ class ARDMediathekIE(ARDMediathekBaseIE):
def suitable(cls, url): def suitable(cls, url):
return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url) return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
formats = self._extract_formats(media_info, video_id)
if not formats:
if '"fsk"' in webpage:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
raise ExtractorError('This video is not available due to geo restriction', expected=True)
self._sort_formats(formats)
duration = int_or_none(media_info.get('_duration'))
thumbnail = media_info.get('_previewImage')
is_live = media_info.get('_isLive') is True
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'ttml',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': duration,
'thumbnail': thumbnail,
'is_live': is_live,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
if not url_or_none(stream_url):
continue
ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'):
continue
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(stream_url, {
'hdcore': '3.1.1',
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
else:
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
def _real_extract(self, url): def _real_extract(self, url):
# determine video id from url # determine video id from url
m = re.match(self._VALID_URL, url) m = re.match(self._VALID_URL, url)
@ -249,7 +242,7 @@ def _real_extract(self, url):
class ARDIE(InfoExtractor): class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html' _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TESTS = [{ _TESTS = [{
# available till 14.02.2019 # available till 14.02.2019
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html', 'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
@ -263,9 +256,6 @@ class ARDIE(InfoExtractor):
'upload_date': '20180214', 'upload_date': '20180214',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, {
'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
'only_matching': True,
}, { }, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html', 'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True, 'only_matching': True,
@ -312,31 +302,21 @@ def _real_extract(self, url):
} }
class ARDBetaMediathekIE(ARDMediathekBaseIE): class ARDBetaMediathekIE(InfoExtractor):
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)' _VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
_TESTS = [{ _TESTS = [{
'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE', 'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f', 'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
'info_dict': { 'info_dict': {
'display_id': 'die-robuste-roswita', 'display_id': 'die-robuste-roswita',
'id': '70153354', 'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'title': 'Die robuste Roswita', 'title': 'Tatort: Die robuste Roswita',
'description': r're:^Der Mord.*trüber ist als die Ilm.', 'description': r're:^Der Mord.*trüber ist als die Ilm.',
'duration': 5316, 'duration': 5316,
'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard', 'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard',
'timestamp': 1577047500, 'upload_date': '20180826',
'upload_date': '20191222',
'ext': 'mp4', 'ext': 'mp4',
}, },
}, {
'url': 'https://beta.ardmediathek.de/ard/video/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'only_matching': True,
}, {
'url': 'https://ardmediathek.de/ard/video/saartalk/saartalk-gesellschaftsgift-haltung-gegen-hass/sr-fernsehen/Y3JpZDovL3NyLW9ubGluZS5kZS9TVF84MTY4MA/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/video/trailer/private-eyes-s01-e01/one/Y3JpZDovL3dkci5kZS9CZWl0cmFnLTE1MTgwYzczLWNiMTEtNGNkMS1iMjUyLTg5MGYzOWQxZmQ1YQ/',
'only_matching': True,
}, { }, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/', 'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
'only_matching': True, 'only_matching': True,
@ -348,75 +328,73 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id') video_id = mobj.group('video_id')
display_id = mobj.group('display_id') display_id = mobj.group('display_id') or video_id
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
player_page = self._download_json( webpage = self._download_webpage(url, display_id)
'https://api.ardmediathek.de/public-gateway', data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
display_id, data=json.dumps({ data = self._parse_json(data_json, display_id)
'query': '''{
playerPage(client:"%s", clipId: "%s") { res = {
blockedByFsk 'id': video_id,
broadcastedOn
maturityContentRating
mediaCollection {
_duration
_geoblocked
_isLive
_mediaArray {
_mediaStreamArray {
_quality
_server
_stream
}
}
_previewImage
_subtitleUrl
_type
}
show {
title
}
synopsis
title
tracking {
atiCustomVars {
contentId
}
}
}
}''' % (mobj.group('client'), video_id),
}).encode(), headers={
'Content-Type': 'application/json'
})['data']['playerPage']
title = player_page['title']
content_id = str_or_none(try_get(
player_page, lambda x: x['tracking']['atiCustomVars']['contentId']))
media_collection = player_page.get('mediaCollection') or {}
if not media_collection and content_id:
media_collection = self._download_json(
'https://www.ardmediathek.de/play/media/' + content_id,
content_id, fatal=False) or {}
info = self._parse_media_info(
media_collection, content_id or video_id,
player_page.get('blockedByFsk'))
age_limit = None
description = player_page.get('synopsis')
maturity_content_rating = player_page.get('maturityContentRating')
if maturity_content_rating:
age_limit = int_or_none(maturity_content_rating.lstrip('FSK'))
if not age_limit and description:
age_limit = int_or_none(self._search_regex(
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
info.update({
'age_limit': age_limit,
'display_id': display_id, 'display_id': display_id,
'title': title, }
'description': description, formats = []
'timestamp': unified_timestamp(player_page.get('broadcastedOn')), subtitles = {}
'series': try_get(player_page, lambda x: x['show']['title']), geoblocked = False
for widget in data.values():
if widget.get('_geoblocked') is True:
geoblocked = True
if '_duration' in widget:
res['duration'] = int_or_none(widget['_duration'])
if 'clipTitle' in widget:
res['title'] = widget['clipTitle']
if '_previewImage' in widget:
res['thumbnail'] = widget['_previewImage']
if 'broadcastedOn' in widget:
res['timestamp'] = unified_timestamp(widget['broadcastedOn'])
if 'synopsis' in widget:
res['description'] = widget['synopsis']
subtitle_url = url_or_none(widget.get('_subtitleUrl'))
if subtitle_url:
subtitles.setdefault('de', []).append({
'ext': 'ttml',
'url': subtitle_url,
}) })
return info if '_quality' in widget:
format_url = url_or_none(try_get(
widget, lambda x: x['_stream']['json'][0]))
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
format_url + '?hdcore=3.11.0',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id='hls',
fatal=False))
else:
# HTTP formats are not available when geoblocked is True,
# other formats are fine though
if geoblocked:
continue
quality = str_or_none(widget.get('_quality'))
formats.append({
'format_id': ('http-' + quality) if quality else 'http',
'url': format_url,
'preference': 10, # Plain HTTP, that's nice
})
if not formats and geoblocked:
self.raise_geo_restricted(
msg='This video is not available due to geoblocking',
countries=['DE'])
self._sort_formats(formats)
res.update({
'subtitles': subtitles,
'formats': formats,
})
return res

View file

@ -47,19 +47,39 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1', 'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True 'only_matching': True
}] }]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
_PARTNER_ID = '1719221' _PARTNER_ID = '1719221'
def _real_extract(self, url): def _real_extract(self, url):
host, display_id, article_id, entry_id = re.match(self._VALID_URL, url).groups() mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
entry_id = mobj.group('kaltura_id')
if not entry_id: if not entry_id:
entry_id = self._download_json( api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
self._API_TEMPL % (host, host.split('.')[0]), display_id, query={ payload = {
'variables': json.dumps({ 'query': '''query VideoContext($articleId: ID!) {
'contextId': 'NewsArticle:' + article_id, article: node(id: $articleId) {
}), ... on Article {
})['data']['context']['mainAsset']['video']['kaltura']['kalturaId'] mainAssetRelation {
asset {
... on VideoAsset {
kalturaId
}
}
}
}
}
}''',
'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
}
json_data = self._download_json(
api_url, video_id, headers={
'Content-Type': 'application/json',
},
data=json.dumps(payload).encode())
entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
return self.url_result( return self.url_result(
'kaltura:%s:%s' % (self._PARTNER_ID, entry_id), 'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),

View file

@ -25,8 +25,8 @@ class BellMediaIE(InfoExtractor):
etalk| etalk|
marilyn marilyn
)\.ca| )\.ca|
(?:much|cp24)\.com much\.com
)/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})''' )/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070', 'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '36d3ef559cfe8af8efe15922cd3ce950', 'md5': '36d3ef559cfe8af8efe15922cd3ce950',
@ -62,9 +62,6 @@ class BellMediaIE(InfoExtractor):
}, { }, {
'url': 'http://www.etalk.ca/video?videoid=663455', 'url': 'http://www.etalk.ca/video?videoid=663455',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.cp24.com/video?clipId=1982548',
'only_matching': True,
}] }]
_DOMAINS = { _DOMAINS = {
'thecomedynetwork': 'comedy', 'thecomedynetwork': 'comedy',

View file

@ -24,18 +24,7 @@
class BiliBiliIE(InfoExtractor): class BiliBiliIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)'
https?://
(?:(?:www|bangumi)\.)?
bilibili\.(?:tv|com)/
(?:
(?:
video/[aA][vV]|
anime/(?P<anime_id>\d+)/play\#
)(?P<id_bv>\d+)|
video/[bB][vV](?P<id>[^/?#&]+)
)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',
@ -103,10 +92,6 @@ class BiliBiliIE(InfoExtractor):
'skip_download': True, # Test metadata only 'skip_download': True, # Test metadata only
}, },
}] }]
}, {
# new BV video id format
'url': 'https://www.bilibili.com/video/BV1JE411F741',
'only_matching': True,
}] }]
_APP_KEY = 'iVGUTjsxvpLeuDCf' _APP_KEY = 'iVGUTjsxvpLeuDCf'
@ -124,7 +109,7 @@ def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') or mobj.group('id_bv') video_id = mobj.group('id')
anime_id = mobj.group('anime_id') anime_id = mobj.group('anime_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@ -434,17 +419,3 @@ def _real_extract(self, url):
entries, am_id, album_title, album_data.get('intro')) entries, am_id, album_title, album_data.get('intro'))
return self.playlist_result(entries, am_id) return self.playlist_result(entries, am_id)
class BiliBiliPlayerIE(InfoExtractor):
_VALID_URL = r'https?://player\.bilibili\.com/player\.html\?.*?\baid=(?P<id>\d+)'
_TEST = {
'url': 'http://player.bilibili.com/player.html?aid=92494333&cid=157926707&page=1',
'only_matching': True,
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://www.bilibili.tv/video/av%s/' % video_id,
ie=BiliBiliIE.ie_key(), video_id=video_id)

View file

@ -5,34 +5,32 @@
import re import re
import struct import struct
from .adobepass import AdobePassIE
from .common import InfoExtractor from .common import InfoExtractor
from .adobepass import AdobePassIE
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_HTTPError,
compat_parse_qs, compat_parse_qs,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
compat_xml_parse_error, compat_xml_parse_error,
compat_HTTPError,
) )
from ..utils import ( from ..utils import (
clean_html,
extract_attributes,
ExtractorError, ExtractorError,
extract_attributes,
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
float_or_none, float_or_none,
int_or_none,
js_to_json, js_to_json,
mimetype2ext, int_or_none,
parse_iso8601, parse_iso8601,
smuggle_url, smuggle_url,
str_or_none,
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
UnsupportedError,
update_url_query, update_url_query,
url_or_none, clean_html,
mimetype2ext,
UnsupportedError,
) )
@ -426,7 +424,7 @@ def _extract_urls(ie, webpage):
# [2] looks like: # [2] looks like:
for video, script_tag, account_id, player_id, embed in re.findall( for video, script_tag, account_id, player_id, embed in re.findall(
r'''(?isx) r'''(?isx)
(<video(?:-js)?\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>) (<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
(?:.*? (?:.*?
(<script[^>]+ (<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/ src=["\'](?:https?:)?//players\.brightcove\.net/
@ -555,15 +553,9 @@ def build_format_id(kind):
subtitles = {} subtitles = {}
for text_track in json_data.get('text_tracks', []): for text_track in json_data.get('text_tracks', []):
if text_track.get('kind') != 'captions': if text_track.get('src'):
continue subtitles.setdefault(text_track.get('srclang'), []).append({
text_track_url = url_or_none(text_track.get('src')) 'url': text_track['src'],
if not text_track_url:
continue
lang = (str_or_none(text_track.get('srclang'))
or str_or_none(text_track.get('label')) or 'en').lower()
subtitles.setdefault(lang, []).append({
'url': text_track_url,
}) })
is_live = False is_live = False
@ -594,12 +586,6 @@ def _real_extract(self, url):
account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups() account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
policy_key_id = '%s_%s' % (account_id, player_id)
policy_key = self._downloader.cache.load('brightcove', policy_key_id)
policy_key_extracted = False
store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
def extract_policy_key():
webpage = self._download_webpage( webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js' 'http://players.brightcove.net/%s/%s_%s/index.min.js'
% (account_id, player_id, embed), video_id) % (account_id, player_id, embed), video_id)
@ -619,36 +605,24 @@ def extract_policy_key():
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1', r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk') webpage, 'policy key', group='pk')
store_pk(policy_key)
return policy_key
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id) api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
headers = {} headers = {
'Accept': 'application/json;pk=%s' % policy_key,
}
referrer = smuggled_data.get('referrer') referrer = smuggled_data.get('referrer')
if referrer: if referrer:
headers.update({ headers.update({
'Referer': referrer, 'Referer': referrer,
'Origin': re.search(r'https?://[^/]+', referrer).group(0), 'Origin': re.search(r'https?://[^/]+', referrer).group(0),
}) })
for _ in range(2):
if not policy_key:
policy_key = extract_policy_key()
policy_key_extracted = True
headers['Accept'] = 'application/json;pk=%s' % policy_key
try: try:
json_data = self._download_json(api_url, video_id, headers=headers) json_data = self._download_json(api_url, video_id, headers=headers)
break
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403): if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)[0] json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
message = json_data.get('message') or json_data['error_code'] message = json_data.get('message') or json_data['error_code']
if json_data.get('error_subcode') == 'CLIENT_GEO': if json_data.get('error_subcode') == 'CLIENT_GEO':
self.raise_geo_restricted(msg=message) self.raise_geo_restricted(msg=message)
elif json_data.get('error_code') == 'INVALID_POLICY_KEY' and not policy_key_extracted:
policy_key = None
store_pk(None)
continue
raise ExtractorError(message, expected=True) raise ExtractorError(message, expected=True)
raise raise

View file

@ -9,26 +9,21 @@ class BusinessInsiderIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6', 'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
'md5': 'ffed3e1e12a6f950aa2f7d83851b497a', 'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
'info_dict': { 'info_dict': {
'id': 'cjGDb0X9', 'id': 'hZRllCfw',
'ext': 'mp4', 'ext': 'mp4',
'title': "Bananas give you more radiation exposure than living next to a nuclear power plant", 'title': "Here's how much radiation you're exposed to in everyday life",
'description': 'md5:0175a3baf200dd8fa658f94cade841b3', 'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
'upload_date': '20160611', 'upload_date': '20170709',
'timestamp': 1465675620, 'timestamp': 1499606400,
},
'params': {
'skip_download': True,
}, },
}, { }, {
'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/', 'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
'md5': '43f438dbc6da0b89f5ac42f68529d84a', 'only_matching': True,
'info_dict': {
'id': '5zJwd4FK',
'ext': 'mp4',
'title': 'Deze dingen zorgen ervoor dat je minder snel een date scoort',
'description': 'md5:2af8975825d38a4fed24717bbe51db49',
'upload_date': '20170705',
'timestamp': 1499270528,
},
}, { }, {
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T', 'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'only_matching': True, 'only_matching': True,
@ -40,8 +35,7 @@ def _real_extract(self, url):
jwplatform_id = self._search_regex( jwplatform_id = self._search_regex(
(r'data-media-id=["\']([a-zA-Z0-9]{8})', (r'data-media-id=["\']([a-zA-Z0-9]{8})',
r'id=["\']jwplayer_([a-zA-Z0-9]{8})', r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})', r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
r'(?:jwplatform\.com/players/|jwplayer_)([a-zA-Z0-9]{8})'),
webpage, 'jwplatform id') webpage, 'jwplatform id')
return self.url_result( return self.url_result(
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(), 'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),

View file

@ -13,8 +13,6 @@
int_or_none, int_or_none,
merge_dicts, merge_dicts,
parse_iso8601, parse_iso8601,
str_or_none,
url_or_none,
) )
@ -22,15 +20,15 @@ class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475', 'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '68993eda72ef62386a15ea2cf3c93107', 'md5': '90139b746a0a9bd7bb631283f6e2a64e',
'info_dict': { 'info_dict': {
'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475', 'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475', 'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'ext': 'mp4', 'ext': 'flv',
'title': 'Nachtwacht: De Greystook', 'title': 'Nachtwacht: De Greystook',
'description': 'Nachtwacht: De Greystook', 'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1468.04, 'duration': 1468.03,
}, },
'expected_warnings': ['is not a supported codec', 'Unknown MIME type'], 'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
}, { }, {
@ -41,45 +39,23 @@ class CanvasIE(InfoExtractor):
'HLS': 'm3u8_native', 'HLS': 'm3u8_native',
'HLS_AES': 'm3u8', 'HLS_AES': 'm3u8',
} }
_REST_API_BASE = 'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v1'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id') site_id, video_id = mobj.group('site_id'), mobj.group('id')
# Old API endpoint, serves more formats but may fail for some videos
data = self._download_json( data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s' 'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id, 'Downloading asset JSON', % (site_id, video_id), video_id)
'Unable to download asset JSON', fatal=False)
# New API endpoint
if not data:
token = self._download_json(
'%s/tokens' % self._REST_API_BASE, video_id,
'Downloading token', data=b'',
headers={'Content-Type': 'application/json'})['vrtPlayerToken']
data = self._download_json(
'%s/videos/%s' % (self._REST_API_BASE, video_id),
video_id, 'Downloading video JSON', fatal=False, query={
'vrtPlayerToken': token,
'client': '%s@PROD' % site_id,
}, expected_status=400)
message = data.get('message')
if message and not data.get('title'):
if data.get('code') == 'AUTHENTICATION_REQUIRED':
self.raise_login_required(message)
raise ExtractorError(message, expected=True)
title = data['title'] title = data['title']
description = data.get('description') description = data.get('description')
formats = [] formats = []
for target in data['targetUrls']: for target in data['targetUrls']:
format_url, format_type = url_or_none(target.get('url')), str_or_none(target.get('type')) format_url, format_type = target.get('url'), target.get('type')
if not format_url or not format_type: if not format_url or not format_type:
continue continue
format_type = format_type.upper()
if format_type in self._HLS_ENTRY_PROTOCOLS_MAP: if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type], format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
@ -158,20 +134,20 @@ class CanvasEenIE(InfoExtractor):
}, },
'skip': 'Pagina niet gevonden', 'skip': 'Pagina niet gevonden',
}, { }, {
'url': 'https://www.een.be/thuis/emma-pakt-thilly-aan', 'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'info_dict': { 'info_dict': {
'id': 'md-ast-3a24ced2-64d7-44fb-b4ed-ed1aafbf90b8', 'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
'display_id': 'emma-pakt-thilly-aan', 'display_id': 'herbekijk-sorry-voor-alles',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Emma pakt Thilly aan', 'title': 'Herbekijk Sorry voor alles',
'description': 'md5:c5c9b572388a99b2690030afa3f3bad7', 'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 118.24, 'duration': 3788.06,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['is not a supported codec'], 'skip': 'Episode no longer available',
}, { }, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend', 'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True, 'only_matching': True,
@ -207,44 +183,19 @@ class VrtNUIE(GigyaBaseIE):
IE_DESC = 'VrtNU.be' IE_DESC = 'VrtNU.be'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
# Available via old API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/', 'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
'info_dict': { 'info_dict': {
'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de', 'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'ext': 'mp4', 'ext': 'flv',
'title': 'De zwarte weduwe', 'title': 'De zwarte weduwe',
'description': 'md5:db1227b0f318c849ba5eab1fef895ee4', 'description': 'md5:d90c21dced7db869a85db89a623998d4',
'duration': 1457.04, 'duration': 1457.04,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'season': 'Season 1', 'season': '1',
'season_number': 1, 'season_number': 1,
'episode_number': 1, 'episode_number': 1,
}, },
'skip': 'This video is only available for registered users', 'skip': 'This video is only available for registered users'
'params': {
'username': '<snip>',
'password': '<snip>',
},
'expected_warnings': ['is not a supported codec'],
}, {
# Only available via new API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/kamp-waes/1/kamp-waes-s1a5/',
'info_dict': {
'id': 'pbs-pub-0763b56c-64fb-4d38-b95b-af60bf433c71$vid-ad36a73c-4735-4f1f-b2c0-a38e6e6aa7e1',
'ext': 'mp4',
'title': 'Aflevering 5',
'description': 'Wie valt door de mand tijdens een missie?',
'duration': 2967.06,
'season': 'Season 1',
'season_number': 1,
'episode_number': 5,
},
'skip': 'This video is only available for registered users',
'params': {
'username': '<snip>',
'password': '<snip>',
},
'expected_warnings': ['Unable to download asset JSON', 'is not a supported codec', 'Unknown MIME type'],
}] }]
_NETRC_MACHINE = 'vrtnu' _NETRC_MACHINE = 'vrtnu'
_APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy' _APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'

View file

@ -1,10 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import hashlib
import json import json
import re import re
from xml.sax.saxutils import escape
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
@ -218,29 +216,6 @@ class CBCWatchBaseIE(InfoExtractor):
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/', 'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
} }
_GEO_COUNTRIES = ['CA'] _GEO_COUNTRIES = ['CA']
_LOGIN_URL = 'https://api.loginradius.com/identity/v2/auth/login'
_TOKEN_URL = 'https://cloud-api.loginradius.com/sso/jwt/api/token'
_API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
_NETRC_MACHINE = 'cbcwatch'
def _signature(self, email, password):
data = json.dumps({
'email': email,
'password': password,
}).encode()
headers = {'content-type': 'application/json'}
query = {'apikey': self._API_KEY}
resp = self._download_json(self._LOGIN_URL, None, data=data, headers=headers, query=query)
access_token = resp['access_token']
# token
query = {
'access_token': access_token,
'apikey': self._API_KEY,
'jwtapp': 'jwt',
}
resp = self._download_json(self._TOKEN_URL, None, headers=headers, query=query)
return resp['signature']
def _call_api(self, path, video_id): def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path url = path if path.startswith('http') else self._API_BASE_URL + path
@ -264,8 +239,7 @@ def _call_api(self, path, video_id):
def _real_initialize(self): def _real_initialize(self):
if self._valid_device_token(): if self._valid_device_token():
return return
device = self._downloader.cache.load( device = self._downloader.cache.load('cbcwatch', 'device') or {}
'cbcwatch', self._cache_device_key()) or {}
self._device_id, self._device_token = device.get('id'), device.get('token') self._device_id, self._device_token = device.get('id'), device.get('token')
if self._valid_device_token(): if self._valid_device_token():
return return
@ -274,30 +248,16 @@ def _real_initialize(self):
def _valid_device_token(self): def _valid_device_token(self):
return self._device_id and self._device_token return self._device_id and self._device_token
def _cache_device_key(self):
email, _ = self._get_login_info()
return '%s_device' % hashlib.sha256(email.encode()).hexdigest() if email else 'device'
def _register_device(self): def _register_device(self):
self._device_id = self._device_token = None
result = self._download_xml( result = self._download_xml(
self._API_BASE_URL + 'device/register', self._API_BASE_URL + 'device/register',
None, 'Acquiring device token', None, 'Acquiring device token',
data=b'<device><type>web</type></device>') data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True) self._device_id = xpath_text(result, 'deviceId', fatal=True)
email, password = self._get_login_info()
if email and password:
signature = self._signature(email, password)
data = '<login><token>{0}</token><device><deviceId>{1}</deviceId><type>web</type></device></login>'.format(
escape(signature), escape(self._device_id)).encode()
url = self._API_BASE_URL + 'device/login'
result = self._download_xml(
url, None, data=data,
headers={'content-type': 'application/xml'})
self._device_token = xpath_text(result, 'token', fatal=True)
else:
self._device_token = xpath_text(result, 'deviceToken', fatal=True) self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store( self._downloader.cache.store(
'cbcwatch', self._cache_device_key(), { 'cbcwatch', 'device', {
'id': self._device_id, 'id': self._device_id,
'token': self._device_token, 'token': self._device_token,
}) })

View file

@ -1,24 +1,20 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import re import re
from .common import InfoExtractor from .common import InfoExtractor
class CloudflareStreamIE(InfoExtractor): class CloudflareStreamIE(InfoExtractor):
_DOMAIN_RE = r'(?:cloudflarestream\.com|(?:videodelivery|bytehighway)\.net)'
_EMBED_RE = r'embed\.%s/embed/[^/]+\.js\?.*?\bvideo=' % _DOMAIN_RE
_ID_RE = r'[\da-f]{32}|[\w-]+\.[\w-]+\.[\w-]+'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:watch\.)?%s/| (?:watch\.)?(?:cloudflarestream\.com|videodelivery\.net)/|
%s embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=
) )
(?P<id>%s) (?P<id>[\da-f]+)
''' % (_DOMAIN_RE, _EMBED_RE, _ID_RE) '''
_TESTS = [{ _TESTS = [{
'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717', 'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
'info_dict': { 'info_dict': {
@ -45,28 +41,23 @@ def _extract_urls(webpage):
return [ return [
mobj.group('url') mobj.group('url')
for mobj in re.finditer( for mobj in re.finditer(
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//%s(?:%s).*?)\1' % (CloudflareStreamIE._EMBED_RE, CloudflareStreamIE._ID_RE), r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
domain = 'bytehighway.net' if 'bytehighway.net/' in url else 'videodelivery.net'
base_url = 'https://%s/%s/' % (domain, video_id)
if '.' in video_id:
video_id = self._parse_json(base64.urlsafe_b64decode(
video_id.split('.')[1]), video_id)['sub']
manifest_base_url = base_url + 'manifest/video.'
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
manifest_base_url + 'm3u8', video_id, 'mp4', 'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
'm3u8_native', m3u8_id='hls', fatal=False) video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False)
formats.extend(self._extract_mpd_formats( formats.extend(self._extract_mpd_formats(
manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False)) 'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': video_id, 'title': video_id,
'thumbnail': base_url + 'thumbnails/thumbnail.jpg',
'formats': formats, 'formats': formats,
} }

View file

@ -15,7 +15,7 @@
import math import math
from ..compat import ( from ..compat import (
compat_cookiejar_Cookie, compat_cookiejar,
compat_cookies, compat_cookies,
compat_etree_Element, compat_etree_Element,
compat_etree_fromstring, compat_etree_fromstring,
@ -1182,33 +1182,16 @@ def _twitter_search_player(self, html):
'twitter card player') 'twitter card player')
def _search_json_ld(self, html, video_id, expected_type=None, **kwargs): def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld_list = list(re.finditer(JSON_LD_RE, html)) json_ld = self._search_regex(
JSON_LD_RE, html, 'JSON-LD', group='json_ld', **kwargs)
default = kwargs.get('default', NO_DEFAULT) default = kwargs.get('default', NO_DEFAULT)
if not json_ld:
return default if default is not NO_DEFAULT else {}
# JSON-LD may be malformed and thus `fatal` should be respected. # JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False` # At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well. # for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
json_ld = [] return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
for mobj in json_ld_list:
json_ld_item = self._parse_json(
mobj.group('json_ld'), video_id, fatal=fatal)
if not json_ld_item:
continue
if isinstance(json_ld_item, dict):
json_ld.append(json_ld_item)
elif isinstance(json_ld_item, (list, tuple)):
json_ld.extend(json_ld_item)
if json_ld:
json_ld = self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
if json_ld:
return json_ld
if default is not NO_DEFAULT:
return default
elif fatal:
raise RegexNotFoundError('Unable to extract JSON-LD')
else:
self._downloader.report_warning('unable to extract JSON-LD %s' % bug_reports_message())
return {}
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None): def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str): if isinstance(json_ld, compat_str):
@ -1273,10 +1256,10 @@ def extract_video_object(e):
extract_interaction_statistic(e) extract_interaction_statistic(e)
for e in json_ld: for e in json_ld:
if '@context' in e: if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):
item_type = e.get('@type') item_type = e.get('@type')
if expected_type is not None and expected_type != item_type: if expected_type is not None and expected_type != item_type:
continue return info
if item_type in ('TVEpisode', 'Episode'): if item_type in ('TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name')) episode_name = unescapeHTML(e.get('name'))
info.update({ info.update({
@ -1310,16 +1293,10 @@ def extract_video_object(e):
}) })
elif item_type == 'VideoObject': elif item_type == 'VideoObject':
extract_video_object(e) extract_video_object(e)
if expected_type is None:
continue continue
else:
break
video = e.get('video') video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject': if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video) extract_video_object(video)
if expected_type is None:
continue
else:
break break
return dict((k, v) for k, v in info.items() if v is not None) return dict((k, v) for k, v in info.items() if v is not None)
@ -2363,8 +2340,6 @@ def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnot
if res is False: if res is False:
return [] return []
ism_doc, urlh = res ism_doc, urlh = res
if ism_doc is None:
return []
return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id) return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)
@ -2843,7 +2818,7 @@ def _float(self, v, name, fatal=False, **kwargs):
def _set_cookie(self, domain, name, value, expire_time=None, port=None, def _set_cookie(self, domain, name, value, expire_time=None, port=None,
path='/', secure=False, discard=False, rest={}, **kwargs): path='/', secure=False, discard=False, rest={}, **kwargs):
cookie = compat_cookiejar_Cookie( cookie = compat_cookiejar.Cookie(
0, name, value, port, port is not None, domain, True, 0, name, value, port, port is not None, domain, True,
domain.startswith('.'), path, True, secure, expire_time, domain.startswith('.'), path, True, secure, expire_time,
discard, None, None, rest) discard, None, None, rest)

View file

@ -13,7 +13,6 @@
compat_b64decode, compat_b64decode,
compat_etree_Element, compat_etree_Element,
compat_etree_fromstring, compat_etree_fromstring,
compat_str,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
@ -26,9 +25,9 @@
intlist_to_bytes, intlist_to_bytes,
int_or_none, int_or_none,
lowercase_escape, lowercase_escape,
merge_dicts,
remove_end, remove_end,
sanitized_Request, sanitized_Request,
unified_strdate,
urlencode_postdata, urlencode_postdata,
xpath_text, xpath_text,
) )
@ -137,7 +136,6 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
# rtmp # rtmp
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Video gone',
}, { }, {
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1', 'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
'info_dict': { 'info_dict': {
@ -159,12 +157,11 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': { 'info_dict': {
'id': '702409', 'id': '702409',
'ext': 'mp4', 'ext': 'mp4',
'title': compat_str, 'title': 'Re:ZERO -Starting Life in Another World- Episode 5 The Morning of Our Promise Is Still Distant',
'description': compat_str, 'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Re:Zero Partners', 'uploader': 'TV TOKYO',
'timestamp': 1462098900, 'upload_date': '20160508',
'upload_date': '20160501',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -175,13 +172,12 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': { 'info_dict': {
'id': '727589', 'id': '727589',
'ext': 'mp4', 'ext': 'mp4',
'title': compat_str, 'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 Give Me Deliverance From This Judicial Injustice!",
'description': compat_str, 'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Kadokawa Pictures Inc.', 'uploader': 'Kadokawa Pictures Inc.',
'timestamp': 1484130900, 'upload_date': '20170118',
'upload_date': '20170111', 'series': "KONOSUBA -God's blessing on this wonderful world!",
'series': compat_str,
'season': "KONOSUBA -God's blessing on this wonderful world! 2", 'season': "KONOSUBA -God's blessing on this wonderful world! 2",
'season_number': 2, 'season_number': 2,
'episode': 'Give Me Deliverance From This Judicial Injustice!', 'episode': 'Give Me Deliverance From This Judicial Injustice!',
@ -204,11 +200,10 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': { 'info_dict': {
'id': '535080', 'id': '535080',
'ext': 'mp4', 'ext': 'mp4',
'title': compat_str, 'title': '11eyes Episode 1 Red Night ~ Piros éjszaka',
'description': compat_str, 'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
'uploader': 'Marvelous AQL Inc.', 'uploader': 'Marvelous AQL Inc.',
'timestamp': 1255512600, 'upload_date': '20091021',
'upload_date': '20091014',
}, },
'params': { 'params': {
# Just test metadata extraction # Just test metadata extraction
@ -229,17 +224,15 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
# just test metadata extraction # just test metadata extraction
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Video gone',
}, { }, {
# A video with a vastly different season name compared to the series name # A video with a vastly different season name compared to the series name
'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532', 'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
'info_dict': { 'info_dict': {
'id': '590532', 'id': '590532',
'ext': 'mp4', 'ext': 'mp4',
'title': compat_str, 'title': 'Haiyoru! Nyaruani (ONA) Episode 1 Test',
'description': compat_str, 'description': 'Mahiro and Nyaruko talk about official certification.',
'uploader': 'TV TOKYO', 'uploader': 'TV TOKYO',
'timestamp': 1330956000,
'upload_date': '20120305', 'upload_date': '20120305',
'series': 'Nyarko-san: Another Crawling Chaos', 'series': 'Nyarko-san: Another Crawling Chaos',
'season': 'Haiyoru! Nyaruani (ONA)', 'season': 'Haiyoru! Nyaruani (ONA)',
@ -449,21 +442,23 @@ def _real_extract(self, url):
webpage, 'language', default=None, group='lang') webpage, 'language', default=None, group='lang')
video_title = self._html_search_regex( video_title = self._html_search_regex(
(r'(?s)<h1[^>]*>((?:(?!<h1).)*?<(?:span[^>]+itemprop=["\']title["\']|meta[^>]+itemprop=["\']position["\'])[^>]*>(?:(?!<h1).)+?)</h1>', r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
r'<title>(.+?),\s+-\s+.+? Crunchyroll'), webpage, 'video_title')
webpage, 'video_title', default=None)
if not video_title:
video_title = re.sub(r'^Watch\s+', '', self._og_search_description(webpage))
video_title = re.sub(r' {2,}', ' ', video_title) video_title = re.sub(r' {2,}', ' ', video_title)
video_description = (self._parse_json(self._html_search_regex( video_description = (self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id, r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id) or media_metadata).get('description') webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
if video_description: if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n')) video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_upload_date = self._html_search_regex(
[r'<div>Availability for free users:(.+?)</div>', r'<div>[^<>]+<span>\s*(.+?\d{4})\s*</span></div>'],
webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
if video_upload_date:
video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
# try looking for both an uploader that's a link and one that's not # try looking for both an uploader that's a link and one that's not
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'], [r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
webpage, 'video_uploader', default=False) webpage, 'video_uploader', fatal=False)
formats = [] formats = []
for stream in media.get('streams', []): for stream in media.get('streams', []):
@ -616,15 +611,14 @@ def _real_extract(self, url):
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)', r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
webpage, 'season number', default=None)) webpage, 'season number', default=None))
info = self._search_json_ld(webpage, video_id, default={}) return {
return merge_dicts({
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
'description': video_description, 'description': video_description,
'duration': duration, 'duration': duration,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': video_upload_date,
'series': series, 'series': series,
'season': season, 'season': season,
'season_number': season_number, 'season_number': season_number,
@ -632,7 +626,7 @@ def _real_extract(self, url):
'episode_number': episode_number, 'episode_number': episode_number,
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
}, info) }
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE): class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):

View file

@ -32,7 +32,7 @@ def _get_dailymotion_cookies(self):
@staticmethod @staticmethod
def _get_cookie_value(cookies, name): def _get_cookie_value(cookies, name):
cookie = cookies.get(name) cookie = cookies.get('name')
if cookie: if cookie:
return cookie.value return cookie.value

View file

@ -16,11 +16,10 @@ class DctpTvIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
# 4x3 # 4x3
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/', 'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'md5': '3ffbd1556c3fe210724d7088fad723e3',
'info_dict': { 'info_dict': {
'id': '95eaa4f33dad413aa17b4ee613cccc6c', 'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade', 'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'm4v', 'ext': 'flv',
'title': 'Videoinstallation für eine Kaufhausfassade', 'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm', 'description': 'Kurzfilm',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
@ -28,6 +27,10 @@ class DctpTvIE(InfoExtractor):
'timestamp': 1302172322, 'timestamp': 1302172322,
'upload_date': '20110407', 'upload_date': '20110407',
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, { }, {
# 16x9 # 16x9
'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/', 'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
@ -56,26 +59,33 @@ def _real_extract(self, url):
uuid = media['uuid'] uuid = media['uuid']
title = media['title'] title = media['title']
is_wide = media.get('is_wide') ratio = '16x9' if media.get('is_wide') else '4x3'
formats = [] play_path = 'mp4:%s_dctp_0500_%s.m4v' % (uuid, ratio)
def add_formats(suffix): servers = self._download_json(
templ = 'https://%%s/%s_dctp_%s.m4v' % (uuid, suffix) 'http://www.dctp.tv/streaming_servers/', display_id,
formats.extend([{ note='Downloading server list JSON', fatal=False)
'format_id': 'hls-' + suffix,
'url': templ % 'cdn-segments.dctp.tv' + '/playlist.m3u8',
'protocol': 'm3u8_native',
}, {
'format_id': 's3-' + suffix,
'url': templ % 'completed-media.s3.amazonaws.com',
}, {
'format_id': 'http-' + suffix,
'url': templ % 'cdn-media.dctp.tv',
}])
add_formats('0500_' + ('16x9' if is_wide else '4x3')) if servers:
if is_wide: endpoint = next(
add_formats('720p') server['endpoint']
for server in servers
if url_or_none(server.get('endpoint'))
and 'cloudfront' in server['endpoint'])
else:
endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
app = self._search_regex(
r'^rtmpe?://[^/]+/(?P<app>.*)$', endpoint, 'app')
formats = [{
'url': endpoint,
'app': app,
'play_path': play_path,
'page_url': url,
'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-110.swf',
'ext': 'flv',
}]
thumbnails = [] thumbnails = []
images = media.get('images') images = media.get('images')

View file

@ -13,8 +13,8 @@
class DiscoveryIE(DiscoveryGoBaseIE): class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?:// _VALID_URL = r'''(?x)https?://
(?P<site> (?P<site>
go\.discovery| (?:(?:www|go)\.)?discovery|
www\. (?:www\.)?
(?: (?:
investigationdiscovery| investigationdiscovery|
discoverylife| discoverylife|
@ -22,7 +22,8 @@ class DiscoveryIE(DiscoveryGoBaseIE):
ahctv| ahctv|
destinationamerica| destinationamerica|
sciencechannel| sciencechannel|
tlc tlc|
velocity
)| )|
watch\. watch\.
(?: (?:
@ -82,7 +83,7 @@ def _real_extract(self, url):
'authRel': 'authorization', 'authRel': 'authorization',
'client_id': '3020a40c2356a645b4b4', 'client_id': '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]), 'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://www.discovery.com/', 'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
})['access_token'] })['access_token']
headers = self.geo_verification_headers() headers = self.geo_verification_headers()

View file

@ -4,6 +4,7 @@
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
encode_base_n, encode_base_n,
ExtractorError, ExtractorError,
@ -54,7 +55,7 @@ def _real_extract(self, url):
webpage, urlh = self._download_webpage_handle(url, display_id) webpage, urlh = self._download_webpage_handle(url, display_id)
video_id = self._match_id(urlh.geturl()) video_id = self._match_id(compat_str(urlh.geturl()))
hash = self._search_regex( hash = self._search_regex(
r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash') r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')

View file

@ -105,7 +105,6 @@
BiliBiliBangumiIE, BiliBiliBangumiIE,
BilibiliAudioIE, BilibiliAudioIE,
BilibiliAudioAlbumIE, BilibiliAudioAlbumIE,
BiliBiliPlayerIE,
) )
from .biobiochiletv import BioBioChileTVIE from .biobiochiletv import BioBioChileTVIE
from .bitchute import ( from .bitchute import (
@ -498,6 +497,7 @@
from .jove import JoveIE from .jove import JoveIE
from .joj import JojIE from .joj import JojIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE from .kakao import KakaoIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE from .kanalplay import KanalPlayIE
@ -636,10 +636,7 @@
from .mlb import MLBIE from .mlb import MLBIE
from .mnet import MnetIE from .mnet import MnetIE
from .moevideo import MoeVideoIE from .moevideo import MoeVideoIE
from .mofosex import ( from .mofosex import MofosexIE
MofosexIE,
MofosexEmbedIE,
)
from .mojvideo import MojvideoIE from .mojvideo import MojvideoIE
from .morningstar import MorningstarIE from .morningstar import MorningstarIE
from .motherless import ( from .motherless import (
@ -804,16 +801,6 @@
ORFFM4IE, ORFFM4IE,
ORFFM4StoryIE, ORFFM4StoryIE,
ORFOE1IE, ORFOE1IE,
ORFOE3IE,
ORFNOEIE,
ORFWIEIE,
ORFBGLIE,
ORFOOEIE,
ORFSTMIE,
ORFKTNIE,
ORFSBGIE,
ORFTIRIE,
ORFVBGIE,
ORFIPTVIE, ORFIPTVIE,
) )
from .outsidetv import OutsideTVIE from .outsidetv import OutsideTVIE
@ -821,6 +808,7 @@
PacktPubIE, PacktPubIE,
PacktPubCourseIE, PacktPubCourseIE,
) )
from .pandatv import PandaTVIE
from .pandoratv import PandoraTVIE from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
@ -863,7 +851,6 @@
PolskieRadioIE, PolskieRadioIE,
PolskieRadioCategoryIE, PolskieRadioCategoryIE,
) )
from .popcorntimes import PopcorntimesIE
from .popcorntv import PopcornTVIE from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE from .porn91 import Porn91IE
from .porncom import PornComIE from .porncom import PornComIE
@ -976,10 +963,7 @@
from .sbs import SBSIE from .sbs import SBSIE
from .screencast import ScreencastIE from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE from .screencastomatic import ScreencastOMaticIE
from .scrippsnetworks import ( from .scrippsnetworks import ScrippsNetworksWatchIE
ScrippsNetworksWatchIE,
ScrippsNetworksIE,
)
from .scte import ( from .scte import (
SCTEIE, SCTEIE,
SCTECourseIE, SCTECourseIE,

View file

@ -466,18 +466,15 @@ def _real_extract(self, url):
return info_dict return info_dict
if '/posts/' in url: if '/posts/' in url:
video_id_json = self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])', webpage, 'video ids', group='ids',
default='')
if video_id_json:
entries = [ entries = [
self.url_result('facebook:%s' % vid, FacebookIE.ie_key()) self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
for vid in self._parse_json(video_id_json, video_id)] for vid in self._parse_json(
return self.playlist_result(entries, video_id) self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
webpage, 'video ids', group='ids'),
video_id)]
# Single Video? return self.playlist_result(entries, video_id)
video_id = self._search_regex(r'video_id:\s*"([0-9]+)"', webpage, 'single video id')
return self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
else: else:
_, info_dict = self._extract_from_url( _, info_dict = self._extract_from_url(
self._VIDEO_PAGE_TEMPLATE % video_id, self._VIDEO_PAGE_TEMPLATE % video_id,

View file

@ -31,13 +31,7 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_data = extract_attributes(self._search_regex( video_data = extract_attributes(self._search_regex(
r'''(?sx) r'(?s)<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>.*?(<button[^>]+data-asset-source="[^"]+"[^>]+>)',
(?:
</h1>|
<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>
).*?
(<button[^>]+data-asset-source="[^"]+"[^>]+>)
''',
webpage, 'video data')) webpage, 'video data'))
video_url = video_data['data-asset-source'] video_url = video_data['data-asset-source']

View file

@ -60,9 +60,6 @@
from .drtuber import DrTuberIE from .drtuber import DrTuberIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .tube8 import Tube8IE from .tube8 import Tube8IE
from .mofosex import MofosexEmbedIE
from .spankwire import SpankwireIE
from .youporn import YouPornIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .dailymotion import DailymotionIE from .dailymotion import DailymotionIE
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
@ -1708,15 +1705,6 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': ['Kaltura'], 'add_ie': ['Kaltura'],
}, },
{
# multiple kaltura embeds, nsfw
'url': 'https://www.quartier-rouge.be/prive/femmes/kamila-avec-video-jaime-sadomie.html',
'info_dict': {
'id': 'kamila-avec-video-jaime-sadomie',
'title': "Kamila avec vídeo “J'aime sadomie”",
},
'playlist_count': 8,
},
{ {
# Non-standard Vimeo embed # Non-standard Vimeo embed
'url': 'https://openclassrooms.com/courses/understanding-the-web', 'url': 'https://openclassrooms.com/courses/understanding-the-web',
@ -2110,9 +2098,6 @@ class GenericIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Smoky Barbecue Favorites', 'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g', 'thumbnail': r're:^https?://.*\.jpe?g',
'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
'upload_date': '20170909',
'timestamp': 1504915200,
}, },
'add_ie': [ZypeIE.ie_key()], 'add_ie': [ZypeIE.ie_key()],
'params': { 'params': {
@ -2299,7 +2284,7 @@ def _real_extract(self, url):
if head_response is not False: if head_response is not False:
# Check for redirect # Check for redirect
new_url = head_response.geturl() new_url = compat_str(head_response.geturl())
if url != new_url: if url != new_url:
self.report_following_redirect(new_url) self.report_following_redirect(new_url)
if force_videoid: if force_videoid:
@ -2399,12 +2384,12 @@ def _real_extract(self, url):
return self.playlist_result( return self.playlist_result(
self._parse_xspf( self._parse_xspf(
doc, video_id, xspf_url=url, doc, video_id, xspf_url=url,
xspf_base_url=full_response.geturl()), xspf_base_url=compat_str(full_response.geturl())),
video_id) video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag): elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats( info_dict['formats'] = self._parse_mpd_formats(
doc, doc,
mpd_base_url=full_response.geturl().rpartition('/')[0], mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
mpd_url=url) mpd_url=url)
self._sort_formats(info_dict['formats']) self._sort_formats(info_dict['formats'])
return info_dict return info_dict
@ -2548,21 +2533,15 @@ def _real_extract(self, url):
return self.playlist_from_matches( return self.playlist_from_matches(
dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key()) dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
# Look for Teachable embeds, must be before Wistia
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
# Look for embedded Wistia player # Look for embedded Wistia player
wistia_urls = WistiaIE._extract_urls(webpage) wistia_url = WistiaIE._extract_url(webpage)
if wistia_urls: if wistia_url:
playlist = self.playlist_from_matches(wistia_urls, video_id, video_title, ie=WistiaIE.ie_key()) return {
for entry in playlist['entries']:
entry.update({
'_type': 'url_transparent', '_type': 'url_transparent',
'url': self._proto_relative_url(wistia_url),
'ie_key': WistiaIE.ie_key(),
'uploader': video_uploader, 'uploader': video_uploader,
}) }
return playlist
# Look for SVT player # Look for SVT player
svt_url = SVTIE._extract_url(webpage) svt_url = SVTIE._extract_url(webpage)
@ -2727,21 +2706,6 @@ def _real_extract(self, url):
if tube8_urls: if tube8_urls:
return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key()) return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
# Look for embedded Mofosex player
mofosex_urls = MofosexEmbedIE._extract_urls(webpage)
if mofosex_urls:
return self.playlist_from_matches(mofosex_urls, video_id, video_title, ie=MofosexEmbedIE.ie_key())
# Look for embedded Spankwire player
spankwire_urls = SpankwireIE._extract_urls(webpage)
if spankwire_urls:
return self.playlist_from_matches(spankwire_urls, video_id, video_title, ie=SpankwireIE.ie_key())
# Look for embedded YouPorn player
youporn_urls = YouPornIE._extract_urls(webpage)
if youporn_urls:
return self.playlist_from_matches(youporn_urls, video_id, video_title, ie=YouPornIE.ie_key())
# Look for embedded Tvigle player # Look for embedded Tvigle player
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@ -2853,12 +2817,9 @@ def _real_extract(self, url):
return self.url_result(mobj.group('url'), 'Zapiks') return self.url_result(mobj.group('url'), 'Zapiks')
# Look for Kaltura embeds # Look for Kaltura embeds
kaltura_urls = KalturaIE._extract_urls(webpage) kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_urls: if kaltura_url:
return self.playlist_from_matches( return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
kaltura_urls, video_id, video_title,
getter=lambda x: smuggle_url(x, {'source_url': url}),
ie=KalturaIE.ie_key())
# Look for EaglePlatform embeds # Look for EaglePlatform embeds
eagleplatform_url = EaglePlatformIE._extract_url(webpage) eagleplatform_url = EaglePlatformIE._extract_url(webpage)
@ -2999,7 +2960,7 @@ def _real_extract(self, url):
# Look for VODPlatform embeds # Look for VODPlatform embeds
mobj = re.search( mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/.+?)\1', r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vod-platform\.net/[eE]mbed/.+?)\1',
webpage) webpage)
if mobj is not None: if mobj is not None:
return self.url_result( return self.url_result(
@ -3176,6 +3137,10 @@ def _real_extract(self, url):
return self.playlist_from_matches( return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key()) peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage) indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls: if indavideo_urls:
return self.playlist_from_matches( return self.playlist_from_matches(

View file

@ -13,10 +13,10 @@
class GiantBombIE(InfoExtractor): class GiantBombIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?giantbomb\.com/(?:videos|shows)/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)' _VALID_URL = r'https?://(?:www\.)?giantbomb\.com/videos/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
_TESTS = [{ _TEST = {
'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/', 'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
'md5': '132f5a803e7e0ab0e274d84bda1e77ae', 'md5': 'c8ea694254a59246a42831155dec57ac',
'info_dict': { 'info_dict': {
'id': '2300-9782', 'id': '2300-9782',
'display_id': 'quick-look-destiny-the-dark-below', 'display_id': 'quick-look-destiny-the-dark-below',
@ -26,10 +26,7 @@ class GiantBombIE(InfoExtractor):
'duration': 2399, 'duration': 2399,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
} }
}, { }
'url': 'https://www.giantbomb.com/shows/ben-stranding/2970-20212',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View file

@ -1,11 +1,12 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, js_to_json,
merge_dicts,
remove_end, remove_end,
unified_timestamp, determine_ext,
) )
@ -13,21 +14,15 @@ class HellPornoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/', 'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
'md5': 'f0a46ebc0bed0c72ae8fe4629f7de5f3', 'md5': '1fee339c610d2049699ef2aa699439f1',
'info_dict': { 'info_dict': {
'id': '149116', 'id': '149116',
'display_id': 'dixie-is-posing-with-naked-ass-very-erotic', 'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dixie is posing with naked ass very erotic', 'title': 'Dixie is posing with naked ass very erotic',
'description': 'md5:9a72922749354edb1c4b6e540ad3d215',
'categories': list,
'thumbnail': r're:https?://.*\.jpg$', 'thumbnail': r're:https?://.*\.jpg$',
'duration': 240,
'timestamp': 1398762720,
'upload_date': '20140429',
'view_count': int,
'age_limit': 18, 'age_limit': 18,
}, }
}, { }, {
'url': 'http://hellporno.net/v/186271/', 'url': 'http://hellporno.net/v/186271/',
'only_matching': True, 'only_matching': True,
@ -41,36 +36,40 @@ def _real_extract(self, url):
title = remove_end(self._html_search_regex( title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno') r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
info = self._parse_html5_media_entries(url, webpage, display_id)[0] flashvars = self._parse_json(self._search_regex(
self._sort_formats(info['formats']) r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
display_id, transform_source=js_to_json)
video_id = self._search_regex( video_id = flashvars.get('video_id')
(r'chs_object\s*=\s*["\'](\d+)', thumbnail = flashvars.get('preview_url')
r'params\[["\']video_id["\']\]\s*=\s*(\d+)'), webpage, 'video id', ext = determine_ext(flashvars.get('postfix'), 'mp4')
default=display_id)
description = self._search_regex( formats = []
r'class=["\']desc_video_view_v2[^>]+>([^<]+)', webpage, for video_url_key in ['video_url', 'video_alt_url']:
'description', fatal=False) video_url = flashvars.get(video_url_key)
categories = [ if not video_url:
c.strip() continue
for c in self._html_search_meta( video_text = flashvars.get('%s_text' % video_url_key)
fmt = {
'url': video_url,
'ext': ext,
'format_id': video_text,
}
m = re.search(r'^(?P<height>\d+)[pP]', video_text)
if m:
fmt['height'] = int(m.group('height'))
formats.append(fmt)
self._sort_formats(formats)
categories = self._html_search_meta(
'keywords', webpage, 'categories', default='').split(',') 'keywords', webpage, 'categories', default='').split(',')
if c.strip()]
duration = int_or_none(self._og_search_property(
'video:duration', webpage, fatal=False))
timestamp = unified_timestamp(self._og_search_property(
'video:release_date', webpage, fatal=False))
view_count = int_or_none(self._search_regex(
r'>Views\s+(\d+)', webpage, 'view count', fatal=False))
return merge_dicts(info, { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': description, 'thumbnail': thumbnail,
'categories': categories, 'categories': categories,
'duration': duration,
'timestamp': timestamp,
'view_count': view_count,
'age_limit': 18, 'age_limit': 18,
}) 'formats': formats,
}

View file

@ -1,7 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -10,7 +8,6 @@
mimetype2ext, mimetype2ext,
parse_duration, parse_duration,
qualities, qualities,
try_get,
url_or_none, url_or_none,
) )
@ -18,16 +15,15 @@
class ImdbIE(InfoExtractor): class ImdbIE(InfoExtractor):
IE_NAME = 'imdb' IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers' IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).*?[/-]vi(?P<id>\d+)' _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897', 'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': { 'info_dict': {
'id': '2524815897', 'id': '2524815897',
'ext': 'mp4', 'ext': 'mp4',
'title': 'No. 2', 'title': 'No. 2 from Ice Age: Continental Drift (2012)',
'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7', 'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
'duration': 152,
} }
}, { }, {
'url': 'http://www.imdb.com/video/_/vi2524815897', 'url': 'http://www.imdb.com/video/_/vi2524815897',
@ -51,23 +47,21 @@ class ImdbIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(
data = self._download_json( 'https://www.imdb.com/videoplayer/vi' + video_id, video_id)
'https://www.imdb.com/ve/data/VIDEO_PLAYBACK_DATA', video_id, video_metadata = self._parse_json(self._search_regex(
query={ r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage,
'key': base64.b64encode(json.dumps({ 'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id]
'type': 'VIDEO_PLAYER', title = self._html_search_meta(
'subType': 'FORCE_LEGACY', ['og:title', 'twitter:title'], webpage) or self._html_search_regex(
'id': 'vi%s' % video_id, r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title']
}).encode()).decode(),
})[0]
quality = qualities(('SD', '480p', '720p', '1080p')) quality = qualities(('SD', '480p', '720p', '1080p'))
formats = [] formats = []
for encoding in data['videoLegacyEncodings']: for encoding in video_metadata.get('encodings', []):
if not encoding or not isinstance(encoding, dict): if not encoding or not isinstance(encoding, dict):
continue continue
video_url = url_or_none(encoding.get('url')) video_url = url_or_none(encoding.get('videoUrl'))
if not video_url: if not video_url:
continue continue
ext = mimetype2ext(encoding.get( ext = mimetype2ext(encoding.get(
@ -75,7 +69,7 @@ def _real_extract(self, url):
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native', video_url, video_id, 'mp4', entry_protocol='m3u8_native',
preference=1, m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
continue continue
format_id = encoding.get('definition') format_id = encoding.get('definition')
formats.append({ formats.append({
@ -86,33 +80,13 @@ def _real_extract(self, url):
}) })
self._sort_formats(formats) self._sort_formats(formats)
webpage = self._download_webpage(
'https://www.imdb.com/video/vi' + video_id, video_id)
video_metadata = self._parse_json(self._search_regex(
r'args\.push\(\s*({.+?})\s*\)\s*;', webpage,
'video metadata'), video_id)
video_info = video_metadata.get('VIDEO_INFO')
if video_info and isinstance(video_info, dict):
info = try_get(
video_info, lambda x: x[list(video_info.keys())[0]][0], dict)
else:
info = {}
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage) or self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title',
default=None) or info['videoTitle']
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'alt_title': info.get('videoSubTitle'),
'formats': formats, 'formats': formats,
'description': info.get('videoDescription'), 'description': video_metadata.get('description'),
'thumbnail': url_or_none(try_get( 'thumbnail': video_metadata.get('slate', {}).get('url'),
video_metadata, lambda x: x['videoSlate']['source'])), 'duration': parse_duration(video_metadata.get('duration')),
'duration': parse_duration(info.get('videoRuntime')),
} }

View file

@ -58,7 +58,7 @@ def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video = self._download_json( video = self._download_json(
'https://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id, 'http://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
video_id)['data'] video_id)['data']
title = video['title'] title = video['title']

View file

@ -16,22 +16,12 @@ class IPrimaIE(InfoExtractor):
_GEO_BYPASS = False _GEO_BYPASS = False
_TESTS = [{ _TESTS = [{
'url': 'https://prima.iprima.cz/particka/92-epizoda', 'url': 'http://play.iprima.cz/gondici-s-r-o-33',
'info_dict': { 'info_dict': {
'id': 'p51388', 'id': 'p136534',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Partička (92)', 'title': 'Gondíci s. r. o. (34)',
'description': 'md5:859d53beae4609e6dd7796413f1b6cac', 'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd',
},
'params': {
'skip_download': True, # m3u8 download
},
}, {
'url': 'https://cnn.iprima.cz/videa/70-epizoda',
'info_dict': {
'id': 'p681554',
'ext': 'mp4',
'title': 'HLAVNÍ ZPRÁVY 3.5.2020',
}, },
'params': { 'params': {
'skip_download': True, # m3u8 download 'skip_download': True, # m3u8 download
@ -78,15 +68,9 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = self._og_search_title(
webpage, default=None) or self._search_regex(
r'<h1>([^<]+)', webpage, 'title')
video_id = self._search_regex( video_id = self._search_regex(
(r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)', (r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
r'data-product="([^"]+)">', r'data-product="([^"]+)">'),
r'id=["\']player-(p\d+)"',
r'playerId\s*:\s*["\']player-(p\d+)'),
webpage, 'real id') webpage, 'real id')
playerpage = self._download_webpage( playerpage = self._download_webpage(
@ -141,8 +125,8 @@ def extract_formats(format_url, format_key=None, lang=None):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage, default=None), 'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats, 'formats': formats,
'description': self._og_search_description(webpage, default=None), 'description': self._og_search_description(webpage),
} }

View file

@ -239,7 +239,7 @@ def _extract_entries(self, html, compilation_id):
self.url_result( self.url_result(
'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key()) 'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
for serie in re.findall( for serie in re.findall(
r'<a\b[^>]+\bhref=["\']/watch/%s/(\d+)["\']' % compilation_id, html)] r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View file

@ -0,0 +1,68 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
)
class JpopsukiIE(InfoExtractor):
IE_NAME = 'jpopsuki.tv'
_VALID_URL = r'https?://(?:www\.)?jpopsuki\.tv/(?:category/)?video/[^/]+/(?P<id>\S+)'
_TEST = {
'url': 'http://www.jpopsuki.tv/video/ayumi-hamasaki---evolution/00be659d23b0b40508169cdee4545771',
'md5': '88018c0c1a9b1387940e90ec9e7e198e',
'info_dict': {
'id': '00be659d23b0b40508169cdee4545771',
'ext': 'mp4',
'title': 'ayumi hamasaki - evolution',
'description': 'Release date: 2001.01.31\r\n浜崎あゆみ - evolution',
'thumbnail': 'http://www.jpopsuki.tv/cache/89722c74d2a2ebe58bcac65321c115b2.jpg',
'uploader': 'plama_chan',
'uploader_id': '404',
'upload_date': '20121101'
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = 'http://www.jpopsuki.tv' + self._html_search_regex(
r'<source src="(.*?)" type', webpage, 'video url')
video_title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<li>from: <a href="/user/view/user/(.*?)/uid/',
webpage, 'video uploader', fatal=False)
uploader_id = self._html_search_regex(
r'<li>from: <a href="/user/view/user/\S*?/uid/(\d*)',
webpage, 'video uploader_id', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r'<li>uploaded: (.*?)</li>', webpage, 'video upload_date',
fatal=False))
view_count_str = self._html_search_regex(
r'<li>Hits: ([0-9]+?)</li>', webpage, 'video view_count',
fatal=False)
comment_count_str = self._html_search_regex(
r'<h2>([0-9]+?) comments</h2>', webpage, 'video comment_count',
fatal=False)
return {
'id': video_id,
'url': video_url,
'title': video_title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
'view_count': int_or_none(view_count_str),
'comment_count': int_or_none(comment_count_str),
}

View file

@ -4,7 +4,6 @@
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import unsmuggle_url
class JWPlatformIE(InfoExtractor): class JWPlatformIE(InfoExtractor):
@ -33,14 +32,10 @@ def _extract_url(webpage):
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return re.findall( return re.findall(
r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//(?:content\.jwplatform|cdn\.jwplayer)\.com/players/[a-zA-Z0-9]{8})', r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//content\.jwplatform\.com/players/[a-zA-Z0-9]{8})',
webpage) webpage)
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
video_id = self._match_id(url) video_id = self._match_id(url)
json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id) json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
return self._parse_jwplayer_data(json_data, video_id) return self._parse_jwplayer_data(json_data, video_id)

View file

@ -113,14 +113,9 @@ class KalturaIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
urls = KalturaIE._extract_urls(webpage)
return urls[0] if urls else None
@staticmethod
def _extract_urls(webpage):
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site # Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
finditer = ( mobj = (
re.finditer( re.search(
r"""(?xs) r"""(?xs)
kWidget\.(?:thumb)?[Ee]mbed\( kWidget\.(?:thumb)?[Ee]mbed\(
\{.*? \{.*?
@ -129,7 +124,7 @@ def _extract_urls(webpage):
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s* (?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\}) (?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
""", webpage) """, webpage)
or re.finditer( or re.search(
r'''(?xs) r'''(?xs)
(?P<q1>["']) (?P<q1>["'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)* (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
@ -143,7 +138,7 @@ def _extract_urls(webpage):
) )
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3) (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage) ''', webpage)
or re.finditer( or re.search(
r'''(?xs) r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["']) <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+) (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
@ -153,8 +148,7 @@ def _extract_urls(webpage):
(?P=q1) (?P=q1)
''', webpage) ''', webpage)
) )
urls = [] if mobj:
for mobj in finditer:
embed_info = mobj.groupdict() embed_info = mobj.groupdict()
for k, v in embed_info.items(): for k, v in embed_info.items():
if v: if v:
@ -166,8 +160,7 @@ def _extract_urls(webpage):
webpage) webpage)
if service_mobj: if service_mobj:
url = smuggle_url(url, {'service_url': service_mobj.group('id')}) url = smuggle_url(url, {'service_url': service_mobj.group('id')})
urls.append(url) return url
return urls
def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs): def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
params = actions[0] params = actions[0]

View file

@ -4,6 +4,7 @@
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext, determine_ext,
@ -35,7 +36,7 @@ def _login(self):
self._LOGIN_URL, None, 'Downloading login popup') self._LOGIN_URL, None, 'Downloading login popup')
def is_logged(url_handle): def is_logged(url_handle):
return self._LOGIN_URL not in url_handle.geturl() return self._LOGIN_URL not in compat_str(url_handle.geturl())
# Already logged in # Already logged in
if is_logged(urlh): if is_logged(urlh):

View file

@ -2,24 +2,23 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError, unescapeHTML,
int_or_none, parse_duration,
qualities, get_element_by_class,
) )
class LEGOIE(InfoExtractor): class LEGOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]{32})' _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1', 'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
'md5': 'f34468f176cfd76488767fc162c405fa', 'md5': 'f34468f176cfd76488767fc162c405fa',
'info_dict': { 'info_dict': {
'id': '55492d82-3b1b-4d5e-9857-87fa8c2973b1_en-US', 'id': '55492d823b1b4d5e985787fa8c2973b1',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi', 'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi', 'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
@ -27,123 +26,103 @@ class LEGOIE(InfoExtractor):
}, { }, {
# geo-restricted but the contentUrl contain a valid url # geo-restricted but the contentUrl contain a valid url
'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399', 'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
'md5': 'c7420221f7ffd03ff056f9db7f8d807c', 'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
'info_dict': { 'info_dict': {
'id': '13bdc229-9ab2-4d96-8570-1a915b3d71e7_nl-NL', 'id': '13bdc2299ab24d9685701a915b3d71e7',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Aflevering 20: Helden van het koninkrijk', 'title': 'Aflevering 20 - Helden van het koninkrijk',
'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941', 'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
'age_limit': 5,
}, },
}, { }, {
# with subtitle # special characters in title
'url': 'https://www.lego.com/nl-nl/kids/videos/classic/creative-storytelling-the-little-puppy-aa24f27c7d5242bc86102ebdc0f24cba', 'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
'info_dict': { 'info_dict': {
'id': 'aa24f27c-7d52-42bc-8610-2ebdc0f24cba_nl-NL', 'id': '9685ee9d12e84ff38e84b4e3d0db533d',
'ext': 'mp4', 'ext': 'mp4',
'title': 'De kleine puppy', 'title': 'Force Surprise LEGO® Star Wars™ Microfighters',
'description': 'md5:5b725471f849348ac73f2e12cfb4be06', 'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
'age_limit': 1,
'subtitles': {
'nl': [{
'ext': 'srt',
'url': r're:^https://.+\.srt$',
}],
},
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}] }]
_QUALITIES = { _BITRATES = [256, 512, 1024, 1536, 2560]
'Lowest': (64, 180, 320),
'Low': (64, 270, 480),
'Medium': (96, 360, 640),
'High': (128, 540, 960),
'Highest': (128, 720, 1280),
}
def _real_extract(self, url): def _real_extract(self, url):
locale, video_id = re.match(self._VALID_URL, url).groups() locale, video_id = re.match(self._VALID_URL, url).groups()
countries = [locale.split('-')[1].upper()] webpage = self._download_webpage(url, video_id)
self._initialize_geo_bypass({ title = get_element_by_class('video-header', webpage).strip()
'countries': countries, progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
streaming_base = 'http://legoprod-f.akamaihd.net/'
content_url = self._html_search_meta('contentUrl', webpage)
path = self._search_regex(
r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
content_url, 'video path', default=None)
if not path:
player_url = self._proto_relative_url(self._search_regex(
r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
webpage, 'player url', default=None))
if not player_url:
base_url = self._proto_relative_url(self._search_regex(
r'data-baseurl="([^"]+)"', webpage, 'base url',
default='http://www.lego.com/%s/mediaplayer/video/' % locale))
player_url = base_url + video_id
player_webpage = self._download_webpage(player_url, video_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
r"video='([^']+)'", player_webpage, 'video data')), video_id)
progressive_base = self._search_regex(
r'data-video-progressive-url="([^"]+)"',
player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
streaming_base = self._search_regex(
r'data-video-streaming-url="([^"]+)"',
player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
item_id = video_data['ItemId']
net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
path = '/'.join([net_storage_path, base_path])
streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
formats = self._extract_akamai_formats(
'%si/s/public/%s_,%s,.mp4.csmil/master.m3u8' % (streaming_base, path, streaming_path), video_id)
m3u8_formats = list(filter(
lambda f: f.get('protocol') == 'm3u8_native' and f.get('vcodec') != 'none',
formats))
if len(m3u8_formats) == len(self._BITRATES):
self._sort_formats(m3u8_formats)
for bitrate, m3u8_format in zip(self._BITRATES, m3u8_formats):
progressive_base_url = '%spublic/%s_%d.' % (progressive_base, path, bitrate)
mp4_f = m3u8_format.copy()
mp4_f.update({
'url': progressive_base_url + 'mp4',
'format_id': m3u8_format['format_id'].replace('hls', 'mp4'),
'protocol': 'http',
}) })
web_f = {
try: 'url': progressive_base_url + 'webm',
item = self._download_json( 'format_id': m3u8_format['format_id'].replace('hls', 'webm'),
# https://contentfeed.services.lego.com/api/v2/item/[VIDEO_ID]?culture=[LOCALE]&contentType=Video 'width': m3u8_format['width'],
'https://services.slingshot.lego.com/mediaplayer/v2', 'height': m3u8_format['height'],
video_id, query={ 'tbr': m3u8_format.get('tbr'),
'videoId': '%s_%s' % (uuid.UUID(video_id), locale), 'ext': 'webm',
}, headers=self.geo_verification_headers())
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 451:
self.raise_geo_restricted(countries=countries)
raise
video = item['Video']
video_id = video['Id']
title = video['Title']
q = qualities(['Lowest', 'Low', 'Medium', 'High', 'Highest'])
formats = []
for video_source in item.get('VideoFormats', []):
video_source_url = video_source.get('Url')
if not video_source_url:
continue
video_source_format = video_source.get('Format')
if video_source_format == 'F4M':
formats.extend(self._extract_f4m_formats(
video_source_url, video_id,
f4m_id=video_source_format, fatal=False))
elif video_source_format == 'M3U8':
formats.extend(self._extract_m3u8_formats(
video_source_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=video_source_format, fatal=False))
else:
video_source_quality = video_source.get('Quality')
format_id = []
for v in (video_source_format, video_source_quality):
if v:
format_id.append(v)
f = {
'format_id': '-'.join(format_id),
'quality': q(video_source_quality),
'url': video_source_url,
} }
quality = self._QUALITIES.get(video_source_quality) formats.extend([web_f, mp4_f])
if quality: else:
f.update({ for bitrate in self._BITRATES:
'abr': quality[0], for ext in ('web', 'mp4'):
'height': quality[1], formats.append({
'width': quality[2], 'format_id': '%s-%s' % (ext, bitrate),
}), 'url': '%spublic/%s_%d.%s' % (progressive_base, path, bitrate, ext),
formats.append(f) 'tbr': bitrate,
self._sort_formats(formats) 'ext': ext,
subtitles = {}
sub_file_id = video.get('SubFileId')
if sub_file_id and sub_file_id != '00000000-0000-0000-0000-000000000000':
net_storage_path = video.get('NetstoragePath')
invariant_id = video.get('InvariantId')
video_file_id = video.get('VideoFileId')
video_version = video.get('VideoVersion')
if net_storage_path and invariant_id and video_file_id and video_version:
subtitles.setdefault(locale[:2], []).append({
'url': 'https://lc-mediaplayerns-live-s.legocdn.com/public/%s/%s_%s_%s_%s_sub.srt' % (net_storage_path, invariant_id, video_file_id, locale, video_version),
}) })
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': video.get('Description'), 'description': self._html_search_meta('description', webpage),
'thumbnail': video.get('GeneratedCoverImage') or video.get('GeneratedThumbnail'), 'thumbnail': self._html_search_meta('thumbnail', webpage),
'duration': int_or_none(video.get('Length')), 'duration': parse_duration(self._html_search_meta('duration', webpage)),
'formats': formats, 'formats': formats,
'subtitles': subtitles,
'age_limit': int_or_none(video.get('AgeFrom')),
'season': video.get('SeasonTitle'),
'season_number': int_or_none(video.get('Season')) or None,
'episode_number': int_or_none(video.get('Episode')) or None,
} }

View file

@ -18,6 +18,7 @@
class LimelightBaseIE(InfoExtractor): class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s' _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
_API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
@classmethod @classmethod
def _extract_urls(cls, webpage, source_url): def _extract_urls(cls, webpage, source_url):
@ -69,8 +70,7 @@ def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
try: try:
return self._download_json( return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method), self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
fatal=fatal, headers=headers)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission'] error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
@ -79,22 +79,22 @@ def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
raise ExtractorError(error, expected=True) raise ExtractorError(error, expected=True)
raise raise
def _extract(self, item_id, pc_method, mobile_method, referer=None): def _call_api(self, organization_id, item_id, method):
return self._download_json(
self._API_URL % (organization_id, self._API_PATH, item_id, method),
item_id, 'Downloading API %s JSON' % method)
def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method, referer=referer) pc = self._call_playlist_service(item_id, pc_method, referer=referer)
mobile = self._call_playlist_service( metadata = self._call_api(pc['orgId'], item_id, meta_method)
item_id, mobile_method, fatal=False, referer=referer) mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile return pc, mobile, metadata
def _extract_info(self, pc, mobile, i, referer):
get_item = lambda x, y: try_get(x, lambda x: x[y][i], dict) or {}
pc_item = get_item(pc, 'playlistItems')
mobile_item = get_item(mobile, 'mediaList')
video_id = pc_item.get('mediaId') or mobile_item['mediaId']
title = pc_item.get('title') or mobile_item['title']
def _extract_info(self, streams, mobile_urls, properties):
video_id = properties['media_id']
formats = [] formats = []
urls = [] urls = []
for stream in pc_item.get('streams', []): for stream in streams:
stream_url = stream.get('url') stream_url = stream.get('url')
if not stream_url or stream.get('drmProtected') or stream_url in urls: if not stream_url or stream.get('drmProtected') or stream_url in urls:
continue continue
@ -155,7 +155,7 @@ def _extract_info(self, pc, mobile, i, referer):
}) })
formats.append(fmt) formats.append(fmt)
for mobile_url in mobile_item.get('mobileUrls', []): for mobile_url in mobile_urls:
media_url = mobile_url.get('mobileUrl') media_url = mobile_url.get('mobileUrl')
format_id = mobile_url.get('targetMediaPlatform') format_id = mobile_url.get('targetMediaPlatform')
if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls: if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls:
@ -179,34 +179,54 @@ def _extract_info(self, pc, mobile, i, referer):
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} title = properties['title']
for flag in mobile_item.get('flags'): description = properties.get('description')
if flag == 'ClosedCaptions': timestamp = int_or_none(properties.get('publish_date') or properties.get('create_date'))
closed_captions = self._call_playlist_service( duration = float_or_none(properties.get('duration_in_milliseconds'), 1000)
video_id, 'getClosedCaptionsDetailsByMediaId', filesize = int_or_none(properties.get('total_storage_in_bytes'))
False, referer) or [] categories = [properties.get('category')]
for cc in closed_captions: tags = properties.get('tags', [])
cc_url = cc.get('webvttFileUrl') thumbnails = [{
if not cc_url: 'url': thumbnail['url'],
continue 'width': int_or_none(thumbnail.get('width')),
lang = cc.get('languageCode') or self._search_regex(r'/[a-z]{2}\.vtt', cc_url, 'lang', default='en') 'height': int_or_none(thumbnail.get('height')),
subtitles.setdefault(lang, []).append({ } for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
'url': cc_url,
})
break
get_meta = lambda x: pc_item.get(x) or mobile_item.get(x) subtitles = {}
for caption in properties.get('captions', []):
lang = caption.get('language_code')
subtitles_url = caption.get('url')
if lang and subtitles_url:
subtitles.setdefault(lang, []).append({
'url': subtitles_url,
})
closed_captions_url = properties.get('closed_captions_url')
if closed_captions_url:
subtitles.setdefault('en', []).append({
'url': closed_captions_url,
'ext': 'ttml',
})
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': get_meta('description'), 'description': description,
'formats': formats, 'formats': formats,
'duration': float_or_none(get_meta('durationInMilliseconds'), 1000), 'timestamp': timestamp,
'thumbnail': get_meta('previewImageUrl') or get_meta('thumbnailImageUrl'), 'duration': duration,
'filesize': filesize,
'categories': categories,
'tags': tags,
'thumbnails': thumbnails,
'subtitles': subtitles, 'subtitles': subtitles,
} }
def _extract_info_helper(self, pc, mobile, i, metadata):
return self._extract_info(
try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
metadata)
class LimelightMediaIE(LimelightBaseIE): class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight' IE_NAME = 'limelight'
@ -231,6 +251,8 @@ class LimelightMediaIE(LimelightBaseIE):
'description': 'md5:8005b944181778e313d95c1237ddb640', 'description': 'md5:8005b944181778e313d95c1237ddb640',
'thumbnail': r're:^https?://.*\.jpeg$', 'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 144.23, 'duration': 144.23,
'timestamp': 1244136834,
'upload_date': '20090604',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -246,29 +268,30 @@ class LimelightMediaIE(LimelightBaseIE):
'title': '3Play Media Overview Video', 'title': '3Play Media Overview Video',
'thumbnail': r're:^https?://.*\.jpeg$', 'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 78.101, 'duration': 78.101,
# TODO: extract all languages that were accessible via API 'timestamp': 1338929955,
# 'subtitles': 'mincount:9', 'upload_date': '20120605',
'subtitles': 'mincount:1', 'subtitles': 'mincount:9',
}, },
}, { }, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452', 'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True, 'only_matching': True,
}] }]
_PLAYLIST_SERVICE_PATH = 'media' _PLAYLIST_SERVICE_PATH = 'media'
_API_PATH = 'media'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url) video_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
self._initialize_geo_bypass({ self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'), 'countries': smuggled_data.get('geo_countries'),
}) })
pc, mobile = self._extract( pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId', video_id, 'getPlaylistByMediaId',
'getMobilePlaylistByMediaId', source_url) 'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url'))
return self._extract_info(pc, mobile, 0, source_url) return self._extract_info_helper(pc, mobile, 0, metadata)
class LimelightChannelIE(LimelightBaseIE): class LimelightChannelIE(LimelightBaseIE):
@ -290,7 +313,6 @@ class LimelightChannelIE(LimelightBaseIE):
'info_dict': { 'info_dict': {
'id': 'ab6a524c379342f9b23642917020c082', 'id': 'ab6a524c379342f9b23642917020c082',
'title': 'Javascript Sample Code', 'title': 'Javascript Sample Code',
'description': 'Javascript Sample Code - http://www.delvenetworks.com/sample-code/playerCode-demo.html',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
}, { }, {
@ -298,23 +320,22 @@ class LimelightChannelIE(LimelightBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
_PLAYLIST_SERVICE_PATH = 'channel' _PLAYLIST_SERVICE_PATH = 'channel'
_API_PATH = 'channels'
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url) channel_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
pc, mobile = self._extract( pc, mobile, medias = self._extract(
channel_id, 'getPlaylistByChannelId', channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1', 'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
source_url) 'media', smuggled_data.get('source_url'))
entries = [ entries = [
self._extract_info(pc, mobile, i, source_url) self._extract_info_helper(pc, mobile, i, medias['media_list'][i])
for i in range(len(pc['playlistItems']))] for i in range(len(medias['media_list']))]
return self.playlist_result( return self.playlist_result(entries, channel_id, pc['title'])
entries, channel_id, pc.get('title'), mobile.get('description'))
class LimelightChannelListIE(LimelightBaseIE): class LimelightChannelListIE(LimelightBaseIE):
@ -347,12 +368,10 @@ class LimelightChannelListIE(LimelightBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
channel_list_id = self._match_id(url) channel_list_id = self._match_id(url)
channel_list = self._call_playlist_service( channel_list = self._call_playlist_service(channel_list_id, 'getMobileChannelListById')
channel_list_id, 'getMobileChannelListById')
entries = [ entries = [
self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel') self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel')
for channel in channel_list['channelList']] for channel in channel_list['channelList']]
return self.playlist_result( return self.playlist_result(entries, channel_list_id, channel_list['title'])
entries, channel_list_id, channel_list['title'])

View file

@ -8,6 +8,7 @@
from ..compat import ( from ..compat import (
compat_b64decode, compat_b64decode,
compat_HTTPError, compat_HTTPError,
compat_str,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -98,7 +99,7 @@ def random_string():
'sso': 'true', 'sso': 'true',
}) })
login_state_url = urlh.geturl() login_state_url = compat_str(urlh.geturl())
try: try:
login_page = self._download_webpage( login_page = self._download_webpage(
@ -128,7 +129,7 @@ def random_string():
}) })
access_token = self._search_regex( access_token = self._search_regex(
r'access_token=([^=&]+)', urlh.geturl(), r'access_token=([^=&]+)', compat_str(urlh.geturl()),
'access token') 'access token')
self._download_webpage( self._download_webpage(

View file

@ -20,10 +20,10 @@ class MailRuIE(InfoExtractor):
IE_DESC = 'Видео@Mail.Ru' IE_DESC = 'Видео@Mail.Ru'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:(?:www|m)\.)?my\.mail\.ru/+ (?:(?:www|m)\.)?my\.mail\.ru/
(?: (?:
video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)| video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|
(?:(?P<idv2prefix>(?:[^/]+/+){2})video/(?P<idv2suffix>[^/]+/\d+))\.html| (?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
(?:video/embed|\+/video/meta)/(?P<metaid>\d+) (?:video/embed|\+/video/meta)/(?P<metaid>\d+)
) )
''' '''
@ -85,14 +85,6 @@ class MailRuIE(InfoExtractor):
{ {
'url': 'http://my.mail.ru/+/video/meta/7949340477499637815', 'url': 'http://my.mail.ru/+/video/meta/7949340477499637815',
'only_matching': True, 'only_matching': True,
},
{
'url': 'https://my.mail.ru//list/sinyutin10/video/_myvideo/4.html',
'only_matching': True,
},
{
'url': 'https://my.mail.ru//list//sinyutin10/video/_myvideo/4.html',
'only_matching': True,
} }
] ]
@ -128,12 +120,6 @@ def _real_extract(self, url):
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id, 'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
video_id, 'Downloading video JSON') video_id, 'Downloading video JSON')
headers = {}
video_key = self._get_cookies('https://my.mail.ru').get('video_key')
if video_key:
headers['Cookie'] = 'video_key=%s' % video_key.value
formats = [] formats = []
for f in video_data['videos']: for f in video_data['videos']:
video_url = f.get('url') video_url = f.get('url')
@ -146,7 +132,6 @@ def _real_extract(self, url):
'url': video_url, 'url': video_url,
'format_id': format_id, 'format_id': format_id,
'height': height, 'height': height,
'http_headers': headers,
}) })
self._sort_formats(formats) self._sort_formats(formats)
@ -252,7 +237,7 @@ def _extract_track(t, fatal=True):
class MailRuMusicIE(MailRuMusicSearchBaseIE): class MailRuMusicIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music' IE_NAME = 'mailru:music'
IE_DESC = 'Музыка@Mail.Ru' IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/+music/+songs/+[^/?#&]+-(?P<id>[\da-f]+)' _VALID_URL = r'https?://my\.mail\.ru/music/songs/[^/?#&]+-(?P<id>[\da-f]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893', 'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
'md5': '0f8c22ef8c5d665b13ac709e63025610', 'md5': '0f8c22ef8c5d665b13ac709e63025610',
@ -288,7 +273,7 @@ def _real_extract(self, url):
class MailRuMusicSearchIE(MailRuMusicSearchBaseIE): class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music:search' IE_NAME = 'mailru:music:search'
IE_DESC = 'Музыка@Mail.Ru' IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/+music/+search/+(?P<id>[^/?#&]+)' _VALID_URL = r'https?://my\.mail\.ru/music/search/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://my.mail.ru/music/search/black%20shadow', 'url': 'https://my.mail.ru/music/search/black%20shadow',
'info_dict': { 'info_dict': {

View file

@ -8,7 +8,7 @@
class MallTVIE(InfoExtractor): class MallTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|sk)\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.mall.tv/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice', 'url': 'https://www.mall.tv/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
'md5': '1c4a37f080e1f3023103a7b43458e518', 'md5': '1c4a37f080e1f3023103a7b43458e518',
@ -26,9 +26,6 @@ class MallTVIE(InfoExtractor):
}, { }, {
'url': 'https://www.mall.tv/kdo-to-plati/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice', 'url': 'https://www.mall.tv/kdo-to-plati/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://sk.mall.tv/gejmhaus/reklamacia-nehreje-vyrobnik-tepla-alebo-spekacka',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View file

@ -6,6 +6,7 @@
from .theplatform import ThePlatformBaseIE from .theplatform import ThePlatformBaseIE
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
@ -113,7 +114,7 @@ def _program_guid(qs):
continue continue
urlh = ie._request_webpage( urlh = ie._request_webpage(
embed_url, video_id, note='Following embed URL redirect') embed_url, video_id, note='Following embed URL redirect')
embed_url = urlh.geturl() embed_url = compat_str(urlh.geturl())
program_guid = _program_guid(_qs(embed_url)) program_guid = _program_guid(_qs(embed_url))
if program_guid: if program_guid:
entries.append(embed_url) entries.append(embed_url)
@ -122,7 +123,7 @@ def _program_guid(qs):
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None): def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
for video in smil.findall(self._xpath_ns('.//video', namespace)): for video in smil.findall(self._xpath_ns('.//video', namespace)):
video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src']) video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src'])
return super(MediasetIE, self)._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url) return super()._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
def _real_extract(self, url): def _real_extract(self, url):
guid = self._match_id(url) guid = self._match_id(url)

View file

@ -129,7 +129,7 @@ def _real_extract(self, url):
query = mobj.group('query') query = mobj.group('query')
webpage, urlh = self._download_webpage_handle(url, resource_id) # XXX: add UrlReferrer? webpage, urlh = self._download_webpage_handle(url, resource_id) # XXX: add UrlReferrer?
redirect_url = urlh.geturl() redirect_url = compat_str(urlh.geturl())
# XXX: might have also extracted UrlReferrer and QueryString from the html # XXX: might have also extracted UrlReferrer and QueryString from the html
service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex( service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex(

View file

@ -4,8 +4,8 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601,
smuggle_url, smuggle_url,
parse_duration,
) )
@ -18,18 +18,16 @@ class MiTeleIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg', 'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Diario de La redacción Programa 144', 'title': 'Tor, la web invisible',
'description': 'md5:07c35a7b11abb05876a6a79185b58d27', 'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
'series': 'Diario de', 'series': 'Diario de',
'season': 'Season 14', 'season': 'La redacción',
'season_number': 14, 'season_number': 14,
'episode': 'Tor, la web invisible', 'season_id': 'diario_de_t14_11981',
'episode': 'Programa 144',
'episode_number': 3, 'episode_number': 3,
'thumbnail': r're:(?i)^https?://.*\.jpg$', 'thumbnail': r're:(?i)^https?://.*\.jpg$',
'duration': 2913, 'duration': 2913,
'age_limit': 16,
'timestamp': 1471209401,
'upload_date': '20160814',
}, },
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
}, { }, {
@ -41,15 +39,13 @@ class MiTeleIE(InfoExtractor):
'title': 'Cuarto Milenio Temporada 6 Programa 226', 'title': 'Cuarto Milenio Temporada 6 Programa 226',
'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f', 'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
'series': 'Cuarto Milenio', 'series': 'Cuarto Milenio',
'season': 'Season 6', 'season': 'Temporada 6',
'season_number': 6, 'season_number': 6,
'episode': 'Episode 24', 'season_id': 'cuarto_milenio_t06_12715',
'episode': 'Programa 226',
'episode_number': 24, 'episode_number': 24,
'thumbnail': r're:(?i)^https?://.*\.jpg$', 'thumbnail': r're:(?i)^https?://.*\.jpg$',
'duration': 7313, 'duration': 7313,
'age_limit': 12,
'timestamp': 1471209021,
'upload_date': '20160814',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -58,36 +54,67 @@ class MiTeleIE(InfoExtractor):
}, { }, {
'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player', 'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144-40_1006364575251/player/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
pre_player = self._parse_json(self._search_regex( paths = self._download_json(
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})', 'https://www.mitele.es/amd/agp/web/metadata/general_configuration',
webpage, 'Pre Player'), display_id)['prePlayer'] video_id, 'Downloading paths JSON')
title = pre_player['title']
video = pre_player['video'] ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
video_id = video['dataMediaId'] base_url = ooyala_s.get('base_url', 'cdn-search-mediaset.carbyne.ps.ooyala.com')
content = pre_player.get('content') or {} full_path = ooyala_s.get('full_path', '/search/v1/full/providers/')
info = content.get('info') or {} source = self._download_json(
'%s://%s%s%s/docs/%s' % (
ooyala_s.get('protocol', 'https'), base_url, full_path,
ooyala_s.get('provider_id', '104951'), video_id),
video_id, 'Downloading data JSON', query={
'include_titles': 'Series,Season',
'product_name': ooyala_s.get('product_name', 'test'),
'format': 'full',
})['hits']['hits'][0]['_source']
embedCode = source['offers'][0]['embed_codes'][0]
titles = source['localizable_titles'][0]
title = titles.get('title_medium') or titles['title_long']
description = titles.get('summary_long') or titles.get('summary_medium')
def get(key1, key2):
value1 = source.get(key1)
if not value1 or not isinstance(value1, list):
return
if not isinstance(value1[0], dict):
return
return value1[0].get(key2)
series = get('localizable_titles_series', 'title_medium')
season = get('localizable_titles_season', 'title_medium')
season_number = int_or_none(source.get('season_number'))
season_id = source.get('season_id')
episode = titles.get('title_sort_name')
episode_number = int_or_none(source.get('episode_number'))
duration = parse_duration(get('videos', 'duration'))
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
# for some reason only HLS is supported # for some reason only HLS is supported
'url': smuggle_url('ooyala:' + video_id, {'supportedformats': 'm3u8,dash'}), 'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8,dash'}),
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': info.get('synopsis'), 'description': description,
'series': content.get('title'), 'series': series,
'season_number': int_or_none(info.get('season_number')), 'season': season,
'episode': content.get('subtitle'), 'season_number': season_number,
'episode_number': int_or_none(info.get('episode_number')), 'season_id': season_id,
'duration': int_or_none(info.get('duration')), 'episode': episode,
'thumbnail': video.get('dataPoster'), 'episode_number': episode_number,
'age_limit': int_or_none(info.get('rating')), 'duration': duration,
'timestamp': parse_iso8601(pre_player.get('publishedTime')), 'thumbnail': get('images', 'url'),
} }

View file

@ -1,8 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
str_to_int, str_to_int,
@ -57,23 +54,3 @@ def _real_extract(self, url):
}) })
return info return info
class MofosexEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.mofosex.com/embed/?videoid=318131&referrer=KM',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=\d+)',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://www.mofosex.com/videos/{0}/{0}.html'.format(video_id),
ie=MofosexIE.ie_key(), video_id=video_id)

View file

@ -26,7 +26,7 @@ class MotherlessIE(InfoExtractor):
'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'], 'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
'upload_date': '20100913', 'upload_date': '20100913',
'uploader_id': 'famouslyfuckedup', 'uploader_id': 'famouslyfuckedup',
'thumbnail': r're:https?://.*\.jpg', 'thumbnail': r're:http://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
} }
}, { }, {
@ -40,7 +40,7 @@ class MotherlessIE(InfoExtractor):
'game', 'hairy'], 'game', 'hairy'],
'upload_date': '20140622', 'upload_date': '20140622',
'uploader_id': 'Sulivana7x', 'uploader_id': 'Sulivana7x',
'thumbnail': r're:https?://.*\.jpg', 'thumbnail': r're:http://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
}, },
'skip': '404', 'skip': '404',
@ -54,7 +54,7 @@ class MotherlessIE(InfoExtractor):
'categories': ['superheroine heroine superher'], 'categories': ['superheroine heroine superher'],
'upload_date': '20140827', 'upload_date': '20140827',
'uploader_id': 'shade0230', 'uploader_id': 'shade0230',
'thumbnail': r're:https?://.*\.jpg', 'thumbnail': r're:http://.*\.jpg',
'age_limit': 18, 'age_limit': 18,
} }
}, { }, {
@ -76,8 +76,7 @@ def _real_extract(self, url):
raise ExtractorError('Video %s is for friends only' % video_id, expected=True) raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
title = self._html_search_regex( title = self._html_search_regex(
(r'(?s)<div[^>]+\bclass=["\']media-meta-title[^>]+>(.+?)</div>', r'id="view-upload-title">\s+([^<]+)<', webpage, 'title')
r'id="view-upload-title">\s+([^<]+)<'), webpage, 'title')
video_url = (self._html_search_regex( video_url = (self._html_search_regex(
(r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', (r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'), r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
@ -85,15 +84,14 @@ def _real_extract(self, url):
or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id) or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
view_count = str_to_int(self._html_search_regex( view_count = str_to_int(self._html_search_regex(
(r'>(\d+)\s+Views<', r'<strong>Views</strong>\s+([^<]+)<'), r'<strong>Views</strong>\s+([^<]+)<',
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
like_count = str_to_int(self._html_search_regex( like_count = str_to_int(self._html_search_regex(
(r'>(\d+)\s+Favorites<', r'<strong>Favorited</strong>\s+([^<]+)<'), r'<strong>Favorited</strong>\s+([^<]+)<',
webpage, 'like count', fatal=False)) webpage, 'like count', fatal=False))
upload_date = self._html_search_regex( upload_date = self._html_search_regex(
(r'class=["\']count[^>]+>(\d+\s+[a-zA-Z]{3}\s+\d{4})<', r'<strong>Uploaded</strong>\s+([^<]+)<', webpage, 'upload date')
r'<strong>Uploaded</strong>\s+([^<]+)<'), webpage, 'upload date')
if 'Ago' in upload_date: if 'Ago' in upload_date:
days = int(re.search(r'([0-9]+)', upload_date).group(1)) days = int(re.search(r'([0-9]+)', upload_date).group(1))
upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d') upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')

View file

@ -1,33 +1,68 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html,
dict_get,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
parse_duration,
try_get,
update_url_query, update_url_query,
) )
class NaverBaseIE(InfoExtractor): class NaverIE(InfoExtractor):
_CAPTION_EXT_RE = r'\.(?:ttml|vtt)' _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)'
def _extract_video_info(self, video_id, vid, key): _TESTS = [{
'url': 'http://tv.naver.com/v/81652',
'info_dict': {
'id': '81652',
'ext': 'mp4',
'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
'description': '합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
'upload_date': '20130903',
},
}, {
'url': 'http://tv.naver.com/v/395837',
'md5': '638ed4c12012c458fefcddfd01f173cd',
'info_dict': {
'id': '395837',
'ext': 'mp4',
'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
'description': 'md5:5bf200dcbf4b66eb1b350d1eb9c753f7',
'upload_date': '20150519',
},
'skip': 'Georestricted',
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
vid = self._search_regex(
r'videoId["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'video id', fatal=None, group='value')
in_key = self._search_regex(
r'inKey["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'key', default=None, group='value')
if not vid or not in_key:
error = self._html_search_regex(
r'(?s)<div class="(?:nation_error|nation_box|error_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
webpage, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
raise ExtractorError('couldn\'t extract vid and key')
video_data = self._download_json( video_data = self._download_json(
'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid, 'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
video_id, query={ video_id, query={
'key': key, 'key': in_key,
}) })
meta = video_data['meta'] meta = video_data['meta']
title = meta['subject'] title = meta['subject']
formats = [] formats = []
get_list = lambda x: try_get(video_data, lambda y: y[x + 's']['list'], list) or []
def extract_formats(streams, stream_type, query={}): def extract_formats(streams, stream_type, query={}):
for stream in streams: for stream in streams:
@ -38,7 +73,7 @@ def extract_formats(streams, stream_type, query={}):
encoding_option = stream.get('encodingOption', {}) encoding_option = stream.get('encodingOption', {})
bitrate = stream.get('bitrate', {}) bitrate = stream.get('bitrate', {})
formats.append({ formats.append({
'format_id': '%s_%s' % (stream.get('type') or stream_type, dict_get(encoding_option, ('name', 'id'))), 'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')),
'url': stream_url, 'url': stream_url,
'width': int_or_none(encoding_option.get('width')), 'width': int_or_none(encoding_option.get('width')),
'height': int_or_none(encoding_option.get('height')), 'height': int_or_none(encoding_option.get('height')),
@ -48,7 +83,7 @@ def extract_formats(streams, stream_type, query={}):
'protocol': 'm3u8_native' if stream_type == 'HLS' else None, 'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
}) })
extract_formats(get_list('video'), 'H264') extract_formats(video_data.get('videos', {}).get('list', []), 'H264')
for stream_set in video_data.get('streams', []): for stream_set in video_data.get('streams', []):
query = {} query = {}
for param in stream_set.get('keys', []): for param in stream_set.get('keys', []):
@ -66,101 +101,28 @@ def extract_formats(streams, stream_type, query={}):
'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False)) 'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
replace_ext = lambda x, y: re.sub(self._CAPTION_EXT_RE, '.' + y, x)
def get_subs(caption_url):
if re.search(self._CAPTION_EXT_RE, caption_url):
return [{
'url': replace_ext(caption_url, 'ttml'),
}, {
'url': replace_ext(caption_url, 'vtt'),
}]
else:
return [{'url': caption_url}]
automatic_captions = {}
subtitles = {} subtitles = {}
for caption in get_list('caption'): for caption in video_data.get('captions', {}).get('list', []):
caption_url = caption.get('source') caption_url = caption.get('source')
if not caption_url: if not caption_url:
continue continue
sub_dict = automatic_captions if caption.get('type') == 'auto' else subtitles subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({
sub_dict.setdefault(dict_get(caption, ('locale', 'language')), []).extend(get_subs(caption_url)) 'url': caption_url,
})
user = meta.get('user', {}) upload_date = self._search_regex(
r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
webpage, 'upload date', fatal=False)
if upload_date:
upload_date = upload_date.replace('.', '')
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'automatic_captions': automatic_captions, 'description': self._og_search_description(webpage),
'thumbnail': try_get(meta, lambda x: x['cover']['source']), 'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage),
'view_count': int_or_none(meta.get('count')), 'view_count': int_or_none(meta.get('count')),
'uploader_id': user.get('id'), 'upload_date': upload_date,
'uploader': user.get('name'),
'uploader_url': user.get('url'),
} }
class NaverIE(NaverBaseIE):
_VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/(?:v|embed)/(?P<id>\d+)'
_GEO_BYPASS = False
_TESTS = [{
'url': 'http://tv.naver.com/v/81652',
'info_dict': {
'id': '81652',
'ext': 'mp4',
'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
'description': '메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
'timestamp': 1378200754,
'upload_date': '20130903',
'uploader': '메가스터디, 합격불변의 법칙',
'uploader_id': 'megastudy',
},
}, {
'url': 'http://tv.naver.com/v/395837',
'md5': '8a38e35354d26a17f73f4e90094febd3',
'info_dict': {
'id': '395837',
'ext': 'mp4',
'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
'description': 'md5:eb6aca9d457b922e43860a2a2b1984d3',
'timestamp': 1432030253,
'upload_date': '20150519',
'uploader': '4가지쇼 시즌2',
'uploader_id': 'wrappinguser29',
},
'skip': 'Georestricted',
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
content = self._download_json(
'https://tv.naver.com/api/json/v/' + video_id,
video_id, headers=self.geo_verification_headers())
player_info_json = content.get('playerInfoJson') or {}
current_clip = player_info_json.get('currentClip') or {}
vid = current_clip.get('videoId')
in_key = current_clip.get('inKey')
if not vid or not in_key:
player_auth = try_get(player_info_json, lambda x: x['playerOption']['auth'])
if player_auth == 'notCountry':
self.raise_geo_restricted(countries=['KR'])
elif player_auth == 'notLogin':
self.raise_login_required()
raise ExtractorError('couldn\'t extract vid and key')
info = self._extract_video_info(video_id, vid, in_key)
info.update({
'description': clean_html(current_clip.get('description')),
'timestamp': int_or_none(current_clip.get('firstExposureTime'), 1000),
'duration': parse_duration(current_clip.get('displayPlayTime')),
'like_count': int_or_none(current_clip.get('recommendPoint')),
'age_limit': 19 if current_clip.get('adult') else None,
})
return info

View file

@ -87,25 +87,11 @@ class NBCIE(AdobePassIE):
def _real_extract(self, url): def _real_extract(self, url):
permalink, video_id = re.match(self._VALID_URL, url).groups() permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + compat_urllib_parse_unquote(permalink) permalink = 'http' + compat_urllib_parse_unquote(permalink)
video_data = self._download_json( response = self._download_json(
'https://friendship.nbc.co/v2/graphql', video_id, query={ 'https://friendship.nbc.co/v2/graphql', video_id, query={
'query': '''query bonanzaPage( 'query': '''{
$app: NBCUBrands! = nbc page(name: "%s", platform: web, type: VIDEO, userId: "0") {
$name: String! data {
$oneApp: Boolean
$platform: SupportedPlatforms! = web
$type: EntityPageType! = VIDEO
$userId: String!
) {
bonanzaPage(
app: $app
name: $name
oneApp: $oneApp
platform: $platform
type: $type
userId: $userId
) {
metadata {
... on VideoPageData { ... on VideoPageData {
description description
episodeNumber episodeNumber
@ -114,20 +100,15 @@ def _real_extract(self, url):
mpxAccountId mpxAccountId
mpxGuid mpxGuid
rating rating
resourceId
seasonNumber seasonNumber
secondaryTitle secondaryTitle
seriesShortTitle seriesShortTitle
} }
} }
} }
}''', }''' % permalink,
'variables': json.dumps({ })
'name': permalink, video_data = response['data']['page']['data']
'oneApp': True,
'userId': '0',
}),
})['data']['bonanzaPage']['metadata']
query = { query = {
'mbr': 'true', 'mbr': 'true',
'manifest': 'm3u', 'manifest': 'm3u',
@ -136,8 +117,8 @@ def _real_extract(self, url):
title = video_data['secondaryTitle'] title = video_data['secondaryTitle']
if video_data.get('locked'): if video_data.get('locked'):
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
video_data.get('resourceId') or 'nbcentertainment', 'nbcentertainment', title, video_id,
title, video_id, video_data.get('rating')) video_data.get('rating'))
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource) url, video_id, 'nbcentertainment', resource)
theplatform_url = smuggle_url(update_url_query( theplatform_url = smuggle_url(update_url_query(

View file

@ -7,11 +7,8 @@
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
merge_dicts,
parse_iso8601, parse_iso8601,
qualities, qualities,
try_get,
urljoin,
) )
@ -88,25 +85,21 @@ class NDRIE(NDRBaseIE):
def _extract_embed(self, webpage, display_id): def _extract_embed(self, webpage, display_id):
embed_url = self._html_search_meta( embed_url = self._html_search_meta(
'embedURL', webpage, 'embed URL', 'embedURL', webpage, 'embed URL', fatal=True)
default=None) or self._search_regex(
r'\bembedUrl["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'embed URL', group='url')
description = self._search_regex( description = self._search_regex(
r'<p[^>]+itemprop="description">([^<]+)</p>', r'<p[^>]+itemprop="description">([^<]+)</p>',
webpage, 'description', default=None) or self._og_search_description(webpage) webpage, 'description', default=None) or self._og_search_description(webpage)
timestamp = parse_iso8601( timestamp = parse_iso8601(
self._search_regex( self._search_regex(
r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"', r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
webpage, 'upload date', default=None)) webpage, 'upload date', fatal=False))
info = self._search_json_ld(webpage, display_id, default={}) return {
return merge_dicts({
'_type': 'url_transparent', '_type': 'url_transparent',
'url': embed_url, 'url': embed_url,
'display_id': display_id, 'display_id': display_id,
'description': description, 'description': description,
'timestamp': timestamp, 'timestamp': timestamp,
}, info) }
class NJoyIE(NDRBaseIE): class NJoyIE(NDRBaseIE):
@ -227,17 +220,11 @@ def _real_extract(self, url):
upload_date = ppjson.get('config', {}).get('publicationDate') upload_date = ppjson.get('config', {}).get('publicationDate')
duration = int_or_none(config.get('duration')) duration = int_or_none(config.get('duration'))
thumbnails = [] thumbnails = [{
poster = try_get(config, lambda x: x['poster'], dict) or {}
for thumbnail_id, thumbnail in poster.items():
thumbnail_url = urljoin(url, thumbnail.get('src'))
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail.get('quality') or thumbnail_id, 'id': thumbnail.get('quality') or thumbnail_id,
'url': thumbnail_url, 'url': thumbnail['src'],
'preference': quality_key(thumbnail.get('quality')), 'preference': quality_key(thumbnail.get('quality')),
}) } for thumbnail_id, thumbnail in config.get('poster', {}).items() if thumbnail.get('src')]
return { return {
'id': video_id, 'id': video_id,

View file

@ -6,7 +6,7 @@
class NhkVodIE(InfoExtractor): class NhkVodIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[^/]+?-\d{8}-\d+)' _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[a-z]+-\d{8}-\d+)'
# Content available only for a limited period of time. Visit # Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples. # https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{ _TESTS = [{
@ -30,11 +30,8 @@ class NhkVodIE(InfoExtractor):
}, { }, {
'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/', 'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/j_art-20150903-1/',
'only_matching': True,
}] }]
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7a/episode/%s/%s/all%s.json' _API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7/episode/%s/%s/all%s.json'
def _real_extract(self, url): def _real_extract(self, url):
lang, m_type, episode_id = re.match(self._VALID_URL, url).groups() lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
@ -85,9 +82,15 @@ def get_clean_field(key):
audio = episode['audio'] audio = episode['audio']
audio_path = audio['audio'] audio_path = audio['audio']
info['formats'] = self._extract_m3u8_formats( info['formats'] = self._extract_m3u8_formats(
'https://nhkworld-vh.akamaihd.net/i%s/master.m3u8' % audio_path, 'https://nhks-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
episode_id, 'm4a', entry_protocol='m3u8_native', episode_id, 'm4a', m3u8_id='hls', fatal=False)
m3u8_id='hls', fatal=False) for proto in ('rtmpt', 'rtmp'):
info['formats'].append({
'ext': 'flv',
'format_id': proto,
'url': '%s://flv.nhk.or.jp/ondemand/mp4:flv%s' % (proto, audio_path),
'vcodec': 'none',
})
for f in info['formats']: for f in info['formats']:
f['language'] = lang f['language'] = lang
return info return info

View file

@ -6,7 +6,6 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext,
int_or_none, int_or_none,
js_to_json, js_to_json,
qualities, qualities,
@ -19,7 +18,7 @@ class NovaEmbedIE(InfoExtractor):
_VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)'
_TEST = { _TEST = {
'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1', 'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1',
'md5': 'ee009bafcc794541570edd44b71cbea3', 'md5': 'b3834f6de5401baabf31ed57456463f7',
'info_dict': { 'info_dict': {
'id': '8o0n0r', 'id': '8o0n0r',
'ext': 'mp4', 'ext': 'mp4',
@ -34,40 +33,6 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
duration = None
formats = []
player = self._parse_json(
self._search_regex(
r'Player\.init\s*\([^,]+,\s*({.+?})\s*,\s*{.+?}\s*\)\s*;',
webpage, 'player', default='{}'), video_id, fatal=False)
if player:
for format_id, format_list in player['tracks'].items():
if not isinstance(format_list, list):
format_list = [format_list]
for format_dict in format_list:
if not isinstance(format_dict, dict):
continue
format_url = url_or_none(format_dict.get('src'))
format_type = format_dict.get('type')
ext = determine_ext(format_url)
if (format_type == 'application/x-mpegURL'
or format_id == 'HLS' or ext == 'm3u8'):
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
elif (format_type == 'application/dash+xml'
or format_id == 'DASH' or ext == 'mpd'):
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': format_url,
})
duration = int_or_none(player.get('duration'))
else:
# Old path, not actual as of 08.04.2020
bitrates = self._parse_json( bitrates = self._parse_json(
self._search_regex( self._search_regex(
r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'), r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
@ -76,19 +41,14 @@ def _real_extract(self, url):
QUALITIES = ('lq', 'mq', 'hq', 'hd') QUALITIES = ('lq', 'mq', 'hq', 'hd')
quality_key = qualities(QUALITIES) quality_key = qualities(QUALITIES)
formats = []
for format_id, format_list in bitrates.items(): for format_id, format_list in bitrates.items():
if not isinstance(format_list, list): if not isinstance(format_list, list):
format_list = [format_list] continue
for format_url in format_list: for format_url in format_list:
format_url = url_or_none(format_url) format_url = url_or_none(format_url)
if not format_url: if not format_url:
continue continue
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
continue
f = { f = {
'url': format_url, 'url': format_url,
} }
@ -103,7 +63,6 @@ def _real_extract(self, url):
break break
f['format_id'] = f_id f['format_id'] = f_id
formats.append(f) formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
title = self._og_search_title( title = self._og_search_title(
@ -116,8 +75,7 @@ def _real_extract(self, url):
r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage, r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='value') 'thumbnail', fatal=False, group='value')
duration = int_or_none(self._search_regex( duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
default=duration))
return { return {
'id': video_id, 'id': video_id,
@ -133,7 +91,7 @@ class NovaIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)' _VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260', 'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260',
'md5': '249baab7d0104e186e78b0899c7d5f28', 'md5': '1dd7b9d5ea27bc361f110cd855a19bd3',
'info_dict': { 'info_dict': {
'id': '1757139', 'id': '1757139',
'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci', 'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci',
@ -155,8 +113,7 @@ class NovaIE(InfoExtractor):
'params': { 'params': {
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, }
'skip': 'gone',
}, { }, {
# media.cms.nova.cz embed # media.cms.nova.cz embed
'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil', 'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil',
@ -171,7 +128,6 @@ class NovaIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': [NovaEmbedIE.ie_key()], 'add_ie': [NovaEmbedIE.ie_key()],
'skip': 'CHYBA 404: STRÁNKA NENALEZENA',
}, { }, {
'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html', 'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html',
'only_matching': True, 'only_matching': True,
@ -196,29 +152,14 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
description = clean_html(self._og_search_description(webpage, default=None))
if site == 'novaplus':
upload_date = unified_strdate(self._search_regex(
r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
elif site == 'fanda':
upload_date = unified_strdate(self._search_regex(
r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
else:
upload_date = None
# novaplus # novaplus
embed_id = self._search_regex( embed_id = self._search_regex(
r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)', r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
webpage, 'embed url', default=None) webpage, 'embed url', default=None)
if embed_id: if embed_id:
return { return self.url_result(
'_type': 'url_transparent', 'https://media.cms.nova.cz/embed/%s' % embed_id,
'url': 'https://media.cms.nova.cz/embed/%s' % embed_id, ie=NovaEmbedIE.ie_key(), video_id=embed_id)
'ie_key': NovaEmbedIE.ie_key(),
'id': embed_id,
'description': description,
'upload_date': upload_date
}
video_id = self._search_regex( video_id = self._search_regex(
[r"(?:media|video_id)\s*:\s*'(\d+)'", [r"(?:media|video_id)\s*:\s*'(\d+)'",
@ -292,8 +233,18 @@ def _real_extract(self, url):
self._sort_formats(formats) self._sort_formats(formats)
title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage) title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage)
description = clean_html(self._og_search_description(webpage, default=None))
thumbnail = config.get('poster') thumbnail = config.get('poster')
if site == 'novaplus':
upload_date = unified_strdate(self._search_regex(
r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
elif site == 'fanda':
upload_date = unified_strdate(self._search_regex(
r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
else:
upload_date = None
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,

View file

@ -4,7 +4,6 @@
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
qualities, qualities,
url_or_none,
) )
@ -49,10 +48,6 @@ class NprIE(InfoExtractor):
}, },
}], }],
'expected_warnings': ['Failed to download m3u8 information'], 'expected_warnings': ['Failed to download m3u8 information'],
}, {
# multimedia, no formats, stream
'url': 'https://www.npr.org/2020/02/14/805476846/laura-stevenson-tiny-desk-concert',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -100,17 +95,6 @@ def _real_extract(self, url):
'format_id': format_id, 'format_id': format_id,
'quality': quality(format_id), 'quality': quality(format_id),
}) })
for stream_id, stream_entry in media.get('stream', {}).items():
if not isinstance(stream_entry, dict):
continue
if stream_id != 'hlsUrl':
continue
stream_url = url_or_none(stream_entry.get('$text'))
if not stream_url:
continue
formats.extend(self._extract_m3u8_formats(
stream_url, stream_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
entries.append({ entries.append({

View file

@ -12,7 +12,6 @@
ExtractorError, ExtractorError,
int_or_none, int_or_none,
JSON_LD_RE, JSON_LD_RE,
js_to_json,
NO_DEFAULT, NO_DEFAULT,
parse_age_limit, parse_age_limit,
parse_duration, parse_duration,
@ -106,7 +105,6 @@ def video_id_and_title(idx):
MESSAGES = { MESSAGES = {
'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet', 'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
'ProgramRightsHasExpired': 'Programmet har gått ut', 'ProgramRightsHasExpired': 'Programmet har gått ut',
'NoProgramRights': 'Ikke tilgjengelig',
'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge', 'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
} }
message_type = data.get('messageType', '') message_type = data.get('messageType', '')
@ -257,17 +255,6 @@ class NRKTVIE(NRKBaseIE):
''' % _EPISODE_RE ''' % _EPISODE_RE
_API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no') _API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
_TESTS = [{ _TESTS = [{
'url': 'https://tv.nrk.no/program/MDDP12000117',
'md5': '8270824df46ec629b66aeaa5796b36fb',
'info_dict': {
'id': 'MDDP12000117AA',
'ext': 'mp4',
'title': 'Alarm Trolltunga',
'description': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
'duration': 2223,
'age_limit': 6,
},
}, {
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014', 'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': '9a167e54d04671eb6317a37b7bc8a280', 'md5': '9a167e54d04671eb6317a37b7bc8a280',
'info_dict': { 'info_dict': {
@ -279,7 +266,6 @@ class NRKTVIE(NRKBaseIE):
'series': '20 spørsmål', 'series': '20 spørsmål',
'episode': '23.05.2014', 'episode': '23.05.2014',
}, },
'skip': 'NoProgramRights',
}, { }, {
'url': 'https://tv.nrk.no/program/mdfp15000514', 'url': 'https://tv.nrk.no/program/mdfp15000514',
'info_dict': { 'info_dict': {
@ -384,24 +370,7 @@ class NRKTVIE(NRKBaseIE):
class NRKTVEpisodeIE(InfoExtractor): class NRKTVEpisodeIE(InfoExtractor):
_VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)' _VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)'
_TESTS = [{ _TEST = {
'url': 'https://tv.nrk.no/serie/hellums-kro/sesong/1/episode/2',
'info_dict': {
'id': 'MUHH36005220BA',
'ext': 'mp4',
'title': 'Kro, krig og kjærlighet 2:6',
'description': 'md5:b32a7dc0b1ed27c8064f58b97bda4350',
'duration': 1563,
'series': 'Hellums kro',
'season_number': 1,
'episode_number': 2,
'episode': '2:6',
'age_limit': 6,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8', 'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
'info_dict': { 'info_dict': {
'id': 'MSUI14000816AA', 'id': 'MSUI14000816AA',
@ -417,8 +386,7 @@ class NRKTVEpisodeIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'ProgramRightsHasExpired', }
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -441,7 +409,7 @@ def _extract_series(self, webpage, display_id, fatal=True):
(r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;', (r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'), r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
webpage, 'config', default='{}' if not fatal else NO_DEFAULT), webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
display_id, fatal=False, transform_source=js_to_json) display_id, fatal=False)
if not config: if not config:
return return
return try_get( return try_get(
@ -511,14 +479,6 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
_VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
_ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)' _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://tv.nrk.no/serie/blank',
'info_dict': {
'id': 'blank',
'title': 'Blank',
'description': 'md5:7664b4e7e77dc6810cd3bca367c25b6e',
},
'playlist_mincount': 30,
}, {
# new layout, seasons # new layout, seasons
'url': 'https://tv.nrk.no/serie/backstage', 'url': 'https://tv.nrk.no/serie/backstage',
'info_dict': { 'info_dict': {
@ -688,7 +648,7 @@ class NRKSkoleIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099', 'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
'md5': '18c12c3d071953c3bf8d54ef6b2587b7', 'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6',
'info_dict': { 'info_dict': {
'id': '6021', 'id': '6021',
'ext': 'mp4', 'ext': 'mp4',

View file

@ -69,10 +69,10 @@ def get_file_size(file_size):
'width': int_or_none(video.get('width')), 'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')), 'height': int_or_none(video.get('height')),
'filesize': get_file_size(video.get('file_size') or video.get('fileSize')), 'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
'tbr': int_or_none(video.get('bitrate'), 1000) or None, 'tbr': int_or_none(video.get('bitrate'), 1000),
'ext': ext, 'ext': ext,
}) })
self._sort_formats(formats, ('height', 'width', 'filesize', 'tbr', 'fps', 'format_id')) self._sort_formats(formats)
thumbnails = [] thumbnails = []
for image in video_data.get('images', []): for image in video_data.get('images', []):

View file

@ -6,14 +6,12 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html,
determine_ext, determine_ext,
float_or_none, float_or_none,
HEADRequest, HEADRequest,
int_or_none, int_or_none,
orderedSet, orderedSet,
remove_end, remove_end,
str_or_none,
strip_jsonp, strip_jsonp,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
@ -90,11 +88,8 @@ def _real_extract(self, url):
format_id = '-'.join(format_id_list) format_id = '-'.join(format_id_list)
ext = determine_ext(src) ext = determine_ext(src)
if ext == 'm3u8': if ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id=format_id, fatal=False) src, video_id, 'mp4', m3u8_id=format_id, fatal=False))
if any('/geoprotection' in f['url'] for f in m3u8_formats):
self.raise_geo_restricted()
formats.extend(m3u8_formats)
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
src, video_id, f4m_id=format_id, fatal=False)) src, video_id, f4m_id=format_id, fatal=False))
@ -162,53 +157,48 @@ def _real_extract(self, url):
class ORFRadioIE(InfoExtractor): class ORFRadioIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
station = mobj.group('station')
show_date = mobj.group('date') show_date = mobj.group('date')
show_id = mobj.group('show') show_id = mobj.group('show')
data = self._download_json( if station == 'fm4':
'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s' show_id = '4%s' % show_id
% (self._API_STATION, show_id, show_date), show_id)
entries = [] data = self._download_json(
for info in data['streams']: 'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s' % (station, show_id, show_date),
loop_stream_id = str_or_none(info.get('loopStreamId')) show_id
if not loop_stream_id: )
continue
title = str_or_none(data.get('title')) def extract_entry_dict(info, title, subtitle):
if not title: return {
continue 'id': info['loopStreamId'].replace('.mp3', ''),
start = int_or_none(info.get('start'), scale=1000) 'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, info['loopStreamId']),
end = int_or_none(info.get('end'), scale=1000)
duration = end - start if end and start else None
entries.append({
'id': loop_stream_id.replace('.mp3', ''),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
'title': title, 'title': title,
'description': clean_html(data.get('subtitle')), 'description': subtitle,
'duration': duration, 'duration': (info['end'] - info['start']) / 1000,
'timestamp': start, 'timestamp': info['start'] / 1000,
'ext': 'mp3', 'ext': 'mp3',
'series': data.get('programTitle'), 'series': data.get('programTitle')
}) }
entries = [extract_entry_dict(t, data['title'], data['subtitle']) for t in data['streams']]
return { return {
'_type': 'playlist', '_type': 'playlist',
'id': show_id, 'id': show_id,
'title': data.get('title'), 'title': data['title'],
'description': clean_html(data.get('subtitle')), 'description': data['subtitle'],
'entries': entries, 'entries': entries
} }
class ORFFM4IE(ORFRadioIE): class ORFFM4IE(ORFRadioIE):
IE_NAME = 'orf:fm4' IE_NAME = 'orf:fm4'
IE_DESC = 'radio FM4' IE_DESC = 'radio FM4'
_VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)' _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'fm4'
_LOOP_STATION = 'fm4'
_TEST = { _TEST = {
'url': 'http://fm4.orf.at/player/20170107/4CC', 'url': 'http://fm4.orf.at/player/20170107/CC',
'md5': '2b0be47375432a7ef104453432a19212', 'md5': '2b0be47375432a7ef104453432a19212',
'info_dict': { 'info_dict': {
'id': '2017-01-07_2100_tl_54_7DaysSat18_31295', 'id': '2017-01-07_2100_tl_54_7DaysSat18_31295',
@ -219,138 +209,7 @@ class ORFFM4IE(ORFRadioIE):
'timestamp': 1483819257, 'timestamp': 1483819257,
'upload_date': '20170107', 'upload_date': '20170107',
}, },
'skip': 'Shows from ORF radios are only available for 7 days.', 'skip': 'Shows from ORF radios are only available for 7 days.'
'only_matching': True,
}
class ORFNOEIE(ORFRadioIE):
IE_NAME = 'orf:noe'
IE_DESC = 'Radio Niederösterreich'
_VALID_URL = r'https?://(?P<station>noe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'noe'
_LOOP_STATION = 'oe2n'
_TEST = {
'url': 'https://noe.orf.at/player/20200423/NGM',
'only_matching': True,
}
class ORFWIEIE(ORFRadioIE):
IE_NAME = 'orf:wien'
IE_DESC = 'Radio Wien'
_VALID_URL = r'https?://(?P<station>wien)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'wie'
_LOOP_STATION = 'oe2w'
_TEST = {
'url': 'https://wien.orf.at/player/20200423/WGUM',
'only_matching': True,
}
class ORFBGLIE(ORFRadioIE):
IE_NAME = 'orf:burgenland'
IE_DESC = 'Radio Burgenland'
_VALID_URL = r'https?://(?P<station>burgenland)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'bgl'
_LOOP_STATION = 'oe2b'
_TEST = {
'url': 'https://burgenland.orf.at/player/20200423/BGM',
'only_matching': True,
}
class ORFOOEIE(ORFRadioIE):
IE_NAME = 'orf:oberoesterreich'
IE_DESC = 'Radio Oberösterreich'
_VALID_URL = r'https?://(?P<station>ooe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'ooe'
_LOOP_STATION = 'oe2o'
_TEST = {
'url': 'https://ooe.orf.at/player/20200423/OGMO',
'only_matching': True,
}
class ORFSTMIE(ORFRadioIE):
IE_NAME = 'orf:steiermark'
IE_DESC = 'Radio Steiermark'
_VALID_URL = r'https?://(?P<station>steiermark)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'stm'
_LOOP_STATION = 'oe2st'
_TEST = {
'url': 'https://steiermark.orf.at/player/20200423/STGMS',
'only_matching': True,
}
class ORFKTNIE(ORFRadioIE):
IE_NAME = 'orf:kaernten'
IE_DESC = 'Radio Kärnten'
_VALID_URL = r'https?://(?P<station>kaernten)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'ktn'
_LOOP_STATION = 'oe2k'
_TEST = {
'url': 'https://kaernten.orf.at/player/20200423/KGUMO',
'only_matching': True,
}
class ORFSBGIE(ORFRadioIE):
IE_NAME = 'orf:salzburg'
IE_DESC = 'Radio Salzburg'
_VALID_URL = r'https?://(?P<station>salzburg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'sbg'
_LOOP_STATION = 'oe2s'
_TEST = {
'url': 'https://salzburg.orf.at/player/20200423/SGUM',
'only_matching': True,
}
class ORFTIRIE(ORFRadioIE):
IE_NAME = 'orf:tirol'
IE_DESC = 'Radio Tirol'
_VALID_URL = r'https?://(?P<station>tirol)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'tir'
_LOOP_STATION = 'oe2t'
_TEST = {
'url': 'https://tirol.orf.at/player/20200423/TGUMO',
'only_matching': True,
}
class ORFVBGIE(ORFRadioIE):
IE_NAME = 'orf:vorarlberg'
IE_DESC = 'Radio Vorarlberg'
_VALID_URL = r'https?://(?P<station>vorarlberg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'vbg'
_LOOP_STATION = 'oe2v'
_TEST = {
'url': 'https://vorarlberg.orf.at/player/20200423/VGUM',
'only_matching': True,
}
class ORFOE3IE(ORFRadioIE):
IE_NAME = 'orf:oe3'
IE_DESC = 'Radio Österreich 3'
_VALID_URL = r'https?://(?P<station>oe3)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'oe3'
_LOOP_STATION = 'oe3'
_TEST = {
'url': 'https://oe3.orf.at/player/20200424/3WEK',
'only_matching': True,
} }
@ -358,8 +217,6 @@ class ORFOE1IE(ORFRadioIE):
IE_NAME = 'orf:oe1' IE_NAME = 'orf:oe1'
IE_DESC = 'Radio Österreich 1' IE_DESC = 'Radio Österreich 1'
_VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)' _VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'oe1'
_LOOP_STATION = 'oe1'
_TEST = { _TEST = {
'url': 'http://oe1.orf.at/player/20170108/456544', 'url': 'http://oe1.orf.at/player/20170108/456544',

View file

@ -0,0 +1,99 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
qualities,
)
class PandaTVIE(InfoExtractor):
IE_DESC = '熊猫TV'
_VALID_URL = r'https?://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.panda.tv/66666',
'info_dict': {
'id': '66666',
'title': 're:.+',
'uploader': '刘杀鸡',
'ext': 'flv',
'is_live': True,
},
'params': {
'skip_download': True,
},
'skip': 'Live stream is offline',
}, {
'url': 'https://www.panda.tv/66666',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
config = self._download_json(
'https://www.panda.tv/api_room_v2?roomid=%s' % video_id, video_id)
error_code = config.get('errno', 0)
if error_code != 0:
raise ExtractorError(
'%s returned error %s: %s'
% (self.IE_NAME, error_code, config['errmsg']),
expected=True)
data = config['data']
video_info = data['videoinfo']
# 2 = live, 3 = offline
if video_info.get('status') != '2':
raise ExtractorError(
'Live stream is offline', expected=True)
title = data['roominfo']['name']
uploader = data.get('hostinfo', {}).get('name')
room_key = video_info['room_key']
stream_addr = video_info.get(
'stream_addr', {'OD': '1', 'HD': '1', 'SD': '1'})
# Reverse engineered from web player swf
# (http://s6.pdim.gs/static/07153e425f581151.swf at the moment of
# writing).
plflag0, plflag1 = video_info['plflag'].split('_')
plflag0 = int(plflag0) - 1
if plflag1 == '21':
plflag0 = 10
plflag1 = '4'
live_panda = 'live_panda' if plflag0 < 1 else ''
plflag_auth = self._parse_json(video_info['plflag_list'], video_id)
sign = plflag_auth['auth']['sign']
ts = plflag_auth['auth']['time']
rid = plflag_auth['auth']['rid']
quality_key = qualities(['OD', 'HD', 'SD'])
suffix = ['_small', '_mid', '']
formats = []
for k, v in stream_addr.items():
if v != '1':
continue
quality = quality_key(k)
if quality <= 0:
continue
for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
formats.append({
'url': 'https://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s?sign=%s&ts=%s&rid=%s'
% (pl, plflag1, room_key, live_panda, suffix[quality], ext, sign, ts, rid),
'format_id': '%s-%s' % (k, ext),
'quality': quality,
'source_preference': pref,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title(title),
'uploader': uploader,
'formats': formats,
'is_live': True,
}

View file

@ -8,7 +8,6 @@
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_resolution, parse_resolution,
str_or_none,
try_get, try_get,
unified_timestamp, unified_timestamp,
url_or_none, url_or_none,
@ -416,7 +415,6 @@ class PeerTubeIE(InfoExtractor):
peertube\.cpy\.re peertube\.cpy\.re
)''' )'''
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}' _UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
_API_BASE = 'https://%s/api/v1/videos/%s/%s'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?: (?:
peertube:(?P<host>[^:]+):| peertube:(?P<host>[^:]+):|
@ -425,30 +423,26 @@ class PeerTubeIE(InfoExtractor):
(?P<id>%s) (?P<id>%s)
''' % (_INSTANCES_RE, _UUID_RE) ''' % (_INSTANCES_RE, _UUID_RE)
_TESTS = [{ _TESTS = [{
'url': 'https://framatube.org/videos/watch/9c9de5e8-0a1e-484a-b099-e80766180a6d', 'url': 'https://peertube.cpy.re/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c',
'md5': '9bed8c0137913e17b86334e5885aacff', 'md5': '80f24ff364cc9d333529506a263e7feb',
'info_dict': { 'info_dict': {
'id': '9c9de5e8-0a1e-484a-b099-e80766180a6d', 'id': '2790feb0-8120-4e63-9af3-c943c69f5e6c',
'ext': 'mp4', 'ext': 'mp4',
'title': 'What is PeerTube?', 'title': 'wow',
'description': 'md5:3fefb8dde2b189186ce0719fda6f7b10', 'description': 'wow such video, so gif',
'thumbnail': r're:https?://.*\.(?:jpg|png)', 'thumbnail': r're:https?://.*\.(?:jpg|png)',
'timestamp': 1538391166, 'timestamp': 1519297480,
'upload_date': '20181001', 'upload_date': '20180222',
'uploader': 'Framasoft', 'uploader': 'Luclu7',
'uploader_id': '3', 'uploader_id': '7fc42640-efdb-4505-a45d-a15b1a5496f1',
'uploader_url': 'https://framatube.org/accounts/framasoft', 'uploder_url': 'https://peertube.nsa.ovh/accounts/luclu7',
'channel': 'Les vidéos de Framasoft', 'license': 'Unknown',
'channel_id': '2', 'duration': 3,
'channel_url': 'https://framatube.org/video-channels/bf54d359-cfad-4935-9d45-9d6be93f63e8',
'language': 'en',
'license': 'Attribution - Share Alike',
'duration': 113,
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'dislike_count': int, 'dislike_count': int,
'tags': ['framasoft', 'peertube'], 'tags': list,
'categories': ['Science & Technology'], 'categories': list,
} }
}, { }, {
'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44', 'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44',
@ -490,38 +484,13 @@ def _extract_urls(webpage, source_url):
entries = [peertube_url] entries = [peertube_url]
return entries return entries
def _call_api(self, host, video_id, path, note=None, errnote=None, fatal=True):
return self._download_json(
self._API_BASE % (host, video_id, path), video_id,
note=note, errnote=errnote, fatal=fatal)
def _get_subtitles(self, host, video_id):
captions = self._call_api(
host, video_id, 'captions', note='Downloading captions JSON',
fatal=False)
if not isinstance(captions, dict):
return
data = captions.get('data')
if not isinstance(data, list):
return
subtitles = {}
for e in data:
language_id = try_get(e, lambda x: x['language']['id'], compat_str)
caption_url = urljoin('https://%s' % host, e.get('captionPath'))
if not caption_url:
continue
subtitles.setdefault(language_id or 'en', []).append({
'url': caption_url,
})
return subtitles
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
host = mobj.group('host') or mobj.group('host_2') host = mobj.group('host') or mobj.group('host_2')
video_id = mobj.group('id') video_id = mobj.group('id')
video = self._call_api( video = self._download_json(
host, video_id, '', note='Downloading video JSON') 'https://%s/api/v1/videos/%s' % (host, video_id), video_id)
title = video['name'] title = video['name']
@ -544,28 +513,10 @@ def _real_extract(self, url):
formats.append(f) formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
full_description = self._call_api( def account_data(field):
host, video_id, 'description', note='Downloading description JSON', return try_get(video, lambda x: x['account'][field], compat_str)
fatal=False)
description = None category = try_get(video, lambda x: x['category']['label'], compat_str)
if isinstance(full_description, dict):
description = str_or_none(full_description.get('description'))
if not description:
description = video.get('description')
subtitles = self.extract_subtitles(host, video_id)
def data(section, field, type_):
return try_get(video, lambda x: x[section][field], type_)
def account_data(field, type_):
return data('account', field, type_)
def channel_data(field, type_):
return data('channel', field, type_)
category = data('category', 'label', compat_str)
categories = [category] if category else None categories = [category] if category else None
nsfw = video.get('nsfw') nsfw = video.get('nsfw')
@ -577,17 +528,14 @@ def channel_data(field, type_):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': video.get('description'),
'thumbnail': urljoin(url, video.get('thumbnailPath')), 'thumbnail': urljoin(url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')), 'timestamp': unified_timestamp(video.get('publishedAt')),
'uploader': account_data('displayName', compat_str), 'uploader': account_data('displayName'),
'uploader_id': str_or_none(account_data('id', int)), 'uploader_id': account_data('uuid'),
'uploader_url': url_or_none(account_data('url', compat_str)), 'uploder_url': account_data('url'),
'channel': channel_data('displayName', compat_str), 'license': try_get(
'channel_id': str_or_none(channel_data('id', int)), video, lambda x: x['licence']['label'], compat_str),
'channel_url': url_or_none(channel_data('url', compat_str)),
'language': data('language', 'id', compat_str),
'license': data('licence', 'label', compat_str),
'duration': int_or_none(video.get('duration')), 'duration': int_or_none(video.get('duration')),
'view_count': int_or_none(video.get('views')), 'view_count': int_or_none(video.get('views')),
'like_count': int_or_none(video.get('likes')), 'like_count': int_or_none(video.get('likes')),
@ -596,5 +544,4 @@ def channel_data(field, type_):
'tags': try_get(video, lambda x: x['tags'], list), 'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories, 'categories': categories,
'formats': formats, 'formats': formats,
'subtitles': subtitles
} }

View file

@ -18,7 +18,7 @@ def _call_api(self, method, query, item_id):
item_id, query=query) item_id, query=query)
def _parse_broadcast_data(self, broadcast, video_id): def _parse_broadcast_data(self, broadcast, video_id):
title = broadcast.get('status') or 'Periscope Broadcast' title = broadcast['status']
uploader = broadcast.get('user_display_name') or broadcast.get('username') uploader = broadcast.get('user_display_name') or broadcast.get('username')
title = '%s - %s' % (uploader, title) if uploader else title title = '%s - %s' % (uploader, title) if uploader else title
is_live = broadcast.get('state').lower() == 'running' is_live = broadcast.get('state').lower() == 'running'

View file

@ -46,7 +46,7 @@ def _login(self):
headers={'Referer': self._LOGIN_URL}) headers={'Referer': self._LOGIN_URL})
# login succeeded # login succeeded
if 'platzi.com/login' not in urlh.geturl(): if 'platzi.com/login' not in compat_str(urlh.geturl()):
return return
login_error = self._webpage_read_content( login_error = self._webpage_read_content(

View file

@ -20,16 +20,20 @@ class PokemonIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Ol Raise and Switch!', 'title': 'The Ol Raise and Switch!',
'description': 'md5:7db77f7107f98ba88401d3adc80ff7af', 'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
'timestamp': 1511824728,
'upload_date': '20171127',
}, },
'add_id': ['LimelightMedia'], 'add_id': ['LimelightMedia'],
}, { }, {
# no data-video-title # no data-video-title
'url': 'https://www.pokemon.com/fr/episodes-pokemon/films-pokemon/pokemon-lascension-de-darkrai-2008', 'url': 'https://www.pokemon.com/us/pokemon-episodes/pokemon-movies/pokemon-the-rise-of-darkrai-2008',
'info_dict': { 'info_dict': {
'id': 'dfbaf830d7e54e179837c50c0c6cc0e1', 'id': '99f3bae270bf4e5097274817239ce9c8',
'ext': 'mp4', 'ext': 'mp4',
'title': "Pokémon : L'ascension de Darkrai", 'title': 'Pokémon: The Rise of Darkrai',
'description': 'md5:d1dbc9e206070c3e14a06ff557659fb5', 'description': 'md5:ea8fbbf942e1e497d54b19025dd57d9d',
'timestamp': 1417778347,
'upload_date': '20141205',
}, },
'add_id': ['LimelightMedia'], 'add_id': ['LimelightMedia'],
'params': { 'params': {

View file

@ -1,99 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_chr,
)
from ..utils import int_or_none
class PopcorntimesIE(InfoExtractor):
_VALID_URL = r'https?://popcorntimes\.tv/[^/]+/m/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
_TEST = {
'url': 'https://popcorntimes.tv/de/m/A1XCFvz/haensel-und-gretel-opera-fantasy',
'md5': '93f210991ad94ba8c3485950a2453257',
'info_dict': {
'id': 'A1XCFvz',
'display_id': 'haensel-und-gretel-opera-fantasy',
'ext': 'mp4',
'title': 'Hänsel und Gretel',
'description': 'md5:1b8146791726342e7b22ce8125cf6945',
'thumbnail': r're:^https?://.*\.jpg$',
'creator': 'John Paul',
'release_date': '19541009',
'duration': 4260,
'tbr': 5380,
'width': 720,
'height': 540,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id)
title = self._search_regex(
r'<h1>([^<]+)', webpage, 'title',
default=None) or self._html_search_meta(
'ya:ovs:original_name', webpage, 'title', fatal=True)
loc = self._search_regex(
r'PCTMLOC\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage, 'loc',
group='value')
loc_b64 = ''
for c in loc:
c_ord = ord(c)
if ord('a') <= c_ord <= ord('z') or ord('A') <= c_ord <= ord('Z'):
upper = ord('Z') if c_ord <= ord('Z') else ord('z')
c_ord += 13
if upper < c_ord:
c_ord -= 26
loc_b64 += compat_chr(c_ord)
video_url = compat_b64decode(loc_b64).decode('utf-8')
description = self._html_search_regex(
r'(?s)<div[^>]+class=["\']pt-movie-desc[^>]+>(.+?)</div>', webpage,
'description', fatal=False)
thumbnail = self._search_regex(
r'<img[^>]+class=["\']video-preview[^>]+\bsrc=(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'thumbnail', default=None,
group='value') or self._og_search_thumbnail(webpage)
creator = self._html_search_meta(
'video:director', webpage, 'creator', default=None)
release_date = self._html_search_meta(
'video:release_date', webpage, default=None)
if release_date:
release_date = release_date.replace('-', '')
def int_meta(name):
return int_or_none(self._html_search_meta(
name, webpage, default=None))
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
'creator': creator,
'release_date': release_date,
'duration': int_meta('video:duration'),
'tbr': int_meta('ya:ovs:bitrate'),
'width': int_meta('og:video:width'),
'height': int_meta('og:video:height'),
'http_headers': {
'Referer': url,
},
}

View file

@ -8,7 +8,6 @@
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json, js_to_json,
merge_dicts,
urljoin, urljoin,
) )
@ -28,22 +27,23 @@ class PornHdIE(InfoExtractor):
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'age_limit': 18, 'age_limit': 18,
}, }
'skip': 'HTTP Error 404: Not Found',
}, { }, {
# removed video
'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video', 'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'md5': '1b7b3a40b9d65a8e5b25f7ab9ee6d6de', 'md5': '956b8ca569f7f4d8ec563e2c41598441',
'info_dict': { 'info_dict': {
'id': '1962', 'id': '1962',
'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video', 'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:98c6f8b2d9c229d0f0fde47f61a1a759', 'title': 'Sierra loves doing laundry',
'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294', 'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'age_limit': 18, 'age_limit': 18,
}, },
'skip': 'Not available anymore',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -61,13 +61,7 @@ def _real_extract(self, url):
r"(?s)sources'?\s*[:=]\s*(\{.+?\})", r"(?s)sources'?\s*[:=]\s*(\{.+?\})",
webpage, 'sources', default='{}')), video_id) webpage, 'sources', default='{}')), video_id)
info = {}
if not sources: if not sources:
entries = self._parse_html5_media_entries(url, webpage, video_id)
if entries:
info = entries[0]
if not sources and not info:
message = self._html_search_regex( message = self._html_search_regex(
r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1', r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
webpage, 'error message', group='value') webpage, 'error message', group='value')
@ -86,29 +80,23 @@ def _real_extract(self, url):
'format_id': format_id, 'format_id': format_id,
'height': height, 'height': height,
}) })
if formats: self._sort_formats(formats)
info['formats'] = formats
self._sort_formats(info['formats'])
description = self._html_search_regex( description = self._html_search_regex(
(r'(?s)<section[^>]+class=["\']video-description[^>]+>(?P<value>.+?)</section>', r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1',
r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1'), webpage, 'description', fatal=False, group='value')
webpage, 'description', fatal=False,
group='value') or self._html_search_meta(
'description', webpage, default=None) or self._og_search_description(webpage)
view_count = int_or_none(self._html_search_regex( view_count = int_or_none(self._html_search_regex(
r'(\d+) views\s*<', webpage, 'view count', fatal=False)) r'(\d+) views\s*<', webpage, 'view count', fatal=False))
thumbnail = self._search_regex( thumbnail = self._search_regex(
r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage, r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage,
'thumbnail', default=None, group='url') 'thumbnail', fatal=False, group='url')
like_count = int_or_none(self._search_regex( like_count = int_or_none(self._search_regex(
(r'(\d+)</span>\s*likes', (r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes',
r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes',
r'class=["\']save-count["\'][^>]*>\s*(\d+)'), r'class=["\']save-count["\'][^>]*>\s*(\d+)'),
webpage, 'like count', fatal=False)) webpage, 'like count', fatal=False))
return merge_dicts(info, { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
@ -118,4 +106,4 @@ def _real_extract(self, url):
'like_count': like_count, 'like_count': like_count,
'formats': formats, 'formats': formats,
'age_limit': 18, 'age_limit': 18,
}) }

View file

@ -17,7 +17,6 @@
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
NO_DEFAULT,
orderedSet, orderedSet,
remove_quotes, remove_quotes,
str_to_int, str_to_int,
@ -52,7 +51,7 @@ class PornHubIE(PornHubBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)| (?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/ (?:www\.)?thumbzilla\.com/video/
) )
(?P<id>[\da-z]+) (?P<id>[\da-z]+)
@ -149,9 +148,6 @@ class PornHubIE(PornHubBaseIE):
}, { }, {
'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933', 'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5e4acdae54a82',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -169,13 +165,6 @@ def _real_extract(self, url):
host = mobj.group('host') or 'pornhub.com' host = mobj.group('host') or 'pornhub.com'
video_id = mobj.group('id') video_id = mobj.group('id')
if 'premium' in host:
if not self._downloader.params.get('cookiefile'):
raise ExtractorError(
'PornHub Premium requires authentication.'
' You may want to use --cookies.',
expected=True)
self._set_cookie(host, 'age_verified', '1') self._set_cookie(host, 'age_verified', '1')
def dl_webpage(platform): def dl_webpage(platform):
@ -199,10 +188,10 @@ def dl_webpage(platform):
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore. # on that anymore.
title = self._html_search_meta( title = self._html_search_meta(
'twitter:title', webpage, default=None) or self._html_search_regex( 'twitter:title', webpage, default=None) or self._search_regex(
(r'(?s)<h1[^>]+class=["\']title["\'][^>]*>(?P<title>.+?)</h1>', (r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>(?:(?!\1).)+)\1', r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
r'shareTitle["\']\s*[=:]\s*(["\'])(?P<title>(?:(?!\1).)+)\1'), r'shareTitle\s*=\s*(["\'])(?P<title>.+?)\1'),
webpage, 'title', group='title') webpage, 'title', group='title')
video_urls = [] video_urls = []
@ -238,13 +227,12 @@ def dl_webpage(platform):
else: else:
thumbnail, duration = [None] * 2 thumbnail, duration = [None] * 2
def extract_js_vars(webpage, pattern, default=NO_DEFAULT): if not video_urls:
assignments = self._search_regex( tv_webpage = dl_webpage('tv')
pattern, webpage, 'encoded url', default=default)
if not assignments:
return {}
assignments = assignments.split(';') assignments = self._search_regex(
r'(var.+?mediastring.+?)</script>', tv_webpage,
'encoded url').split(';')
js_vars = {} js_vars = {}
@ -266,35 +254,11 @@ def parse_js_value(inp):
assn = re.sub(r'var\s+', '', assn) assn = re.sub(r'var\s+', '', assn)
vname, value = assn.split('=', 1) vname, value = assn.split('=', 1)
js_vars[vname] = parse_js_value(value) js_vars[vname] = parse_js_value(value)
return js_vars
def add_video_url(video_url): video_url = js_vars['mediastring']
v_url = url_or_none(video_url) if video_url not in video_urls_set:
if not v_url: video_urls.append((video_url, None))
return video_urls_set.add(video_url)
if v_url in video_urls_set:
return
video_urls.append((v_url, None))
video_urls_set.add(v_url)
if not video_urls:
FORMAT_PREFIXES = ('media', 'quality')
js_vars = extract_js_vars(
webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES),
default=None)
if js_vars:
for key, format_url in js_vars.items():
if any(key.startswith(p) for p in FORMAT_PREFIXES):
add_video_url(format_url)
if not video_urls and re.search(
r'<[^>]+\bid=["\']lockedPlayer', webpage):
raise ExtractorError(
'Video %s is locked' % video_id, expected=True)
if not video_urls:
js_vars = extract_js_vars(
dl_webpage('tv'), r'(var.+?mediastring.+?)</script>')
add_video_url(js_vars['mediastring'])
for mobj in re.finditer( for mobj in re.finditer(
r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1', r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1',
@ -312,16 +276,10 @@ def add_video_url(video_url):
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None) r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
if upload_date: if upload_date:
upload_date = upload_date.replace('/', '') upload_date = upload_date.replace('/', '')
ext = determine_ext(video_url) if determine_ext(video_url) == 'mpd':
if ext == 'mpd':
formats.extend(self._extract_mpd_formats( formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False)) video_url, video_id, mpd_id='dash', fatal=False))
continue continue
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
tbr = None tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url) mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
if mobj: if mobj:
@ -415,7 +373,7 @@ def _real_extract(self, url):
class PornHubUserIE(PornHubPlaylistBaseIE): class PornHubUserIE(PornHubPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)' _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?pornhub\.(?:com|net)/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph', 'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118, 'playlist_mincount': 118,
@ -483,7 +441,7 @@ def _real_extract(self, url):
class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE): class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)' _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph/videos', 'url': 'https://www.pornhub.com/model/zoe_ph/videos',
'only_matching': True, 'only_matching': True,
@ -598,7 +556,7 @@ def suitable(cls, url):
class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE): class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)' _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload', 'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
'info_dict': { 'info_dict': {

View file

@ -11,13 +11,12 @@
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
merge_dicts,
unified_strdate, unified_strdate,
) )
class ProSiebenSat1BaseIE(InfoExtractor): class ProSiebenSat1BaseIE(InfoExtractor):
_GEO_BYPASS = False _GEO_COUNTRIES = ['DE']
_ACCESS_ID = None _ACCESS_ID = None
_SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear' _SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
_V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get' _V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
@ -40,18 +39,14 @@ def _extract_video_info(self, url, clip_id):
formats = [] formats = []
if self._ACCESS_ID: if self._ACCESS_ID:
raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
protocols = self._download_json( server_token = (self._download_json(
self._V4_BASE_URL + 'protocols', clip_id, self._V4_BASE_URL + 'protocols', clip_id,
'Downloading protocols JSON', 'Downloading protocols JSON',
headers=self.geo_verification_headers(), query={ headers=self.geo_verification_headers(), query={
'access_id': self._ACCESS_ID, 'access_id': self._ACCESS_ID,
'client_token': sha1((raw_ct).encode()).hexdigest(), 'client_token': sha1((raw_ct).encode()).hexdigest(),
'video_id': clip_id, 'video_id': clip_id,
}, fatal=False, expected_status=(403,)) or {} }, fatal=False) or {}).get('server_token')
error = protocols.get('error') or {}
if error.get('title') == 'Geo check failed':
self.raise_geo_restricted(countries=['AT', 'CH', 'DE'])
server_token = protocols.get('server_token')
if server_token: if server_token:
urls = (self._download_json( urls = (self._download_json(
self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={ self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
@ -176,7 +171,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
(?: (?:
(?:beta\.)? (?:beta\.)?
(?: (?:
prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|advopedia prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
)\.(?:de|at|ch)| )\.(?:de|at|ch)|
ran\.de|fem\.com|advopedia\.de|galileo\.tv/video ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
) )
@ -194,14 +189,10 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'info_dict': { 'info_dict': {
'id': '2104602', 'id': '2104602',
'ext': 'mp4', 'ext': 'mp4',
'title': 'CIRCUS HALLIGALLI - Episode 18 - Staffel 2', 'title': 'Episode 18 - Staffel 2',
'description': 'md5:8733c81b702ea472e069bc48bb658fc1', 'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
'upload_date': '20131231', 'upload_date': '20131231',
'duration': 5845.04, 'duration': 5845.04,
'series': 'CIRCUS HALLIGALLI',
'season_number': 2,
'episode': 'Episode 18 - Staffel 2',
'episode_number': 18,
}, },
}, },
{ {
@ -305,9 +296,8 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'info_dict': { 'info_dict': {
'id': '2572814', 'id': '2572814',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Voice of Germany - Andreas Kümmert: Rocket Man', 'title': 'Andreas Kümmert: Rocket Man',
'description': 'md5:6ddb02b0781c6adf778afea606652e38', 'description': 'md5:6ddb02b0781c6adf778afea606652e38',
'timestamp': 1382041620,
'upload_date': '20131017', 'upload_date': '20131017',
'duration': 469.88, 'duration': 469.88,
}, },
@ -316,7 +306,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
}, },
}, },
{ {
'url': 'http://www.fem.com/videos/beauty-lifestyle/kurztrips-zum-valentinstag', 'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
'info_dict': { 'info_dict': {
'id': '2156342', 'id': '2156342',
'ext': 'mp4', 'ext': 'mp4',
@ -338,6 +328,19 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'playlist_count': 2, 'playlist_count': 2,
'skip': 'This video is unavailable', 'skip': 'This video is unavailable',
}, },
{
'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
'info_dict': {
'id': '4187506',
'ext': 'mp4',
'title': 'Best of Circus HalliGalli',
'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
'upload_date': '20151229',
},
'params': {
'skip_download': True,
},
},
{ {
# title in <h2 class="subtitle"> # title in <h2 class="subtitle">
'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip', 'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip',
@ -414,6 +417,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>', r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
] ]
_UPLOAD_DATE_REGEXES = [ _UPLOAD_DATE_REGEXES = [
r'<meta property="og:published_time" content="(.+?)">',
r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"', r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"',
r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr', r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr',
r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>', r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>',
@ -443,21 +447,17 @@ def _extract_clip(self, url, webpage):
if description is None: if description is None:
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate( upload_date = unified_strdate(self._html_search_regex(
self._html_search_meta('og:published_time', webpage, self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
'upload date', default=None)
or self._html_search_regex(self._UPLOAD_DATE_REGEXES,
webpage, 'upload date', default=None))
json_ld = self._search_json_ld(webpage, clip_id, default={}) info.update({
return merge_dicts(info, {
'id': clip_id, 'id': clip_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'upload_date': upload_date, 'upload_date': upload_date,
}, json_ld) })
return info
def _extract_playlist(self, url, webpage): def _extract_playlist(self, url, webpage):
playlist_id = self._html_search_regex( playlist_id = self._html_search_regex(

View file

@ -82,6 +82,17 @@ def _real_extract(self, url):
urls = [] urls = []
formats = [] formats = []
def add_http_from_hls(m3u8_f):
http_url = m3u8_f['url'].replace('/hls/', '/mp4/').replace('/chunklist.m3u8', '.mp4')
if http_url != m3u8_f['url']:
f = m3u8_f.copy()
f.update({
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
'url': http_url,
})
formats.append(f)
for video in videos['data']['videos']: for video in videos['data']['videos']:
media_url = url_or_none(video.get('url')) media_url = url_or_none(video.get('url'))
if not media_url or media_url in urls: if not media_url or media_url in urls:
@ -90,9 +101,12 @@ def _real_extract(self, url):
playlist = video.get('is_playlist') playlist = video.get('is_playlist')
if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url: if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url:
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', entry_protocol='m3u8_native', media_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False)
for m3u8_f in m3u8_formats:
formats.append(m3u8_f)
add_http_from_hls(m3u8_f)
continue continue
quality = int_or_none(video.get('quality')) quality = int_or_none(video.get('quality'))
@ -114,6 +128,8 @@ def _real_extract(self, url):
format_id += '-%sp' % quality format_id += '-%sp' % quality
f['format_id'] = format_id f['format_id'] = format_id
formats.append(f) formats.append(f)
if is_hls:
add_http_from_hls(f)
self._sort_formats(formats) self._sort_formats(formats)
creator = try_get( creator = try_get(

View file

@ -4,7 +4,6 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
@ -44,21 +43,14 @@ def _real_extract(self, url):
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.redtube.com/%s' % video_id, video_id) 'http://www.redtube.com/%s' % video_id, video_id)
ERRORS = ( if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']):
(('video-deleted-info', '>This video has been removed'), 'has been removed'), raise ExtractorError('Video %s has been removed' % video_id, expected=True)
(('private_video_text', '>This video is private', '>Send a friend request to its owner to be able to view it'), 'is private'),
)
for patterns, message in ERRORS:
if any(p in webpage for p in patterns):
raise ExtractorError(
'Video %s %s' % (video_id, message), expected=True)
info = self._search_json_ld(webpage, video_id, default={}) info = self._search_json_ld(webpage, video_id, default={})
if not info.get('title'): if not info.get('title'):
info['title'] = self._html_search_regex( info['title'] = self._html_search_regex(
(r'<h(\d)[^>]+class="(?:video_title_text|videoTitle|video_title)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>', (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',), r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
webpage, 'title', group='title', webpage, 'title', group='title',
default=None) or self._og_search_title(webpage) default=None) or self._og_search_title(webpage)
@ -78,7 +70,7 @@ def _real_extract(self, url):
}) })
medias = self._parse_json( medias = self._parse_json(
self._search_regex( self._search_regex(
r'mediaDefinition["\']?\s*:\s*(\[.+?}\s*\])', webpage, r'mediaDefinition\s*:\s*(\[.+?\])', webpage,
'media definitions', default='{}'), 'media definitions', default='{}'),
video_id, fatal=False) video_id, fatal=False)
if medias and isinstance(medias, list): if medias and isinstance(medias, list):
@ -86,12 +78,6 @@ def _real_extract(self, url):
format_url = url_or_none(media.get('videoUrl')) format_url = url_or_none(media.get('videoUrl'))
if not format_url: if not format_url:
continue continue
if media.get('format') == 'hls' or determine_ext(format_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
continue
format_id = media.get('quality') format_id = media.get('quality')
formats.append({ formats.append({
'url': format_url, 'url': format_url,

View file

@ -8,6 +8,7 @@
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_str,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -38,13 +39,13 @@ def _login(self):
'Downloading login page') 'Downloading login page')
def is_logged(urlh): def is_logged(urlh):
return 'learning.oreilly.com/home/' in urlh.geturl() return 'learning.oreilly.com/home/' in compat_str(urlh.geturl())
if is_logged(urlh): if is_logged(urlh):
self.LOGGED_IN = True self.LOGGED_IN = True
return return
redirect_url = urlh.geturl() redirect_url = compat_str(urlh.geturl())
parsed_url = compat_urlparse.urlparse(redirect_url) parsed_url = compat_urlparse.urlparse(redirect_url)
qs = compat_parse_qs(parsed_url.query) qs = compat_parse_qs(parsed_url.query)
next_uri = compat_urlparse.urljoin( next_uri = compat_urlparse.urljoin(
@ -164,8 +165,7 @@ def _real_extract(self, url):
kaltura_session = self._download_json( kaltura_session = self._download_json(
'%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id), '%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
video_id, 'Downloading kaltura session JSON', video_id, 'Downloading kaltura session JSON',
'Unable to download kaltura session JSON', fatal=False, 'Unable to download kaltura session JSON', fatal=False)
headers={'Accept': 'application/json'})
if kaltura_session: if kaltura_session:
session = kaltura_session.get('session') session = kaltura_session.get('session')
if session: if session:

View file

@ -7,7 +7,6 @@
from .aws import AWSIE from .aws import AWSIE
from .anvato import AnvatoIE from .anvato import AnvatoIE
from .common import InfoExtractor
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
urlencode_postdata, urlencode_postdata,
@ -103,50 +102,3 @@ def get(key):
'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id, 'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id,
{'geo_countries': ['US']}), {'geo_countries': ['US']}),
AnvatoIE.ie_key(), video_id=mcp_id) AnvatoIE.ie_key(), video_id=mcp_id)
class ScrippsNetworksIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<site>cookingchanneltv|discovery|(?:diy|food)network|hgtv|travelchannel)\.com/videos/[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.cookingchanneltv.com/videos/the-best-of-the-best-0260338',
'info_dict': {
'id': '0260338',
'ext': 'mp4',
'title': 'The Best of the Best',
'description': 'Catch a new episode of MasterChef Canada Tuedsay at 9/8c.',
'timestamp': 1475678834,
'upload_date': '20161005',
'uploader': 'SCNI-SCND',
},
'add_ie': ['ThePlatform'],
}, {
'url': 'https://www.diynetwork.com/videos/diy-barnwood-tablet-stand-0265790',
'only_matching': True,
}, {
'url': 'https://www.foodnetwork.com/videos/chocolate-strawberry-cake-roll-7524591',
'only_matching': True,
}, {
'url': 'https://www.hgtv.com/videos/cookie-decorating-101-0301929',
'only_matching': True,
}, {
'url': 'https://www.travelchannel.com/videos/two-climates-one-bag-5302184',
'only_matching': True,
}, {
'url': 'https://www.discovery.com/videos/guardians-of-the-glades-cooking-with-tom-cobb-5578368',
'only_matching': True,
}]
_ACCOUNT_MAP = {
'cookingchanneltv': 2433005105,
'discovery': 2706091867,
'diynetwork': 2433004575,
'foodnetwork': 2433005105,
'hgtv': 2433004575,
'travelchannel': 2433005739,
}
_TP_TEMPL = 'https://link.theplatform.com/s/ip77QC/media/guid/%d/%s?mbr=true'
def _real_extract(self, url):
site, guid = re.match(self._VALID_URL, url).groups()
return self.url_result(smuggle_url(
self._TP_TEMPL % (self._ACCOUNT_MAP[site], guid),
{'force_smil_url': True}), 'ThePlatform', guid)

View file

@ -7,18 +7,9 @@
class ServusIE(InfoExtractor): class ServusIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'https?://(?:www\.)?servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)/(?P<id>[aA]{2}-\w+|\d+-\d+)'
https?://
(?:www\.)?
(?:
servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)|
servustv\.com/videos
)
/(?P<id>[aA]{2}-\w+|\d+-\d+)
'''
_TESTS = [{ _TESTS = [{
# new URL schema 'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/',
'url': 'https://www.servustv.com/videos/aa-1t6vbu5pw1w12/',
'md5': '3e1dd16775aa8d5cbef23628cfffc1f4', 'md5': '3e1dd16775aa8d5cbef23628cfffc1f4',
'info_dict': { 'info_dict': {
'id': 'AA-1T6VBU5PW1W12', 'id': 'AA-1T6VBU5PW1W12',
@ -27,10 +18,6 @@ class ServusIE(InfoExtractor):
'description': 'md5:1247204d85783afe3682644398ff2ec4', 'description': 'md5:1247204d85783afe3682644398ff2ec4',
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
} }
}, {
# old URL schema
'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/',
'only_matching': True,
}, { }, {
'url': 'https://www.servus.com/at/p/Wie-das-Leben-beginnt/1309984137314-381415152/', 'url': 'https://www.servus.com/at/p/Wie-das-Leben-beginnt/1309984137314-381415152/',
'only_matching': True, 'only_matching': True,

View file

@ -9,13 +9,10 @@
SearchInfoExtractor SearchInfoExtractor
) )
from ..compat import ( from ..compat import (
compat_HTTPError,
compat_kwargs,
compat_str, compat_str,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
error_to_compat_str,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
HEADRequest, HEADRequest,
@ -27,7 +24,6 @@
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
url_or_none, url_or_none,
urlhandle_detect_ext,
) )
@ -97,7 +93,7 @@ class SoundcloudIE(InfoExtractor):
'repost_count': int, 'repost_count': int,
} }
}, },
# geo-restricted # not streamable song
{ {
'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep', 'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
'info_dict': { 'info_dict': {
@ -109,13 +105,18 @@ class SoundcloudIE(InfoExtractor):
'uploader_id': '9615865', 'uploader_id': '9615865',
'timestamp': 1337635207, 'timestamp': 1337635207,
'upload_date': '20120521', 'upload_date': '20120521',
'duration': 227.155, 'duration': 30,
'license': 'all-rights-reserved', 'license': 'all-rights-reserved',
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
}, },
'params': {
# rtmp
'skip_download': True,
},
'skip': 'Preview',
}, },
# private link # private link
{ {
@ -226,6 +227,7 @@ class SoundcloudIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
# not available via api.soundcloud.com/i1/tracks/id/streams
{ {
'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer', 'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7', 'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
@ -234,7 +236,7 @@ class SoundcloudIE(InfoExtractor):
'ext': 'mp3', 'ext': 'mp3',
'title': 'Mezzo Valzer', 'title': 'Mezzo Valzer',
'description': 'md5:4138d582f81866a530317bae316e8b61', 'description': 'md5:4138d582f81866a530317bae316e8b61',
'uploader': 'Micronie', 'uploader': 'Giovanni Sarani',
'uploader_id': '3352531', 'uploader_id': '3352531',
'timestamp': 1551394171, 'timestamp': 1551394171,
'upload_date': '20190228', 'upload_date': '20190228',
@ -246,16 +248,14 @@ class SoundcloudIE(InfoExtractor):
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
}, },
}, 'expected_warnings': ['Unable to download JSON metadata'],
{ }
# with AAC HQ format available via OAuth token
'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',
'only_matching': True,
},
] ]
_API_BASE = 'https://api.soundcloud.com/'
_API_V2_BASE = 'https://api-v2.soundcloud.com/' _API_V2_BASE = 'https://api-v2.soundcloud.com/'
_BASE_URL = 'https://soundcloud.com/' _BASE_URL = 'https://soundcloud.com/'
_CLIENT_ID = 'UW9ajvMgVdMMW3cdeBi8lPfN6dvOVGji'
_IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg' _IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
_ARTWORK_MAP = { _ARTWORK_MAP = {
@ -271,53 +271,14 @@ class SoundcloudIE(InfoExtractor):
'original': 0, 'original': 0,
} }
def _store_client_id(self, client_id):
self._downloader.cache.store('soundcloud', 'client_id', client_id)
def _update_client_id(self):
webpage = self._download_webpage('https://soundcloud.com/', None)
for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)):
script = self._download_webpage(src, None, fatal=False)
if script:
client_id = self._search_regex(
r'client_id\s*:\s*"([0-9a-zA-Z]{32})"',
script, 'client id', default=None)
if client_id:
self._CLIENT_ID = client_id
self._store_client_id(client_id)
return
raise ExtractorError('Unable to extract client id')
def _download_json(self, *args, **kwargs):
non_fatal = kwargs.get('fatal') is False
if non_fatal:
del kwargs['fatal']
query = kwargs.get('query', {}).copy()
for _ in range(2):
query['client_id'] = self._CLIENT_ID
kwargs['query'] = query
try:
return super(SoundcloudIE, self)._download_json(*args, **compat_kwargs(kwargs))
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
self._store_client_id(None)
self._update_client_id()
continue
elif non_fatal:
self._downloader.report_warning(error_to_compat_str(e))
return False
raise
def _real_initialize(self):
self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or 'YUKXoArFcqrlQn9tfNHvvyfnDISj04zk'
@classmethod @classmethod
def _resolv_url(cls, url): def _resolv_url(cls, url):
return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url + '&client_id=' + cls._CLIENT_ID
def _extract_info_dict(self, info, full_title=None, secret_token=None): def _extract_info_dict(self, info, full_title=None, secret_token=None, version=2):
track_id = compat_str(info['id']) track_id = compat_str(info['id'])
title = info['title'] title = info['title']
track_base_url = self._API_BASE + 'tracks/%s' % track_id
format_urls = set() format_urls = set()
formats = [] formats = []
@ -326,27 +287,26 @@ def _extract_info_dict(self, info, full_title=None, secret_token=None):
query['secret_token'] = secret_token query['secret_token'] = secret_token
if info.get('downloadable') and info.get('has_downloads_left'): if info.get('downloadable') and info.get('has_downloads_left'):
download_url = update_url_query( format_url = update_url_query(
self._API_V2_BASE + 'tracks/' + track_id + '/download', query) info.get('download_url') or track_base_url + '/download', query)
redirect_url = (self._download_json(download_url, track_id, fatal=False) or {}).get('redirectUri')
if redirect_url:
urlh = self._request_webpage(
HEADRequest(redirect_url), track_id, fatal=False)
if urlh:
format_url = urlh.geturl()
format_urls.add(format_url) format_urls.add(format_url)
if version == 2:
v1_info = self._download_json(
track_base_url, track_id, query=query, fatal=False) or {}
else:
v1_info = info
formats.append({ formats.append({
'format_id': 'download', 'format_id': 'download',
'ext': urlhandle_detect_ext(urlh) or 'mp3', 'ext': v1_info.get('original_format') or 'mp3',
'filesize': int_or_none(urlh.headers.get('Content-Length')), 'filesize': int_or_none(v1_info.get('original_content_size')),
'url': format_url, 'url': format_url,
'preference': 10, 'preference': 10,
}) })
def invalid_url(url): def invalid_url(url):
return not url or url in format_urls return not url or url in format_urls or re.search(r'/(?:preview|playlist)/0/30/', url)
def add_format(f, protocol, is_preview=False): def add_format(f, protocol):
mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url) mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
if mobj: if mobj:
for k, v in mobj.groupdict().items(): for k, v in mobj.groupdict().items():
@ -355,27 +315,16 @@ def add_format(f, protocol, is_preview=False):
format_id_list = [] format_id_list = []
if protocol: if protocol:
format_id_list.append(protocol) format_id_list.append(protocol)
ext = f.get('ext')
if ext == 'aac':
f['abr'] = '256'
for k in ('ext', 'abr'): for k in ('ext', 'abr'):
v = f.get(k) v = f.get(k)
if v: if v:
format_id_list.append(v) format_id_list.append(v)
preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
if preview:
format_id_list.append('preview')
abr = f.get('abr') abr = f.get('abr')
if abr: if abr:
f['abr'] = int(abr) f['abr'] = int(abr)
if protocol == 'hls':
protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
else:
protocol = 'http'
f.update({ f.update({
'format_id': '_'.join(format_id_list), 'format_id': '_'.join(format_id_list),
'protocol': protocol, 'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
'preference': -10 if preview else None,
}) })
formats.append(f) formats.append(f)
@ -386,7 +335,7 @@ def add_format(f, protocol, is_preview=False):
if not isinstance(t, dict): if not isinstance(t, dict):
continue continue
format_url = url_or_none(t.get('url')) format_url = url_or_none(t.get('url'))
if not format_url: if not format_url or t.get('snipped') or '/preview/' in format_url:
continue continue
stream = self._download_json( stream = self._download_json(
format_url, track_id, query=query, fatal=False) format_url, track_id, query=query, fatal=False)
@ -409,14 +358,44 @@ def add_format(f, protocol, is_preview=False):
add_format({ add_format({
'url': stream_url, 'url': stream_url,
'ext': ext, 'ext': ext,
}, 'http' if protocol == 'progressive' else protocol, }, 'http' if protocol == 'progressive' else protocol)
t.get('snipped') or '/preview/' in format_url)
if not formats:
# Old API, does not work for some tracks (e.g.
# https://soundcloud.com/giovannisarani/mezzo-valzer)
# and might serve preview URLs (e.g.
# http://www.soundcloud.com/snbrn/ele)
format_dict = self._download_json(
track_base_url + '/streams', track_id,
'Downloading track url', query=query, fatal=False) or {}
for key, stream_url in format_dict.items():
if invalid_url(stream_url):
continue
format_urls.add(stream_url)
mobj = re.search(r'(http|hls)_([^_]+)_(\d+)_url', key)
if mobj:
protocol, ext, abr = mobj.groups()
add_format({
'abr': abr,
'ext': ext,
'url': stream_url,
}, protocol)
if not formats:
# We fallback to the stream_url in the original info, this
# cannot be always used, sometimes it can give an HTTP 404 error
urlh = self._request_webpage(
HEADRequest(info.get('stream_url') or track_base_url + '/stream'),
track_id, query=query, fatal=False)
if urlh:
stream_url = urlh.geturl()
if not invalid_url(stream_url):
add_format({'url': stream_url}, 'http')
for f in formats: for f in formats:
f['vcodec'] = 'none' f['vcodec'] = 'none'
if not formats and info.get('policy') == 'BLOCK':
self.raise_geo_restricted()
self._sort_formats(formats) self._sort_formats(formats)
user = info.get('user') or {} user = info.get('user') or {}
@ -472,7 +451,9 @@ def _real_extract(self, url):
track_id = mobj.group('track_id') track_id = mobj.group('track_id')
query = {} query = {
'client_id': self._CLIENT_ID,
}
if track_id: if track_id:
info_json_url = self._API_V2_BASE + 'tracks/' + track_id info_json_url = self._API_V2_BASE + 'tracks/' + track_id
full_title = track_id full_title = track_id
@ -486,24 +467,20 @@ def _real_extract(self, url):
resolve_title += '/%s' % token resolve_title += '/%s' % token
info_json_url = self._resolv_url(self._BASE_URL + resolve_title) info_json_url = self._resolv_url(self._BASE_URL + resolve_title)
version = 2
info = self._download_json( info = self._download_json(
info_json_url, full_title, 'Downloading info JSON', query=query) info_json_url, full_title, 'Downloading info JSON', query=query, fatal=False)
if not info:
info = self._download_json(
info_json_url.replace(self._API_V2_BASE, self._API_BASE),
full_title, 'Downloading info JSON', query=query)
version = 1
return self._extract_info_dict(info, full_title, token) return self._extract_info_dict(info, full_title, token, version)
class SoundcloudPlaylistBaseIE(SoundcloudIE): class SoundcloudPlaylistBaseIE(SoundcloudIE):
def _extract_set(self, playlist, token=None): def _extract_track_entries(self, tracks, token=None):
playlist_id = compat_str(playlist['id'])
tracks = playlist.get('tracks') or []
if not all([t.get('permalink_url') for t in tracks]) and token:
tracks = self._download_json(
self._API_V2_BASE + 'tracks', playlist_id,
'Downloading tracks', query={
'ids': ','.join([compat_str(t['id']) for t in tracks]),
'playlistId': playlist_id,
'playlistSecretToken': token,
})
entries = [] entries = []
for track in tracks: for track in tracks:
track_id = str_or_none(track.get('id')) track_id = str_or_none(track.get('id'))
@ -516,10 +493,7 @@ def _extract_set(self, playlist, token=None):
url += '?secret_token=' + token url += '?secret_token=' + token
entries.append(self.url_result( entries.append(self.url_result(
url, SoundcloudIE.ie_key(), track_id)) url, SoundcloudIE.ie_key(), track_id))
return self.playlist_result( return entries
entries, playlist_id,
playlist.get('title'),
playlist.get('description'))
class SoundcloudSetIE(SoundcloudPlaylistBaseIE): class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
@ -530,7 +504,6 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
'info_dict': { 'info_dict': {
'id': '2284613', 'id': '2284613',
'title': 'The Royal Concept EP', 'title': 'The Royal Concept EP',
'description': 'md5:71d07087c7a449e8941a70a29e34671e',
}, },
'playlist_mincount': 5, 'playlist_mincount': 5,
}, { }, {
@ -553,13 +526,17 @@ def _real_extract(self, url):
msgs = (compat_str(err['error_message']) for err in info['errors']) msgs = (compat_str(err['error_message']) for err in info['errors'])
raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs)) raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
return self._extract_set(info, token) entries = self._extract_track_entries(info['tracks'], token)
return self.playlist_result(
entries, str_or_none(info.get('id')), info.get('title'))
class SoundcloudPagedPlaylistBaseIE(SoundcloudIE): class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
def _extract_playlist(self, base_url, playlist_id, playlist_title): def _extract_playlist(self, base_url, playlist_id, playlist_title):
COMMON_QUERY = { COMMON_QUERY = {
'limit': 80000, 'limit': 2000000000,
'client_id': self._CLIENT_ID,
'linked_partitioning': '1', 'linked_partitioning': '1',
} }
@ -745,7 +722,9 @@ def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id') playlist_id = mobj.group('id')
query = {} query = {
'client_id': self._CLIENT_ID,
}
token = mobj.group('token') token = mobj.group('token')
if token: if token:
query['secret_token'] = token query['secret_token'] = token
@ -754,7 +733,10 @@ def _real_extract(self, url):
self._API_V2_BASE + 'playlists/' + playlist_id, self._API_V2_BASE + 'playlists/' + playlist_id,
playlist_id, 'Downloading playlist', query=query) playlist_id, 'Downloading playlist', query=query)
return self._extract_set(data, token) entries = self._extract_track_entries(data['tracks'], token)
return self.playlist_result(
entries, playlist_id, data.get('title'), data.get('description'))
class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE): class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
@ -779,6 +761,7 @@ def _get_collection(self, endpoint, collection_id, **query):
self._MAX_RESULTS_PER_PAGE) self._MAX_RESULTS_PER_PAGE)
query.update({ query.update({
'limit': limit, 'limit': limit,
'client_id': self._CLIENT_ID,
'linked_partitioning': 1, 'linked_partitioning': 1,
'offset': 0, 'offset': 0,
}) })

View file

@ -4,7 +4,6 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
merge_dicts, merge_dicts,
orderedSet, orderedSet,
@ -65,7 +64,7 @@ def _real_extract(self, url):
url.replace('/%s/embed' % video_id, '/%s/video' % video_id), url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
video_id, headers={'Cookie': 'country=US'}) video_id, headers={'Cookie': 'country=US'})
if re.search(r'<[^>]+\b(?:id|class)=["\']video_removed', webpage): if re.search(r'<[^>]+\bid=["\']video_removed', webpage):
raise ExtractorError( raise ExtractorError(
'Video %s is not available' % video_id, expected=True) 'Video %s is not available' % video_id, expected=True)
@ -76,15 +75,6 @@ def extract_format(format_id, format_url):
if not f_url: if not f_url:
return return
f = parse_resolution(format_id) f = parse_resolution(format_id)
ext = determine_ext(f_url)
if format_id.startswith('m3u8') or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
f_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif format_id.startswith('mpd') or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
f_url, video_id, mpd_id='dash', fatal=False))
elif ext == 'mp4' or f.get('width') or f.get('height'):
f.update({ f.update({
'url': f_url, 'url': f_url,
'format_id': format_id, 'format_id': format_id,
@ -103,22 +93,28 @@ def extract_format(format_id, format_url):
r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1', r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'stream key', group='value') webpage, 'stream key', group='value')
sb_csrf_session = self._get_cookies(
'https://spankbang.com')['sb_csrf_session'].value
stream = self._download_json( stream = self._download_json(
'https://spankbang.com/api/videos/stream', video_id, 'https://spankbang.com/api/videos/stream', video_id,
'Downloading stream JSON', data=urlencode_postdata({ 'Downloading stream JSON', data=urlencode_postdata({
'id': stream_key, 'id': stream_key,
'data': 0, 'data': 0,
'sb_csrf_session': sb_csrf_session,
}), headers={ }), headers={
'Referer': url, 'Referer': url,
'X-Requested-With': 'XMLHttpRequest', 'X-CSRFToken': sb_csrf_session,
}) })
for format_id, format_url in stream.items(): for format_id, format_url in stream.items():
if format_id.startswith(STREAM_URL_PREFIX):
if format_url and isinstance(format_url, list): if format_url and isinstance(format_url, list):
format_url = format_url[0] format_url = format_url[0]
extract_format(format_id, format_url) extract_format(
format_id[len(STREAM_URL_PREFIX):], format_url)
self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'tbr', 'format_id')) self._sort_formats(formats)
info = self._search_json_ld(webpage, video_id, default={}) info = self._search_json_ld(webpage, video_id, default={})

View file

@ -3,47 +3,34 @@
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..compat import (
float_or_none, compat_urllib_parse_unquote,
int_or_none, compat_urllib_parse_urlparse,
merge_dicts,
str_or_none,
str_to_int,
url_or_none,
) )
from ..utils import (
sanitized_Request,
str_to_int,
unified_strdate,
)
from ..aes import aes_decrypt_text
class SpankwireIE(InfoExtractor): class SpankwireIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<id>[0-9]+)/?)'
https?://
(?:www\.)?spankwire\.com/
(?:
[^/]+/video|
EmbedPlayer\.aspx/?\?.*?\bArticleId=
)
(?P<id>\d+)
'''
_TESTS = [{ _TESTS = [{
# download URL pattern: */<height>P_<tbr>K_<video_id>.mp4 # download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/', 'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
'md5': '5aa0e4feef20aad82cbcae3aed7ab7cd', 'md5': '8bbfde12b101204b39e4b9fe7eb67095',
'info_dict': { 'info_dict': {
'id': '103545', 'id': '103545',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Buckcherry`s X Rated Music Video Crazy Bitch', 'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
'description': 'Crazy Bitch X rated music video.', 'description': 'Crazy Bitch X rated music video.',
'duration': 222,
'uploader': 'oreusz', 'uploader': 'oreusz',
'uploader_id': '124697', 'uploader_id': '124697',
'timestamp': 1178587885, 'upload_date': '20070507',
'upload_date': '20070508',
'average_rating': float,
'view_count': int,
'comment_count': int,
'age_limit': 18, 'age_limit': 18,
'categories': list, }
'tags': list,
},
}, { }, {
# download URL pattern: */mp4_<format_id>_<video_id>.mp4 # download URL pattern: */mp4_<format_id>_<video_id>.mp4
'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/', 'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
@ -58,125 +45,83 @@ class SpankwireIE(InfoExtractor):
'upload_date': '20150822', 'upload_date': '20150822',
'age_limit': 18, 'age_limit': 18,
}, },
'params': {
'proxy': '127.0.0.1:8118'
},
'skip': 'removed',
}, {
'url': 'https://www.spankwire.com/EmbedPlayer.aspx/?ArticleId=156156&autostart=true',
'only_matching': True,
}] }]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?spankwire\.com/EmbedPlayer\.aspx/?\?.*?\bArticleId=\d+)',
webpage)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video = self._download_json( req = sanitized_Request('http://www.' + mobj.group('url'))
'https://www.spankwire.com/api/video/%s.json' % video_id, video_id) req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
title = video['title'] title = self._html_search_regex(
r'<h1>([^<]+)', webpage, 'title')
description = self._html_search_regex(
r'(?s)<div\s+id="descriptionContent">(.+?)</div>',
webpage, 'description', fatal=False)
thumbnail = self._html_search_regex(
r'playerData\.screenShot\s*=\s*["\']([^"\']+)["\']',
webpage, 'thumbnail', fatal=False)
uploader = self._html_search_regex(
r'by:\s*<a [^>]*>(.+?)</a>',
webpage, 'uploader', fatal=False)
uploader_id = self._html_search_regex(
r'by:\s*<a href="/(?:user/viewProfile|Profile\.aspx)\?.*?UserId=(\d+).*?"',
webpage, 'uploader id', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r'</a> on (.+?) at \d+:\d+',
webpage, 'upload date', fatal=False))
view_count = str_to_int(self._html_search_regex(
r'<div id="viewsCounter"><span>([\d,\.]+)</span> views</div>',
webpage, 'view count', fatal=False))
comment_count = str_to_int(self._html_search_regex(
r'<span\s+id="spCommentCount"[^>]*>([\d,\.]+)</span>',
webpage, 'comment count', fatal=False))
videos = re.findall(
r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)
heights = [int(video[0]) for video in videos]
video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos]))
if webpage.find(r'flashvars\.encrypted = "true"') != -1:
password = self._search_regex(
r'flashvars\.video_title = "([^"]+)',
webpage, 'password').replace('+', ' ')
video_urls = list(map(
lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'),
video_urls))
formats = [] formats = []
videos = video.get('videos') for height, video_url in zip(heights, video_urls):
if isinstance(videos, dict): path = compat_urllib_parse_urlparse(video_url).path
for format_id, format_url in videos.items(): m = re.search(r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', path)
video_url = url_or_none(format_url)
if not format_url:
continue
height = int_or_none(self._search_regex(
r'(\d+)[pP]', format_id, 'height', default=None))
m = re.search(
r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', video_url)
if m: if m:
tbr = int(m.group('tbr')) tbr = int(m.group('tbr'))
height = height or int(m.group('height')) height = int(m.group('height'))
else: else:
tbr = None tbr = None
formats.append({ formats.append({
'url': video_url, 'url': video_url,
'format_id': '%dp' % height if height else format_id, 'format_id': '%dp' % height,
'height': height, 'height': height,
'tbr': tbr, 'tbr': tbr,
}) })
m3u8_url = url_or_none(video.get('HLS')) self._sort_formats(formats)
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats, ('height', 'tbr', 'width', 'format_id'))
view_count = str_to_int(video.get('viewed')) age_limit = self._rta_search(webpage)
thumbnails = [] return {
for preference, t in enumerate(('', '2x'), start=0):
thumbnail_url = url_or_none(video.get('poster%s' % t))
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'preference': preference,
})
def extract_names(key):
entries_list = video.get(key)
if not isinstance(entries_list, list):
return
entries = []
for entry in entries_list:
name = str_or_none(entry.get('name'))
if name:
entries.append(name)
return entries
categories = extract_names('categories')
tags = extract_names('tags')
uploader = None
info = {}
webpage = self._download_webpage(
'https://www.spankwire.com/_/video%s/' % video_id, video_id,
fatal=False)
if webpage:
info = self._search_json_ld(webpage, video_id, default={})
thumbnail_url = None
if 'thumbnail' in info:
thumbnail_url = url_or_none(info['thumbnail'])
del info['thumbnail']
if not thumbnail_url:
thumbnail_url = self._og_search_thumbnail(webpage)
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'preference': 10,
})
uploader = self._html_search_regex(
r'(?s)by\s*<a[^>]+\bclass=["\']uploaded__by[^>]*>(.+?)</a>',
webpage, 'uploader', fatal=False)
if not view_count:
view_count = str_to_int(self._search_regex(
r'data-views=["\']([\d,.]+)', webpage, 'view count',
fatal=False))
return merge_dicts({
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': video.get('description'), 'description': description,
'duration': int_or_none(video.get('duration')), 'thumbnail': thumbnail,
'thumbnails': thumbnails,
'uploader': uploader, 'uploader': uploader,
'uploader_id': str_or_none(video.get('userId')), 'uploader_id': uploader_id,
'timestamp': int_or_none(video.get('time_approved_on')), 'upload_date': upload_date,
'average_rating': float_or_none(video.get('rating')),
'view_count': view_count, 'view_count': view_count,
'comment_count': int_or_none(video.get('comments')), 'comment_count': comment_count,
'age_limit': 18,
'categories': categories,
'tags': tags,
'formats': formats, 'formats': formats,
}, info) 'age_limit': age_limit,
}

View file

@ -8,10 +8,15 @@ class BellatorIE(MTVServicesInfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg', 'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
'info_dict': { 'info_dict': {
'title': 'Michael Page vs. Evangelista Cyborg', 'id': 'b55e434e-fde1-4a98-b7cc-92003a034de4',
'description': 'md5:0d917fc00ffd72dd92814963fc6cbb05', 'ext': 'mp4',
'title': 'Douglas Lima vs. Paul Daley - Round 1',
'description': 'md5:805a8dd29310fd611d32baba2f767885',
},
'params': {
# m3u8 download
'skip_download': True,
}, },
'playlist_count': 3,
}, { }, {
'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page', 'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
'only_matching': True, 'only_matching': True,
@ -20,9 +25,6 @@ class BellatorIE(MTVServicesInfoExtractor):
_FEED_URL = 'http://www.bellator.com/feeds/mrss/' _FEED_URL = 'http://www.bellator.com/feeds/mrss/'
_GEO_COUNTRIES = ['US'] _GEO_COUNTRIES = ['US']
def _extract_mgid(self, webpage):
return self._extract_triforce_mgid(webpage)
class ParamountNetworkIE(MTVServicesInfoExtractor): class ParamountNetworkIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)' _VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'

View file

@ -13,18 +13,36 @@
class SportDeutschlandIE(InfoExtractor): class SportDeutschlandIE(InfoExtractor):
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])' _VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
_TESTS = [{ _TESTS = [{
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0', 'url': 'http://sportdeutschland.tv/badminton/live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
'info_dict': { 'info_dict': {
'id': 're-live-deutsche-meisterschaften-2020-halbfinals', 'id': 'live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals', 'title': 're:Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen',
'categories': ['Badminton-Deutschland'], 'categories': ['Badminton'],
'view_count': int, 'view_count': int,
'thumbnail': r're:^https?://.*\.(?:jpg|png)$', 'thumbnail': r're:^https?://.*\.jpg$',
'description': r're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
'timestamp': int, 'timestamp': int,
'upload_date': '20200201', 'upload_date': 're:^201408[23][0-9]$',
'description': 're:.*', # meaningless description for THIS video
}, },
'params': {
'skip_download': 'Live stream',
},
}, {
'url': 'http://sportdeutschland.tv/li-ning-badminton-wm-2014/lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
'info_dict': {
'id': 'lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
'ext': 'mp4',
'upload_date': '20140825',
'description': 'md5:60a20536b57cee7d9a4ec005e8687504',
'timestamp': 1408976060,
'duration': 2732,
'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee',
'thumbnail': r're:^https?://.*\.jpg$',
'view_count': int,
'categories': ['Li-Ning Badminton WM 2014'],
}
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -32,7 +50,7 @@ def _real_extract(self, url):
video_id = mobj.group('id') video_id = mobj.group('id')
sport_id = mobj.group('sport') sport_id = mobj.group('sport')
api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % ( api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
sport_id, video_id) sport_id, video_id)
req = sanitized_Request(api_url, headers={ req = sanitized_Request(api_url, headers={
'Accept': 'application/vnd.vidibus.v2.html+json', 'Accept': 'application/vnd.vidibus.v2.html+json',

View file

@ -1,14 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .ard import ARDMediathekBaseIE from .ard import ARDMediathekIE
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
get_element_by_attribute, get_element_by_attribute,
) )
class SRMediathekIE(ARDMediathekBaseIE): class SRMediathekIE(ARDMediathekIE):
IE_NAME = 'sr:mediathek' IE_NAME = 'sr:mediathek'
IE_DESC = 'Saarländischer Rundfunk' IE_DESC = 'Saarländischer Rundfunk'
_VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)' _VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'

View file

@ -5,28 +5,44 @@
class StretchInternetIE(InfoExtractor): class StretchInternetIE(InfoExtractor):
_VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/(?:portal|full)\.htm\?.*?\beventId=(?P<id>\d+)' _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/portal\.htm\?.*?\beventId=(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=573272&streamType=video', 'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=313900&streamType=video',
'info_dict': { 'info_dict': {
'id': '573272', 'id': '313900',
'ext': 'mp4', 'ext': 'mp4',
'title': 'University of Mary Wrestling vs. Upper Iowa', 'title': 'Augustana (S.D.) Baseball vs University of Mary',
'timestamp': 1575668361, 'description': 'md5:7578478614aae3bdd4a90f578f787438',
'upload_date': '20191206', 'timestamp': 1490468400,
'upload_date': '20170325',
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
stream = self._download_json(
'https://neo-client.stretchinternet.com/streamservice/v1/media/stream/v%s'
% video_id, video_id)
video_url = 'https://%s' % stream['source']
event = self._download_json( event = self._download_json(
'https://api.stretchinternet.com/trinity/event/tcg/' + video_id, 'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
video_id)[0] video_id, query={
'clientID': 99997,
'eventID': video_id,
'token': 'asdf',
})['event']
title = event.get('title') or event['mobileTitle']
description = event.get('customText')
timestamp = int_or_none(event.get('longtime'))
return { return {
'id': video_id, 'id': video_id,
'title': event['title'], 'title': title,
'timestamp': int_or_none(event.get('dateCreated'), 1000), 'description': description,
'url': 'https://' + event['media'][0]['url'], 'timestamp': timestamp,
'url': video_url,
} }

View file

@ -4,14 +4,19 @@
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
dict_get, dict_get,
int_or_none, int_or_none,
str_or_none, orderedSet,
strip_or_none, strip_or_none,
try_get, try_get,
urljoin,
compat_str,
) )
@ -232,23 +237,23 @@ def _real_extract(self, url):
class SVTSeriesIE(SVTPlayBaseIE): class SVTSeriesIE(SVTPlayBaseIE):
_VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)(?:.+?\btab=(?P<season_slug>[^&#]+))?' _VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.svtplay.se/rederiet', 'url': 'https://www.svtplay.se/rederiet',
'info_dict': { 'info_dict': {
'id': '14445680', 'id': 'rederiet',
'title': 'Rederiet', 'title': 'Rederiet',
'description': 'md5:d9fdfff17f5d8f73468176ecd2836039', 'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
}, },
'playlist_mincount': 318, 'playlist_mincount': 318,
}, { }, {
'url': 'https://www.svtplay.se/rederiet?tab=season-2-14445680', 'url': 'https://www.svtplay.se/rederiet?tab=sasong2',
'info_dict': { 'info_dict': {
'id': 'season-2-14445680', 'id': 'rederiet-sasong2',
'title': 'Rederiet - Säsong 2', 'title': 'Rederiet - Säsong 2',
'description': 'md5:d9fdfff17f5d8f73468176ecd2836039', 'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
}, },
'playlist_mincount': 12, 'playlist_count': 12,
}] }]
@classmethod @classmethod
@ -256,87 +261,83 @@ def suitable(cls, url):
return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url) return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
series_slug, season_id = re.match(self._VALID_URL, url).groups() series_id = self._match_id(url)
series = self._download_json( qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
'https://api.svt.se/contento/graphql', series_slug, season_slug = qs.get('tab', [None])[0]
'Downloading series page', query={
'query': '''{ if season_slug:
listablesBySlug(slugs: ["%s"]) { series_id += '-%s' % season_slug
associatedContent(include: [productionPeriod, season]) {
items { webpage = self._download_webpage(
item { url, series_id, 'Downloading series page')
... on Episode {
videoSvtId root = self._parse_json(
} self._search_regex(
} self._SVTPLAY_RE, webpage, 'content', group='json'),
} series_id)
id
name
}
id
longDescription
name
shortDescription
}
}''' % series_slug,
})['data']['listablesBySlug'][0]
season_name = None season_name = None
entries = [] entries = []
for season in series['associatedContent']: for season in root['relatedVideoContent']['relatedVideosAccordion']:
if not isinstance(season, dict): if not isinstance(season, dict):
continue continue
if season_id: if season_slug:
if season.get('id') != season_id: if season.get('slug') != season_slug:
continue continue
season_name = season.get('name') season_name = season.get('name')
items = season.get('items') videos = season.get('videos')
if not isinstance(items, list): if not isinstance(videos, list):
continue continue
for item in items: for video in videos:
video = item.get('item') or {} content_url = video.get('contentUrl')
content_id = video.get('videoSvtId') if not content_url or not isinstance(content_url, compat_str):
if not content_id or not isinstance(content_id, compat_str):
continue continue
entries.append(self.url_result( entries.append(
'svt:' + content_id, SVTPlayIE.ie_key(), content_id)) self.url_result(
urljoin(url, content_url),
ie=SVTPlayIE.ie_key(),
video_title=video.get('title')
))
title = series.get('name') metadata = root.get('metaData')
season_name = season_name or season_id if not isinstance(metadata, dict):
metadata = {}
title = metadata.get('title')
season_name = season_name or season_slug
if title and season_name: if title and season_name:
title = '%s - %s' % (title, season_name) title = '%s - %s' % (title, season_name)
elif season_id: elif season_slug:
title = season_id title = season_slug
return self.playlist_result( return self.playlist_result(
entries, season_id or series.get('id'), title, entries, series_id, title, metadata.get('description'))
dict_get(series, ('longDescription', 'shortDescription')))
class SVTPageIE(InfoExtractor): class SVTPageIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?svt\.se/(?P<path>(?:[^/]+/)*(?P<id>[^/?&#]+))' _VALID_URL = r'https?://(?:www\.)?svt\.se/(?:[^/]+/)*(?P<id>[^/?&#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.svt.se/sport/ishockey/bakom-masken-lehners-kamp-mot-mental-ohalsa', 'url': 'https://www.svt.se/sport/oseedat/guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
'info_dict': { 'info_dict': {
'id': '25298267', 'id': 'guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
'title': 'Bakom masken Lehners kamp mot mental ohälsa', 'title': 'GUIDE: Sommarträning du kan göra var och när du vill',
}, },
'playlist_count': 4, 'playlist_count': 7,
}, { }, {
'url': 'https://www.svt.se/nyheter/utrikes/svenska-andrea-ar-en-mil-fran-branderna-i-kalifornien', 'url': 'https://www.svt.se/nyheter/inrikes/ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
'info_dict': { 'info_dict': {
'id': '24243746', 'id': 'ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
'title': 'Svenska Andrea redo att fly sitt hem i Kalifornien', 'title': 'Ebba Busch Thor har bara delvis rätt om ”no-go-zoner”',
}, },
'playlist_count': 2, 'playlist_count': 1,
}, { }, {
# only programTitle # only programTitle
'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun', 'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
'info_dict': { 'info_dict': {
'id': '8439V2K', 'id': '2900353',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Stjärnorna skojar till det - under SVT-intervjun', 'title': 'Stjärnorna skojar till det - under SVT-intervjun',
'duration': 27, 'duration': 27,
@ -355,26 +356,16 @@ def suitable(cls, url):
return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url) return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
path, display_id = re.match(self._VALID_URL, url).groups() playlist_id = self._match_id(url)
article = self._download_json( webpage = self._download_webpage(url, playlist_id)
'https://api.svt.se/nss-api/page/' + path, display_id,
query={'q': 'articles'})['articles']['content'][0]
entries = [] entries = [
self.url_result(
'svt:%s' % video_id, ie=SVTPlayIE.ie_key(), video_id=video_id)
for video_id in orderedSet(re.findall(
r'data-video-id=["\'](\d+)', webpage))]
def _process_content(content): title = strip_or_none(self._og_search_title(webpage, default=None))
if content.get('_type') in ('VIDEOCLIP', 'VIDEOEPISODE'):
video_id = compat_str(content['image']['svtId'])
entries.append(self.url_result(
'svt:' + video_id, SVTPlayIE.ie_key(), video_id))
for media in article.get('media', []): return self.playlist_result(entries, playlist_id, title)
_process_content(media)
for obj in article.get('structuredBody', []):
_process_content(obj.get('content') or {})
return self.playlist_result(
entries, str_or_none(article.get('id')),
strip_or_none(article.get('title')))

View file

@ -4,12 +4,11 @@
from .common import InfoExtractor from .common import InfoExtractor
from .wistia import WistiaIE from .wistia import WistiaIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError, ExtractorError,
int_or_none,
get_element_by_class, get_element_by_class,
strip_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
@ -21,8 +20,8 @@ class TeachableBaseIE(InfoExtractor):
_SITES = { _SITES = {
# Only notable ones here # Only notable ones here
'v1.upskillcourses.com': 'upskill', 'upskillcourses.com': 'upskill',
'gns3.teachable.com': 'gns3', 'academy.gns3.com': 'gns3',
'academyhacker.com': 'academyhacker', 'academyhacker.com': 'academyhacker',
'stackskills.com': 'stackskills', 'stackskills.com': 'stackskills',
'market.saleshacker.com': 'saleshacker', 'market.saleshacker.com': 'saleshacker',
@ -59,7 +58,7 @@ def is_logged(webpage):
self._logged_in = True self._logged_in = True
return return
login_url = urlh.geturl() login_url = compat_str(urlh.geturl())
login_form = self._hidden_inputs(login_page) login_form = self._hidden_inputs(login_page)
@ -111,29 +110,27 @@ class TeachableIE(TeachableBaseIE):
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{ _TESTS = [{
'url': 'https://gns3.teachable.com/courses/gns3-certified-associate/lectures/6842364', 'url': 'http://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
'info_dict': { 'info_dict': {
'id': 'untlgzk1v7', 'id': 'uzw6zw58or',
'ext': 'bin', 'ext': 'mp4',
'title': 'Overview', 'title': 'Welcome to the Course!',
'description': 'md5:071463ff08b86c208811130ea1c2464c', 'description': 'md5:65edb0affa582974de4625b9cdea1107',
'duration': 736.4, 'duration': 138.763,
'timestamp': 1542315762, 'timestamp': 1479846621,
'upload_date': '20181115', 'upload_date': '20161122',
'chapter': 'Welcome',
'chapter_number': 1,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
'url': 'http://v1.upskillcourses.com/courses/119763/lectures/1747100', 'url': 'http://upskillcourses.com/courses/119763/lectures/1747100',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://gns3.teachable.com/courses/423415/lectures/6885939', 'url': 'https://academy.gns3.com/courses/423415/lectures/6885939',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'teachable:https://v1.upskillcourses.com/courses/essential-web-developer-course/lectures/1747100', 'url': 'teachable:https://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
'only_matching': True, 'only_matching': True,
}] }]
@ -163,51 +160,22 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
wistia_urls = WistiaIE._extract_urls(webpage) wistia_url = WistiaIE._extract_url(webpage)
if not wistia_urls: if not wistia_url:
if any(re.search(p, webpage) for p in ( if any(re.search(p, webpage) for p in (
r'class=["\']lecture-contents-locked', r'class=["\']lecture-contents-locked',
r'>\s*Lecture contents locked', r'>\s*Lecture contents locked',
r'id=["\']lecture-locked', r'id=["\']lecture-locked')):
# https://academy.tailoredtutors.co.uk/courses/108779/lectures/1955313
r'class=["\'](?:inner-)?lesson-locked',
r'>LESSON LOCKED<')):
self.raise_login_required('Lecture contents locked') self.raise_login_required('Lecture contents locked')
raise ExtractorError('Unable to find video URL')
title = self._og_search_title(webpage, default=None) title = self._og_search_title(webpage, default=None)
chapter = None return {
chapter_number = None
section_item = self._search_regex(
r'(?s)(?P<li><li[^>]+\bdata-lecture-id=["\']%s[^>]+>.+?</li>)' % video_id,
webpage, 'section item', default=None, group='li')
if section_item:
chapter_number = int_or_none(self._search_regex(
r'data-ss-position=["\'](\d+)', section_item, 'section id',
default=None))
if chapter_number is not None:
sections = []
for s in re.findall(
r'(?s)<div[^>]+\bclass=["\']section-title[^>]+>(.+?)</div>', webpage):
section = strip_or_none(clean_html(s))
if not section:
sections = []
break
sections.append(section)
if chapter_number <= len(sections):
chapter = sections[chapter_number - 1]
entries = [{
'_type': 'url_transparent', '_type': 'url_transparent',
'url': wistia_url, 'url': wistia_url,
'ie_key': WistiaIE.ie_key(), 'ie_key': WistiaIE.ie_key(),
'title': title, 'title': title,
'chapter': chapter, }
'chapter_number': chapter_number,
} for wistia_url in wistia_urls]
return self.playlist_result(entries, video_id, title)
class TeachableCourseIE(TeachableBaseIE): class TeachableCourseIE(TeachableBaseIE):
@ -219,20 +187,20 @@ class TeachableCourseIE(TeachableBaseIE):
/(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+) /(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{ _TESTS = [{
'url': 'http://v1.upskillcourses.com/courses/essential-web-developer-course/', 'url': 'http://upskillcourses.com/courses/essential-web-developer-course/',
'info_dict': { 'info_dict': {
'id': 'essential-web-developer-course', 'id': 'essential-web-developer-course',
'title': 'The Essential Web Developer Course (Free)', 'title': 'The Essential Web Developer Course (Free)',
}, },
'playlist_count': 192, 'playlist_count': 192,
}, { }, {
'url': 'http://v1.upskillcourses.com/courses/119763/', 'url': 'http://upskillcourses.com/courses/119763/',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://v1.upskillcourses.com/courses/enrolled/119763', 'url': 'http://upskillcourses.com/courses/enrolled/119763',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://gns3.teachable.com/courses/enrolled/423415', 'url': 'https://academy.gns3.com/courses/enrolled/423415',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini', 'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',

View file

@ -1,21 +1,13 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from .jwplatform import JWPlatformIE
from .nexx import NexxIE from .nexx import NexxIE
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import (
NO_DEFAULT,
smuggle_url,
)
class Tele5IE(InfoExtractor): class Tele5IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_GEO_COUNTRIES = ['DE']
_TESTS = [{ _TESTS = [{
'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416', 'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416',
'info_dict': { 'info_dict': {
@ -28,21 +20,6 @@ class Tele5IE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# jwplatform, nexx unavailable
'url': 'https://www.tele5.de/filme/ghoul-das-geheimnis-des-friedhofmonsters/',
'info_dict': {
'id': 'WJuiOlUp',
'ext': 'mp4',
'upload_date': '20200603',
'timestamp': 1591214400,
'title': 'Ghoul - Das Geheimnis des Friedhofmonsters',
'description': 'md5:42002af1d887ff3d5b2b3ca1f8137d97',
},
'params': {
'skip_download': True,
},
'add_ie': [JWPlatformIE.ie_key()],
}, { }, {
'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191', 'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191',
'only_matching': True, 'only_matching': True,
@ -67,42 +44,14 @@ def _real_extract(self, url):
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0] video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
NEXX_ID_RE = r'\d{6,}' if not video_id:
JWPLATFORM_ID_RE = r'[a-zA-Z0-9]{8}'
def nexx_result(nexx_id):
return self.url_result(
'https://api.nexx.cloud/v3/759/videos/byid/%s' % nexx_id,
ie=NexxIE.ie_key(), video_id=nexx_id)
nexx_id = jwplatform_id = None
if video_id:
if re.match(NEXX_ID_RE, video_id):
return nexx_result(video_id)
elif re.match(JWPLATFORM_ID_RE, video_id):
jwplatform_id = video_id
if not nexx_id:
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex(
def extract_id(pattern, name, default=NO_DEFAULT): (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](\d+)',
return self._html_search_regex( r'\s+id\s*=\s*["\']player_(\d{6,})',
(r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](%s)' % pattern, r'\bdata-id\s*=\s*["\'](\d{6,})'), webpage, 'video id')
r'\s+id\s*=\s*["\']player_(%s)' % pattern,
r'\bdata-id\s*=\s*["\'](%s)' % pattern), webpage, name,
default=default)
nexx_id = extract_id(NEXX_ID_RE, 'nexx id', default=None)
if nexx_id:
return nexx_result(nexx_id)
if not jwplatform_id:
jwplatform_id = extract_id(JWPLATFORM_ID_RE, 'jwplatform id')
return self.url_result( return self.url_result(
smuggle_url( 'https://api.nexx.cloud/v3/759/videos/byid/%s' % video_id,
'jwplatform:%s' % jwplatform_id, ie=NexxIE.ie_key(), video_id=video_id)
{'geo_countries': self._GEO_COUNTRIES}),
ie=JWPlatformIE.ie_key(), video_id=jwplatform_id)

View file

@ -11,7 +11,6 @@
determine_ext, determine_ext,
int_or_none, int_or_none,
str_or_none, str_or_none,
try_get,
urljoin, urljoin,
) )
@ -25,7 +24,7 @@ class TelecincoIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '1876350223', 'id': '1876350223',
'title': 'Bacalao con kokotxas al pil-pil', 'title': 'Bacalao con kokotxas al pil-pil',
'description': 'md5:716caf5601e25c3c5ab6605b1ae71529', 'description': 'md5:1382dacd32dd4592d478cbdca458e5bb',
}, },
'playlist': [{ 'playlist': [{
'md5': 'adb28c37238b675dad0f042292f209a7', 'md5': 'adb28c37238b675dad0f042292f209a7',
@ -56,26 +55,6 @@ class TelecincoIE(InfoExtractor):
'description': 'md5:2771356ff7bfad9179c5f5cd954f1477', 'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
'duration': 50, 'duration': 50,
}, },
}, {
# video in opening's content
'url': 'https://www.telecinco.es/vivalavida/fiorella-sobrina-edmundo-arrocet-entrevista_18_2907195140.html',
'info_dict': {
'id': '2907195140',
'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
'description': 'md5:73f340a7320143d37ab895375b2bf13a',
},
'playlist': [{
'md5': 'adb28c37238b675dad0f042292f209a7',
'info_dict': {
'id': 'TpI2EttSDAReWpJ1o0NVh2',
'ext': 'mp4',
'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
'duration': 1015,
},
}],
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html', 'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
'only_matching': True, 'only_matching': True,
@ -156,27 +135,16 @@ def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
article = self._parse_json(self._search_regex( article = self._parse_json(self._search_regex(
r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=\s*({.+})', r'window\.\$REACTBASE_STATE\.article\s*=\s*({.+})',
webpage, 'article'), display_id)['article'] webpage, 'article'), display_id)['article']
title = article.get('title') title = article.get('title')
description = clean_html(article.get('leadParagraph')) or '' description = clean_html(article.get('leadParagraph'))
if article.get('editorialType') != 'VID': if article.get('editorialType') != 'VID':
entries = [] entries = []
body = [article.get('opening')] for p in article.get('body', []):
body.extend(try_get(article, lambda x: x['body'], list) or [])
for p in body:
if not isinstance(p, dict):
continue
content = p.get('content') content = p.get('content')
if not content: if p.get('type') != 'video' or not content:
continue continue
type_ = p.get('type')
if type_ == 'paragraph':
content_str = str_or_none(content)
if content_str:
description += content_str
continue
if type_ == 'video' and isinstance(content, dict):
entries.append(self._parse_content(content, url)) entries.append(self._parse_content(content, url))
return self.playlist_result( return self.playlist_result(
entries, str_or_none(article.get('id')), title, description) entries, str_or_none(article.get('id')), title, description)

View file

@ -38,6 +38,8 @@ class TeleQuebecIE(TeleQuebecBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Un petit choc et puis repart!', 'title': 'Un petit choc et puis repart!',
'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374', 'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
'upload_date': '20180222',
'timestamp': 1519326631,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,

View file

@ -10,8 +10,8 @@
class TenPlayIE(InfoExtractor): class TenPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/]+/)+(?P<id>tpv\d{6}[a-z]{5})' _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/[^/]+/episodes/[^/]+/[^/]+/(?P<id>tpv\d{6}[a-z]{5})'
_TESTS = [{ _TEST = {
'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga', 'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga',
'info_dict': { 'info_dict': {
'id': '6060533435001', 'id': '6060533435001',
@ -27,10 +27,7 @@ class TenPlayIE(InfoExtractor):
'format': 'bestvideo', 'format': 'bestvideo',
'skip_download': True, 'skip_download': True,
} }
}, { }
'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):

View file

@ -17,12 +17,14 @@ class TFOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon', 'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
'md5': 'cafbe4f47a8dae0ca0159937878100d6', 'md5': '47c987d0515561114cf03d1226a9d4c7',
'info_dict': { 'info_dict': {
'id': '7da3d50e495c406b8fc0b997659cc075', 'id': '100463871',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video Game Hackathon', 'title': 'Video Game Hackathon',
'description': 'md5:558afeba217c6c8d96c60e5421795c07', 'description': 'md5:558afeba217c6c8d96c60e5421795c07',
'upload_date': '20160212',
'timestamp': 1455310233,
} }
} }

View file

@ -2,46 +2,43 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import try_get
class ThisOldHouseIE(InfoExtractor): class ThisOldHouseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode|(?:[^/]+/)?\d+)/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode)/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench', 'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
'md5': '568acf9ca25a639f0c4ff905826b662f',
'info_dict': { 'info_dict': {
'id': '5dcdddf673c3f956ef5db202', 'id': '2REGtUDQ',
'ext': 'mp4', 'ext': 'mp4',
'title': 'How to Build a Storage Bench', 'title': 'How to Build a Storage Bench',
'description': 'In the workshop, Tom Silva and Kevin O\'Connor build a storage bench for an entryway.', 'description': 'In the workshop, Tom Silva and Kevin O\'Connor build a storage bench for an entryway.',
'timestamp': 1442548800, 'timestamp': 1442548800,
'upload_date': '20150918', 'upload_date': '20150918',
}, }
'params': {
'skip_download': True,
},
}, { }, {
'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins', 'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric', 'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.thisoldhouse.com/furniture/21017078/how-to-build-a-storage-bench',
'only_matching': True,
}, {
'url': 'https://www.thisoldhouse.com/21113884/s41-e13-paradise-lost',
'only_matching': True,
}, {
# iframe www.thisoldhouse.com
'url': 'https://www.thisoldhouse.com/21083431/seaside-transformation-the-westerly-project',
'only_matching': True,
}] }]
_ZYPE_TMPL = 'https://player.zype.com/embed/%s.html?api_key=hsOk_yMSPYNrT22e9pu8hihLXjaZf0JW5jsOWv4ZqyHJFvkJn6rtToHl09tbbsbe'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( video_id = self._search_regex(
r'<iframe[^>]+src=[\'"](?:https?:)?//(?:www\.)?thisoldhouse\.(?:chorus\.build|com)/videos/zype/([0-9a-f]{24})', (r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'video id') r'id=(["\'])inline-video-player-(?P<id>(?:(?!\1).)+)\1'),
return self.url_result(self._ZYPE_TMPL % video_id, 'Zype', video_id) webpage, 'video id', default=None, group='id')
if not video_id:
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
video_id = try_get(
drupal_settings, lambda x: x['jwplatform']['video_id'],
compat_str) or list(drupal_settings['comScore'])[0]
return self.url_result('jwplatform:' + video_id, 'JWPlatform', video_id)

View file

@ -17,9 +17,9 @@
class ToggleIE(InfoExtractor): class ToggleIE(InfoExtractor):
IE_NAME = 'toggle' IE_NAME = 'toggle'
_VALID_URL = r'https?://(?:(?:www\.)?mewatch|video\.toggle)\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)' _VALID_URL = r'https?://video\.toggle\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.mewatch.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115', 'url': 'http://video.toggle.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
'info_dict': { 'info_dict': {
'id': '343115', 'id': '343115',
'ext': 'mp4', 'ext': 'mp4',
@ -33,7 +33,7 @@ class ToggleIE(InfoExtractor):
} }
}, { }, {
'note': 'DRM-protected video', 'note': 'DRM-protected video',
'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413', 'url': 'http://video.toggle.sg/en/movies/dug-s-special-mission/341413',
'info_dict': { 'info_dict': {
'id': '341413', 'id': '341413',
'ext': 'wvm', 'ext': 'wvm',
@ -48,7 +48,7 @@ class ToggleIE(InfoExtractor):
}, { }, {
# this also tests correct video id extraction # this also tests correct video id extraction
'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay', 'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861', 'url': 'http://video.toggle.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
'info_dict': { 'info_dict': {
'id': '332861', 'id': '332861',
'ext': 'mp4', 'ext': 'mp4',
@ -65,22 +65,19 @@ class ToggleIE(InfoExtractor):
'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331', 'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.mewatch.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331', 'url': 'http://video.toggle.sg/zh/series/zero-calling-s2-hd/ep13/336367',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.mewatch.sg/zh/series/zero-calling-s2-hd/ep13/336367', 'url': 'http://video.toggle.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.mewatch.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302', 'url': 'http://video.toggle.sg/en/movies/seven-days/321936',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.mewatch.sg/en/movies/seven-days/321936', 'url': 'https://video.toggle.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://www.mewatch.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456', 'url': 'http://video.toggle.sg/en/channels/eleven-plus/401585',
'only_matching': True,
}, {
'url': 'http://www.mewatch.sg/en/channels/eleven-plus/401585',
'only_matching': True, 'only_matching': True,
}] }]

Some files were not shown because too many files have changed in this diff Show more