Merge branch 'master' into akamai-fix

This commit is contained in:
nixxo 2021-01-07 16:49:07 +01:00 committed by GitHub
commit 1c3a61baae
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
139 changed files with 7582 additions and 4316 deletions

View File

@ -21,15 +21,15 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2020.10.31. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.07. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/blackjack4494/yt-dlc. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/pukkandan/yt-dlc.
- Search the bugtracker for similar issues: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar issues: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dlc version **2020.10.31** - [ ] I've verified that I'm running youtube-dlc version **2021.01.07**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones - [ ] I've searched the bugtracker for similar issues including closed ones
@ -44,7 +44,7 @@ ## Verbose log
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dlc version 2020.10.31 [debug] youtube-dlc version 2021.01.07
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}
@ -53,7 +53,11 @@ ## Verbose log
``` ```
PASTE VERBOSE LOG HERE PASTE VERBOSE LOG HERE
``` ```
<!--
Do not remove the above ```
-->
## Description ## Description

View File

@ -21,15 +21,15 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2020.10.31. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.07. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/blackjack4494/yt-dlc. youtube-dlc does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/pukkandan/yt-dlc. youtube-dlc does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar site support requests: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dlcc version **2020.10.31** - [ ] I've verified that I'm running youtube-dlc version **2021.01.07**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights - [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones - [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -21,20 +21,20 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2020.10.31. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.07. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dlc version **2020.10.31** - [ ] I've verified that I'm running youtube-dlc version **2021.01.07**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones - [ ] I've searched the bugtracker for similar site feature requests including closed ones
## Description ## Description
<!-- <!--
Provide an explanation of your site feature request in an arbitrary form. Please make sure the description is worded well enough to be understood, see https://github.com/ytdl-org/youtube-dlc#is-the-description-of-the-issue-itself-sufficient. Provide any additional information, suggested solution and as much context and examples as possible. Provide an explanation of your site feature request in an arbitrary form. Please make sure the description is worded well enough to be understood, see https://github.com/ytdl-org/youtube-dl#is-the-description-of-the-issue-itself-sufficient. Provide any additional information, suggested solution and as much context and examples as possible.
--> -->
WRITE DESCRIPTION HERE WRITE DESCRIPTION HERE

View File

@ -21,16 +21,16 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2020.10.31. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.07. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/blackjack4494/yt-dlc. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/pukkandan/yt-dlc.
- Search the bugtracker for similar issues: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar issues: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Read bugs section in FAQ: https://github.com/blackjack4494/yt-dlc - Read bugs section in FAQ: https://github.com/pukkandan/yt-dlc
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dlc version **2020.10.31** - [ ] I've verified that I'm running youtube-dlc version **2021.01.07**
- [ ] I've checked that all provided URLs are alive and playable in a browser - [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones - [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -46,7 +46,7 @@ ## Verbose log
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dlc version 2020.10.31 [debug] youtube-dlc version 2021.01.07
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}
@ -55,13 +55,17 @@ ## Verbose log
``` ```
PASTE VERBOSE LOG HERE PASTE VERBOSE LOG HERE
``` ```
<!--
Do not remove the above ```
-->
## Description ## Description
<!-- <!--
Provide an explanation of your issue in an arbitrary form. Please make sure the description is worded well enough to be understood, see https://github.com/ytdl-org/youtube-dlc#is-the-description-of-the-issue-itself-sufficient. Provide any additional information, suggested solution and as much context and examples as possible. Provide an explanation of your issue in an arbitrary form. Please make sure the description is worded well enough to be understood, see https://github.com/ytdl-org/youtube-dl#is-the-description-of-the-issue-itself-sufficient. Provide any additional information, suggested solution and as much context and examples as possible.
If work on your issue requires account credentials please provide them or explain how one can obtain them. If work on your issue requires account credentials please provide them or explain how one can obtain them.
--> -->

View File

@ -21,20 +21,20 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2020.10.31. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is 2021.01.07. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar feature requests: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dlc version **2020.10.31** - [ ] I've verified that I'm running youtube-dlc version **2021.01.07**
- [ ] I've searched the bugtracker for similar feature requests including closed ones - [ ] I've searched the bugtracker for similar feature requests including closed ones
## Description ## Description
<!-- <!--
Provide an explanation of your issue in an arbitrary form. Please make sure the description is worded well enough to be understood, see https://github.com/ytdl-org/youtube-dlc#is-the-description-of-the-issue-itself-sufficient. Provide any additional information, suggested solution and as much context and examples as possible. Provide an explanation of your issue in an arbitrary form. Please make sure the description is worded well enough to be understood, see https://github.com/ytdl-org/youtube-dl#is-the-description-of-the-issue-itself-sufficient. Provide any additional information, suggested solution and as much context and examples as possible.
--> -->
WRITE DESCRIPTION HERE WRITE DESCRIPTION HERE

View File

@ -23,7 +23,7 @@ ## Checklist
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- Look through the README (https://github.com/blackjack4494/yt-dlc) and FAQ (https://github.com/blackjack4494/yt-dlc) for similar questions - Look through the README (https://github.com/blackjack4494/yt-dlc) and FAQ (https://github.com/blackjack4494/yt-dlc) for similar questions
- Search the bugtracker for similar questions: https://github.com/blackjack4494/yt-dlc - Search the bugtracker for similar questions: https://github.com/blackjack4494/yt-dlc
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm asking a question - [ ] I'm asking a question

View File

@ -1,7 +1,10 @@
--- ---
name: Broken site support name: Broken site support
about: Report broken or misfunctioning site about: Report broken or misfunctioning site
title: '' title: "[Broken]"
labels: Broken
assignees: ''
--- ---
<!-- <!--
@ -18,11 +21,11 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/blackjack4494/yt-dlc. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/pukkandan/yt-dlc.
- Search the bugtracker for similar issues: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar issues: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a broken site support - [ ] I'm reporting a broken site support
@ -50,7 +53,11 @@ ## Verbose log
``` ```
PASTE VERBOSE LOG HERE PASTE VERBOSE LOG HERE
``` ```
<!--
Do not remove the above ```
-->
## Description ## Description

View File

@ -1,8 +1,10 @@
--- ---
name: Site support request name: Site support request
about: Request support for a new site about: Request support for a new site
title: '' title: "[Site Request]"
labels: 'site-support-request' labels: Request
assignees: ''
--- ---
<!-- <!--
@ -19,11 +21,11 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/blackjack4494/yt-dlc. youtube-dlc does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights. - Make sure that site you are requesting is not dedicated to copyright infringement, see https://github.com/pukkandan/yt-dlc. youtube-dlc does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar site support requests: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a new site support request - [ ] I'm reporting a new site support request

View File

@ -1,7 +1,10 @@
--- ---
name: Site feature request name: Site feature request
about: Request a new functionality for a site about: Request a new functionality for a site
title: '' title: "[Site Request]"
labels: Request
assignees: ''
--- ---
<!-- <!--
@ -18,9 +21,9 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar site feature requests: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a site feature request - [ ] I'm reporting a site feature request

View File

@ -2,6 +2,9 @@
name: Bug report name: Bug report
about: Report a bug unrelated to any particular site or extractor about: Report a bug unrelated to any particular site or extractor
title: '' title: ''
labels: ''
assignees: ''
--- ---
<!-- <!--
@ -18,12 +21,12 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser. - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/blackjack4494/yt-dlc. - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in https://github.com/pukkandan/yt-dlc.
- Search the bugtracker for similar issues: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar issues: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Read bugs section in FAQ: https://github.com/blackjack4494/yt-dlc - Read bugs section in FAQ: https://github.com/pukkandan/yt-dlc
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a broken site support issue - [ ] I'm reporting a broken site support issue
@ -52,7 +55,11 @@ ## Verbose log
``` ```
PASTE VERBOSE LOG HERE PASTE VERBOSE LOG HERE
``` ```
<!--
Do not remove the above ```
-->
## Description ## Description

View File

@ -1,8 +1,10 @@
--- ---
name: Feature request name: Feature request
about: Request a new functionality unrelated to any particular site or extractor about: Request a new functionality unrelated to any particular site or extractor
title: '' title: "[Feature Request]"
labels: 'request' labels: Request
assignees: ''
--- ---
<!-- <!--
@ -19,9 +21,9 @@ ## Checklist
<!-- <!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dlc:
- First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/blackjack4494/yt-dlc on how to update. Issues with outdated version will be REJECTED. - First of, make sure you are using the latest version of youtube-dlc. Run `youtube-dlc --version` and ensure your version is %(version)s. If it's not, see https://github.com/pukkandan/yt-dlc on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: https://github.com/blackjack4494/yt-dlc. DO NOT post duplicates. - Search the bugtracker for similar feature requests: https://github.com/pukkandan/yt-dlc. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x]) - Finally, put x into all relevant boxes like this [x] (Dont forget to delete the empty space)
--> -->
- [ ] I'm reporting a feature request - [ ] I'm reporting a feature request

View File

@ -8,7 +8,7 @@ ## Please follow the guide below
### Before submitting a *pull request* make sure you have: ### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/ytdl-org/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/ytdl-org/youtube-dl#youtube-dl-coding-conventions) sections - [ ] At least skimmed through [adding new extractor tutorial](https://github.com/ytdl-org/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/ytdl-org/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/ytdl-org/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests - [ ] [Searched](https://github.com/pukkandan/yt-dlc/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
- [ ] Checked the code with [flake8](https://pypi.python.org/pypi/flake8) - [ ] Checked the code with [flake8](https://pypi.python.org/pypi/flake8)
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options: ### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:

View File

@ -58,18 +58,18 @@ jobs:
env: env:
SHA2: ${{ hashFiles('youtube-dlc') }} SHA2: ${{ hashFiles('youtube-dlc') }}
run: echo "::set-output name=sha2_unix::$SHA2" run: echo "::set-output name=sha2_unix::$SHA2"
- name: Install dependencies for pypi # - name: Install dependencies for pypi
run: | # run: |
python -m pip install --upgrade pip # python -m pip install --upgrade pip
pip install setuptools wheel twine # pip install setuptools wheel twine
- name: Build and publish # - name: Build and publish
env: # env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} # TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} # TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: | # run: |
rm -rf dist/* # rm -rf dist/*
python setup.py sdist bdist_wheel # python setup.py sdist bdist_wheel
twine upload dist/* # twine upload dist/*
build_windows: build_windows:

75
.github/workflows/ci.yml vendored Normal file
View File

@ -0,0 +1,75 @@
name: CI
on: [push]
jobs:
tests:
name: Tests
runs-on: ${{ matrix.os }}
strategy:
fail-fast: true
matrix:
os: [ubuntu-latest]
# TODO: python 2.6
# 3.3, 3.4 are not running
python-version: [2.7, 3.5, 3.6, 3.7, 3.8, 3.9, pypy-2.7, pypy-3.6, pypy-3.7]
python-impl: [cpython]
ytdl-test-set: [core, download]
run-tests-ext: [sh]
include:
# python 3.2 is only available on windows via setup-python
- os: windows-latest
python-version: 3.2
python-impl: cpython
ytdl-test-set: core
run-tests-ext: bat
- os: windows-latest
python-version: 3.2
python-impl: cpython
ytdl-test-set: download
run-tests-ext: bat
# jython
- os: ubuntu-latest
python-impl: jython
ytdl-test-set: core
run-tests-ext: sh
- os: ubuntu-latest
python-impl: jython
ytdl-test-set: download
run-tests-ext: sh
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
if: ${{ matrix.python-impl == 'cpython' }}
with:
python-version: ${{ matrix.python-version }}
- name: Set up Java 8
if: ${{ matrix.python-impl == 'jython' }}
uses: actions/setup-java@v1
with:
java-version: 8
- name: Install Jython
if: ${{ matrix.python-impl == 'jython' }}
run: |
wget http://search.maven.org/remotecontent?filepath=org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
java -jar jython-installer.jar -s -d "$HOME/jython"
echo "$HOME/jython/bin" >> $GITHUB_PATH
- name: Install nose
run: pip install nose
- name: Run tests
continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
env:
YTDL_TEST_SET: ${{ matrix.ytdl-test-set }}
run: ./devscripts/run_tests.${{ matrix.run-tests-ext }}
flake8:
name: Linter
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install flake8
run: pip install flake8
- name: Run flake8
run: flake8 .

4
.gitignore vendored
View File

@ -8,6 +8,7 @@ py2exe.log
*.kate-swp *.kate-swp
build/ build/
dist/ dist/
zip/
MANIFEST MANIFEST
README.txt README.txt
youtube-dl.1 youtube-dl.1
@ -46,6 +47,7 @@ updates_key.pem
*.part *.part
*.ytdl *.ytdl
*.swp *.swp
*.spec
test/local_parameters.json test/local_parameters.json
.tox .tox
youtube-dl.zsh youtube-dl.zsh
@ -62,3 +64,5 @@ venv/
.vscode .vscode
cookies.txt cookies.txt
*.sublime-workspace

3
AUTHORS-Fork Normal file
View File

@ -0,0 +1,3 @@
pukkandan
h-h-h-h
pauldubois98

View File

@ -1,4 +1,5 @@
all: youtube-dlc README.md CONTRIBUTING.md README.txt youtube-dlc.1 youtube-dlc.bash-completion youtube-dlc.zsh youtube-dlc.fish supportedsites all: youtube-dlc README.md CONTRIBUTING.md README.txt issuetemplates youtube-dlc.1 youtube-dlc.bash-completion youtube-dlc.zsh youtube-dlc.fish supportedsites
doc: youtube-dlc README.md CONTRIBUTING.md issuetemplates supportedsites
clean: clean:
rm -rf youtube-dlc.1.temp.md youtube-dlc.1 youtube-dlc.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dlc.tar.gz youtube-dlc.zsh youtube-dlc.fish youtube_dlc/extractor/lazy_extractors.py *.dump *.part* *.ytdl *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.ape *.swf *.jpg *.png CONTRIBUTING.md.tmp youtube-dlc youtube-dlc.exe rm -rf youtube-dlc.1.temp.md youtube-dlc.1 youtube-dlc.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dlc.tar.gz youtube-dlc.zsh youtube-dlc.fish youtube_dlc/extractor/lazy_extractors.py *.dump *.part* *.ytdl *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.ape *.swf *.jpg *.png CONTRIBUTING.md.tmp youtube-dlc youtube-dlc.exe

581
README.md
View File

@ -1,68 +1,100 @@
[![Build Status](https://travis-ci.com/blackjack4494/yt-dlc.svg?branch=master)](https://travis-ci.com/blackjack4494/yt-dlc) [![Build Status](https://github.com/pukkandan/yt-dlc/workflows/CI/badge.svg)](https://github.com/pukkandan/yt-dlc/actions?query=workflow%3ACI)
[![PyPi](https://img.shields.io/pypi/v/youtube-dlc.svg)](https://pypi.org/project/youtube-dlc) [![Release Version](https://img.shields.io/badge/Release-2021.01.07-brightgreen)](https://github.com/pukkandan/yt-dlc/releases/latest)
[![License: Unlicense](https://img.shields.io/badge/License-Unlicense-blue.svg)](https://github.com/pukkandan/yt-dlc/blob/master/LICENSE)
[![Gitter chat](https://img.shields.io/gitter/room/youtube-dlc/community)](https://gitter.im/youtube-dlc) youtube-dlc - download videos from youtube.com and many other [video platforms](docs/supportedsites.md)
[![License: Unlicense](https://img.shields.io/badge/license-Unlicense-blue.svg)](https://github.com/blackjack4494/yt-dlc/blob/master/LICENSE)
youtube-dlc - download videos from youtube.com or other video platforms. This is a fork of [youtube-dlc](https://github.com/blackjack4494/yt-dlc) which is inturn a fork of [youtube-dl](https://github.com/ytdl-org/youtube-dl)
youtube-dlc is a fork of youtube-dl with the intention of getting features tested by the community merged in the tool faster, since youtube-dl's development seems to be slowing down. (https://web.archive.org/web/20201014194602/https://github.com/ytdl-org/youtube-dl/issues/26462) * [CHANGES FROM YOUTUBE-DLC](#changes)
* [INSTALLATION](#installation)
* [UPDATE](#update)
* [COMPILE](#compile)
* [YOUTUBE-DLC](#youtube-dlc)
* [DESCRIPTION](#description)
* [OPTIONS](#options)
* [Network Options](#network-options)
* [Geo Restriction](#geo-restriction)
* [Video Selection](#video-selection)
* [Download Options](#download-options)
* [Filesystem Options](#filesystem-options)
* [Thumbnail images](#thumbnail-images)
* [Internet Shortcut Options](#internet-shortcut-options)
* [Verbosity / Simulation Options](#verbosity--simulation-options)
* [Workarounds](#workarounds)
* [Video Format Options](#video-format-options)
* [Subtitle Options](#subtitle-options)
* [Authentication Options](#authentication-options)
* [Adobe Pass Options](#adobe-pass-options)
* [Post-processing Options](#post-processing-options)
* [SponSkrub Options (SponsorBlock)](#sponskrub-options-sponsorblock)
* [Extractor Options](#extractor-options)
* [CONFIGURATION](#configuration)
* [Authentication with .netrc file](#authentication-with-netrc-file)
* [OUTPUT TEMPLATE](#output-template)
* [Output template and Windows batch files](#output-template-and-windows-batch-files)
* [Output template examples](#output-template-examples)
* [FORMAT SELECTION](#format-selection)
* [Filtering Formats](#filtering-formats)
* [Sorting Formats](#sorting-formats)
* [Format Selection examples](#format-selection-examples)
* [VIDEO SELECTION](#video-selection-1)
* [MORE](#more)
- [INSTALLATION](#installation)
- [UPDATE](#update) # CHANGES
- [DESCRIPTION](#description) See [commits](https://github.com/pukkandan/yt-dlc/commits) for more details
- [OPTIONS](#options)
- [Network Options:](#network-options) ### 2021.01.05
- [Geo Restriction:](#geo-restriction) * **Format Sort:** Added `--format-sort` (`-S`), `--format-sort-force` (`--S-force`) - See [Sorting Formats](#sorting-formats) for details
- [Video Selection:](#video-selection) * **Format Selection:** See [Format Selection](#format-selection) for details
- [Download Options:](#download-options) * New format selectors: `best*`, `worst*`, `bestvideo*`, `bestaudio*`, `worstvideo*`, `worstaudio*`
- [Filesystem Options:](#filesystem-options) * Changed video format sorting to show video only files and video+audio files together.
- [Thumbnail images:](#thumbnail-images) * Added `--video-multistreams`, `--no-video-multistreams`, `--audio-multistreams`, `--no-audio-multistreams`
- [Verbosity / Simulation Options:](#verbosity--simulation-options) * Added `b`,`w`,`v`,`a` as alias for `best`, `worst`, `video` and `audio` respectively
- [Workarounds:](#workarounds) * **Shortcut Options:** Added `--write-link`, `--write-url-link`, `--write-webloc-link`, `--write-desktop-link` by @h-h-h-h - See [Internet Shortcut Options](#internet-shortcut-options) for details
- [Video Format Options:](#video-format-options) * **Sponskrub integration:** Added `--sponskrub`, `--sponskrub-cut`, `--sponskrub-force`, `--sponskrub-location`, `--sponskrub-args` - See [SponSkrub Options](#sponskrub-options-sponsorblock) for details
- [Subtitle Options:](#subtitle-options) * Added `--force-download-archive` (`--force-write-archive`) by by h-h-h-h
- [Authentication Options:](#authentication-options) * Added `--list-formats-as-table`, `--list-formats-old`
- [Adobe Pass Options:](#adobe-pass-options) * **Negative Options:** Makes it possible to negate boolean options by adding a `no-` to the switch
- [Post-processing Options:](#post-processing-options) * Added `--no-ignore-dynamic-mpd`, `--no-allow-dynamic-mpd`, `--allow-dynamic-mpd`, `--youtube-include-hls-manifest`, `--no-youtube-include-hls-manifest`, `--no-youtube-skip-hls-manifest`, `--no-download`, `--no-download-archive`, `--resize-buffer`, `--part`, `--mtime`, `--no-keep-fragments`, `--no-cookies`, `--no-write-annotations`, `--no-write-info-json`, `--no-write-description`, `--no-write-thumbnail`, `--youtube-include-dash-manifest`, `--post-overwrites`, `--no-keep-video`, `--no-embed-subs`, `--no-embed-thumbnail`, `--no-add-metadata`, `--no-include-ads`, `--no-write-sub`, `--no-write-auto-sub`, `--no-playlist-reverse`, `--no-restrict-filenames`, `--youtube-include-dash-manifest`, `--no-format-sort-force`, `--flat-videos`, `--no-list-formats-as-table`, `--no-sponskrub`, `--no-sponskrub-cut`, `--no-sponskrub-force`
- [Extractor Options:](#extractor-options) * Renamed: `--write-subs`, --no-write-subs`, `--no-write-auto-subs, `--write-auto-subs`. Note that these can still be used without the ending "s"
- [CONFIGURATION](#configuration) * Relaxed validation for format filters so that any arbitrary field can be used
- [Authentication with `.netrc` file](#authentication-with-netrc-file) * Fix for embedding thumbnail in mp3 by @pauldubois98
- [OUTPUT TEMPLATE](#output-template) * Make Twitch Video ID output from Playlist and VOD extractor same. This is only a temporary fix
- [Output template and Windows batch files](#output-template-and-windows-batch-files) * **Merge youtube-dl:** Upto [2020.01.03](https://github.com/ytdl-org/youtube-dl/commit/8e953dcbb10a1a42f4e12e4e132657cb0100a1f8) - See [blackjack4494/yt-dlc#280](https://github.com/blackjack4494/yt-dlc/pull/280) for details
- [Output template examples](#output-template-examples) * Cleaned up the fork for public use
- [FORMAT SELECTION](#format-selection)
- [Format selection examples](#format-selection-examples) ### 2021.01.05-2
- [VIDEO SELECTION](#video-selection-1) * **Changed defaults:**
* Enabled `--ignore`
* Disabled `--video-multistreams` and `--audio-multistreams`
* Changed default format selection to `bv*+ba/b` when `--audio-multistreams` is disabled
* Changed default format sort order to `res,fps,codec,size,br,asr,proto,ext,has_audio,source,format_id`
* Changed `webm` to be more preferable than `flv` in format sorting
* Changed default output template to `%(title)s [%(id)s].%(ext)s`
* Enabled `--list-formats-as-table`
### 2021.01.07
* Removed priority of `av01` codec in `-S` since most devices don't support it yet
* Added `duration_string` to be used in `--output`
* Created First Release
# INSTALLATION # INSTALLATION
[How to update](#update)
**All Platforms** To use the latest version, simply download and run the [latest release](https://github.com/pukkandan/yt-dlc/releases/latest).
Preferred way using pip: Currently, there is no support for any package managers.
You may want to use `python3` instead of `python`
python -m pip install --upgrade youtube-dlc
If you want to install the current master branch If you want to install the current master branch
python -m pip install git+https://github.com/blackjack4494/yt-dlc python -m pip install git+https://github.com/pukkandan/yt-dlc
**UNIX** (Linux, macOS, etc.) ### UPDATE
Using wget: **DO NOT UPDATE using `-U` !** instead download binaries again
sudo wget https://github.com/blackjack4494/yt-dlc/releases/latest/download/youtube-dlc -O /usr/local/bin/youtube-dlc ### COMPILE
sudo chmod a+rx /usr/local/bin/youtube-dlc
Using curl: **For Windows**:
sudo curl -L https://github.com/blackjack4494/yt-dlc/releases/latest/download/youtube-dlc -o /usr/local/bin/youtube-dlc
sudo chmod a+rx /usr/local/bin/youtube-dlc
**Windows** users can download [youtube-dlc.exe](https://github.com/blackjack4494/yt-dlc/releases/latest/download/youtube-dlc.exe) (**do not** put in `C:\Windows\System32`!).
**Compile**
To build the Windows executable yourself (without version info!) To build the Windows executable yourself (without version info!)
python -m pip install --upgrade pyinstaller python -m pip install --upgrade pyinstaller
@ -74,7 +106,7 @@ # INSTALLATION
New way to build Windows is to use `python pyinst.py` (please use python3 64Bit) New way to build Windows is to use `python pyinst.py` (please use python3 64Bit)
For 32Bit Version use a 32Bit Version of python (3 preferred here as well) and run `python pyinst32.py` For 32Bit Version use a 32Bit Version of python (3 preferred here as well) and run `python pyinst32.py`
For Unix: **For Unix**:
You will need the required build tools You will need the required build tools
python, make (GNU), pandoc, zip, nosetests python, make (GNU), pandoc, zip, nosetests
Then simply type this Then simply type this
@ -82,26 +114,27 @@ # INSTALLATION
make make
# UPDATE
**DO NOT UPDATE using `-U` !** instead download binaries again or when installed with pip use a described above when installing.
I will add some memorable short links to the binaries so you can download them easier.
# DESCRIPTION # DESCRIPTION
**youtube-dlc** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like. **youtube-dlc** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
youtube-dlc [OPTIONS] URL [URL...] youtube-dlc [OPTIONS] URL [URL...]
# OPTIONS # OPTIONS
`Ctrl+F` is your friend :D
<!-- Autogenerated -->
## General Options:
-h, --help Print this help text and exit -h, --help Print this help text and exit
--version Print program version and exit --version Print program version and exit
-U, --update Update this program to latest version. Make -U, --update [BROKEN] Update this program to latest
sure that you have sufficient permissions version. Make sure that you have sufficient
(run with sudo if needed) permissions (run with sudo if needed)
-i, --ignore-errors Continue on download errors, for example to -i, --ignore-errors Continue on download errors, for example to
skip unavailable videos in a playlist skip unavailable videos in a playlist
--abort-on-error Abort downloading of further videos (in the (default) (Same as --no-abort-on-error)
playlist or the command line) if an error --abort-on-error Abort downloading of further videos if an
occurs error occurs (Same as --no-ignore-errors)
--dump-user-agent Display the current browser identification --dump-user-agent Display the current browser identification
--list-extractors List all supported extractors --list-extractors List all supported extractors
--extractor-descriptions Output descriptions of all supported --extractor-descriptions Output descriptions of all supported
@ -110,26 +143,28 @@ # OPTIONS
extractor extractor
--default-search PREFIX Use this prefix for unqualified URLs. For --default-search PREFIX Use this prefix for unqualified URLs. For
example "gvsearch2:" downloads two videos example "gvsearch2:" downloads two videos
from google videos for youtube-dlc "large from google videos for youtube-dl "large
apple". Use the value "auto" to let apple". Use the value "auto" to let
youtube-dlc guess ("auto_warning" to emit a youtube-dl guess ("auto_warning" to emit a
warning when guessing). "error" just throws warning when guessing). "error" just throws
an error. The default value "fixup_error" an error. The default value "fixup_error"
repairs broken URLs, but emits an error if repairs broken URLs, but emits an error if
this is not possible instead of searching. this is not possible instead of searching.
--ignore-config Do not read configuration files. When given --ignore-config, --no-config Do not read configuration files. When given
in the global configuration file in the global configuration file
/etc/youtube-dlc.conf: Do not read the user /etc/youtube-dl.conf: Do not read the user
configuration in ~/.config/youtube- configuration in ~/.config/youtube-
dlc/config (%APPDATA%/youtube- dl/config (%APPDATA%/youtube-dl/config.txt
dlc/config.txt on Windows) on Windows)
--config-location PATH Location of the configuration file; either --config-location PATH Location of the configuration file; either
the path to the config or its containing the path to the config or its containing
directory. directory.
--flat-playlist Do not extract the videos of a playlist, --flat-playlist Do not extract the videos of a playlist,
only list them. only list them.
--flat-videos Do not resolve the video urls
--no-flat-playlist Extract the videos of a playlist
--mark-watched Mark videos watched (YouTube only) --mark-watched Mark videos watched (YouTube only)
--no-mark-watched Do not mark videos watched (YouTube only) --no-mark-watched Do not mark videos watched
--no-color Do not emit color codes in output --no-color Do not emit color codes in output
## Network Options: ## Network Options:
@ -180,11 +215,15 @@ ## Video Selection:
SIZE (e.g. 50k or 44.6m) SIZE (e.g. 50k or 44.6m)
--max-filesize SIZE Do not download any videos larger than SIZE --max-filesize SIZE Do not download any videos larger than SIZE
(e.g. 50k or 44.6m) (e.g. 50k or 44.6m)
--date DATE Download only videos uploaded in this date --date DATE Download only videos uploaded in this date.
The date can be "YYYYMMDD" or in the format
"(now|today)[+-][0-9](day|week|month|year)(s)?"
--datebefore DATE Download only videos uploaded on or before --datebefore DATE Download only videos uploaded on or before
this date (i.e. inclusive) this date. The date formats accepted is the
same as --date
--dateafter DATE Download only videos uploaded on or after --dateafter DATE Download only videos uploaded on or after
this date (i.e. inclusive) this date. The date formats accepted is the
same as --date
--min-views COUNT Do not download any videos with less than --min-views COUNT Do not download any videos with less than
COUNT views COUNT views
--max-views COUNT Do not download any videos with more than --max-views COUNT Do not download any videos with more than
@ -208,6 +247,7 @@ ## Video Selection:
service), but who also have a description, service), but who also have a description,
use --match-filter "like_count > 100 & use --match-filter "like_count > 100 &
dislike_count <? 50 & description" . dislike_count <? 50 & description" .
--no-match-filter Do not use generic video filter (default)
--no-playlist Download only the video, if the URL refers --no-playlist Download only the video, if the URL refers
to a video and a playlist. to a video and a playlist.
--yes-playlist Download the playlist, if the URL refers to --yes-playlist Download the playlist, if the URL refers to
@ -219,8 +259,10 @@ ## Video Selection:
downloaded videos in it. downloaded videos in it.
--break-on-existing Stop the download process after attempting --break-on-existing Stop the download process after attempting
to download a file that's in the archive. to download a file that's in the archive.
--no-download-archive Do not use archive file (default)
--include-ads Download advertisements as well --include-ads Download advertisements as well
(experimental) (experimental)
--no-include-ads Do not download advertisements (default)
## Download Options: ## Download Options:
-r, --limit-rate RATE Maximum download rate in bytes per second -r, --limit-rate RATE Maximum download rate in bytes per second
@ -230,25 +272,29 @@ ## Download Options:
--fragment-retries RETRIES Number of retries for a fragment (default --fragment-retries RETRIES Number of retries for a fragment (default
is 10), or "infinite" (DASH, hlsnative and is 10), or "infinite" (DASH, hlsnative and
ISM) ISM)
--skip-unavailable-fragments Skip unavailable fragments (DASH, hlsnative --skip-unavailable-fragments Skip unavailable fragments for DASH,
and ISM) hlsnative and ISM (default)
--abort-on-unavailable-fragment Abort downloading when some fragment is not (Same as --no-abort-on-unavailable-fragment)
available --abort-on-unavailable-fragment Abort downloading if a fragment is unavailable
(Same as --no-skip-unavailable-fragments)
--keep-fragments Keep downloaded fragments on disk after --keep-fragments Keep downloaded fragments on disk after
downloading is finished; fragments are downloading is finished
erased by default --no-keep-fragments Delete downloaded fragments after
downloading is finished (default)
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K) --buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
(default is 1024) (default is 1024)
--no-resize-buffer Do not automatically adjust the buffer --resize-buffer The buffer size is automatically resized
size. By default, the buffer size is from an initial value of --buffer-size
automatically resized from an initial value (default)
of SIZE. --no-resize-buffer Do not automatically adjust the buffer size
--http-chunk-size SIZE Size of a chunk for chunk-based HTTP --http-chunk-size SIZE Size of a chunk for chunk-based HTTP
downloading (e.g. 10485760 or 10M) (default downloading (e.g. 10485760 or 10M) (default
is disabled). May be useful for bypassing is disabled). May be useful for bypassing
bandwidth throttling imposed by a webserver bandwidth throttling imposed by a webserver
(experimental) (experimental)
--playlist-reverse Download playlist videos in reverse order --playlist-reverse Download playlist videos in reverse order
--no-playlist-reverse Download playlist videos in default order
(default)
--playlist-random Download playlist videos in random order --playlist-random Download playlist videos in random order
--xattr-set-filesize Set file xattribute ytdl.filesize with --xattr-set-filesize Set file xattribute ytdl.filesize with
expected file size expected file size
@ -271,53 +317,71 @@ ## Filesystem Options:
stdin), one URL per line. Lines starting stdin), one URL per line. Lines starting
with '#', ';' or ']' are considered as with '#', ';' or ']' are considered as
comments and ignored. comments and ignored.
--id Use only video ID in file name
-o, --output TEMPLATE Output filename template, see the "OUTPUT -o, --output TEMPLATE Output filename template, see the "OUTPUT
TEMPLATE" for all the info TEMPLATE" for details
--autonumber-start NUMBER Specify the start value for %(autonumber)s --autonumber-start NUMBER Specify the start value for %(autonumber)s
(default is 1) (default is 1)
--restrict-filenames Restrict filenames to only ASCII --restrict-filenames Restrict filenames to only ASCII
characters, and avoid "&" and spaces in characters, and avoid "&" and spaces in
filenames filenames
--no-restrict-filenames Allow Unicode characters, "&" and spaces in
filenames (default)
-w, --no-overwrites Do not overwrite files -w, --no-overwrites Do not overwrite files
-c, --continue Force resume of partially downloaded files. -c, --continue Resume partially downloaded files (default)
By default, youtube-dlc will resume --no-continue Restart download of partially downloaded
downloads if possible. files from beginning
--no-continue Do not resume partially downloaded files --part Use .part files instead of writing directly
(restart from beginning) into output file (default)
--no-part Do not use .part files - write directly --no-part Do not use .part files - write directly
into output file into output file
--mtime Use the Last-modified header to set the
file modification time (default)
--no-mtime Do not use the Last-modified header to set --no-mtime Do not use the Last-modified header to set
the file modification time the file modification time
--write-description Write video description to a .description --write-description Write video description to a .description
file file
--no-write-description Do not write video description (default)
--write-info-json Write video metadata to a .info.json file --write-info-json Write video metadata to a .info.json file
--no-write-info-json Do not write video metadata (default)
--write-annotations Write video annotations to a --write-annotations Write video annotations to a
.annotations.xml file .annotations.xml file
--no-write-annotations Do not write video annotations (default)
--load-info-json FILE JSON file containing the video information --load-info-json FILE JSON file containing the video information
(created with the "--write-info-json" (created with the "--write-info-json"
option) option)
--cookies FILE File to read cookies from and dump cookie --cookies FILE File to read cookies from and dump cookie
jar in jar in
--cache-dir DIR Location in the filesystem where youtube- --no-cookies Do not read/dump cookies (default)
dlc can store some downloaded information --cache-dir DIR Location in the filesystem where youtube-dl
can store some downloaded information
permanently. By default permanently. By default
$XDG_CACHE_HOME/youtube-dlc or $XDG_CACHE_HOME/youtube-dl or
~/.cache/youtube-dlc . At the moment, only ~/.cache/youtube-dl . At the moment, only
YouTube player files (for videos with YouTube player files (for videos with
obfuscated signatures) are cached, but that obfuscated signatures) are cached, but that
may change. may change.
--no-cache-dir Disable filesystem caching --no-cache-dir Disable filesystem caching
--rm-cache-dir Delete all filesystem cache files --rm-cache-dir Delete all filesystem cache files
--trim-file-name Limit the filename length (extension --trim-file-name LENGTH Limit the filename length (extension
excluded) excluded)
## Thumbnail images: ## Thumbnail Images:
--write-thumbnail Write thumbnail image to disk --write-thumbnail Write thumbnail image to disk
--no-write-thumbnail Do not write thumbnail image to disk
(default)
--write-all-thumbnails Write all thumbnail image formats to disk --write-all-thumbnails Write all thumbnail image formats to disk
--list-thumbnails Simulate and list all available thumbnail --list-thumbnails Simulate and list all available thumbnail
formats formats
## Internet Shortcut Options:
--write-link Write an internet shortcut file, depending on
the current platform (.url/.webloc/.desktop).
The URL may be cached by the OS.
--write-url-link Write a Windows .url internet shortcut file.
(The OS caches the URL based on the file path)
--write-webloc-link Write a .webloc macOS internet shortcut file
--write-desktop-link Write a .desktop Linux internet shortcut file
## Verbosity / Simulation Options: ## Verbosity / Simulation Options:
-q, --quiet Activate quiet mode -q, --quiet Activate quiet mode
--no-warnings Ignore warnings --no-warnings Ignore warnings
@ -341,6 +405,10 @@ ## Verbosity / Simulation Options:
playlist information in a single line. playlist information in a single line.
--print-json Be quiet and print the video information as --print-json Be quiet and print the video information as
JSON (video is still being downloaded). JSON (video is still being downloaded).
--force-write-archive Force download archive entries to be written
as far as no errors occur, even if -s or
another simulation switch is used.
(Same as --force-download-archive)
--newline Output progress bar as new lines --newline Output progress bar as new lines
--no-progress Do not print progress bar --no-progress Do not print progress bar
--console-title Display progress in console titlebar --console-title Display progress in console titlebar
@ -351,10 +419,9 @@ ## Verbosity / Simulation Options:
files in the current directory to debug files in the current directory to debug
problems problems
--print-traffic Display sent and read HTTP traffic --print-traffic Display sent and read HTTP traffic
-C, --call-home Contact the youtube-dlc server for -C, --call-home Contact the youtube-dlc server for debugging
debugging --no-call-home Do not contact the youtube-dlc server for
--no-call-home Do NOT contact the youtube-dlc server for debugging (default)
debugging
## Workarounds: ## Workarounds:
--encoding ENCODING Force the specified encoding (experimental) --encoding ENCODING Force the specified encoding (experimental)
@ -381,30 +448,60 @@ ## Workarounds:
before each download (maximum possible before each download (maximum possible
number of seconds to sleep). Must only be number of seconds to sleep). Must only be
used along with --min-sleep-interval. used along with --min-sleep-interval.
--sleep-subtitles Enforce sleep interval on subtitles as well. --sleep-subtitles SECONDS Enforce sleep interval on subtitles as well
## Video Format Options: ## Video Format Options:
-f, --format FORMAT Video format code, see the "FORMAT -f, --format FORMAT Video format code, see "FORMAT SELECTION"
SELECTION" for all the info for more details
-S, --format-sort SORTORDER Sort the formats by the fields given, see
"Sorting Formats" for more details
--S-force, --format-sort-force Force user specified sort order to have
precedence over all fields, see "Sorting
Formats" for more details
--no-format-sort-force Some fields have precedence over the user
specified sort order (default), see
"Sorting Formats" for more details
--video-multistreams Allow multiple video streams to be merged
into a single file
--no-video-multistreams Only one video stream is downloaded for
each output file (default)
--audio-multistreams Allow multiple audio streams to be merged
into a single file
--no-audio-multistreams Only one audio stream is downloaded for
each output file (default)
--all-formats Download all available video formats --all-formats Download all available video formats
--prefer-free-formats Prefer free video formats unless a specific --prefer-free-formats Prefer free video formats unless a specific
one is requested one is requested
-F, --list-formats List all available formats of requested -F, --list-formats List all available formats of requested
videos videos
--list-formats-as-table Present the output of -F in a more tabular
form (default)
(Same as --no-list-formats-as-table)
--list-formats-old Present the output of -F in the old form
--youtube-include-dash-manifest Download the DASH manifests and related data
on YouTube videos (default)
(Same as --no-youtube-skip-dash-manifest)
--youtube-skip-dash-manifest Do not download the DASH manifests and --youtube-skip-dash-manifest Do not download the DASH manifests and
related data on YouTube videos related data on YouTube videos
(Same as --no-youtube-include-dash-manifest)
--youtube-include-hls-manifest Download the HLS manifests and related data
on YouTube videos (default)
(Same as --no-youtube-skip-hls-manifest)
--youtube-skip-hls-manifest Do not download the HLS manifests and --youtube-skip-hls-manifest Do not download the HLS manifests and
related data on YouTube videos related data on YouTube videos
(Same as --no-youtube-include-hls-manifest)
--merge-output-format FORMAT If a merge is required (e.g. --merge-output-format FORMAT If a merge is required (e.g.
bestvideo+bestaudio), output to given bestvideo+bestaudio), output to given
container format. One of mkv, mp4, ogg, container format. One of mkv, mp4, ogg,
webm, flv. Ignored if no merge is required webm, flv. Ignored if no merge is required
## Subtitle Options: ## Subtitle Options:
--write-sub Write subtitle file --write-subs Write subtitle file
--write-auto-sub Write automatically generated subtitle file --no-write-subs Do not write subtitle file (default)
--write-auto-subs Write automatically generated subtitle file
(YouTube only) (YouTube only)
--no-write-auto-subs Do not write automatically generated
subtitle file (default)
--all-subs Download all the available subtitles of the --all-subs Download all the available subtitles of the
video video
--list-subs List all available subtitles for the video --list-subs List all available subtitles for the video
@ -421,7 +518,7 @@ ## Authentication Options:
out, youtube-dlc will ask interactively. out, youtube-dlc will ask interactively.
-2, --twofactor TWOFACTOR Two-factor authentication code -2, --twofactor TWOFACTOR Two-factor authentication code
-n, --netrc Use .netrc authentication data -n, --netrc Use .netrc authentication data
--video-password PASSWORD Video password (vimeo, smotri, youku) --video-password PASSWORD Video password (vimeo, youku)
## Adobe Pass Options: ## Adobe Pass Options:
--ap-mso MSO Adobe Pass multiple-system operator (TV --ap-mso MSO Adobe Pass multiple-system operator (TV
@ -434,7 +531,7 @@ ## Adobe Pass Options:
--ap-list-mso List all supported multiple-system --ap-list-mso List all supported multiple-system
operators operators
## Post-processing Options: ## Post-Processing Options:
-x, --extract-audio Convert video files to audio-only files -x, --extract-audio Convert video files to audio-only files
(requires ffmpeg or avconv and ffprobe or (requires ffmpeg or avconv and ffprobe or
avprobe) avprobe)
@ -446,23 +543,27 @@ ## Post-processing Options:
a value between 0 (better) and 9 (worse) a value between 0 (better) and 9 (worse)
for VBR or a specific bitrate like 128K for VBR or a specific bitrate like 128K
(default 5) (default 5)
--remux-video FORMAT Remux the video to another container format --remux-video FORMAT Remux the video into another container if
if necessary (currently supported: mp4|mkv, necessary (currently supported: mp4|mkv).
target container format must support video If target container does not support the
/ audio encoding, remuxing may fail) video/audio codec, remuxing will fail
--recode-video FORMAT Encode the video to another format if --recode-video FORMAT Re-encode the video into another format if
necessary (currently supported: re-encoding is necessary (currently
mp4|flv|ogg|webm|mkv|avi) supported: mp4|flv|ogg|webm|mkv|avi)
--postprocessor-args ARGS Give these arguments to the postprocessor --postprocessor-args ARGS Give these arguments to the postprocessor
-k, --keep-video Keep the video file on disk after the post- -k, --keep-video Keep the intermediate video file on disk
processing; the video is erased by default after post-processing
--no-post-overwrites Do not overwrite post-processed files; the --no-keep-video Delete the intermediate video file after
post-processed files are overwritten by post-processing (default)
default --post-overwrites Overwrite post-processed files (default)
--no-post-overwrites Do not overwrite post-processed files
--embed-subs Embed subtitles in the video (only for mp4, --embed-subs Embed subtitles in the video (only for mp4,
webm and mkv videos) webm and mkv videos)
--no-embed-subs Do not embed subtitles (default)
--embed-thumbnail Embed thumbnail in the audio as cover art --embed-thumbnail Embed thumbnail in the audio as cover art
--no-embed-thumbnail Do not embed thumbnail (default)
--add-metadata Write metadata to the video file --add-metadata Write metadata to the video file
--no-add-metadata Do not write metadata (default)
--metadata-from-title FORMAT Parse additional metadata like song title / --metadata-from-title FORMAT Parse additional metadata like song title /
artist from the video title. The format artist from the video title. The format
syntax is the same as --output. Regular syntax is the same as --output. Regular
@ -481,9 +582,10 @@ ## Post-processing Options:
default; fix file if we can, warn default; fix file if we can, warn
otherwise) otherwise)
--prefer-avconv Prefer avconv over ffmpeg for running the --prefer-avconv Prefer avconv over ffmpeg for running the
postprocessors postprocessors (Same as --no-prefer-ffmpeg)
--prefer-ffmpeg Prefer ffmpeg over avconv for running the --prefer-ffmpeg Prefer ffmpeg over avconv for running the
postprocessors (default) postprocessors (default)
(Same as --no-prefer-avconv)
--ffmpeg-location PATH Location of the ffmpeg/avconv binary; --ffmpeg-location PATH Location of the ffmpeg/avconv binary;
either the path to the binary or its either the path to the binary or its
containing directory. containing directory.
@ -494,8 +596,30 @@ ## Post-processing Options:
--convert-subs FORMAT Convert the subtitles to other format --convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt|lrc) (currently supported: srt|ass|vtt|lrc)
## [SponSkrub](https://github.com/faissaloo/SponSkrub) Options ([SponsorBlock](https://sponsor.ajay.app)):
--sponskrub Use sponskrub to mark sponsored sections
with the data available in SponsorBlock
API. This is enabled by default if the
sponskrub binary exists (Youtube only)
--no-sponskrub Do not use sponskrub
--sponskrub-cut Cut out the sponsor sections instead of
simply marking them
--no-sponskrub-cut Simply mark the sponsor sections, not cut
them out (default)
--sponskrub-force Run sponskrub even if the video was already
downloaded
--no-sponskrub-force Do not cut out the sponsor sections if the
video was already downloaded (default)
--sponskrub-location PATH Location of the sponskrub binary; either
the path to the binary or its containing
directory.
--sponskrub-args None Give these arguments to sponskrub
## Extractor Options: ## Extractor Options:
--ignore-dynamic-mpd Do not process dynamic DASH manifests --ignore-dynamic-mpd Do not process dynamic DASH manifests
(Same as --no-allow-dynamic-mpd)
--allow-dynamic-mpd Process dynamic DASH manifests (default)
(Same as --no-ignore-dynamic-mpd)
# CONFIGURATION # CONFIGURATION
@ -572,6 +696,7 @@ # OUTPUT TEMPLATE
- `channel_id` (string): Id of the channel - `channel_id` (string): Id of the channel
- `location` (string): Physical location where the video was filmed - `location` (string): Physical location where the video was filmed
- `duration` (numeric): Length of the video in seconds - `duration` (numeric): Length of the video in seconds
- `duration_string` (string): Length of the video (HH-mm-ss)
- `view_count` (numeric): How many users have watched the video on the platform - `view_count` (numeric): How many users have watched the video on the platform
- `like_count` (numeric): Number of positive ratings of the video - `like_count` (numeric): Number of positive ratings of the video
- `dislike_count` (numeric): Number of negative ratings of the video - `dislike_count` (numeric): Number of negative ratings of the video
@ -649,7 +774,7 @@ # OUTPUT TEMPLATE
To use percent literals in an output template use `%%`. To output to stdout use `-o -`. To use percent literals in an output template use `%%`. To output to stdout use `-o -`.
The current default template is `%(title)s-%(id)s.%(ext)s`. The current default template is `%(title)s [%(id)s].%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title: In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
@ -686,11 +811,10 @@ # Stream the video being downloaded to stdout
# FORMAT SELECTION # FORMAT SELECTION
By default youtube-dlc tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dlc will guess it for you by **default**. By default, youtube-dlc tries to download the best available quality if you **don't** pass any options.
This is generally equivalent to using `-f bestvideo*+bestaudio/best`. However, if multiple audiostreams is enabled (`--audio-multistreams`), the default format changes to `-f bestvideo+bestaudio/best`. Similarly, if ffmpeg and avconv are unavailable, or if you use youtube-dlc to stream to `stdout` (`-o -`), the default becomes `-f best/bestvideo+bestaudio`.
But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so-called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more. The general syntax for format selection is `--f FORMAT` (or `--format FORMAT`) where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
**tl;dr:** [navigate me to examples](#format-selection-examples). **tl;dr:** [navigate me to examples](#format-selection-examples).
@ -700,19 +824,29 @@ # FORMAT SELECTION
You can also use special names to select particular edge case formats: You can also use special names to select particular edge case formats:
- `best`: Select the best quality format represented by a single file with video and audio. - `b*`, `best*`: Select the best quality format irrespective of whether it contains video or audio.
- `worst`: Select the worst quality format represented by a single file with video and audio. - `w*`, `worst*`: Select the worst quality format irrespective of whether it contains video or audio.
- `bestvideo`: Select the best quality video-only format (e.g. DASH video). May not be available. - `b`, `best`: Select the best quality format that contains both video and audio. Equivalent to `best*[vcodec!=none][acodec!=none]`
- `worstvideo`: Select the worst quality video-only format. May not be available. - `w`, `worst`: Select the worst quality format that contains both video and audio. Equivalent to `worst*[vcodec!=none][acodec!=none]`
- `bestaudio`: Select the best quality audio only-format. May not be available. - `bv`, `bestvideo`: Select the best quality video-only format. Equivalent to `best*[acodec=none]`
- `worstaudio`: Select the worst quality audio only-format. May not be available. - `wv`, `worstvideo`: Select the worst quality video-only format. Equivalent to `worst*[acodec=none]`
- `bv*`, `bestvideo*`: Select the best quality format that contains video. It may also contain audio. Equivalent to `best*[vcodec!=none]`
- `wv*`, `worstvideo*`: Select the worst quality format that contains video. It may also contain audio. Equivalent to `worst*[vcodec!=none]`
- `ba`, `bestaudio`: Select the best quality audio-only format. Equivalent to `best*[vcodec=none]`
- `wa`, `worstaudio`: Select the worst quality audio-only format. Equivalent to `worst*[vcodec=none]`
- `ba*`, `bestaudio*`: Select the best quality format that contains audio. It may also contain video. Equivalent to `best*[acodec!=none]`
- `wa*`, `worstaudio*`: Select the worst quality format that contains audio. It may also contain video. Equivalent to `worst*[acodec!=none]`
For example, to download the worst quality video-only format you can use `-f worstvideo`. For example, to download the worst quality video-only format you can use `-f worstvideo`. It is however recomended to never actually use `worst` and related options. When your format selector is `worst`, the format which is worst in all respects is selected. Most of the time, what you actually want is the video with the smallest filesize instead. So it is generally better to use `-f best -S +size,+br,+res,+fps` instead of `-f worst`. See [sorting formats](#sorting-formats) for more details.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download. If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
You can merge the video and audio of multiple formats into a single file using `-f <format1>+<format2>+...` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg/avconv. If `--no-video-multistreams` is used, all formats with a video stream except the first one are ignored. Similarly, if `--no-audio-multistreams` is used, all formats with an audio stream except the first one are ignored. For example, `-f bestvideo+best+bestaudio` will download and merge all 3 given formats. The resulting file will have 2 video streams and 2 audio streams. But `-f bestvideo+best+bestaudio --no-video-multistreams` will download and merge only `bestvideo` and `bestaudio`. `best` is ignored since another format containing a video stream (`bestvideo`) has already been selected. The order of the formats is therefore important. `-f best+bestaudio --no-audio-multistreams` will download and merge both formats while `-f bestaudio+best --no-audio-multistreams` will ignore `best` and download only `bestaudio`.
## Filtering Formats
You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`). You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `>=`, `=` (equals), `!=` (not equals): The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `>=`, `=` (equals), `!=` (not equals):
@ -734,60 +868,173 @@ # FORMAT SELECTION
- `container`: Name of the container format - `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`) - `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
- `format_id`: A short description of the format - `format_id`: A short description of the format
- `language`: Language code
Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain). Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain).
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster. Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster. Any other field made available by the extractor can also be used for filtering.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg/avconv.
Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`. Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
Since the end of April 2015 and version 2015.04.26, youtube-dlc uses `-f bestvideo+bestaudio/best` as the default format selection (see [#5447](https://github.com/ytdl-org/youtube-dl/issues/5447), [#5456](https://github.com/ytdl-org/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dlc to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dlc still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed. ## Sorting Formats
If you want to preserve the old format selection behavior (prior to youtube-dlc 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dlc. You can change the criteria for being considered the `best` by using `-S` (`--format-sort`). The general format for this is `--format-sort field1,field2...`. The available fields are:
#### Format selection examples - `video`, `has_video`: Gives priority to formats that has a video stream
- `audio`, `has_audio`: Gives priority to formats that has a audio stream
- `extractor`, `preference`, `extractor_preference`: The format preference as given by the extractor
- `lang`, `language_preference`: Language preference as given by the extractor
- `quality`: The quality of the format. This is a metadata field available in some websites
- `source`, `source_preference`: Preference of the source as given by the extractor
- `proto`, `protocol`: Protocol used for download (`https`/`ftps` > `http`/`ftp` > `m3u8-native` > `m3u8` > `http-dash-segments` > other > `mms`/`rtsp` > unknown > `f4f`/`f4m`)
- `vcodec`, `video_codec`: Video Codec (`vp9` > `h265` > `h264` > `vp8` > `h263` > `theora` > other > unknown)
- `acodec`, `audio_codec`: Audio Codec (`opus` > `vorbis` > `aac` > `mp4a` > `mp3` > `ac3` > `dts` > other > unknown)
- `codec`: Equivalent to `vcodec,acodec`
- `vext`, `video_ext`: Video Extension (`mp4` > `webm` > `flv` > other > unknown). If `--prefer-free-formats` is used, `webm` is prefered.
- `aext`, `audio_ext`: Audio Extension (`m4a` > `aac` > `mp3` > `ogg` > `opus` > `webm` > other > unknown). If `--prefer-free-formats` is used, the order changes to `opus` > `ogg` > `webm` > `m4a` > `mp3` > `aac`.
- `ext`, `extension`: Equivalent to `vext,aext`
- `filesize`: Exact filesize, if know in advance. This will be unavailable for mu38 and DASH formats.
- `filesize_approx`: Approximate filesize calculated from the manifests
- `size`, `filesize_estimate`: Exact filesize if available, otherwise approximate filesize
- `height`: Height of video
- `width`: Width of video
- `res`, `dimension`: Video resolution, calculated as the smallest dimension.
- `fps`, `framerate`: Framerate of video
- `tbr`, `total_bitrate`: Total average bitrate in KBit/s
- `vbr`, `video_bitrate`: Average video bitrate in KBit/s
- `abr`, `audio_bitrate`: Average audio bitrate in KBit/s
- `br`, `bitrate`: Equivalent to using `tbr,vbr,abr`
- `samplerate`, `asr`: Audio sample rate in Hz
Note that any other **numerical** field made available by the extractor can also be used. All fields, unless specified otherwise, are sorted in decending order. To reverse this, prefix the field with a `+`. Eg: `+res` prefers format with the smallest resolution. Additionally, you can suffix a prefered value for the fields, seperated by a `:`. Eg: `res:720` prefers larger videos, but no larger than 720p and the smallest video if there are no videos less than 720p. For `codec` and `ext`, you can provide two prefered values, the first for video and the second for audio. Eg: `+codec:avc:m4a` (equivalent to `+vcodec:avc,+acodec:m4a`) sets the video codec preference to `h264` > `h265` > `vp9` > `vp8` > `h263` > `theora` and audio codec preference to `mp4a` > `aac` > `vorbis` > `opus` > `mp3` > `ac3` > `dts`. You can also make the sorting prefer the nearest values to the provided by using `~` as the delimiter. Eg: `filesize~1G` prefers the format with filesize closest to 1 GiB.
The fields `has_video`, `extractor`, `lang`, `quality` are always given highest priority in sorting, irrespective of the user-defined order. This behaviour can be changed by using `--force-format-sort`. Apart from these, the default order used is: `res,fps,codec,size,br,asr,proto,ext,has_audio,source,format_id`. Note that the extractors may override this default order, but they cannot override the user-provided order.
If your format selector is `worst`, the last item is selected after sorting. This means it will select the format that is worst in all repects. Most of the time, what you actually want is the video with the smallest filesize instead. So it is generally better to use `-f best -S +size,+br,+res,+fps`.
**Tip**: You can use the `-v -F` to see how the formats have been sorted (worst to best).
## Format Selection examples
Note that on Windows you may need to use double quotes instead of single. Note that on Windows you may need to use double quotes instead of single.
```bash ```bash
# Download best mp4 format available or any other best if no mp4 available # Download and merge the best best video-only format and the best audio-only format,
$ youtube-dlc -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best' # or download the best combined format if video-only format is not available
$ youtube-dlc -f 'bv+ba/b'
# Download best format available but no better than 480p # Download best format that contains video,
$ youtube-dlc -f 'bestvideo[height<=480]+bestaudio/best[height<=480]' # and if it doesn't already have an audio stream, merge it with best audio-only format
$ youtube-dlc -f 'bv*+ba/b'
# Download best video only format but no bigger than 50 MB # Same as above
$ youtube-dlc -f 'best[filesize<50M]' $ youtube-dlc
# Download best format available via direct link over HTTP/HTTPS protocol
$ youtube-dlc -f '(bestvideo+bestaudio/best)[protocol^=http]'
# Download the best video format and the best audio format without merging them
$ youtube-dlc -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s' # Download the worst video available
$ youtube-dlc -f 'wv*+wa/w'
# Download the best video available but with the smallest resolution
$ youtube-dlc -S '+res'
# Download the smallest video available
$ youtube-dlc -S '+size,+bitrate'
# Download the best mp4 video available, or the best video if no mp4 available
$ youtube-dlc -f 'bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4] / bv*+ba/b'
# Download the best video with the best extension
# (For video, mp4 > webm > flv. For audio, m4a > aac > mp3 ...)
$ youtube-dlc -S 'ext'
# Download the best video available but no better than 480p,
# or the worst video if there is no video under 480p
$ youtube-dlc -f 'bv*[height<=480]+ba/b[height<=480] / wv*+ba/w'
# Download the best video available with the largest height but no better than 480p,
# or the best video with the smallest resolution if there is no video under 480p
$ youtube-dlc -S 'height:480'
# Download the best video available with the largest resolution but no better than 480p,
# or the best video with the smallest resolution if there is no video under 480p
# Resolution is determined by using the smallest dimension.
# So this works correctly for vertical videos as well
$ youtube-dlc -S 'res:480'
# Download the best video (that also has audio) but no bigger than 50 MB,
# or the worst video (that also has audio) if there is no video under 50 MB
$ youtube-dlc -f 'b[filesize<50M] / w'
# Download largest video (that also has audio) but no bigger than 50 MB,
# or the smallest video (that also has audio) if there is no video under 50 MB
$ youtube-dlc -f 'b' -S 'filesize:50M'
# Download best video (that also has audio) that is closest in size to 50 MB
$ youtube-dlc -f 'b' -S 'filesize~50M'
# Download best video available via direct link over HTTP/HTTPS protocol,
# or the best video available via any protocol if there is no such video
$ youtube-dlc -f '(bv*+ba/b)[protocol^=http][protocol!*=dash] / (bv*+ba/b)'
# Download best video available via the best protocol
# (https/ftps > http/ftp > m3u8_native > m3u8 > http_dash_segments ...)
$ youtube-dlc -S 'protocol'
# Download the best video-only format and the best audio-only format without merging them
# For this case, an output template should be used since
# by default, bestvideo and bestaudio will have the same file name.
$ youtube-dlc -f 'bv,ba' -o '%(title)s.f%(format_id)s.%(ext)s'
# Download the best video with h264 codec, or the best video if there is no such video
$ youtube-dlc -f '(bv*+ba/b)[vcodec^=avc1] / (bv*+ba/b)'
# Download the best video with best codec no better than h264,
# or the best video with worst codec if there is no such video
$ youtube-dlc -S 'codec:h264'
# Download the best video with worst codec no worse than h264,
# or the best video with best codec if there is no such video
$ youtube-dlc -S '+codec:h264'
# More complex examples
# Download the best video no better than 720p prefering framerate greater than 30,
# or the worst video (still prefering framerate greater than 30) if there is no such video
$ youtube-dlc -f '((bv*[fps>30]/bv*)[height<=720]/(wv*[fps>30]/wv*)) + ba / (b[fps>30]/b)[height<=720]/(w[fps>30]/w)'
# Download the video with the largest resolution no better than 720p,
# or the video with the smallest resolution available if there is no such video,
# prefering larger framerate for formats with the same resolution
$ youtube-dlc -S 'res:720,fps'
# Download the video with smallest resolution no worse than 480p,
# or the video with the largest resolution available if there is no such video,
# prefering better codec and then larger total bitrate for the same resolution
$ youtube-dlc -S '+res:480,codec,br'
``` ```
Note that in the last example, an output template is recommended as bestvideo and bestaudio may have the same file name.
# VIDEO SELECTION
Videos can be filtered by their upload date using the options `--date`, `--datebefore` or `--dateafter`. They accept dates in two formats:
- Absolute dates: Dates in the format `YYYYMMDD`.
- Relative dates: Dates in the format `(now|today)[+-][0-9](day|week|month|year)(s)?`
Examples:
```bash # MORE
# Download only the videos uploaded in the last 6 months For FAQ, Developer Instructions etc., see the [original README](https://github.com/ytdl-org/youtube-dl)
$ youtube-dlc --dateafter now-6months
# Download only the videos uploaded on January 1, 1970
$ youtube-dlc --date 19700101
$ # Download only the videos uploaded in the 200x decade
$ youtube-dlc --dateafter 20000101 --datebefore 20091231
```

View File

@ -1,3 +1,5 @@
# Unused
#!/usr/bin/env python #!/usr/bin/env python
from __future__ import unicode_literals from __future__ import unicode_literals

View File

@ -1,5 +0,0 @@
#!/bin/bash
wget http://central.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar
java -jar jython-installer-2.7.1.jar -s -d "$HOME/jython"
$HOME/jython/bin/jython -m pip install nose

View File

@ -13,14 +13,14 @@
with io.open(README_FILE, encoding='utf-8') as f: with io.open(README_FILE, encoding='utf-8') as f:
oldreadme = f.read() oldreadme = f.read()
header = oldreadme[:oldreadme.index('# OPTIONS')] header = oldreadme[:oldreadme.index('## General Options:')]
# footer = oldreadme[oldreadme.index('# CONFIGURATION'):] footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
options = helptext[helptext.index(' General Options:') + 19:] options = helptext[helptext.index(' General Options:'):]
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options) options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options)
options = '# OPTIONS\n' + options + '\n' options = options + '\n'
with io.open(README_FILE, 'w', encoding='utf-8') as f: with io.open(README_FILE, 'w', encoding='utf-8') as f:
f.write(header) f.write(header)
f.write(options) f.write(options)
# f.write(footer) f.write(footer)

View File

@ -1,3 +1,4 @@
# Unused
#!/bin/bash #!/bin/bash
# IMPORTANT: the following assumptions are made # IMPORTANT: the following assumptions are made

17
devscripts/run_tests.bat Normal file
View File

@ -0,0 +1,17 @@
@echo off
rem Keep this list in sync with the `offlinetest` target in Makefile
set DOWNLOAD_TESTS="age_restriction^|download^|iqiyi_sdk_interpreter^|socks^|subtitles^|write_annotations^|youtube_lists^|youtube_signature"
if "%YTDL_TEST_SET%" == "core" (
set test_set="-I test_("%DOWNLOAD_TESTS%")\.py"
set multiprocess_args=""
) else if "%YTDL_TEST_SET%" == "download" (
set test_set="-I test_(?!"%DOWNLOAD_TESTS%").+\.py"
set multiprocess_args="--processes=4 --process-timeout=540"
) else (
echo YTDL_TEST_SET is not set or invalid
exit /b 1
)
nosetests test --verbose %test_set:"=% %multiprocess_args:"=%

View File

@ -1,3 +1,5 @@
# Unused
#!/usr/bin/env python #!/usr/bin/env python
from __future__ import unicode_literals from __future__ import unicode_literals

View File

@ -34,6 +34,8 @@ # Supported sites
- **adobetv:video** - **adobetv:video**
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
- **aenetworks:collection**
- **aenetworks:show**
- **afreecatv**: afreecatv.com - **afreecatv**: afreecatv.com
- **AirMozilla** - **AirMozilla**
- **AliExpressLive** - **AliExpressLive**
@ -42,6 +44,7 @@ # Supported sites
- **AlphaPorno** - **AlphaPorno**
- **Alura** - **Alura**
- **AluraCourse** - **AluraCourse**
- **Amara**
- **AMCNetworks** - **AMCNetworks**
- **AmericasTestKitchen** - **AmericasTestKitchen**
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
@ -55,10 +58,12 @@ # Supported sites
- **appletrailers** - **appletrailers**
- **appletrailers:section** - **appletrailers:section**
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ArcPublishing**
- **ARD** - **ARD**
- **ARD:mediathek** - **ARD:mediathek**
- **ARDBetaMediathek** - **ARDBetaMediathek**
- **Arkena** - **Arkena**
- **arte.sky.it**
- **ArteTV** - **ArteTV**
- **ArteTVEmbed** - **ArteTVEmbed**
- **ArteTVPlaylist** - **ArteTVPlaylist**
@ -101,15 +106,18 @@ # Supported sites
- **BilibiliAudioAlbum** - **BilibiliAudioAlbum**
- **BiliBiliPlayer** - **BiliBiliPlayer**
- **BioBioChileTV** - **BioBioChileTV**
- **Biography**
- **BIQLE** - **BIQLE**
- **BitChute** - **BitChute**
- **BitChuteChannel** - **BitChuteChannel**
- **bitwave.tv** - **bitwave:replay**
- **bitwave:stream**
- **BleacherReport** - **BleacherReport**
- **BleacherReportCMS** - **BleacherReportCMS**
- **blinkx** - **blinkx**
- **Bloomberg** - **Bloomberg**
- **BokeCC** - **BokeCC**
- **BongaCams**
- **BostonGlobe** - **BostonGlobe**
- **Box** - **Box**
- **Bpb**: Bundeszentrale für politische Bildung - **Bpb**: Bundeszentrale für politische Bildung
@ -144,6 +152,7 @@ # Supported sites
- **CBS** - **CBS**
- **CBSInteractive** - **CBSInteractive**
- **CBSLocal** - **CBSLocal**
- **CBSLocalArticle**
- **cbsnews**: CBS News - **cbsnews**: CBS News
- **cbsnews:embed** - **cbsnews:embed**
- **cbsnews:livevideo**: CBS News Live Videos - **cbsnews:livevideo**: CBS News Live Videos
@ -193,9 +202,9 @@ # Supported sites
- **CrooksAndLiars** - **CrooksAndLiars**
- **crunchyroll** - **crunchyroll**
- **crunchyroll:playlist** - **crunchyroll:playlist**
- **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTV**
- **CTVNews** - **CTVNews**
- **cu.ntv.co.jp**: Nippon Television Network - **cu.ntv.co.jp**: Nippon Television Network
- **Culturebox** - **Culturebox**
@ -271,7 +280,6 @@ # Supported sites
- **ESPNArticle** - **ESPNArticle**
- **EsriVideo** - **EsriVideo**
- **Europa** - **Europa**
- **EveryonesMixtape**
- **EWETV** - **EWETV**
- **ExpoTV** - **ExpoTV**
- **Expressen** - **Expressen**
@ -313,11 +321,11 @@ # Supported sites
- **FrontendMasters** - **FrontendMasters**
- **FrontendMastersCourse** - **FrontendMastersCourse**
- **FrontendMastersLesson** - **FrontendMastersLesson**
- **FujiTVFODPlus7**
- **Funimation** - **Funimation**
- **Funk** - **Funk**
- **Fusion** - **Fusion**
- **Fux** - **Fux**
- **FXNetworks**
- **Gaia** - **Gaia**
- **GameInformer** - **GameInformer**
- **GameSpot** - **GameSpot**
@ -325,6 +333,8 @@ # Supported sites
- **Gaskrank** - **Gaskrank**
- **Gazeta** - **Gazeta**
- **GDCVault** - **GDCVault**
- **Gedi**
- **GediEmbeds**
- **generic**: Generic downloader that works on some sites - **generic**: Generic downloader that works on some sites
- **Gfycat** - **Gfycat**
- **GiantBomb** - **GiantBomb**
@ -350,6 +360,7 @@ # Supported sites
- **hgtv.com:show** - **hgtv.com:show**
- **HiDive** - **HiDive**
- **HistoricFilms** - **HistoricFilms**
- **history:player**
- **history:topic**: History.com Topic - **history:topic**: History.com Topic
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
@ -403,7 +414,6 @@ # Supported sites
- **JWPlatform** - **JWPlatform**
- **Kakao** - **Kakao**
- **Kaltura** - **Kaltura**
- **KanalPlay**: Kanal 5/9/11 Play
- **Kankan** - **Kankan**
- **Karaoketv** - **Karaoketv**
- **KarriereVideos** - **KarriereVideos**
@ -427,7 +437,8 @@ # Supported sites
- **la7.it** - **la7.it**
- **laola1tv** - **laola1tv**
- **laola1tv:embed** - **laola1tv:embed**
- **lbry.tv** - **lbry**
- **lbry:channel**
- **LCI** - **LCI**
- **Lcp** - **Lcp**
- **LcpPlay** - **LcpPlay**
@ -493,6 +504,7 @@ # Supported sites
- **META** - **META**
- **metacafe** - **metacafe**
- **Metacritic** - **Metacritic**
- **mewatch**
- **Mgoon** - **Mgoon**
- **MGTV**: 芒果TV - **MGTV**: 芒果TV
- **MiaoPai** - **MiaoPai**
@ -503,8 +515,6 @@ # Supported sites
- **mixcloud** - **mixcloud**
- **mixcloud:playlist** - **mixcloud:playlist**
- **mixcloud:user** - **mixcloud:user**
- **Mixer:live**
- **Mixer:vod**
- **MLB** - **MLB**
- **Mnet** - **Mnet**
- **MNetTV** - **MNetTV**
@ -547,6 +557,11 @@ # Supported sites
- **Naver** - **Naver**
- **Naver:live** - **Naver:live**
- **NBA** - **NBA**
- **nba:watch**
- **nba:watch:collection**
- **NBAChannel**
- **NBAEmbed**
- **NBAWatchEmbed**
- **NBC** - **NBC**
- **NBCNews** - **NBCNews**
- **nbcolympics** - **nbcolympics**
@ -576,8 +591,10 @@ # Supported sites
- **NextTV**: 壹電視 - **NextTV**: 壹電視
- **Nexx** - **Nexx**
- **NexxEmbed** - **NexxEmbed**
- **nfl.com** - **nfl.com** (Currently broken)
- **nfl.com:article** (Currently broken)
- **NhkVod** - **NhkVod**
- **NhkVodProgram**
- **nhl.com** - **nhl.com**
- **nick.com** - **nick.com**
- **nick.de** - **nick.de**
@ -592,7 +609,6 @@ # Supported sites
- **njoy:embed** - **njoy:embed**
- **NJPWWorld**: 新日本プロレスワールド - **NJPWWorld**: 新日本プロレスワールド
- **NobelPrize** - **NobelPrize**
- **Noco**
- **NonkTube** - **NonkTube**
- **Noovo** - **Noovo**
- **Normalboots** - **Normalboots**
@ -610,6 +626,7 @@ # Supported sites
- **Npr** - **Npr**
- **NRK** - **NRK**
- **NRKPlaylist** - **NRKPlaylist**
- **NRKRadioPodkast**
- **NRKSkole**: NRK Skole - **NRKSkole**: NRK Skole
- **NRKTV**: NRK TV and NRK Radio - **NRKTV**: NRK TV and NRK Radio
- **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte - **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte
@ -681,6 +698,7 @@ # Supported sites
- **Platzi** - **Platzi**
- **PlatziCourse** - **PlatziCourse**
- **play.fm** - **play.fm**
- **player.sky.it**
- **PlayPlusTV** - **PlayPlusTV**
- **PlaysTV** - **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
@ -719,6 +737,7 @@ # Supported sites
- **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜 - **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuantumTV** - **QuantumTV**
- **Qub**
- **Quickline** - **Quickline**
- **QuicklineLive** - **QuicklineLive**
- **R7** - **R7**
@ -736,6 +755,9 @@ # Supported sites
- **RayWenderlich** - **RayWenderlich**
- **RayWenderlichCourse** - **RayWenderlichCourse**
- **RBMARadio** - **RBMARadio**
- **RCS**
- **RCSEmbeds**
- **RCSVarious**
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedBull** - **RedBull**
- **RedBullEmbed** - **RedBullEmbed**
@ -811,18 +833,17 @@ # Supported sites
- **Shared**: shared.sx - **Shared**: shared.sx
- **ShowRoomLive** - **ShowRoomLive**
- **Sina** - **Sina**
- **sky.it**
- **sky:news**
- **sky:sports**
- **sky:sports:news**
- **skyacademy.it**
- **SkylineWebcams** - **SkylineWebcams**
- **SkyNews**
- **skynewsarabia:article** - **skynewsarabia:article**
- **skynewsarabia:video** - **skynewsarabia:video**
- **SkySports**
- **Slideshare** - **Slideshare**
- **SlidesLive** - **SlidesLive**
- **Slutload** - **Slutload**
- **smotri**: Smotri.com
- **smotri:broadcast**: Smotri.com broadcasts
- **smotri:community**: Smotri.com community videos
- **smotri:user**: Smotri.com user videos
- **Snotr** - **Snotr**
- **Sohu** - **Sohu**
- **SonyLIV** - **SonyLIV**
@ -883,7 +904,6 @@ # Supported sites
- **Tagesschau** - **Tagesschau**
- **tagesschau:player** - **tagesschau:player**
- **Tass** - **Tass**
- **TastyTrade**
- **TBS** - **TBS**
- **TDSLifeway** - **TDSLifeway**
- **Teachable** - **Teachable**
@ -906,6 +926,7 @@ # Supported sites
- **TeleQuebecEmission** - **TeleQuebecEmission**
- **TeleQuebecLive** - **TeleQuebecLive**
- **TeleQuebecSquat** - **TeleQuebecSquat**
- **TeleQuebecVideo**
- **TeleTask** - **TeleTask**
- **Telewebion** - **Telewebion**
- **TennisTV** - **TennisTV**
@ -922,10 +943,10 @@ # Supported sites
- **ThisAmericanLife** - **ThisAmericanLife**
- **ThisAV** - **ThisAV**
- **ThisOldHouse** - **ThisOldHouse**
- **ThisVid**
- **TikTok** - **TikTok**
- **tinypic**: tinypic.com videos - **tinypic**: tinypic.com videos
- **TMZ** - **TMZ**
- **TMZArticle**
- **TNAFlix** - **TNAFlix**
- **TNAFlixNetworkEmbed** - **TNAFlixNetworkEmbed**
- **toggle** - **toggle**
@ -955,12 +976,15 @@ # Supported sites
- **TV2DKBornholmPlay** - **TV2DKBornholmPlay**
- **TV4**: tv4.se and tv4play.se - **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+ - **TV5MondePlus**: TV5MONDE+
- **tv5unis**
- **tv5unis:video**
- **tv8.it** - **tv8.it**
- **TVA** - **TVA**
- **TVANouvelles** - **TVANouvelles**
- **TVANouvellesArticle** - **TVANouvellesArticle**
- **TVC** - **TVC**
- **TVCArticle** - **TVCArticle**
- **TVer**
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com** - **tvland.com**
- **TVN24** - **TVN24**
@ -1029,6 +1053,8 @@ # Supported sites
- **Viddler** - **Viddler**
- **Videa** - **Videa**
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.sky.it**
- **video.sky.it:live**
- **VideoDetective** - **VideoDetective**
- **videofy.me** - **videofy.me**
- **videomore** - **videomore**
@ -1089,6 +1115,7 @@ # Supported sites
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VVVVID** - **VVVVID**
- **VVVVIDShow**
- **VyboryMos** - **VyboryMos**
- **Vzaar** - **Vzaar**
- **Wakanim** - **Wakanim**
@ -1111,6 +1138,7 @@ # Supported sites
- **WeiboMobile** - **WeiboMobile**
- **WeiqiTV**: WQTV - **WeiqiTV**: WQTV
- **Wistia** - **Wistia**
- **WistiaPlaylist**
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **WorldStarHipHop** - **WorldStarHipHop**
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
@ -1142,6 +1170,8 @@ # Supported sites
- **yahoo:japannews**: Yahoo! Japan News - **yahoo:japannews**: Yahoo! Japan News
- **YandexDisk** - **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом - **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:artist:albums**: Яндекс.Музыка - Артист - Альбомы
- **yandexmusic:artist:tracks**: Яндекс.Музыка - Артист - Треки
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист - **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек - **yandexmusic:track**: Яндекс.Музыка - Трек
- **YandexVideo** - **YandexVideo**
@ -1163,18 +1193,19 @@ # Supported sites
- **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication) - **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication)
- **youtube:playlist**: YouTube.com playlists - **youtube:playlist**: YouTube.com playlists
- **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication) - **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication)
- **youtube:search**: YouTube.com searches, "ytsearch" keyword - **youtube:search**: YouTube.com searches
- **youtube:search:date**: YouTube.com searches, newest videos first, "ytsearchdate" keyword - **youtube:search:date**: YouTube.com searches, newest videos first, "ytsearchdate" keyword
- **youtube:search_url**: YouTube.com search URLs - **youtube:search_url**: YouTube.com searches, "ytsearch" keyword
- **youtube:subscriptions**: YouTube.com subscriptions feed, ":ytsubs" for short (requires authentication) - **youtube:subscriptions**: YouTube.com subscriptions feed, ":ytsubs" for short (requires authentication)
- **youtube:tab**: YouTube.com tab - **youtube:tab**: YouTube.com tab
- **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication) - **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
- **YoutubeYtBe**: youtu.be
- **YoutubeYtUser**: YouTube.com user videos, URL or "ytuser" keyword - **YoutubeYtUser**: YouTube.com user videos, URL or "ytuser" keyword
- **Zapiks** - **Zapiks**
- **Zaq1**
- **Zattoo** - **Zattoo**
- **ZattooLive** - **ZattooLive**
- **ZDF-3sat** - **ZDF-3sat**
- **ZDFChannel** - **ZDFChannel**
- **zingmp3**: mp3.zing.vn - **zingmp3**: mp3.zing.vn
- **zoom**
- **Zype** - **Zype**

View File

@ -1,3 +1,5 @@
# Unused
from __future__ import unicode_literals from __future__ import unicode_literals
from datetime import datetime from datetime import datetime
import urllib.request import urllib.request

View File

@ -2,5 +2,5 @@
universal = True universal = True
[flake8] [flake8]
exclude = youtube_dlc/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git,venv exclude = youtube_dlc/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git,venv,devscripts/create-github-release.py,devscripts/release.sh,devscripts/show-downloads-statistics.py,scripts/update-version.py
ignore = E402,E501,E731,E741,W503 ignore = E402,E501,E731,E741,W503

View File

@ -66,7 +66,7 @@ def run(self):
description=DESCRIPTION, description=DESCRIPTION,
long_description=LONG_DESCRIPTION, long_description=LONG_DESCRIPTION,
# long_description_content_type="text/markdown", # long_description_content_type="text/markdown",
url="https://github.com/blackjack4494/yt-dlc", url="https://github.com/pukkandan/yt-dlc",
packages=find_packages(exclude=("youtube_dl","test",)), packages=find_packages(exclude=("youtube_dl","test",)),
#packages=[ #packages=[
# 'youtube_dlc', # 'youtube_dlc',

View File

@ -7,6 +7,7 @@
"forcethumbnail": false, "forcethumbnail": false,
"forcetitle": false, "forcetitle": false,
"forceurl": false, "forceurl": false,
"force_write_download_archive": false,
"format": "best", "format": "best",
"ignoreerrors": false, "ignoreerrors": false,
"listformats": null, "listformats": null,
@ -35,6 +36,11 @@
"verbose": true, "verbose": true,
"writedescription": false, "writedescription": false,
"writeinfojson": true, "writeinfojson": true,
"writeannotations": false,
"writelink": false,
"writeurllink": false,
"writewebloclink": false,
"writedesktoplink": false,
"writesubtitles": false, "writesubtitles": false,
"allsubtitles": false, "allsubtitles": false,
"listsubtitles": false, "listsubtitles": false,

View File

@ -98,6 +98,55 @@ def test_html_search_meta(self):
self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True) self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True) self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True)
def test_search_json_ld_realworld(self):
# https://github.com/ytdl-org/youtube-dl/issues/23306
expect_dict(
self,
self.ie._search_json_ld(r'''<script type="application/ld+json">
{
"@context": "http://schema.org/",
"@type": "VideoObject",
"name": "1 On 1 With Kleio",
"url": "https://www.eporner.com/hd-porn/xN49A1cT3eB/1-On-1-With-Kleio/",
"duration": "PT0H12M23S",
"thumbnailUrl": ["https://static-eu-cdn.eporner.com/thumbs/static4/7/78/780/780814/9_360.jpg", "https://imggen.eporner.com/780814/1920/1080/9.jpg"],
"contentUrl": "https://gvideo.eporner.com/xN49A1cT3eB/xN49A1cT3eB.mp4",
"embedUrl": "https://www.eporner.com/embed/xN49A1cT3eB/1-On-1-With-Kleio/",
"image": "https://static-eu-cdn.eporner.com/thumbs/static4/7/78/780/780814/9_360.jpg",
"width": "1920",
"height": "1080",
"encodingFormat": "mp4",
"bitrate": "6617kbps",
"isFamilyFriendly": "False",
"description": "Kleio Valentien",
"uploadDate": "2015-12-05T21:24:35+01:00",
"interactionStatistic": {
"@type": "InteractionCounter",
"interactionType": { "@type": "http://schema.org/WatchAction" },
"userInteractionCount": 1120958
}, "aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "88",
"ratingCount": "630",
"bestRating": "100",
"worstRating": "0"
}, "actor": [{
"@type": "Person",
"name": "Kleio Valentien",
"url": "https://www.eporner.com/pornstar/kleio-valentien/"
}]}
</script>''', None),
{
'title': '1 On 1 With Kleio',
'description': 'Kleio Valentien',
'url': 'https://gvideo.eporner.com/xN49A1cT3eB/xN49A1cT3eB.mp4',
'timestamp': 1449347075,
'duration': 743.0,
'view_count': 1120958,
'width': 1920,
'height': 1080,
})
def test_download_json(self): def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json') uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')
self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'}) self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'})
@ -108,6 +157,18 @@ def test_download_json(self):
self.assertEqual(self.ie._download_json(uri, None, fatal=False), None) self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
def test_parse_html5_media_entries(self): def test_parse_html5_media_entries(self):
# inline video tag
expect_dict(
self,
self.ie._parse_html5_media_entries(
'https://127.0.0.1/video.html',
r'<html><video src="/vid.mp4" /></html>', None)[0],
{
'formats': [{
'url': 'https://127.0.0.1/vid.mp4',
}],
})
# from https://www.r18.com/ # from https://www.r18.com/
# with kpbs in label # with kpbs in label
expect_dict( expect_dict(

View File

@ -42,6 +42,7 @@ def _make_result(formats, **kwargs):
'title': 'testttitle', 'title': 'testttitle',
'extractor': 'testex', 'extractor': 'testex',
'extractor_key': 'TestEx', 'extractor_key': 'TestEx',
'webpage_url': 'http://example.com/watch?v=shenanigans',
} }
res.update(**kwargs) res.update(**kwargs)
return res return res
@ -77,7 +78,7 @@ def test_prefer_free_formats(self):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['ext'], 'mp4') self.assertEqual(downloaded['ext'], 'mp4')
# No prefer_free_formats => prefer mp4 and flv for greater compatibility # No prefer_free_formats => prefer mp4 and webm
ydl = YDL() ydl = YDL()
ydl.params['prefer_free_formats'] = False ydl.params['prefer_free_formats'] = False
formats = [ formats = [
@ -103,7 +104,7 @@ def test_prefer_free_formats(self):
yie._sort_formats(info_dict['formats']) yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict) ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['ext'], 'flv') self.assertEqual(downloaded['ext'], 'webm')
def test_format_selection(self): def test_format_selection(self):
formats = [ formats = [
@ -310,6 +311,9 @@ def test_format_selection_string_ops(self):
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy()) self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
def test_youtube_format_selection(self): def test_youtube_format_selection(self):
return
# disabled for now - this needs some changes
order = [ order = [
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13', '38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13',
# Apple HTTP Live Streaming # Apple HTTP Live Streaming
@ -347,7 +351,7 @@ def format_info(f_id):
yie._sort_formats(info_dict['formats']) yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict) ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], '137+141') self.assertEqual(downloaded['format_id'], '248+172')
self.assertEqual(downloaded['ext'], 'mp4') self.assertEqual(downloaded['ext'], 'mp4')
info_dict = _make_result(list(formats_order), extractor='youtube') info_dict = _make_result(list(formats_order), extractor='youtube')
@ -534,19 +538,19 @@ def test_format_filtering(self):
def test_default_format_spec(self): def test_default_format_spec(self):
ydl = YDL({'simulate': True}) ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({}), 'bestvideo+bestaudio/best') self.assertEqual(ydl._default_format_spec({}), 'bestvideo*+bestaudio/best')
ydl = YDL({}) ydl = YDL({})
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio') self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')
ydl = YDL({'simulate': True}) ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'bestvideo+bestaudio/best') self.assertEqual(ydl._default_format_spec({'is_live': True}), 'bestvideo*+bestaudio/best')
ydl = YDL({'outtmpl': '-'}) ydl = YDL({'outtmpl': '-'})
self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio') self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio')
ydl = YDL({}) ydl = YDL({})
self.assertEqual(ydl._default_format_spec({}, download=False), 'bestvideo+bestaudio/best') self.assertEqual(ydl._default_format_spec({}, download=False), 'bestvideo*+bestaudio/best')
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio') self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')
@ -567,6 +571,7 @@ def s_formats(lang, autocaption=False):
'subtitles': subtitles, 'subtitles': subtitles,
'automatic_captions': auto_captions, 'automatic_captions': auto_captions,
'extractor': 'TEST', 'extractor': 'TEST',
'webpage_url': 'http://example.com/watch?v=shenanigans',
} }
def get_info(params={}): def get_info(params={}):
@ -730,6 +735,7 @@ def _match_entry(self, info_dict, incomplete):
'playlist_id': '42', 'playlist_id': '42',
'uploader': "變態妍字幕版 太妍 тест", 'uploader': "變態妍字幕版 太妍 тест",
'creator': "тест ' 123 ' тест--", 'creator': "тест ' 123 ' тест--",
'webpage_url': 'http://example.com/watch?v=shenanigans',
} }
second = { second = {
'id': '2', 'id': '2',
@ -741,6 +747,7 @@ def _match_entry(self, info_dict, incomplete):
'filesize': 5 * 1024, 'filesize': 5 * 1024,
'playlist_id': '43', 'playlist_id': '43',
'uploader': "тест 123", 'uploader': "тест 123",
'webpage_url': 'http://example.com/watch?v=SHENANIGANS',
} }
videos = [first, second] videos = [first, second]

View File

@ -39,7 +39,7 @@ def test_youtube_playlist_matching(self):
assertTab('https://www.youtube.com/embedded') assertTab('https://www.youtube.com/embedded')
assertTab('https://www.youtube.com/feed') # Own channel's home page assertTab('https://www.youtube.com/feed') # Own channel's home page
assertTab('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q') assertTab('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8') assertTab('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertTab('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC') assertTab('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
assertTab('https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') # 668 assertTab('https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') # 668
self.assertFalse('youtube:playlist' in self.matching_ies('PLtS2H6bU1M')) self.assertFalse('youtube:playlist' in self.matching_ies('PLtS2H6bU1M'))
@ -60,8 +60,8 @@ def test_youtube_channel_matching(self):
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec') assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos') assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
# def test_youtube_user_matching(self): def test_youtube_user_matching(self):
# self.assertMatch('http://www.youtube.com/NASAgovVideo/videos', ['youtube:tab']) self.assertMatch('http://www.youtube.com/NASAgovVideo/videos', ['youtube:tab'])
def test_youtube_feeds(self): def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/library', ['youtube:tab']) self.assertMatch('https://www.youtube.com/feed/library', ['youtube:tab'])

View File

@ -19,6 +19,8 @@
compat_shlex_split, compat_shlex_split,
compat_str, compat_str,
compat_struct_unpack, compat_struct_unpack,
compat_urllib_parse_quote,
compat_urllib_parse_quote_plus,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
@ -53,6 +55,27 @@ def test_all_present(self):
dir(youtube_dlc.compat))) - set(['unicode_literals']) dir(youtube_dlc.compat))) - set(['unicode_literals'])
self.assertEqual(all_names, sorted(present_names)) self.assertEqual(all_names, sorted(present_names))
def test_compat_urllib_parse_quote(self):
self.assertEqual(compat_urllib_parse_quote('abc def'), 'abc%20def')
self.assertEqual(compat_urllib_parse_quote('/user/abc+def'), '/user/abc%2Bdef')
self.assertEqual(compat_urllib_parse_quote('/user/abc+def', safe='+'), '%2Fuser%2Fabc+def')
self.assertEqual(compat_urllib_parse_quote(''), '')
self.assertEqual(compat_urllib_parse_quote('%'), '%25')
self.assertEqual(compat_urllib_parse_quote('%', safe='%'), '%')
self.assertEqual(compat_urllib_parse_quote('津波'), '%E6%B4%A5%E6%B3%A2')
self.assertEqual(
compat_urllib_parse_quote('''<meta property="og:description" content="▁▂▃▄%▅▆▇█" />
%<a href="https://ar.wikipedia.org/wiki/تسونامي">%a''', safe='<>=":%/ \r\n'),
'''<meta property="og:description" content="%E2%96%81%E2%96%82%E2%96%83%E2%96%84%%E2%96%85%E2%96%86%E2%96%87%E2%96%88" />
%<a href="https://ar.wikipedia.org/wiki/%D8%AA%D8%B3%D9%88%D9%86%D8%A7%D9%85%D9%8A">%a''')
self.assertEqual(
compat_urllib_parse_quote('''(^◣_◢^)っ︻デ═一 ⇀ ⇀ ⇀ ⇀ ⇀ ↶%I%Break%25Things%''', safe='% '),
'''%28%5E%E2%97%A3_%E2%97%A2%5E%29%E3%81%A3%EF%B8%BB%E3%83%87%E2%95%90%E4%B8%80 %E2%87%80 %E2%87%80 %E2%87%80 %E2%87%80 %E2%87%80 %E2%86%B6%I%Break%25Things%''')
def test_compat_urllib_parse_quote_plus(self):
self.assertEqual(compat_urllib_parse_quote_plus('abc def'), 'abc+def')
self.assertEqual(compat_urllib_parse_quote_plus('/abc def'), '%2Fabc+def')
def test_compat_urllib_parse_unquote(self): def test_compat_urllib_parse_unquote(self):
self.assertEqual(compat_urllib_parse_unquote('abc%20def'), 'abc def') self.assertEqual(compat_urllib_parse_unquote('abc%20def'), 'abc def')
self.assertEqual(compat_urllib_parse_unquote('%7e/abc+def'), '~/abc+def') self.assertEqual(compat_urllib_parse_unquote('%7e/abc+def'), '~/abc+def')

View File

@ -104,6 +104,7 @@
cli_valueless_option, cli_valueless_option,
cli_bool_option, cli_bool_option,
parse_codecs, parse_codecs,
iri_to_uri,
) )
from youtube_dlc.compat import ( from youtube_dlc.compat import (
compat_chr, compat_chr,
@ -554,6 +555,11 @@ def test_url_or_none(self):
self.assertEqual(url_or_none('http$://foo.de'), None) self.assertEqual(url_or_none('http$://foo.de'), None)
self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de') self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de')
self.assertEqual(url_or_none('//foo.de'), '//foo.de') self.assertEqual(url_or_none('//foo.de'), '//foo.de')
self.assertEqual(url_or_none('s3://foo.de'), None)
self.assertEqual(url_or_none('rtmpte://foo.de'), 'rtmpte://foo.de')
self.assertEqual(url_or_none('mms://foo.de'), 'mms://foo.de')
self.assertEqual(url_or_none('rtspu://foo.de'), 'rtspu://foo.de')
self.assertEqual(url_or_none('ftps://foo.de'), 'ftps://foo.de')
def test_parse_age_limit(self): def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None) self.assertEqual(parse_age_limit(None), None)
@ -1465,6 +1471,32 @@ def test_get_elements_by_attribute(self):
self.assertEqual(get_elements_by_attribute('class', 'foo', html), []) self.assertEqual(get_elements_by_attribute('class', 'foo', html), [])
self.assertEqual(get_elements_by_attribute('class', 'no-such-foo', html), []) self.assertEqual(get_elements_by_attribute('class', 'no-such-foo', html), [])
def test_iri_to_uri(self):
self.assertEqual(
iri_to_uri('https://www.google.com/search?q=foo&ie=utf-8&oe=utf-8&client=firefox-b'),
'https://www.google.com/search?q=foo&ie=utf-8&oe=utf-8&client=firefox-b') # Same
self.assertEqual(
iri_to_uri('https://www.google.com/search?q=Käsesoßenrührlöffel'), # German for cheese sauce stirring spoon
'https://www.google.com/search?q=K%C3%A4seso%C3%9Fenr%C3%BChrl%C3%B6ffel')
self.assertEqual(
iri_to_uri('https://www.google.com/search?q=lt<+gt>+eq%3D+amp%26+percent%25+hash%23+colon%3A+tilde~#trash=?&garbage=#'),
'https://www.google.com/search?q=lt%3C+gt%3E+eq%3D+amp%26+percent%25+hash%23+colon%3A+tilde~#trash=?&garbage=#')
self.assertEqual(
iri_to_uri('http://правозащита38.рф/category/news/'),
'http://xn--38-6kcaak9aj5chl4a3g.xn--p1ai/category/news/')
self.assertEqual(
iri_to_uri('http://www.правозащита38.рф/category/news/'),
'http://www.xn--38-6kcaak9aj5chl4a3g.xn--p1ai/category/news/')
self.assertEqual(
iri_to_uri('https://i❤.ws/emojidomain/👍👏🤝💪'),
'https://xn--i-7iq.ws/emojidomain/%F0%9F%91%8D%F0%9F%91%8F%F0%9F%A4%9D%F0%9F%92%AA')
self.assertEqual(
iri_to_uri('http://日本語.jp/'),
'http://xn--wgv71a119e.jp/')
self.assertEqual(
iri_to_uri('http://导航.中国/'),
'http://xn--fet810g.xn--fiqs8s/')
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

1
youtube-dlc.cmd Normal file
View File

@ -0,0 +1 @@
py "%~dp0\youtube_dl\__main__.py"

View File

@ -51,6 +51,9 @@
DEFAULT_OUTTMPL, DEFAULT_OUTTMPL,
determine_ext, determine_ext,
determine_protocol, determine_protocol,
DOT_DESKTOP_LINK_TEMPLATE,
DOT_URL_LINK_TEMPLATE,
DOT_WEBLOC_LINK_TEMPLATE,
DownloadError, DownloadError,
encode_compat_str, encode_compat_str,
encodeFilename, encodeFilename,
@ -58,9 +61,11 @@
expand_path, expand_path,
ExtractorError, ExtractorError,
format_bytes, format_bytes,
format_field,
formatSeconds, formatSeconds,
GeoRestrictedError, GeoRestrictedError,
int_or_none, int_or_none,
iri_to_uri,
ISO3166Utils, ISO3166Utils,
locked_file, locked_file,
make_HTTPS_handler, make_HTTPS_handler,
@ -84,6 +89,7 @@
std_headers, std_headers,
str_or_none, str_or_none,
subtitles_filename, subtitles_filename,
to_high_limit_path,
UnavailableVideoError, UnavailableVideoError,
url_basename, url_basename,
version_tuple, version_tuple,
@ -161,12 +167,18 @@ class YoutubeDL(object):
forcejson: Force printing info_dict as JSON. forcejson: Force printing info_dict as JSON.
dump_single_json: Force printing the info_dict of the whole playlist dump_single_json: Force printing the info_dict of the whole playlist
(or video) as a single JSON line. (or video) as a single JSON line.
force_write_download_archive: Force writing download archive regardless of
'skip_download' or 'simulate'.
simulate: Do not download the video files. simulate: Do not download the video files.
format: Video format code. See options.py for more information. format: Video format code. see "FORMAT SELECTION" for more details.
format_sort: How to sort the video formats. see "Sorting Formats" for more details.
format_sort_force: Force the given format_sort. see "Sorting Formats" for more details.
allow_multiple_video_streams: Allow multiple video streams to be merged into a single file
allow_multiple_audio_streams: Allow multiple audio streams to be merged into a single file
outtmpl: Template for output names. outtmpl: Template for output names.
restrictfilenames: Do not allow "&" and spaces in file names. restrictfilenames: Do not allow "&" and spaces in file names.
trim_file_name: Limit length of filename (extension excluded). trim_file_name: Limit length of filename (extension excluded).
ignoreerrors: Do not stop on download errors. ignoreerrors: Do not stop on download errors. (Default False when running youtube-dlc, but True when directly accessing YoutubeDL class)
force_generic_extractor: Force downloader to use the generic extractor force_generic_extractor: Force downloader to use the generic extractor
nooverwrites: Prevent overwriting files. nooverwrites: Prevent overwriting files.
playliststart: Playlist item to start at. playliststart: Playlist item to start at.
@ -183,6 +195,11 @@ class YoutubeDL(object):
writeannotations: Write the video annotations to a .annotations.xml file writeannotations: Write the video annotations to a .annotations.xml file
writethumbnail: Write the thumbnail image to a file writethumbnail: Write the thumbnail image to a file
write_all_thumbnails: Write all thumbnail formats to files write_all_thumbnails: Write all thumbnail formats to files
writelink: Write an internet shortcut file, depending on the
current platform (.url/.webloc/.desktop)
writeurllink: Write a Windows internet shortcut file (.url)
writewebloclink: Write a macOS internet shortcut file (.webloc)
writedesktoplink: Write a Linux internet shortcut file (.desktop)
writesubtitles: Write the video subtitles to a file writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatically generated subtitles to a file writeautomaticsub: Write the automatically generated subtitles to a file
allsubtitles: Downloads all the subtitles of the video allsubtitles: Downloads all the subtitles of the video
@ -891,6 +908,10 @@ def add_default_extra_info(self, ie_result, ie, url):
self.add_extra_info(ie_result, { self.add_extra_info(ie_result, {
'extractor': ie.IE_NAME, 'extractor': ie.IE_NAME,
'webpage_url': url, 'webpage_url': url,
'duration_string': (
formatSeconds(ie_result['duration'], '-')
if ie_result.get('duration', None) is not None
else None),
'webpage_url_basename': url_basename(url), 'webpage_url_basename': url_basename(url),
'extractor_key': ie.ie_key(), 'extractor_key': ie.ie_key(),
}) })
@ -1138,7 +1159,7 @@ def _build_format_filter(self, filter_spec):
'*=': lambda attr, value: value in attr, '*=': lambda attr, value: value in attr,
} }
str_operator_rex = re.compile(r'''(?x) str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id) \s*(?P<key>[a-zA-Z0-9._-]+)
\s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)? \s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+) \s*(?P<value>[a-zA-Z0-9._-]+)
\s*$ \s*$
@ -1168,23 +1189,20 @@ def can_merge():
merger = FFmpegMergerPP(self) merger = FFmpegMergerPP(self)
return merger.available and merger.can_merge() return merger.available and merger.can_merge()
def prefer_best(): prefer_best = (
if self.params.get('simulate', False): not self.params.get('simulate', False)
return False and download
if not download: and (
return False not can_merge()
if self.params.get('outtmpl', DEFAULT_OUTTMPL) == '-': or info_dict.get('is_live', False)
return True or self.params.get('outtmpl', DEFAULT_OUTTMPL) == '-'))
if info_dict.get('is_live'):
return True
if not can_merge():
return True
return False
req_format_list = ['bestvideo+bestaudio', 'best'] return (
if prefer_best(): 'best/bestvideo+bestaudio'
req_format_list.reverse() if prefer_best
return '/'.join(req_format_list) else 'bestvideo*+bestaudio/best'
if not self.params.get('allow_multiple_audio_streams', False)
else 'bestvideo+bestaudio/best')
def build_format_selector(self, format_spec): def build_format_selector(self, format_spec):
def syntax_error(note, start): def syntax_error(note, start):
@ -1199,6 +1217,9 @@ def syntax_error(note, start):
GROUP = 'GROUP' GROUP = 'GROUP'
FormatSelector = collections.namedtuple('FormatSelector', ['type', 'selector', 'filters']) FormatSelector = collections.namedtuple('FormatSelector', ['type', 'selector', 'filters'])
allow_multiple_streams = {'audio': self.params.get('allow_multiple_audio_streams', False),
'video': self.params.get('allow_multiple_video_streams', False)}
def _parse_filter(tokens): def _parse_filter(tokens):
filter_parts = [] filter_parts = []
for type, string, start, _, _ in tokens: for type, string, start, _, _ in tokens:
@ -1297,7 +1318,7 @@ def _parse_format_selection(tokens, inside_merge=False, inside_choice=False, ins
return selectors return selectors
def _build_selector_function(selector): def _build_selector_function(selector):
if isinstance(selector, list): if isinstance(selector, list): # ,
fs = [_build_selector_function(s) for s in selector] fs = [_build_selector_function(s) for s in selector]
def selector_function(ctx): def selector_function(ctx):
@ -1305,9 +1326,11 @@ def selector_function(ctx):
for format in f(ctx): for format in f(ctx):
yield format yield format
return selector_function return selector_function
elif selector.type == GROUP:
elif selector.type == GROUP: # ()
selector_function = _build_selector_function(selector.selector) selector_function = _build_selector_function(selector.selector)
elif selector.type == PICKFIRST:
elif selector.type == PICKFIRST: # /
fs = [_build_selector_function(s) for s in selector.selector] fs = [_build_selector_function(s) for s in selector.selector]
def selector_function(ctx): def selector_function(ctx):
@ -1316,62 +1339,54 @@ def selector_function(ctx):
if picked_formats: if picked_formats:
return picked_formats return picked_formats
return [] return []
elif selector.type == SINGLE:
format_spec = selector.selector
def selector_function(ctx): elif selector.type == SINGLE: # atom
formats = list(ctx['formats']) format_spec = selector.selector if selector.selector is not None else 'best'
if not formats:
return if format_spec == 'all':
if format_spec == 'all': def selector_function(ctx):
for f in formats: formats = list(ctx['formats'])
yield f if formats:
elif format_spec in ['best', 'worst', None]: for f in formats:
format_idx = 0 if format_spec == 'worst' else -1 yield f
audiovideo_formats = [
f for f in formats else:
if f.get('vcodec') != 'none' and f.get('acodec') != 'none'] format_fallback = False
if audiovideo_formats: format_spec_obj = re.match(r'(best|worst|b|w)(video|audio|v|a)?(\*)?$', format_spec)
yield audiovideo_formats[format_idx] if format_spec_obj is not None:
# for extractors with incomplete formats (audio only (soundcloud) format_idx = 0 if format_spec_obj.group(1)[0] == 'w' else -1
# or video only (imgur)) we will fallback to best/worst format_type = format_spec_obj.group(2)[0] if format_spec_obj.group(2) else False
# {video,audio}-only format not_format_type = 'v' if format_type == 'a' else 'a'
elif ctx['incomplete_formats']: format_modified = format_spec_obj.group(3) is not None
yield formats[format_idx]
elif format_spec == 'bestaudio': format_fallback = not format_type and not format_modified # for b, w
audio_formats = [ filter_f = ((lambda f: f.get(format_type + 'codec') != 'none')
f for f in formats if format_type and format_modified # bv*, ba*, wv*, wa*
if f.get('vcodec') == 'none'] else (lambda f: f.get(not_format_type + 'codec') == 'none')
if audio_formats: if format_type # bv, ba, wv, wa
yield audio_formats[-1] else (lambda f: f.get('vcodec') != 'none' and f.get('acodec') != 'none')
elif format_spec == 'worstaudio': if not format_modified # b, w
audio_formats = [ else None) # b*, w*
f for f in formats
if f.get('vcodec') == 'none']
if audio_formats:
yield audio_formats[0]
elif format_spec == 'bestvideo':
video_formats = [
f for f in formats
if f.get('acodec') == 'none']
if video_formats:
yield video_formats[-1]
elif format_spec == 'worstvideo':
video_formats = [
f for f in formats
if f.get('acodec') == 'none']
if video_formats:
yield video_formats[0]
else: else:
extensions = ['mp4', 'flv', 'webm', '3gp', 'm4a', 'mp3', 'ogg', 'aac', 'wav'] format_idx = -1
if format_spec in extensions: filter_f = ((lambda f: f.get('ext') == format_spec)
filter_f = lambda f: f['ext'] == format_spec if format_spec in ['mp4', 'flv', 'webm', '3gp', 'm4a', 'mp3', 'ogg', 'aac', 'wav'] # extension
else: else (lambda f: f.get('format_id') == format_spec)) # id
filter_f = lambda f: f['format_id'] == format_spec
matches = list(filter(filter_f, formats)) def selector_function(ctx):
formats = list(ctx['formats'])
if not formats:
return
matches = list(filter(filter_f, formats)) if filter_f is not None else formats
if matches: if matches:
yield matches[-1] yield matches[format_idx]
elif selector.type == MERGE: elif format_fallback == 'force' or (format_fallback and ctx['incomplete_formats']):
# for extractors with incomplete formats (audio only (soundcloud)
# or video only (imgur)) best/worst will fallback to
# best/worst {video,audio}-only format
yield formats[format_idx]
elif selector.type == MERGE: # +
def _merge(formats_pair): def _merge(formats_pair):
format_1, format_2 = formats_pair format_1, format_2 = formats_pair
@ -1379,6 +1394,18 @@ def _merge(formats_pair):
formats_info.extend(format_1.get('requested_formats', (format_1,))) formats_info.extend(format_1.get('requested_formats', (format_1,)))
formats_info.extend(format_2.get('requested_formats', (format_2,))) formats_info.extend(format_2.get('requested_formats', (format_2,)))
if not allow_multiple_streams['video'] or not allow_multiple_streams['audio']:
get_no_more = {"video": False, "audio": False}
for (i, fmt_info) in enumerate(formats_info):
for aud_vid in ["audio", "video"]:
if not allow_multiple_streams[aud_vid] and fmt_info.get(aud_vid[0] + 'codec') != 'none':
if get_no_more[aud_vid]:
formats_info.pop(i)
get_no_more[aud_vid] = True
if len(formats_info) == 1:
return formats_info[0]
video_fmts = [fmt_info for fmt_info in formats_info if fmt_info.get('vcodec') != 'none'] video_fmts = [fmt_info for fmt_info in formats_info if fmt_info.get('vcodec') != 'none']
audio_fmts = [fmt_info for fmt_info in formats_info if fmt_info.get('acodec') != 'none'] audio_fmts = [fmt_info for fmt_info in formats_info if fmt_info.get('acodec') != 'none']
@ -1679,7 +1706,7 @@ def is_wellformed(f):
if req_format is None: if req_format is None:
req_format = self._default_format_spec(info_dict, download=download) req_format = self._default_format_spec(info_dict, download=download)
if self.params.get('verbose'): if self.params.get('verbose'):
self.to_stdout('[debug] Default format spec: %s' % req_format) self._write_string('[debug] Default format spec: %s\n' % req_format)
format_selector = self.build_format_selector(req_format) format_selector = self.build_format_selector(req_format)
@ -1715,6 +1742,7 @@ def is_wellformed(f):
expected=True) expected=True)
if download: if download:
self.to_screen('[info] Downloading format(s) %s' % ", ".join([f['format_id'] for f in formats_to_download]))
if len(formats_to_download) > 1: if len(formats_to_download) > 1:
self.to_screen('[info] %s: downloading video in %s formats' % (info_dict['id'], len(formats_to_download))) self.to_screen('[info] %s: downloading video in %s formats' % (info_dict['id'], len(formats_to_download)))
for format in formats_to_download: for format in formats_to_download:
@ -1832,8 +1860,11 @@ def process_info(self, info_dict):
# Forced printings # Forced printings
self.__forced_printings(info_dict, filename, incomplete=False) self.__forced_printings(info_dict, filename, incomplete=False)
# Do nothing else if in simulate mode
if self.params.get('simulate', False): if self.params.get('simulate', False):
if self.params.get('force_write_download_archive', False):
self.record_download_archive(info_dict)
# Do nothing else if in simulate mode
return return
if filename is None: if filename is None:
@ -1889,7 +1920,7 @@ def dl(name, info, subtitle=False):
for ph in self._progress_hooks: for ph in self._progress_hooks:
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
if self.params.get('verbose'): if self.params.get('verbose'):
self.to_stdout('[debug] Invoking downloader on %r' % info.get('url')) self.to_screen('[debug] Invoking downloader on %r' % info.get('url'))
return fd.download(name, info, subtitle) return fd.download(name, info, subtitle)
subtitles_are_requested = any([self.params.get('writesubtitles', False), subtitles_are_requested = any([self.params.get('writesubtitles', False),
@ -1970,6 +2001,57 @@ def dl(name, info, subtitle=False):
self._write_thumbnails(info_dict, filename) self._write_thumbnails(info_dict, filename)
# Write internet shortcut files
url_link = webloc_link = desktop_link = False
if self.params.get('writelink', False):
if sys.platform == "darwin": # macOS.
webloc_link = True
elif sys.platform.startswith("linux"):
desktop_link = True
else: # if sys.platform in ['win32', 'cygwin']:
url_link = True
if self.params.get('writeurllink', False):
url_link = True
if self.params.get('writewebloclink', False):
webloc_link = True
if self.params.get('writedesktoplink', False):
desktop_link = True
if url_link or webloc_link or desktop_link:
if 'webpage_url' not in info_dict:
self.report_error('Cannot write internet shortcut file because the "webpage_url" field is missing in the media information')
return
ascii_url = iri_to_uri(info_dict['webpage_url'])
def _write_link_file(extension, template, newline, embed_filename):
linkfn = replace_extension(filename, extension, info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(linkfn)):
self.to_screen('[info] Internet shortcut is already present')
else:
try:
self.to_screen('[info] Writing internet shortcut to: ' + linkfn)
with io.open(encodeFilename(to_high_limit_path(linkfn)), 'w', encoding='utf-8', newline=newline) as linkfile:
template_vars = {'url': ascii_url}
if embed_filename:
template_vars['filename'] = linkfn[:-(len(extension) + 1)]
linkfile.write(template % template_vars)
except (OSError, IOError):
self.report_error('Cannot write internet shortcut ' + linkfn)
return False
return True
if url_link:
if not _write_link_file('url', DOT_URL_LINK_TEMPLATE, '\r\n', embed_filename=False):
return
if webloc_link:
if not _write_link_file('webloc', DOT_WEBLOC_LINK_TEMPLATE, '\n', embed_filename=False):
return
if desktop_link:
if not _write_link_file('desktop', DOT_DESKTOP_LINK_TEMPLATE, '\n', embed_filename=True):
return
# Download
must_record_download_archive = False
if not self.params.get('skip_download', False): if not self.params.get('skip_download', False):
try: try:
if info_dict.get('requested_formats') is not None: if info_dict.get('requested_formats') is not None:
@ -2029,13 +2111,16 @@ def compatible_formats(formats):
if not ensure_dir_exists(fname): if not ensure_dir_exists(fname):
return return
downloaded.append(fname) downloaded.append(fname)
partial_success = dl(fname, new_info) partial_success, real_download = dl(fname, new_info)
success = success and partial_success success = success and partial_success
info_dict['__postprocessors'] = postprocessors info_dict['__postprocessors'] = postprocessors
info_dict['__files_to_merge'] = downloaded info_dict['__files_to_merge'] = downloaded
# Even if there were no downloads, it is being merged only now
info_dict['__real_download'] = True
else: else:
# Just a single file # Just a single file
success = dl(filename, info_dict) success, real_download = dl(filename, info_dict)
info_dict['__real_download'] = real_download
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_error('unable to download video data: %s' % error_to_compat_str(err)) self.report_error('unable to download video data: %s' % error_to_compat_str(err))
return return
@ -2113,7 +2198,10 @@ def compatible_formats(formats):
except (PostProcessingError) as err: except (PostProcessingError) as err:
self.report_error('postprocessing: %s' % str(err)) self.report_error('postprocessing: %s' % str(err))
return return
self.record_download_archive(info_dict) must_record_download_archive = True
if must_record_download_archive or self.params.get('force_write_download_archive', False):
self.record_download_archive(info_dict)
def download(self, url_list): def download(self, url_list):
"""Download a given list of URLs.""" """Download a given list of URLs."""
@ -2299,19 +2387,62 @@ def _format_note(self, fdict):
res += '~' + format_bytes(fdict['filesize_approx']) res += '~' + format_bytes(fdict['filesize_approx'])
return res return res
def _format_note_table(self, f):
def join_fields(*vargs):
return ', '.join((val for val in vargs if val != ''))
return join_fields(
'UNSUPPORTED' if f.get('ext') in ('f4f', 'f4m') else '',
format_field(f, 'language', '[%s]'),
format_field(f, 'format_note'),
format_field(f, 'container', ignore=(None, f.get('ext'))),
format_field(f, 'asr', '%5dHz'))
def list_formats(self, info_dict): def list_formats(self, info_dict):
formats = info_dict.get('formats', [info_dict]) formats = info_dict.get('formats', [info_dict])
table = [ new_format = self.params.get('listformats_table', False)
[f['format_id'], f['ext'], self.format_resolution(f), self._format_note(f)] if new_format:
for f in formats table = [
if f.get('preference') is None or f['preference'] >= -1000] [
if len(formats) > 1: format_field(f, 'format_id'),
table[-1][-1] += (' ' if table[-1][-1] else '') + '(best)' format_field(f, 'ext'),
self.format_resolution(f),
format_field(f, 'fps', '%d'),
'|',
format_field(f, 'filesize', ' %s', func=format_bytes) + format_field(f, 'filesize_approx', '~%s', func=format_bytes),
format_field(f, 'tbr', '%4dk'),
f.get('protocol').replace('http_dash_segments', 'dash').replace("native", "n"),
'|',
format_field(f, 'vcodec', default='unknown').replace('none', ''),
format_field(f, 'vbr', '%4dk'),
format_field(f, 'acodec', default='unknown').replace('none', ''),
format_field(f, 'abr', '%3dk'),
format_field(f, 'asr', '%5dHz'),
self._format_note_table(f)]
for f in formats
if f.get('preference') is None or f['preference'] >= -1000]
header_line = ['ID', 'EXT', 'RESOLUTION', 'FPS', '|', ' FILESIZE', ' TBR', 'PROTO',
'|', 'VCODEC', ' VBR', 'ACODEC', ' ABR', ' ASR', 'NOTE']
else:
table = [
[
format_field(f, 'format_id'),
format_field(f, 'ext'),
self.format_resolution(f),
self._format_note(f)]
for f in formats
if f.get('preference') is None or f['preference'] >= -1000]
header_line = ['format code', 'extension', 'resolution', 'note']
header_line = ['format code', 'extension', 'resolution', 'note'] # if len(formats) > 1:
# table[-1][-1] += (' ' if table[-1][-1] else '') + '(best)'
self.to_screen( self.to_screen(
'[info] Available formats for %s:\n%s' % '[info] Available formats for %s:\n%s' % (info_dict['id'], render_table(
(info_dict['id'], render_table(header_line, table))) header_line,
table,
delim=new_format,
extraGap=(0 if new_format else 1),
hideEmpty=new_format)))
def list_thumbnails(self, info_dict): def list_thumbnails(self, info_dict):
thumbnails = info_dict.get('thumbnails') thumbnails = info_dict.get('thumbnails')
@ -2505,7 +2636,7 @@ def _write_thumbnails(self, info_dict, filename):
thumb_ext = determine_ext(t['url'], 'jpg') thumb_ext = determine_ext(t['url'], 'jpg')
suffix = '_%s' % t['id'] if len(thumbnails) > 1 else '' suffix = '_%s' % t['id'] if len(thumbnails) > 1 else ''
thumb_display_id = '%s ' % t['id'] if len(thumbnails) > 1 else '' thumb_display_id = '%s ' % t['id'] if len(thumbnails) > 1 else ''
t['filename'] = thumb_filename = os.path.splitext(filename)[0] + suffix + '.' + thumb_ext t['filename'] = thumb_filename = replace_extension(filename + suffix, thumb_ext, info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(thumb_filename)): if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(thumb_filename)):
self.to_screen('[%s] %s: Thumbnail %sis already present' % self.to_screen('[%s] %s: Thumbnail %sis already present' %

View File

@ -8,6 +8,7 @@
import codecs import codecs
import io import io
import os import os
import re
import random import random
import sys import sys
@ -41,6 +42,7 @@
FileDownloader, FileDownloader,
) )
from .extractor import gen_extractors, list_extractors from .extractor import gen_extractors, list_extractors
from .extractor.common import InfoExtractor
from .extractor.adobepass import MSO_INFO from .extractor.adobepass import MSO_INFO
from .YoutubeDL import YoutubeDL from .YoutubeDL import YoutubeDL
@ -245,6 +247,9 @@ def parse_retries(retries):
parser.error('Cannot download a video and extract audio into the same' parser.error('Cannot download a video and extract audio into the same'
' file! Use "{0}.%(ext)s" instead of "{0}" as the output' ' file! Use "{0}.%(ext)s" instead of "{0}" as the output'
' template'.format(outtmpl)) ' template'.format(outtmpl))
for f in opts.format_sort:
if re.match(InfoExtractor.FormatSort.regex, f) is None:
parser.error('invalid format sort string "%s" specified' % f)
any_getting = opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson or opts.dump_single_json any_getting = opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson or opts.dump_single_json
any_printing = opts.print_json any_printing = opts.print_json
@ -305,6 +310,17 @@ def parse_retries(retries):
# contents # contents
if opts.xattrs: if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'}) postprocessors.append({'key': 'XAttrMetadata'})
# This should be below all ffmpeg PP because it may cut parts out from the video
# If opts.sponskrub is None, sponskrub is used, but it silently fails if the executable can't be found
if opts.sponskrub is not False:
postprocessors.append({
'key': 'SponSkrub',
'path': opts.sponskrub_path,
'args': opts.sponskrub_args,
'cut': opts.sponskrub_cut,
'force': opts.sponskrub_force,
'ignoreerror': opts.sponskrub is None,
})
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way. # Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems. # So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd: if opts.exec_cmd:
@ -344,10 +360,16 @@ def parse_retries(retries):
'forceformat': opts.getformat, 'forceformat': opts.getformat,
'forcejson': opts.dumpjson or opts.print_json, 'forcejson': opts.dumpjson or opts.print_json,
'dump_single_json': opts.dump_single_json, 'dump_single_json': opts.dump_single_json,
'force_write_download_archive': opts.force_write_download_archive,
'simulate': opts.simulate or any_getting, 'simulate': opts.simulate or any_getting,
'skip_download': opts.skip_download, 'skip_download': opts.skip_download,
'format': opts.format, 'format': opts.format,
'format_sort': opts.format_sort,
'format_sort_force': opts.format_sort_force,
'allow_multiple_video_streams': opts.allow_multiple_video_streams,
'allow_multiple_audio_streams': opts.allow_multiple_audio_streams,
'listformats': opts.listformats, 'listformats': opts.listformats,
'listformats_table': opts.listformats_table,
'outtmpl': outtmpl, 'outtmpl': outtmpl,
'autonumber_size': opts.autonumber_size, 'autonumber_size': opts.autonumber_size,
'autonumber_start': opts.autonumber_start, 'autonumber_start': opts.autonumber_start,
@ -380,6 +402,10 @@ def parse_retries(retries):
'writeinfojson': opts.writeinfojson, 'writeinfojson': opts.writeinfojson,
'writethumbnail': opts.writethumbnail, 'writethumbnail': opts.writethumbnail,
'write_all_thumbnails': opts.write_all_thumbnails, 'write_all_thumbnails': opts.write_all_thumbnails,
'writelink': opts.writelink,
'writeurllink': opts.writeurllink,
'writewebloclink': opts.writewebloclink,
'writedesktoplink': opts.writedesktoplink,
'writesubtitles': opts.writesubtitles, 'writesubtitles': opts.writesubtitles,
'writeautomaticsub': opts.writeautomaticsub, 'writeautomaticsub': opts.writeautomaticsub,
'allsubtitles': opts.allsubtitles, 'allsubtitles': opts.allsubtitles,

View File

@ -37,15 +37,20 @@
except ImportError: # Python 2 except ImportError: # Python 2
import urllib as compat_urllib_parse import urllib as compat_urllib_parse
try:
import urllib.parse as compat_urlparse
except ImportError: # Python 2
import urlparse as compat_urlparse
try: try:
from urllib.parse import urlparse as compat_urllib_parse_urlparse from urllib.parse import urlparse as compat_urllib_parse_urlparse
except ImportError: # Python 2 except ImportError: # Python 2
from urlparse import urlparse as compat_urllib_parse_urlparse from urlparse import urlparse as compat_urllib_parse_urlparse
try: try:
import urllib.parse as compat_urlparse from urllib.parse import urlunparse as compat_urllib_parse_urlunparse
except ImportError: # Python 2 except ImportError: # Python 2
import urlparse as compat_urlparse from urlparse import urlunparse as compat_urllib_parse_urlunparse
try: try:
import urllib.response as compat_urllib_response import urllib.response as compat_urllib_response
@ -2365,6 +2370,20 @@ class compat_HTMLParseError(Exception):
except NameError: except NameError:
compat_str = str compat_str = str
try:
from urllib.parse import quote as compat_urllib_parse_quote
from urllib.parse import quote_plus as compat_urllib_parse_quote_plus
except ImportError: # Python 2
def compat_urllib_parse_quote(string, safe='/'):
return compat_urllib_parse.quote(
string.encode('utf-8'),
str(safe))
def compat_urllib_parse_quote_plus(string, safe=''):
return compat_urllib_parse.quote_plus(
string.encode('utf-8'),
str(safe))
try: try:
from urllib.parse import unquote_to_bytes as compat_urllib_parse_unquote_to_bytes from urllib.parse import unquote_to_bytes as compat_urllib_parse_unquote_to_bytes
from urllib.parse import unquote as compat_urllib_parse_unquote from urllib.parse import unquote as compat_urllib_parse_unquote
@ -3033,11 +3052,14 @@ def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
'compat_tokenize_tokenize', 'compat_tokenize_tokenize',
'compat_urllib_error', 'compat_urllib_error',
'compat_urllib_parse', 'compat_urllib_parse',
'compat_urllib_parse_quote',
'compat_urllib_parse_quote_plus',
'compat_urllib_parse_unquote', 'compat_urllib_parse_unquote',
'compat_urllib_parse_unquote_plus', 'compat_urllib_parse_unquote_plus',
'compat_urllib_parse_unquote_to_bytes', 'compat_urllib_parse_unquote_to_bytes',
'compat_urllib_parse_urlencode', 'compat_urllib_parse_urlencode',
'compat_urllib_parse_urlparse', 'compat_urllib_parse_urlparse',
'compat_urllib_parse_urlunparse',
'compat_urllib_request', 'compat_urllib_request',
'compat_urllib_request_DataHandler', 'compat_urllib_request_DataHandler',
'compat_urllib_response', 'compat_urllib_response',

View File

@ -351,7 +351,7 @@ def download(self, filename, info_dict, subtitle=False):
'status': 'finished', 'status': 'finished',
'total_bytes': os.path.getsize(encodeFilename(filename)), 'total_bytes': os.path.getsize(encodeFilename(filename)),
}) })
return True return True, False
if subtitle is False: if subtitle is False:
min_sleep_interval = self.params.get('sleep_interval') min_sleep_interval = self.params.get('sleep_interval')
@ -372,7 +372,7 @@ def download(self, filename, info_dict, subtitle=False):
'[download] Sleeping %s seconds...' % ( '[download] Sleeping %s seconds...' % (
sleep_interval_sub)) sleep_interval_sub))
time.sleep(sleep_interval_sub) time.sleep(sleep_interval_sub)
return self.real_download(filename, info_dict) return self.real_download(filename, info_dict), True
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
"""Real download process. Redefine in subclasses.""" """Real download process. Redefine in subclasses."""

View File

@ -42,11 +42,13 @@ def can_download(manifest, info_dict):
# no segments will definitely be appended to the end of the playlist. # no segments will definitely be appended to the end of the playlist.
# r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of # r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# # event media playlists [4] # # event media playlists [4]
r'#EXT-X-MAP:', # media initialization [5]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4 # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2 # 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2 # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5 # 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
# 5. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.5
) )
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES] check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest

View File

@ -61,7 +61,7 @@ def parse_yt_initial_data(data):
else: else:
url = ('https://www.youtube.com/live_chat_replay/get_live_chat_replay' url = ('https://www.youtube.com/live_chat_replay/get_live_chat_replay'
+ '?continuation={}'.format(continuation_id) + '?continuation={}'.format(continuation_id)
+ '&playerOffsetMs={}'.format(offset - 5000) + '&playerOffsetMs={}'.format(max(offset - 5000, 0))
+ '&hidden=false' + '&hidden=false'
+ '&pbj=1') + '&pbj=1')
success, raw_fragment = dl_fragment(url) success, raw_fragment = dl_fragment(url)

View File

@ -2,21 +2,47 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import functools
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html, clean_html,
float_or_none,
int_or_none, int_or_none,
try_get, parse_iso8601,
unified_timestamp,
OnDemandPagedList,
) )
class ACastIE(InfoExtractor): class ACastBaseIE(InfoExtractor):
def _extract_episode(self, episode, show_info):
title = episode['title']
info = {
'id': episode['id'],
'display_id': episode.get('episodeUrl'),
'url': episode['url'],
'title': title,
'description': clean_html(episode.get('description') or episode.get('summary')),
'thumbnail': episode.get('image'),
'timestamp': parse_iso8601(episode.get('publishDate')),
'duration': int_or_none(episode.get('duration')),
'filesize': int_or_none(episode.get('contentLength')),
'season_number': int_or_none(episode.get('season')),
'episode': title,
'episode_number': int_or_none(episode.get('episode')),
}
info.update(show_info)
return info
def _extract_show_info(self, show):
return {
'creator': show.get('author'),
'series': show.get('title'),
}
def _call_api(self, path, video_id, query=None):
return self._download_json(
'https://feeder.acast.com/api/v1/shows/' + path, video_id, query=query)
class ACastIE(ACastBaseIE):
IE_NAME = 'acast' IE_NAME = 'acast'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@ -28,15 +54,15 @@ class ACastIE(InfoExtractor):
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna', 'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': '16d936099ec5ca2d5869e3a813ee8dc4', 'md5': 'f5598f3ad1e4776fed12ec1407153e4b',
'info_dict': { 'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9', 'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3', 'ext': 'mp3',
'title': '2. Raggarmordet - Röster ur det förflutna', 'title': '2. Raggarmordet - Röster ur det förflutna',
'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4', 'description': 'md5:a992ae67f4d98f1c0141598f7bebbf67',
'timestamp': 1477346700, 'timestamp': 1477346700,
'upload_date': '20161024', 'upload_date': '20161024',
'duration': 2766.602563, 'duration': 2766,
'creator': 'Anton Berg & Martin Johnson', 'creator': 'Anton Berg & Martin Johnson',
'series': 'Spår', 'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna', 'episode': '2. Raggarmordet - Röster ur det förflutna',
@ -45,7 +71,7 @@ class ACastIE(InfoExtractor):
'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015', 'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22', 'url': 'https://play.acast.com/s/rattegangspodden/s04e09styckmordetihelenelund-del2-2',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9', 'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
@ -54,40 +80,14 @@ class ACastIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups() channel, display_id = re.match(self._VALID_URL, url).groups()
s = self._download_json( episode = self._call_api(
'https://feeder.acast.com/api/v1/shows/%s/episodes/%s' % (channel, display_id), '%s/episodes/%s' % (channel, display_id),
display_id) display_id, {'showInfo': 'true'})
media_url = s['url'] return self._extract_episode(
if re.search(r'[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}', display_id): episode, self._extract_show_info(episode.get('show') or {}))
episode_url = s.get('episodeUrl')
if episode_url:
display_id = episode_url
else:
channel, display_id = re.match(self._VALID_URL, s['link']).groups()
cast_data = self._download_json(
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id),
display_id)['result']
e = cast_data['episode']
title = e.get('name') or s['title']
return {
'id': compat_str(e['id']),
'display_id': display_id,
'url': media_url,
'title': title,
'description': e.get('summary') or clean_html(e.get('description') or s.get('description')),
'thumbnail': e.get('image'),
'timestamp': unified_timestamp(e.get('publishingDate') or s.get('publishDate')),
'duration': float_or_none(e.get('duration') or s.get('duration')),
'filesize': int_or_none(e.get('contentLength')),
'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str),
'series': try_get(cast_data, lambda x: x['show']['name'], compat_str),
'season_number': int_or_none(e.get('seasonNumber')),
'episode': title,
'episode_number': int_or_none(e.get('episodeNumber')),
}
class ACastChannelIE(InfoExtractor): class ACastChannelIE(ACastBaseIE):
IE_NAME = 'acast:channel' IE_NAME = 'acast:channel'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@ -102,34 +102,24 @@ class ACastChannelIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '4efc5294-5385-4847-98bd-519799ce5786', 'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Today in Focus', 'title': 'Today in Focus',
'description': 'md5:9ba5564de5ce897faeb12963f4537a64', 'description': 'md5:c09ce28c91002ce4ffce71d6504abaae',
}, },
'playlist_mincount': 35, 'playlist_mincount': 200,
}, { }, {
'url': 'http://play.acast.com/s/ft-banking-weekly', 'url': 'http://play.acast.com/s/ft-banking-weekly',
'only_matching': True, 'only_matching': True,
}] }]
_API_BASE_URL = 'https://play.acast.com/api/'
_PAGE_SIZE = 10
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url) return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
def _fetch_page(self, channel_slug, page):
casts = self._download_json(
self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
channel_slug, note='Download page %d of channel data' % page)
for cast in casts:
yield self.url_result(
'https://play.acast.com/s/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id'])
def _real_extract(self, url): def _real_extract(self, url):
channel_slug = self._match_id(url) show_slug = self._match_id(url)
channel_data = self._download_json( show = self._call_api(show_slug, show_slug)
self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug) show_info = self._extract_show_info(show)
entries = OnDemandPagedList(functools.partial( entries = []
self._fetch_page, channel_slug), self._PAGE_SIZE) for episode in (show.get('episodes') or []):
return self.playlist_result(entries, compat_str( entries.append(self._extract_episode(episode, show_info))
channel_data['id']), channel_data['name'], channel_data.get('description')) return self.playlist_result(
entries, show.get('id'), show.get('title'), show.get('description'))

View File

@ -5,20 +5,32 @@
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
extract_attributes,
ExtractorError, ExtractorError,
GeoRestrictedError,
int_or_none, int_or_none,
smuggle_url,
update_url_query, update_url_query,
) urlencode_postdata,
from ..compat import (
compat_urlparse,
) )
class AENetworksBaseIE(ThePlatformIE): class AENetworksBaseIE(ThePlatformIE):
_BASE_URL_REGEX = r'''(?x)https?://
(?:(?:www|play|watch)\.)?
(?P<domain>
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/'''
_THEPLATFORM_KEY = 'crazyjava' _THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t' _THEPLATFORM_SECRET = 's3cr3t'
_DOMAIN_MAP = {
'history.com': ('HISTORY', 'history'),
'aetv.com': ('AETV', 'aetv'),
'mylifetime.com': ('LIFETIME', 'lifetime'),
'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc'),
'fyi.tv': ('FYI', 'fyi'),
'historyvault.com': (None, 'historyvault'),
'biography.com': (None, 'biography'),
}
def _extract_aen_smil(self, smil_url, video_id, auth=None): def _extract_aen_smil(self, smil_url, video_id, auth=None):
query = {'mbr': 'true'} query = {'mbr': 'true'}
@ -31,7 +43,7 @@ def _extract_aen_smil(self, smil_url, video_id, auth=None):
'assetTypes': 'high_video_s3' 'assetTypes': 'high_video_s3'
}, { }, {
'assetTypes': 'high_video_s3', 'assetTypes': 'high_video_s3',
'switch': 'hls_ingest_fastly' 'switch': 'hls_high_fastly',
}] }]
formats = [] formats = []
subtitles = {} subtitles = {}
@ -44,6 +56,8 @@ def _extract_aen_smil(self, smil_url, video_id, auth=None):
tp_formats, tp_subtitles = self._extract_theplatform_smil( tp_formats, tp_subtitles = self._extract_theplatform_smil(
m_url, video_id, 'Downloading %s SMIL data' % (q.get('switch') or q['assetTypes'])) m_url, video_id, 'Downloading %s SMIL data' % (q.get('switch') or q['assetTypes']))
except ExtractorError as e: except ExtractorError as e:
if isinstance(e, GeoRestrictedError):
raise
last_e = e last_e = e
continue continue
formats.extend(tp_formats) formats.extend(tp_formats)
@ -57,24 +71,45 @@ def _extract_aen_smil(self, smil_url, video_id, auth=None):
'subtitles': subtitles, 'subtitles': subtitles,
} }
def _extract_aetn_info(self, domain, filter_key, filter_value, url):
requestor_id, brand = self._DOMAIN_MAP[domain]
result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand,
filter_value, query={'filter[%s]' % filter_key: filter_value})['results'][0]
title = result['title']
video_id = result['id']
media_url = result['publicUrl']
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
auth = None
if theplatform_metadata.get('AETN$isBehindWall'):
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({
'title': title,
'series': result.get('seriesName'),
'season_number': int_or_none(result.get('tvSeasonNumber')),
'episode_number': int_or_none(result.get('tvSeasonEpisodeNumber')),
})
return info
class AENetworksIE(AENetworksBaseIE): class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault'
_VALID_URL = r'''(?x) _VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'''(?P<id>
https?:// shows/[^/]+/season-\d+/episode-\d+|
(?:www\.)? (?:
(?P<domain> (?:movie|special)s/[^/]+|
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com| (?:shows/[^/]+/)?videos
fyi\.tv )/[^/?#&]+
)/ )'''
(?:
shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|
movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?|
specials/(?P<special_display_id>[^/]+)/(?:full-special|preview-)|
collections/[^/]+/(?P<collection_display_id>[^/]+)
)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'info_dict': { 'info_dict': {
@ -91,22 +126,23 @@ class AENetworksIE(AENetworksBaseIE):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
}, { 'skip': 'This video is only available for users of participating TV providers.',
'url': 'http://www.history.com/shows/ancient-aliens/season-1',
'info_dict': {
'id': '71889446852',
},
'playlist_mincount': 5,
}, {
'url': 'http://www.mylifetime.com/shows/atlanta-plastic',
'info_dict': {
'id': 'SERIES4317',
'title': 'Atlanta Plastic',
},
'playlist_mincount': 2,
}, { }, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1', 'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'only_matching': True 'info_dict': {
'id': '600587331957',
'ext': 'mp4',
'title': 'Inlawful Entry',
'description': 'md5:57c12115a2b384d883fe64ca50529e08',
'timestamp': 1452634428,
'upload_date': '20160112',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}, { }, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8', 'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True 'only_matching': True
@ -117,78 +153,125 @@ class AENetworksIE(AENetworksBaseIE):
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie', 'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True 'only_matching': True
}, { }, {
'url': 'https://www.lifetimemovieclub.com/movies/a-killer-among-us', 'url': 'https://watch.lifetimemovieclub.com/movies/10-year-reunion/full-movie',
'only_matching': True 'only_matching': True
}, { }, {
'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special', 'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special',
'only_matching': True 'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/america-the-story-of-us/westward',
'only_matching': True
}, { }, {
'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story/preview-hunting-jonbenets-killer-the-untold-story', 'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story/preview-hunting-jonbenets-killer-the-untold-story',
'only_matching': True 'only_matching': True
}, {
'url': 'http://www.history.com/videos/history-of-valentines-day',
'only_matching': True
}, {
'url': 'https://play.aetv.com/shows/duck-dynasty/videos/best-of-duck-dynasty-getting-quack-in-shape',
'only_matching': True
}] }]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME',
'lifetimemovieclub.com': 'LIFETIMEMOVIECLUB',
'fyi.tv': 'FYI',
}
def _real_extract(self, url): def _real_extract(self, url):
domain, show_path, movie_display_id, special_display_id, collection_display_id = re.match(self._VALID_URL, url).groups() domain, canonical = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id or special_display_id or collection_display_id return self._extract_aetn_info(domain, 'canonical', '/' + canonical, url)
webpage = self._download_webpage(url, display_id, headers=self.geo_verification_headers())
if show_path:
url_parts = show_path.split('/')
url_parts_len = len(url_parts)
if url_parts_len == 1:
entries = []
for season_url_path in re.findall(r'(?s)<li[^>]+data-href="(/shows/%s/season-\d+)"' % url_parts[0], webpage):
entries.append(self.url_result(
compat_urlparse.urljoin(url, season_url_path), 'AENetworks'))
if entries:
return self.playlist_result(
entries, self._html_search_meta('aetn:SeriesId', webpage),
self._html_search_meta('aetn:SeriesTitle', webpage))
else:
# single season
url_parts_len = 2
if url_parts_len == 2:
entries = []
for episode_item in re.findall(r'(?s)<[^>]+class="[^"]*(?:episode|program)-item[^"]*"[^>]*>', webpage):
episode_attributes = extract_attributes(episode_item)
episode_url = compat_urlparse.urljoin(
url, episode_attributes['data-canonical'])
entries.append(self.url_result(
episode_url, 'AENetworks',
episode_attributes.get('data-videoid') or episode_attributes.get('data-video-id')))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeasonId', webpage))
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex( class AENetworksListBaseIE(AENetworksBaseIE):
[r"media_url\s*=\s*'(?P<url>[^']+)'", def _call_api(self, resource, slug, brand, fields):
r'data-media-url=(?P<url>(?:https?:)?//[^\s>]+)', return self._download_json(
r'data-media-url=(["\'])(?P<url>(?:(?!\1).)+?)\1'], 'https://yoga.appsvcs.aetnd.com/graphql',
webpage, 'video url', group='url') slug, query={'brand': brand}, data=urlencode_postdata({
theplatform_metadata = self._download_theplatform_metadata(self._search_regex( 'query': '''{
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id) %s(slug: "%s") {
info = self._parse_theplatform_metadata(theplatform_metadata) %s
auth = None }
if theplatform_metadata.get('AETN$isBehindWall'): }''' % (resource, slug, fields),
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain] }))['data'][resource]
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'], def _real_extract(self, url):
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'), domain, slug = re.match(self._VALID_URL, url).groups()
theplatform_metadata['ratings'][0]['rating']) _, brand = self._DOMAIN_MAP[domain]
auth = self._extract_mvpd_auth( playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS)
url, video_id, requestor_id, resource) base_url = 'http://watch.%s' % domain
info.update(self._search_json_ld(webpage, video_id, fatal=False))
info.update(self._extract_aen_smil(media_url, video_id, auth)) entries = []
return info for item in (playlist.get(self._ITEMS_KEY) or []):
doc = self._get_doc(item)
canonical = doc.get('canonical')
if not canonical:
continue
entries.append(self.url_result(
base_url + canonical, AENetworksIE.ie_key(), doc.get('id')))
description = None
if self._PLAYLIST_DESCRIPTION_KEY:
description = playlist.get(self._PLAYLIST_DESCRIPTION_KEY)
return self.playlist_result(
entries, playlist.get('id'),
playlist.get(self._PLAYLIST_TITLE_KEY), description)
class AENetworksCollectionIE(AENetworksListBaseIE):
IE_NAME = 'aenetworks:collection'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'(?:[^/]+/)*(?:list|collections)/(?P<id>[^/?#&]+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'https://watch.historyvault.com/list/america-the-story-of-us',
'info_dict': {
'id': '282',
'title': 'America The Story of Us',
},
'playlist_mincount': 12,
}, {
'url': 'https://watch.historyvault.com/shows/america-the-story-of-us-2/season-1/list/america-the-story-of-us',
'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/mysteryquest',
'only_matching': True
}]
_RESOURCE = 'list'
_ITEMS_KEY = 'items'
_PLAYLIST_TITLE_KEY = 'display_title'
_PLAYLIST_DESCRIPTION_KEY = None
_FIELDS = '''id
display_title
items {
... on ListVideoItem {
doc {
canonical
id
}
}
}'''
def _get_doc(self, item):
return item.get('doc') or {}
class AENetworksShowIE(AENetworksListBaseIE):
IE_NAME = 'aenetworks:show'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'shows/(?P<id>[^/?#&]+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'http://www.history.com/shows/ancient-aliens',
'info_dict': {
'id': 'SH012427480000',
'title': 'Ancient Aliens',
'description': 'md5:3f6d74daf2672ff3ae29ed732e37ea7f',
},
'playlist_mincount': 168,
}]
_RESOURCE = 'series'
_ITEMS_KEY = 'episodes'
_PLAYLIST_TITLE_KEY = 'title'
_PLAYLIST_DESCRIPTION_KEY = 'description'
_FIELDS = '''description
id
title
episodes {
canonical
id
}'''
def _get_doc(self, item):
return item
class HistoryTopicIE(AENetworksBaseIE): class HistoryTopicIE(AENetworksBaseIE):
@ -204,6 +287,7 @@ class HistoryTopicIE(AENetworksBaseIE):
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7', 'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
'timestamp': 1375819729, 'timestamp': 1375819729,
'upload_date': '20130806', 'upload_date': '20130806',
'uploader': 'AENE-NEW',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -212,36 +296,47 @@ class HistoryTopicIE(AENetworksBaseIE):
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
}] }]
def theplatform_url_result(self, theplatform_url, video_id, query): def _real_extract(self, url):
return { display_id = self._match_id(url)
'_type': 'url_transparent', return self.url_result(
'id': video_id, 'http://www.history.com/videos/' + display_id,
'url': smuggle_url( AENetworksIE.ie_key())
update_url_query(theplatform_url, query),
{
'sig': { class HistoryPlayerIE(AENetworksBaseIE):
'key': self._THEPLATFORM_KEY, IE_NAME = 'history:player'
'secret': self._THEPLATFORM_SECRET, _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|biography)\.com)/player/(?P<id>\d+)'
}, _TESTS = []
'force_smil_url': True
}), def _real_extract(self, url):
'ie_key': 'ThePlatform', domain, video_id = re.match(self._VALID_URL, url).groups()
} return self._extract_aetn_info(domain, 'id', video_id, url)
class BiographyIE(AENetworksBaseIE):
_VALID_URL = r'https?://(?:www\.)?biography\.com/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.biography.com/video/vincent-van-gogh-full-episode-2075049808',
'info_dict': {
'id': '30322987',
'ext': 'mp4',
'title': 'Vincent Van Gogh - Full Episode',
'description': 'A full biography about the most influential 20th century painter, Vincent Van Gogh.',
'timestamp': 1311970571,
'upload_date': '20110729',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( player_url = self._search_regex(
r'<phoenix-iframe[^>]+src="[^"]+\btpid=(\d+)', webpage, 'tpid') r'<phoenix-iframe[^>]+src="(%s)' % HistoryPlayerIE._VALID_URL,
result = self._download_json( webpage, 'player URL')
'https://feeds.video.aetnd.com/api/v2/history/videos', return self.url_result(player_url, HistoryPlayerIE.ie_key())
video_id, query={'filter[id]': video_id})['results'][0]
title = result['title']
info = self._extract_aen_smil(result['publicUrl'], video_id)
info.update({
'title': title,
'description': result.get('description'),
'duration': int_or_none(result.get('duration')),
'timestamp': int_or_none(result.get('added'), 1000),
})
return info

View File

@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
@ -11,25 +13,22 @@
class AMCNetworksIE(ThePlatformIE): class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?(?P<site>amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?P<id>(?:movies|shows(?:/[^/]+)+)/[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1', 'url': 'https://www.bbcamerica.com/shows/the-graham-norton-show/videos/tina-feys-adorable-airline-themed-family-dinner--51631',
'md5': '',
'info_dict': { 'info_dict': {
'id': 's3MX01Nl4vPH', 'id': '4Lq1dzOnZGt0',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Maron - Season 4 - Step 1', 'title': "The Graham Norton Show - Season 28 - Tina Fey's Adorable Airline-Themed Family Dinner",
'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.', 'description': "It turns out child stewardesses are very generous with the wine! All-new episodes of 'The Graham Norton Show' premiere Fridays at 11/10c on BBC America.",
'age_limit': 17, 'upload_date': '20201120',
'upload_date': '20160505', 'timestamp': 1605904350,
'timestamp': 1462468831,
'uploader': 'AMCN', 'uploader': 'AMCN',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Requires TV provider accounts',
}, { }, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge', 'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True, 'only_matching': True,
@ -55,32 +54,34 @@ class AMCNetworksIE(ThePlatformIE):
'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1', 'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1',
'only_matching': True, 'only_matching': True,
}] }]
_REQUESTOR_ID_MAP = {
'amc': 'AMC',
'bbcamerica': 'BBCA',
'ifc': 'IFC',
'sundancetv': 'SUNDANCE',
'wetv': 'WETV',
}
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) requestor_id = self._REQUESTOR_ID_MAP[site]
properties = self._download_json(
'https://content-delivery-gw.svc.ds.amcn.com/api/v2/content/amcn/%s/url/%s' % (requestor_id.lower(), display_id),
display_id)['data']['properties']
query = { query = {
'mbr': 'true', 'mbr': 'true',
'manifest': 'm3u', 'manifest': 'm3u',
} }
media_url = self._search_regex( tp_path = 'M_UwQC/media/' + properties['videoPid']
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', media_url = 'https://link.theplatform.com/s/' + tp_path
webpage, 'media url') theplatform_metadata = self._download_theplatform_metadata(tp_path, display_id)
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata) info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid'] video_id = theplatform_metadata['pid']
title = theplatform_metadata['title'] title = theplatform_metadata['title']
rating = try_get( rating = try_get(
theplatform_metadata, lambda x: x['ratings'][0]['rating']) theplatform_metadata, lambda x: x['ratings'][0]['rating'])
auth_required = self._search_regex( video_category = properties.get('videoCategory')
r'window\.authRequired\s*=\s*(true|false);', if video_category and video_category.endswith('-Auth'):
webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(
r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
webpage, 'requestor id')
resource = self._get_mvpd_resource( resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating) requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth( query['auth'] = self._extract_mvpd_auth(

View File

@ -1,33 +1,33 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
int_or_none,
js_to_json,
try_get, try_get,
unified_strdate, unified_strdate,
) )
class AmericasTestKitchenIE(InfoExtractor): class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?(?:americastestkitchen|cooks(?:country|illustrated))\.com/(?P<resource_type>episode|videos)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers', 'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
'md5': 'b861c3e365ac38ad319cfd509c30577f', 'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': { 'info_dict': {
'id': '5b400b9ee338f922cb06450c', 'id': '5b400b9ee338f922cb06450c',
'title': 'Weeknight Japanese Suppers', 'title': 'Japanese Suppers',
'ext': 'mp4', 'ext': 'mp4',
'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8', 'description': 'md5:64e606bfee910627efc4b5f050de92b3',
'thumbnail': r're:^https?://', 'thumbnail': r're:^https?://',
'timestamp': 1523664000, 'timestamp': 1523664000,
'upload_date': '20180414', 'upload_date': '20180414',
'release_date': '20180414', 'release_date': '20180410',
'series': "America's Test Kitchen", 'series': "America's Test Kitchen",
'season_number': 18, 'season_number': 18,
'episode': 'Weeknight Japanese Suppers', 'episode': 'Japanese Suppers',
'episode_number': 15, 'episode_number': 15,
}, },
'params': { 'params': {
@ -36,47 +36,31 @@ class AmericasTestKitchenIE(InfoExtractor):
}, { }, {
'url': 'https://www.americastestkitchen.com/videos/3420-pan-seared-salmon', 'url': 'https://www.americastestkitchen.com/videos/3420-pan-seared-salmon',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.cookscountry.com/episode/564-when-only-chocolate-will-do',
'only_matching': True,
}, {
'url': 'https://www.cooksillustrated.com/videos/4478-beef-wellington',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) resource_type, video_id = re.match(self._VALID_URL, url).groups()
is_episode = resource_type == 'episode'
if is_episode:
resource_type = 'episodes'
webpage = self._download_webpage(url, video_id) resource = self._download_json(
'https://www.americastestkitchen.com/api/v6/%s/%s' % (resource_type, video_id), video_id)
video_data = self._parse_json( video = resource['video'] if is_episode else resource
self._search_regex( episode = resource if is_episode else resource.get('episode') or {}
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'),
video_id, js_to_json)
ep_data = try_get(
video_data,
(lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
'description') or ep_meta.get('description'))
thumbnail = try_get(ep_meta, lambda x: x['photo']['image_url'])
release_date = unified_strdate(ep_data.get('aired_at'))
season_number = int_or_none(ep_meta.get('season_number'))
episode = ep_meta.get('title')
episode_number = int_or_none(ep_meta.get('episode_number'))
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id, 'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % video['zypeId'],
'ie_key': 'Zype', 'ie_key': 'Zype',
'title': title, 'description': clean_html(video.get('description')),
'description': description, 'release_date': unified_strdate(video.get('publishDate')),
'thumbnail': thumbnail, 'series': try_get(episode, lambda x: x['show']['title']),
'release_date': release_date, 'episode': episode.get('title'),
'series': "America's Test Kitchen",
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
} }

View File

@ -116,7 +116,76 @@ class AnvatoIE(InfoExtractor):
'anvato_scripps_app_ios_prod_409c41960c60b308db43c3cc1da79cab9f1c3d93': 'WPxj5GraLTkYCyj3M7RozLqIycjrXOEcDGFMIJPn', 'anvato_scripps_app_ios_prod_409c41960c60b308db43c3cc1da79cab9f1c3d93': 'WPxj5GraLTkYCyj3M7RozLqIycjrXOEcDGFMIJPn',
'EZqvRyKBJLrgpClDPDF8I7Xpdp40Vx73': '4OxGd2dEakylntVKjKF0UK9PDPYB6A9W', 'EZqvRyKBJLrgpClDPDF8I7Xpdp40Vx73': '4OxGd2dEakylntVKjKF0UK9PDPYB6A9W',
'M2v78QkpleXm9hPp9jUXI63x5vA6BogR': 'ka6K32k7ZALmpINkjJUGUo0OE42Md1BQ', 'M2v78QkpleXm9hPp9jUXI63x5vA6BogR': 'ka6K32k7ZALmpINkjJUGUo0OE42Md1BQ',
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ' 'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
'X8POa4zPPaKVZHqmWjuEzfP31b1QM9VN': 'Dn5vOY9ooDw7VSl9qztjZI5o0g08mA0z',
'M2v78QkBMpNJlSPp9diX5F2PBmBy6Bog': 'ka6K32kyo7nDZfNkjQCGWf1lpApXMd1B',
'bvJ0dQpav07l0hG5JgfVLF2dv1vARwpP': 'BzoQW24GrJZoJfmNodiJKSPeB9B8NOxj',
'lxQMLg2XZKuEZaWgsqubBxV9INZ6bryY': 'Vm2Mx6noKds9jB71h6urazwlTG3m9x8l',
'04EnjvXeoSmkbJ9ckPs7oY0mcxv7PlyN': 'aXERQP9LMfQVlEDsgGs6eEA1SWznAQ8P',
'mQbO2ge6BFRWVPYCYpU06YvNt80XLvAX': 'E2BV1NGmasN5v7eujECVPJgwflnLPm2A',
'g43oeBzJrCml7o6fa5fRL1ErCdeD8z4K': 'RX34mZ6zVH4Nr6whbxIGLv9WSbxEKo8V',
'VQrDJoP7mtdBzkxhXbSPwGB1coeElk4x': 'j2VejQx0VFKQepAF7dI0mJLKtOVJE18z',
'WxA5NzLRjCrmq0NUgaU5pdMDuZO7RJ4w': 'lyY5ADLKaIOLEgAsGQCveEMAcqnx3rY9',
'M4lpMXB71ie0PjMCjdFzVXq0SeRVqz49': 'n2zVkOqaLIv3GbLfBjcwW51LcveWOZ2e',
'dyDZGEqN8u8nkJZcJns0oxYmtP7KbGAn': 'VXOEqQW9BtEVLajfZQSLEqxgS5B7qn2D',
'E7QNjrVY5u5mGvgu67IoDgV1CjEND8QR': 'rz8AaDmdKIkLmPNhB5ILPJnjS5PnlL8d',
'a4zrqjoKlfzg0dwHEWtP31VqcLBpjm4g': 'LY9J16gwETdGWa3hjBu5o0RzuoQDjqXQ',
'dQP5BZroMsMVLO1hbmT5r2Enu86GjxA6': '7XR3oOdbPF6x3PRFLDCq9RkgsRjAo48V',
'M4lKNBO1NFe0PjMCj1tzVXq0SeRVqzA9': 'n2zoRqGLRUv3GbLfBmTwW51LcveWOZYe',
'nAZ7MZdpGCGg1pqFEbsoJOz2C60mv143': 'dYJgdqA9aT4yojETqGi7yNgoFADxqmXP',
'3y1MERYgOuE9NzbFgwhV6Wv2F0YKvbyz': '081xpZDQgC4VadLTavhWQxrku56DAgXV',
'bmQvmEXr5HWklBMCZOcpE2Z3HBYwqGyl': 'zxXPbVNyMiMAZldhr9FkOmA0fl4aKr2v',
'wA7oDNYldfr6050Hwxi52lPZiVlB86Ap': 'ZYK16aA7ni0d3l3c34uwpxD7CbReMm8Q',
'g43MbKMWmFml7o7sJoSRkXxZiXRvJ3QK': 'RX3oBJonvs4Nr6rUWBCGn3matRGqJPXV',
'mA9VdlqpLS0raGaSDvtoqNrBTzb8XY4q': '0XN4OjBD3fnW7r7IbmtJB4AyfOmlrE2r',
'mAajOwgkGt17oGoFmEuklMP9H0GnW54d': 'lXbBLPGyzikNGeGujAuAJGjZiwLRxyXR',
'vy8vjJ9kbUwrRqRu59Cj5dWZfzYErlAb': 'K8l7gpwaGcBpnAnCLNCmPZRdin3eaQX0',
'xQMWBpR8oHEZaWaSMGUb0avOHjLVYn4Y': 'm2MrN4vEaf9jB7BFy5Srb40jTrN67AYl',
'xyKEmVO3miRr6D6UVkt7oB8jtD6aJEAv': 'g2ddDebqDfqdgKgswyUKwGjbTWwzq923',
'7Qk0wa2D9FjKapacoJF27aLvUDKkLGA0': 'b2kgBEkephJaMkMTL7s1PLe4Ua6WyP2P',
'3QLg6nqmNTJ5VvVTo7f508LPidz1xwyY': 'g2L1GgpraipmAOAUqmIbBnPxHOmw4MYa',
'3y1B7zZjXTE9NZNSzZSVNPZaTNLjo6Qz': '081b5G6wzH4VagaURmcWbN5mT4JGEe2V',
'lAqnwvkw6SG6D8DSqmUg6DRLUp0w3G4x': 'O2pbP0xPDFNJjpjIEvcdryOJtpkVM4X5',
'awA7xd1N0Hr6050Hw2c52lPZiVlB864p': 'GZYKpn4aoT0d3l3c3PiwpxD7CbReMmXQ',
'jQVqPLl9YHL1WGWtR1HDgWBGT63qRNyV': '6X03ne6vrU4oWyWUN7tQVoajikxJR3Ye',
'GQRMR8mL7uZK797t7xH3eNzPIP5dOny1': 'm2vqPWGd4U31zWzSyasDRAoMT1PKRp8o',
'zydq9RdmRhXLkNkfNoTJlMzaF0lWekQB': '3X7LnvE7vH5nkEkSqLiey793Un7dLB8e',
'VQrDzwkB2IdBzjzu9MHPbEYkSB50gR4x': 'j2VebLzoKUKQeEesmVh0gM1eIp9jKz8z',
'mAa2wMamBs17oGoFmktklMP9H0GnW54d': 'lXbgP74xZTkNGeGujVUAJGjZiwLRxy8R',
'7yjB6ZLG6sW8R6RF2xcan1KGfJ5dNoyd': 'wXQkPorvPHZ45N5t4Jf6qwg5Tp4xvw29',
'a4zPpNeWGuzg0m0iX3tPeanGSkRKWXQg': 'LY9oa3QAyHdGW9Wu3Ri5JGeEik7l1N8Q',
'k2rneA2M38k25cXDwwSknTJlxPxQLZ6M': '61lyA2aEVDzklfdwmmh31saPxQx2VRjp',
'bK9Zk4OvPnvxduLgxvi8VUeojnjA02eV': 'o5jANYjbeMb4nfBaQvcLAt1jzLzYx6ze',
'5VD6EydM3R9orHmNMGInGCJwbxbQvGRw': 'w3zjmX7g4vnxzCxElvUEOiewkokXprkZ',
'70X35QbVYVYNPUmP9YfbzI06YqYQk2R1': 'vG4Aj2BMjMjoztB7zeFOnCVPJpJ8lMOa',
'26qYwQVG9p1Bks2GgBckjfDJOXOAMgG1': 'r4ev9X0mv5zqJc0yk5IBDcQOwZw8mnwQ',
'rvVKpA56MBXWlSxMw3cobT5pdkd4Dm7q': '1J7ZkY53pZ645c93owcLZuveE7E8B3rL',
'qN1zdy1zlYL23IWZGWtDvfV6WeWQWkJo': 'qN1zdy1zlYL23IWZGWtDvfV6WeWQWkJo',
'jdKqRGF16dKsBviMDae7IGDl7oTjEbVV': 'Q09l7vhlNxPFErIOK6BVCe7KnwUW5DVV',
'3QLkogW1OUJ5VvPsrDH56DY2u7lgZWyY': 'g2LRE1V9espmAOPhE4ubj4ZdUA57yDXa',
'wyJvWbXGBSdbkEzhv0CW8meou82aqRy8': 'M2wolPvyBIpQGkbT4juedD4ruzQGdK2y',
'7QkdZrzEkFjKap6IYDU2PB0oCNZORmA0': 'b2kN1l96qhJaMkPs9dt1lpjBfwqZoA8P',
'pvA05113MHG1w3JTYxc6DVlRCjErVz4O': 'gQXeAbblBUnDJ7vujbHvbRd1cxlz3AXO',
'mA9blJDZwT0raG1cvkuoeVjLC7ZWd54q': '0XN9jRPwMHnW7rvumgfJZOD9CJgVkWYr',
'5QwRN5qKJTvGKlDTmnf7xwNZcjRmvEy9': 'R2GP6LWBJU1QlnytwGt0B9pytWwAdDYy',
'eyn5rPPbkfw2KYxH32fG1q58CbLJzM40': 'p2gyqooZnS56JWeiDgfmOy1VugOQEBXn',
'3BABn3b5RfPJGDwilbHe7l82uBoR05Am': '7OYZG7KMVhbPdKJS3xcWEN3AuDlLNmXj',
'xA5zNGXD3HrmqMlF6OS5pdMDuZO7RJ4w': 'yY5DAm6r1IOLE3BCVMFveEMAcqnx3r29',
'g43PgW3JZfml7o6fDEURL1ErCdeD8zyK': 'RX3aQn1zrS4Nr6whDgCGLv9WSbxEKo2V',
'lAqp8WbGgiG6D8LTKJcg3O72CDdre1Qx': 'O2pnm6473HNJjpKuVosd3vVeh975yrX5',
'wyJbYEDxKSdbkJ6S6RhW8meou82aqRy8': 'M2wPm7EgRSpQGlAh70CedD4ruzQGdKYy',
'M4lgW28nLCe0PVdtaXszVXq0SeRVqzA9': 'n2zmJvg4jHv3G0ETNgiwW51LcveWOZ8e',
'5Qw3OVvp9FvGKlDTmOC7xwNZcjRmvEQ9': 'R2GzDdml9F1Qlnytw9s0B9pytWwAdD8y',
'vy8a98X7zCwrRqbHrLUjYzwDiK2b70Qb': 'K8lVwzyjZiBpnAaSGeUmnAgxuGOBxmY0',
'g4eGjJLLoiqRD3Pf9oT5O03LuNbLRDQp': '6XqD59zzpfN4EwQuaGt67qNpSyRBlnYy',
'g43OPp9boIml7o6fDOIRL1ErCdeD8z4K': 'RX33alNB4s4Nr6whDPUGLv9WSbxEKoXV',
'xA2ng9OkBcGKzDbTkKsJlx7dUK8R3dA5': 'z2aPnJvzBfObkwGC3vFaPxeBhxoMqZ8K',
'xyKEgBajZuRr6DEC0Kt7XpD1cnNW9gAv': 'g2ddlEBvRsqdgKaI4jUK9PrgfMexGZ23',
'BAogww51jIMa2JnH1BcYpXM5F658RNAL': 'rYWDmm0KptlkGv4FGJFMdZmjs9RDE6XR',
'BAokpg62VtMa2JnH1mHYpXM5F658RNAL': 'rYWryDnlNslkGv4FG4HMdZmjs9RDE62R',
'a4z1Px5e2hzg0m0iMMCPeanGSkRKWXAg': 'LY9eorNQGUdGW9WuKKf5JGeEik7l1NYQ',
'kAx69R58kF9nY5YcdecJdl2pFXP53WyX': 'gXyRxELpbfPvLeLSaRil0mp6UEzbZJ8L',
'BAoY13nwViMa2J2uo2cY6BlETgmdwryL': 'rYWwKzJmNFlkGvGtNoUM9bzwIJVzB1YR',
} }
_MCP_TO_ACCESS_KEY_TABLE = { _MCP_TO_ACCESS_KEY_TABLE = {
@ -189,19 +258,17 @@ def _get_video_json(self, access_key, video_id):
video_data_url += '&X-Anvato-Adst-Auth=' + base64.b64encode(auth_secret).decode('ascii') video_data_url += '&X-Anvato-Adst-Auth=' + base64.b64encode(auth_secret).decode('ascii')
anvrid = md5_text(time.time() * 1000 * random.random())[:30] anvrid = md5_text(time.time() * 1000 * random.random())[:30]
payload = { api = {
'api': { 'anvrid': anvrid,
'anvrid': anvrid, 'anvts': server_time,
'anvstk': md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time,
self._ANVACK_TABLE.get(access_key, self._API_KEY))),
'anvts': server_time,
},
} }
api['anvstk'] = md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time,
self._ANVACK_TABLE.get(access_key, self._API_KEY)))
return self._download_json( return self._download_json(
video_data_url, video_id, transform_source=strip_jsonp, video_data_url, video_id, transform_source=strip_jsonp,
data=json.dumps(payload).encode('utf-8')) data=json.dumps({'api': api}).encode('utf-8'))
def _get_anvato_videos(self, access_key, video_id): def _get_anvato_videos(self, access_key, video_id):
video_data = self._get_video_json(access_key, video_id) video_data = self._get_video_json(access_key, video_id)
@ -259,7 +326,7 @@ def _get_anvato_videos(self, access_key, video_id):
'description': video_data.get('def_description'), 'description': video_data.get('def_description'),
'tags': video_data.get('def_tags', '').split(','), 'tags': video_data.get('def_tags', '').split(','),
'categories': video_data.get('categories'), 'categories': video_data.get('categories'),
'thumbnail': video_data.get('thumbnail'), 'thumbnail': video_data.get('src_image_url') or video_data.get('thumbnail'),
'timestamp': int_or_none(video_data.get( 'timestamp': int_or_none(video_data.get(
'ts_published') or video_data.get('ts_added')), 'ts_published') or video_data.get('ts_added')),
'uploader': video_data.get('mcp_id'), 'uploader': video_data.get('mcp_id'),

View File

@ -0,0 +1,7 @@
from __future__ import unicode_literals
from .nfl import NFLTokenGenerator
__all__ = [
'NFLTokenGenerator',
]

View File

@ -0,0 +1,6 @@
from __future__ import unicode_literals
class TokenGenerator:
def generate(self, anvack, mcp_id):
raise NotImplementedError('This method must be implemented by subclasses')

View File

@ -0,0 +1,30 @@
from __future__ import unicode_literals
import json
from .common import TokenGenerator
class NFLTokenGenerator(TokenGenerator):
_AUTHORIZATION = None
def generate(ie, anvack, mcp_id):
if not NFLTokenGenerator._AUTHORIZATION:
reroute = ie._download_json(
'https://api.nfl.com/v1/reroute', mcp_id,
data=b'grant_type=client_credentials',
headers={'X-Domain-Id': 100})
NFLTokenGenerator._AUTHORIZATION = '%s %s' % (reroute.get('token_type') or 'Bearer', reroute['access_token'])
return ie._download_json(
'https://api.nfl.com/v3/shield/', mcp_id, data=json.dumps({
'query': '''{
viewer {
mediaToken(anvack: "%s", id: %s) {
token
}
}
}''' % (anvack, mcp_id),
}).encode(), headers={
'Authorization': NFLTokenGenerator._AUTHORIZATION,
'Content-Type': 'application/json',
})['data']['viewer']['mediaToken']['token']

View File

@ -3,6 +3,7 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
get_element_by_id,
int_or_none, int_or_none,
merge_dicts, merge_dicts,
mimetype2ext, mimetype2ext,
@ -39,23 +40,15 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, video_id, fatal=False) webpage = self._download_webpage(url, video_id, fatal=False)
if not webpage: if not webpage:
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id, 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id) video_id)
options = self._parse_json( options = self._parse_json(self._search_regex(
self._search_regex( r'options\s*=\s*({.+?})\s*;', webpage, 'options'), video_id)
r'options\s*=\s*JSON\.parse\(\s*(["\'])(?P<value>(?:(?!\1).)+)\1\s*\)',
webpage, 'options', group='value'),
video_id)
player = options['plugins']['sabaPlayerPlugin']
formats = [] formats = []
for sources in player['multiSRC']: for sources in (options.get('multiSRC') or []):
for item in sources: for item in sources:
if not isinstance(item, dict): if not isinstance(item, dict):
continue continue
@ -85,11 +78,12 @@ def _real_extract(self, url):
info = self._search_json_ld(webpage, video_id, default={}) info = self._search_json_ld(webpage, video_id, default={})
if not info.get('title'): if not info.get('title'):
info['title'] = player['title'] info['title'] = get_element_by_id('videoTitle', webpage) or \
self._html_search_meta(['og:title', 'twitter:title', 'DC.Title', 'title'], webpage, fatal=True)
return merge_dicts(info, { return merge_dicts(info, {
'id': video_id, 'id': video_id,
'thumbnail': url_or_none(options.get('poster')), 'thumbnail': url_or_none(options.get('poster')),
'duration': int_or_none(player.get('duration')), 'duration': int_or_none(options.get('duration')),
'formats': formats, 'formats': formats,
}) })

View File

@ -0,0 +1,174 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
int_or_none,
parse_iso8601,
try_get,
)
class ArcPublishingIE(InfoExtractor):
_UUID_REGEX = r'[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12}'
_VALID_URL = r'arcpublishing:(?P<org>[a-z]+):(?P<id>%s)' % _UUID_REGEX
_TESTS = [{
# https://www.adn.com/politics/2020/11/02/video-senate-candidates-campaign-in-anchorage-on-eve-of-election-day/
'url': 'arcpublishing:adn:8c99cb6e-b29c-4bc9-9173-7bf9979225ab',
'only_matching': True,
}, {
# https://www.bostonglobe.com/video/2020/12/30/metro/footage-released-showing-officer-talking-about-striking-protesters-with-car/
'url': 'arcpublishing:bostonglobe:232b7ae6-7d73-432d-bc0a-85dbf0119ab1',
'only_matching': True,
}, {
# https://www.actionnewsjax.com/video/live-stream/
'url': 'arcpublishing:cmg:cfb1cf1b-3ab5-4d1b-86c5-a5515d311f2a',
'only_matching': True,
}, {
# https://elcomercio.pe/videos/deportes/deporte-total-futbol-peruano-seleccion-peruana-la-valorizacion-de-los-peruanos-en-el-exterior-tras-un-2020-atipico-nnav-vr-video-noticia/
'url': 'arcpublishing:elcomercio:27a7e1f8-2ec7-4177-874f-a4feed2885b3',
'only_matching': True,
}, {
# https://www.clickondetroit.com/video/community/2020/05/15/events-surrounding-woodward-dream-cruise-being-canceled/
'url': 'arcpublishing:gmg:c8793fb2-8d44-4242-881e-2db31da2d9fe',
'only_matching': True,
}, {
# https://www.wabi.tv/video/2020/12/30/trenton-company-making-equipment-pfizer-covid-vaccine/
'url': 'arcpublishing:gray:0b0ba30e-032a-4598-8810-901d70e6033e',
'only_matching': True,
}, {
# https://www.lateja.cr/el-mundo/video-china-aprueba-con-condiciones-su-primera/dfcbfa57-527f-45ff-a69b-35fe71054143/video/
'url': 'arcpublishing:gruponacion:dfcbfa57-527f-45ff-a69b-35fe71054143',
'only_matching': True,
}, {
# https://www.fifthdomain.com/video/2018/03/09/is-america-vulnerable-to-a-cyber-attack/
'url': 'arcpublishing:mco:aa0ca6fe-1127-46d4-b32c-be0d6fdb8055',
'only_matching': True,
}, {
# https://www.vl.no/kultur/2020/12/09/en-melding-fra-en-lytter-endret-julelista-til-lewi-bergrud/
'url': 'arcpublishing:mentormedier:47a12084-650b-4011-bfd0-3699b6947b2d',
'only_matching': True,
}, {
# https://www.14news.com/2020/12/30/whiskey-theft-caught-camera-henderson-liquor-store/
'url': 'arcpublishing:raycom:b89f61f8-79fa-4c09-8255-e64237119bf7',
'only_matching': True,
}, {
# https://www.theglobeandmail.com/world/video-ethiopian-woman-who-became-symbol-of-integration-in-italy-killed-on/
'url': 'arcpublishing:tgam:411b34c1-8701-4036-9831-26964711664b',
'only_matching': True,
}, {
# https://www.pilotonline.com/460f2931-8130-4719-8ea1-ffcb2d7cb685-132.html
'url': 'arcpublishing:tronc:460f2931-8130-4719-8ea1-ffcb2d7cb685',
'only_matching': True,
}]
_POWA_DEFAULTS = [
(['cmg', 'prisa'], '%s-config-prod.api.cdn.arcpublishing.com/video'),
([
'adn', 'advancelocal', 'answers', 'bonnier', 'bostonglobe', 'demo',
'gmg', 'gruponacion', 'infobae', 'mco', 'nzme', 'pmn', 'raycom',
'spectator', 'tbt', 'tgam', 'tronc', 'wapo', 'wweek',
], 'video-api-cdn.%s.arcpublishing.com/api'),
]
@staticmethod
def _extract_urls(webpage):
entries = []
# https://arcpublishing.atlassian.net/wiki/spaces/POWA/overview
for powa_el in re.findall(r'(<div[^>]+class="[^"]*\bpowa\b[^"]*"[^>]+data-uuid="%s"[^>]*>)' % ArcPublishingIE._UUID_REGEX, webpage):
powa = extract_attributes(powa_el) or {}
org = powa.get('data-org')
uuid = powa.get('data-uuid')
if org and uuid:
entries.append('arcpublishing:%s:%s' % (org, uuid))
return entries
def _real_extract(self, url):
org, uuid = re.match(self._VALID_URL, url).groups()
for orgs, tmpl in self._POWA_DEFAULTS:
if org in orgs:
base_api_tmpl = tmpl
break
else:
base_api_tmpl = '%s-prod-cdn.video-api.arcpublishing.com/api'
if org == 'wapo':
org = 'washpost'
video = self._download_json(
'https://%s/v1/ansvideos/findByUuid' % (base_api_tmpl % org),
uuid, query={'uuid': uuid})[0]
title = video['headlines']['basic']
is_live = video.get('status') == 'live'
urls = []
formats = []
for s in video.get('streams', []):
s_url = s.get('url')
if not s_url or s_url in urls:
continue
urls.append(s_url)
stream_type = s.get('stream_type')
if stream_type == 'smil':
smil_formats = self._extract_smil_formats(
s_url, uuid, fatal=False)
for f in smil_formats:
if f['url'].endswith('/cfx/st'):
f['app'] = 'cfx/st'
if not f['play_path'].startswith('mp4:'):
f['play_path'] = 'mp4:' + f['play_path']
if isinstance(f['tbr'], float):
f['vbr'] = f['tbr'] * 1000
del f['tbr']
f['format_id'] = 'rtmp-%d' % f['vbr']
formats.extend(smil_formats)
elif stream_type in ('ts', 'hls'):
m3u8_formats = self._extract_m3u8_formats(
s_url, uuid, 'mp4', 'm3u8' if is_live else 'm3u8_native',
m3u8_id='hls', fatal=False)
if all([f.get('acodec') == 'none' for f in m3u8_formats]):
continue
for f in m3u8_formats:
if f.get('acodec') == 'none':
f['preference'] = -40
elif f.get('vcodec') == 'none':
f['preference'] = -50
height = f.get('height')
if not height:
continue
vbr = self._search_regex(
r'[_x]%d[_-](\d+)' % height, f['url'], 'vbr', default=None)
if vbr:
f['vbr'] = int(vbr)
formats.extend(m3u8_formats)
else:
vbr = int_or_none(s.get('bitrate'))
formats.append({
'format_id': '%s-%d' % (stream_type, vbr) if vbr else stream_type,
'vbr': vbr,
'width': int_or_none(s.get('width')),
'height': int_or_none(s.get('height')),
'filesize': int_or_none(s.get('filesize')),
'url': s_url,
'preference': -1,
})
self._sort_formats(
formats, ('preference', 'width', 'height', 'vbr', 'filesize', 'tbr', 'ext', 'format_id'))
subtitles = {}
for subtitle in (try_get(video, lambda x: x['subtitles']['urls'], list) or []):
subtitle_url = subtitle.get('url')
if subtitle_url:
subtitles.setdefault('en', []).append({'url': subtitle_url})
return {
'id': uuid,
'title': self._live_title(title) if is_live else title,
'thumbnail': try_get(video, lambda x: x['promo_image']['url']),
'description': try_get(video, lambda x: x['subheadlines']['basic']),
'formats': formats,
'duration': int_or_none(video.get('duration'), 100),
'timestamp': parse_iso8601(video.get('created_date')),
'subtitles': subtitles,
'is_live': is_live,
}

View File

@ -6,13 +6,11 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
mimetype2ext,
parse_iso8601, parse_iso8601,
strip_jsonp, try_get,
) )
@ -20,22 +18,27 @@ class ArkenaIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
video\.arkena\.com/play2/embed/player\?| video\.(?:arkena|qbrick)\.com/play2/embed/player\?|
play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+) play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)
) )
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411', 'url': 'https://video.qbrick.com/play2/embed/player?accountId=1034090&mediaId=d8ab4607-00090107-aab86310',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365', 'md5': '97f117754e5f3c020f5f26da4a44ebaf',
'info_dict': { 'info_dict': {
'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe', 'id': 'd8ab4607-00090107-aab86310',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Big Buck Bunny', 'title': 'EM_HT20_117_roslund_v2.mp4',
'description': 'Royalty free test video', 'timestamp': 1608285912,
'timestamp': 1432816365, 'upload_date': '20201218',
'upload_date': '20150528', 'duration': 1429.162667,
'is_live': False, 'subtitles': {
'sv': 'count:3',
},
}, },
}, {
'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411',
'only_matching': True,
}, { }, {
'url': 'https://play.arkena.com/config/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411/?callbackMethod=jQuery1111023664739129262213_1469227693893', 'url': 'https://play.arkena.com/config/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411/?callbackMethod=jQuery1111023664739129262213_1469227693893',
'only_matching': True, 'only_matching': True,
@ -72,62 +75,89 @@ def _real_extract(self, url):
if not video_id or not account_id: if not video_id or not account_id:
raise ExtractorError('Invalid URL', expected=True) raise ExtractorError('Invalid URL', expected=True)
playlist = self._download_json( media = self._download_json(
'https://play.arkena.com/config/avp/v2/player/media/%s/0/%s/?callbackMethod=_' 'https://video.qbrick.com/api/v1/public/accounts/%s/medias/%s' % (account_id, video_id),
% (video_id, account_id), video_id, query={
video_id, transform_source=strip_jsonp)['Playlist'][0] # https://video.qbrick.com/docs/api/examples/library-api.html
'fields': 'asset/resources/*/renditions/*(height,id,language,links/*(href,mimeType),type,size,videos/*(audios/*(codec,sampleRate),bitrate,codec,duration,height,width),width),created,metadata/*(title,description),tags',
})
metadata = media.get('metadata') or {}
title = metadata['title']
media_info = playlist['MediaInfo'] duration = None
title = media_info['Title']
media_files = playlist['MediaFiles']
is_live = False
formats = [] formats = []
for kind_case, kind_formats in media_files.items(): thumbnails = []
kind = kind_case.lower() subtitles = {}
for f in kind_formats: for resource in media['asset']['resources']:
f_url = f.get('Url') for rendition in (resource.get('renditions') or []):
if not f_url: rendition_type = rendition.get('type')
continue for i, link in enumerate(rendition.get('links') or []):
is_live = f.get('Live') == 'true' href = link.get('href')
exts = (mimetype2ext(f.get('Type')), determine_ext(f_url, None)) if not href:
if kind == 'm3u8' or 'm3u8' in exts: continue
formats.extend(self._extract_m3u8_formats( if rendition_type == 'image':
f_url, video_id, 'mp4', 'm3u8_native', thumbnails.append({
m3u8_id=kind, fatal=False, live=is_live)) 'filesize': int_or_none(rendition.get('size')),
elif kind == 'flash' or 'f4m' in exts: 'height': int_or_none(rendition.get('height')),
formats.extend(self._extract_f4m_formats( 'id': rendition.get('id'),
f_url, video_id, f4m_id=kind, fatal=False)) 'url': href,
elif kind == 'dash' or 'mpd' in exts: 'width': int_or_none(rendition.get('width')),
formats.extend(self._extract_mpd_formats( })
f_url, video_id, mpd_id=kind, fatal=False)) elif rendition_type == 'subtitle':
elif kind == 'silverlight': subtitles.setdefault(rendition.get('language') or 'en', []).append({
# TODO: process when ism is supported (see 'url': href,
# https://github.com/ytdl-org/youtube-dl/issues/8118) })
continue elif rendition_type == 'video':
else: f = {
tbr = float_or_none(f.get('Bitrate'), 1000) 'filesize': int_or_none(rendition.get('size')),
formats.append({ 'format_id': rendition.get('id'),
'url': f_url, 'url': href,
'format_id': '%s-%d' % (kind, tbr) if tbr else kind, }
'tbr': tbr, video = try_get(rendition, lambda x: x['videos'][i], dict)
}) if video:
if not duration:
duration = float_or_none(video.get('duration'))
f.update({
'height': int_or_none(video.get('height')),
'tbr': int_or_none(video.get('bitrate'), 1000),
'vcodec': video.get('codec'),
'width': int_or_none(video.get('width')),
})
audio = try_get(video, lambda x: x['audios'][0], dict)
if audio:
f.update({
'acodec': audio.get('codec'),
'asr': int_or_none(audio.get('sampleRate')),
})
formats.append(f)
elif rendition_type == 'index':
mime_type = link.get('mimeType')
if mime_type == 'application/smil+xml':
formats.extend(self._extract_smil_formats(
href, video_id, fatal=False))
elif mime_type == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
href, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif mime_type == 'application/hds+xml':
formats.extend(self._extract_f4m_formats(
href, video_id, f4m_id='hds', fatal=False))
elif mime_type == 'application/dash+xml':
formats.extend(self._extract_f4m_formats(
href, video_id, f4m_id='hds', fatal=False))
elif mime_type == 'application/vnd.ms-sstr+xml':
formats.extend(self._extract_ism_formats(
href, video_id, ism_id='mss', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
description = media_info.get('Description')
video_id = media_info.get('VideoId') or video_id
timestamp = parse_iso8601(media_info.get('PublishDate'))
thumbnails = [{
'url': thumbnail['Url'],
'width': int_or_none(thumbnail.get('Size')),
} for thumbnail in (media_info.get('Poster') or []) if thumbnail.get('Url')]
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': metadata.get('description'),
'timestamp': timestamp, 'timestamp': parse_iso8601(media.get('created')),
'is_live': is_live,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'subtitles': subtitles,
'duration': duration,
'tags': media.get('tags'),
'formats': formats, 'formats': formats,
} }

View File

@ -1,27 +1,91 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import functools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .kaltura import KalturaIE from .kaltura import KalturaIE
from ..utils import extract_attributes from ..utils import (
extract_attributes,
int_or_none,
OnDemandPagedList,
parse_age_limit,
strip_or_none,
try_get,
)
class AsianCrushIE(InfoExtractor): class AsianCrushBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))' _VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|(?:cocoro|retrocrush)\.tv))'
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE _KALTURA_KEYS = [
'video_url', 'progressive_url', 'download_url', 'thumbnail_url',
'widescreen_thumbnail_url', 'screencap_widescreen',
]
_API_SUFFIX = {'retrocrush.tv': '-ott'}
def _call_api(self, host, endpoint, video_id, query, resource):
return self._download_json(
'https://api%s.%s/%s' % (self._API_SUFFIX.get(host, ''), host, endpoint), video_id,
'Downloading %s JSON metadata' % resource, query=query,
headers=self.geo_verification_headers())['objects']
def _download_object_data(self, host, object_id, resource):
return self._call_api(
host, 'search', object_id, {'id': object_id}, resource)[0]
def _get_object_description(self, obj):
return strip_or_none(obj.get('long_description') or obj.get('short_description'))
def _parse_video_data(self, video):
title = video['name']
entry_id, partner_id = [None] * 2
for k in self._KALTURA_KEYS:
k_url = video.get(k)
if k_url:
mobj = re.search(r'/p/(\d+)/.+?/entryId/([^/]+)/', k_url)
if mobj:
partner_id, entry_id = mobj.groups()
break
meta_categories = try_get(video, lambda x: x['meta']['categories'], list) or []
categories = list(filter(None, [c.get('name') for c in meta_categories]))
show_info = video.get('show_info') or {}
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': KalturaIE.ie_key(),
'id': entry_id,
'title': title,
'description': self._get_object_description(video),
'age_limit': parse_age_limit(video.get('mpaa_rating') or video.get('tv_rating')),
'categories': categories,
'series': show_info.get('show_name'),
'season_number': int_or_none(show_info.get('season_num')),
'season_id': show_info.get('season_id'),
'episode_number': int_or_none(show_info.get('episode_num')),
}
class AsianCrushIE(AsianCrushBaseIE):
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % AsianCrushBaseIE._VALID_URL_BASE
_TESTS = [{ _TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/', 'url': 'https://www.asiancrush.com/video/004289v/women-who-flirt',
'md5': 'c3b740e48d0ba002a42c0b72857beae6', 'md5': 'c3b740e48d0ba002a42c0b72857beae6',
'info_dict': { 'info_dict': {
'id': '1_y4tmjm5r', 'id': '1_y4tmjm5r',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Women Who Flirt', 'title': 'Women Who Flirt',
'description': 'md5:7e986615808bcfb11756eb503a751487', 'description': 'md5:b65c7e0ae03a85585476a62a186f924c',
'timestamp': 1496936429, 'timestamp': 1496936429,
'upload_date': '20170608', 'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com', 'uploader_id': 'craig@crifkin.com',
'age_limit': 13,
'categories': 'count:5',
'duration': 5812,
}, },
}, { }, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/', 'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
@ -41,67 +105,35 @@ class AsianCrushIE(InfoExtractor):
}, { }, {
'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/', 'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.retrocrush.tv/video/true-tears/012328v-i...gave-away-my-tears',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) host, video_id = re.match(self._VALID_URL, url).groups()
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) if host == 'cocoro.tv':
webpage = self._download_webpage(url, video_id)
entry_id, partner_id, title = [None] * 3 embed_vars = self._parse_json(self._search_regex(
vars = self._parse_json(
self._search_regex(
r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars', r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars',
default='{}'), video_id, fatal=False) default='{}'), video_id, fatal=False) or {}
if vars: video_id = embed_vars.get('entry_id') or video_id
entry_id = vars.get('entry_id')
partner_id = vars.get('partner_id')
title = vars.get('vid_label')
if not entry_id: video = self._download_object_data(host, video_id, 'video')
entry_id = self._search_regex( return self._parse_video_data(video)
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.%s/embeddedVideoPlayer' % host, video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
r'entry_id["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1', player,
'kaltura id', group='id')
if not partner_id:
partner_id = self._search_regex(
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
description = self._html_search_regex(
r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description', fatal=False)
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
'ie_key': KalturaIE.ie_key(),
'id': video_id,
'title': title,
'description': description,
}
class AsianCrushPlaylistIE(InfoExtractor): class AsianCrushPlaylistIE(AsianCrushBaseIE):
_VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE _VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushBaseIE._VALID_URL_BASE
_TESTS = [{ _TESTS = [{
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/', 'url': 'https://www.asiancrush.com/series/006447s/fruity-samurai',
'info_dict': { 'info_dict': {
'id': '12481', 'id': '6447',
'title': 'Scholar Who Walks the Night', 'title': 'Fruity Samurai',
'description': 'md5:7addd7c5132a09fd4741152d96cce886', 'description': 'md5:7535174487e4a202d3872a7fc8f2f154',
}, },
'playlist_count': 20, 'playlist_count': 13,
}, { }, {
'url': 'https://www.yuyutv.com/series/013920s/peep-show/', 'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
'only_matching': True, 'only_matching': True,
@ -111,35 +143,58 @@ class AsianCrushPlaylistIE(InfoExtractor):
}, { }, {
'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/', 'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.retrocrush.tv/series/012355s/true-tears',
'only_matching': True,
}] }]
_PAGE_SIZE = 1000000000
def _fetch_page(self, domain, parent_id, page):
videos = self._call_api(
domain, 'getreferencedobjects', parent_id, {
'max': self._PAGE_SIZE,
'object_type': 'video',
'parent_id': parent_id,
'start': page * self._PAGE_SIZE,
}, 'page %d' % (page + 1))
for video in videos:
yield self._parse_video_data(video)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) host, playlist_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, playlist_id) if host == 'cocoro.tv':
webpage = self._download_webpage(url, playlist_id)
entries = [] entries = []
for mobj in re.finditer( for mobj in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL, r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL,
webpage): webpage):
attrs = extract_attributes(mobj.group(0)) attrs = extract_attributes(mobj.group(0))
if attrs.get('class') == 'clearfix': if attrs.get('class') == 'clearfix':
entries.append(self.url_result( entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key())) mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = self._html_search_regex( title = self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage, r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title( 'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta( webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title', 'twitter:title', webpage, 'title',
default=None) or self._search_regex( default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False) r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
if title: if title:
title = re.sub(r'\s*\|\s*.+?$', '', title) title = re.sub(r'\s*\|\s*.+?$', '', title)
description = self._og_search_description( description = self._og_search_description(
webpage, default=None) or self._html_search_meta( webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, 'description', fatal=False) 'twitter:description', webpage, 'description', fatal=False)
else:
show = self._download_object_data(host, playlist_id, 'show')
title = show.get('name')
description = self._get_object_description(show)
entries = OnDemandPagedList(
functools.partial(self._fetch_page, host, playlist_id),
self._PAGE_SIZE)
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(entries, playlist_id, title, description)

View File

@ -49,22 +49,17 @@ class BBCCoUkIE(InfoExtractor):
_LOGIN_URL = 'https://account.bbc.com/signin' _LOGIN_URL = 'https://account.bbc.com/signin'
_NETRC_MACHINE = 'bbc' _NETRC_MACHINE = 'bbc'
_MEDIASELECTOR_URLS = [ _MEDIA_SELECTOR_URL_TEMPL = 'https://open.live.bbc.co.uk/mediaselector/6/select/version/2.0/mediaset/%s/vpid/%s'
_MEDIA_SETS = [
# Provides HQ HLS streams with even better quality that pc mediaset but fails # Provides HQ HLS streams with even better quality that pc mediaset but fails
# with geolocation in some cases when it's even not geo restricted at all (e.g. # with geolocation in some cases when it's even not geo restricted at all (e.g.
# http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable. # http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable.
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s', 'iptv-all',
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s', 'pc',
] ]
_MEDIASELECTION_NS = 'http://bbc.co.uk/2008/mp/mediaselection'
_EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist' _EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist'
_NAMESPACES = (
_MEDIASELECTION_NS,
_EMP_PLAYLIST_NS,
)
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.bbc.co.uk/programmes/b039g8p7', 'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
@ -261,8 +256,6 @@ class BBCCoUkIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
def _login(self): def _login(self):
username, password = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
@ -307,22 +300,14 @@ def _extract_asx_playlist(self, connection, programme_id):
def _extract_items(self, playlist): def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS) return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
def _findall_ns(self, element, xpath):
elements = []
for ns in self._NAMESPACES:
elements.extend(element.findall(xpath % ns))
return elements
def _extract_medias(self, media_selection): def _extract_medias(self, media_selection):
error = media_selection.find('./{%s}error' % self._MEDIASELECTION_NS) error = media_selection.get('result')
if error is None: if error:
media_selection.find('./{%s}error' % self._EMP_PLAYLIST_NS) raise BBCCoUkIE.MediaSelectionError(error)
if error is not None: return media_selection.get('media') or []
raise BBCCoUkIE.MediaSelectionError(error.get('id'))
return self._findall_ns(media_selection, './{%s}media')
def _extract_connections(self, media): def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection') return media.get('connection') or []
def _get_subtitles(self, media, programme_id): def _get_subtitles(self, media, programme_id):
subtitles = {} subtitles = {}
@ -334,13 +319,13 @@ def _get_subtitles(self, media, programme_id):
cc_url, programme_id, 'Downloading captions', fatal=False) cc_url, programme_id, 'Downloading captions', fatal=False)
if not isinstance(captions, compat_etree_Element): if not isinstance(captions, compat_etree_Element):
continue continue
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en') subtitles['en'] = [
subtitles[lang] = [
{ {
'url': connection.get('href'), 'url': connection.get('href'),
'ext': 'ttml', 'ext': 'ttml',
}, },
] ]
break
return subtitles return subtitles
def _raise_extractor_error(self, media_selection_error): def _raise_extractor_error(self, media_selection_error):
@ -350,10 +335,10 @@ def _raise_extractor_error(self, media_selection_error):
def _download_media_selector(self, programme_id): def _download_media_selector(self, programme_id):
last_exception = None last_exception = None
for mediaselector_url in self._MEDIASELECTOR_URLS: for media_set in self._MEDIA_SETS:
try: try:
return self._download_media_selector_url( return self._download_media_selector_url(
mediaselector_url % programme_id, programme_id) self._MEDIA_SELECTOR_URL_TEMPL % (media_set, programme_id), programme_id)
except BBCCoUkIE.MediaSelectionError as e: except BBCCoUkIE.MediaSelectionError as e:
if e.id in ('notukerror', 'geolocation', 'selectionunavailable'): if e.id in ('notukerror', 'geolocation', 'selectionunavailable'):
last_exception = e last_exception = e
@ -362,8 +347,8 @@ def _download_media_selector(self, programme_id):
self._raise_extractor_error(last_exception) self._raise_extractor_error(last_exception)
def _download_media_selector_url(self, url, programme_id=None): def _download_media_selector_url(self, url, programme_id=None):
media_selection = self._download_xml( media_selection = self._download_json(
url, programme_id, 'Downloading media selection XML', url, programme_id, 'Downloading media selection JSON',
expected_status=(403, 404)) expected_status=(403, 404))
return self._process_media_selector(media_selection, programme_id) return self._process_media_selector(media_selection, programme_id)
@ -377,7 +362,6 @@ def _process_media_selector(self, media_selection, programme_id):
if kind in ('video', 'audio'): if kind in ('video', 'audio'):
bitrate = int_or_none(media.get('bitrate')) bitrate = int_or_none(media.get('bitrate'))
encoding = media.get('encoding') encoding = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width')) width = int_or_none(media.get('width'))
height = int_or_none(media.get('height')) height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size')) file_size = int_or_none(media.get('media_file_size'))
@ -392,8 +376,6 @@ def _process_media_selector(self, media_selection, programme_id):
supplier = connection.get('supplier') supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat') transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist # ASX playlist
if supplier == 'asx': if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)): for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
@ -408,20 +390,11 @@ def _process_media_selector(self, media_selection, programme_id):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native', href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)) m3u8_id=format_id, fatal=False))
if re.search(self._USP_RE, href):
usp_formats = self._extract_m3u8_formats(
re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href),
programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for f in usp_formats:
if f.get('height') and f['height'] > 720:
continue
formats.append(f)
elif transfer_format == 'hds': elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False)) href, programme_id, f4m_id=format_id, fatal=False))
else: else:
if not service and not supplier and bitrate: if not supplier and bitrate:
format_id += '-%d' % bitrate format_id += '-%d' % bitrate
fmt = { fmt = {
'format_id': format_id, 'format_id': format_id,
@ -554,7 +527,7 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, group_id, 'Downloading video page') webpage = self._download_webpage(url, group_id, 'Downloading video page')
error = self._search_regex( error = self._search_regex(
r'<div\b[^>]+\bclass=["\']smp__message delta["\'][^>]*>([^<]+)<', r'<div\b[^>]+\bclass=["\'](?:smp|playout)__message delta["\'][^>]*>\s*([^<]+?)\s*<',
webpage, 'error', default=None) webpage, 'error', default=None)
if error: if error:
raise ExtractorError(error, expected=True) raise ExtractorError(error, expected=True)
@ -607,16 +580,9 @@ class BBCIE(BBCCoUkIE):
IE_DESC = 'BBC' IE_DESC = 'BBC'
_VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)' _VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)'
_MEDIASELECTOR_URLS = [ _MEDIA_SETS = [
# Provides HQ HLS streams but fails with geolocation in some cases when it's 'mobile-tablet-main',
# even not geo restricted at all 'pc',
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
# Provides more formats, namely direct mp4 links, but fails on some videos with
# notukerror for non UK (?) users (e.g.
# http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
'http://open.live.bbc.co.uk/mediaselector/4/mtis/stream/%s',
# Provides fewer formats, but works everywhere for everybody (hopefully)
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/journalism-pc/vpid/%s',
] ]
_TESTS = [{ _TESTS = [{

View File

@ -1,194 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
compat_str,
float_or_none,
int_or_none,
parse_iso8601,
try_get,
urljoin,
)
class BeamProBaseIE(InfoExtractor):
_API_BASE = 'https://mixer.com/api/v1'
_RATINGS = {'family': 0, 'teen': 13, '18+': 18}
def _extract_channel_info(self, chan):
user_id = chan.get('userId') or try_get(chan, lambda x: x['user']['id'])
return {
'uploader': chan.get('token') or try_get(
chan, lambda x: x['user']['username'], compat_str),
'uploader_id': compat_str(user_id) if user_id else None,
'age_limit': self._RATINGS.get(chan.get('audience')),
}
class BeamProLiveIE(BeamProBaseIE):
IE_NAME = 'Mixer:live'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://mixer.com/niterhayven',
'info_dict': {
'id': '261562',
'ext': 'mp4',
'title': 'Introducing The Witcher 3 // The Grind Starts Now!',
'description': 'md5:0b161ac080f15fe05d18a07adb44a74d',
'thumbnail': r're:https://.*\.jpg$',
'timestamp': 1483477281,
'upload_date': '20170103',
'uploader': 'niterhayven',
'uploader_id': '373396',
'age_limit': 18,
'is_live': True,
'view_count': int,
},
'skip': 'niterhayven is offline',
'params': {
'skip_download': True,
},
}
_MANIFEST_URL_TEMPLATE = '%s/channels/%%s/manifest.%%s' % BeamProBaseIE._API_BASE
@classmethod
def suitable(cls, url):
return False if BeamProVodIE.suitable(url) else super(BeamProLiveIE, cls).suitable(url)
def _real_extract(self, url):
channel_name = self._match_id(url)
chan = self._download_json(
'%s/channels/%s' % (self._API_BASE, channel_name), channel_name)
if chan.get('online') is False:
raise ExtractorError(
'{0} is offline'.format(channel_name), expected=True)
channel_id = chan['id']
def manifest_url(kind):
return self._MANIFEST_URL_TEMPLATE % (channel_id, kind)
formats = self._extract_m3u8_formats(
manifest_url('m3u8'), channel_name, ext='mp4', m3u8_id='hls',
fatal=False)
formats.extend(self._extract_smil_formats(
manifest_url('smil'), channel_name, fatal=False))
self._sort_formats(formats)
info = {
'id': compat_str(chan.get('id') or channel_name),
'title': self._live_title(chan.get('name') or channel_name),
'description': clean_html(chan.get('description')),
'thumbnail': try_get(
chan, lambda x: x['thumbnail']['url'], compat_str),
'timestamp': parse_iso8601(chan.get('updatedAt')),
'is_live': True,
'view_count': int_or_none(chan.get('viewersTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(chan))
return info
class BeamProVodIE(BeamProBaseIE):
IE_NAME = 'Mixer:vod'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>[^?#&]+)'
_TESTS = [{
'url': 'https://mixer.com/willow8714?vod=2259830',
'md5': 'b2431e6e8347dc92ebafb565d368b76b',
'info_dict': {
'id': '2259830',
'ext': 'mp4',
'title': 'willow8714\'s Channel',
'duration': 6828.15,
'thumbnail': r're:https://.*source\.png$',
'timestamp': 1494046474,
'upload_date': '20170506',
'uploader': 'willow8714',
'uploader_id': '6085379',
'age_limit': 13,
'view_count': int,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw',
'only_matching': True,
}, {
'url': 'https://mixer.com/streamer?vod=Rh3LY0VAqkGpEQUe2pN-ig',
'only_matching': True,
}]
@staticmethod
def _extract_format(vod, vod_type):
if not vod.get('baseUrl'):
return []
if vod_type == 'hls':
filename, protocol = 'manifest.m3u8', 'm3u8_native'
elif vod_type == 'raw':
filename, protocol = 'source.mp4', 'https'
else:
assert False
data = vod.get('data') if isinstance(vod.get('data'), dict) else {}
format_id = [vod_type]
if isinstance(data.get('Height'), compat_str):
format_id.append('%sp' % data['Height'])
return [{
'url': urljoin(vod['baseUrl'], filename),
'format_id': '-'.join(format_id),
'ext': 'mp4',
'protocol': protocol,
'width': int_or_none(data.get('Width')),
'height': int_or_none(data.get('Height')),
'fps': int_or_none(data.get('Fps')),
'tbr': int_or_none(data.get('Bitrate'), 1000),
}]
def _real_extract(self, url):
vod_id = self._match_id(url)
vod_info = self._download_json(
'%s/recordings/%s' % (self._API_BASE, vod_id), vod_id)
state = vod_info.get('state')
if state != 'AVAILABLE':
raise ExtractorError(
'VOD %s is not available (state: %s)' % (vod_id, state),
expected=True)
formats = []
thumbnail_url = None
for vod in vod_info['vods']:
vod_type = vod.get('format')
if vod_type in ('hls', 'raw'):
formats.extend(self._extract_format(vod, vod_type))
elif vod_type == 'thumbnail':
thumbnail_url = urljoin(vod.get('baseUrl'), 'source.png')
self._sort_formats(formats)
info = {
'id': vod_id,
'title': vod_info.get('name') or vod_id,
'duration': float_or_none(vod_info.get('duration')),
'thumbnail': thumbnail_url,
'timestamp': parse_iso8601(vod_info.get('createdAt')),
'view_count': int_or_none(vod_info.get('viewsTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(vod_info.get('channel') or {}))
return info

View File

@ -0,0 +1,60 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
try_get,
urlencode_postdata,
)
class BongaCamsIE(InfoExtractor):
_VALID_URL = r'https?://(?P<host>(?:[^/]+\.)?bongacams\d*\.com)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://de.bongacams.com/azumi-8',
'only_matching': True,
}, {
'url': 'https://cn.bongacams.com/azumi-8',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
channel_id = mobj.group('id')
amf = self._download_json(
'https://%s/tools/amf.php' % host, channel_id,
data=urlencode_postdata((
('method', 'getRoomData'),
('args[]', channel_id),
('args[]', 'false'),
)), headers={'X-Requested-With': 'XMLHttpRequest'})
server_url = amf['localData']['videoServerUrl']
uploader_id = try_get(
amf, lambda x: x['performerData']['username'], compat_str) or channel_id
uploader = try_get(
amf, lambda x: x['performerData']['displayName'], compat_str)
like_count = int_or_none(try_get(
amf, lambda x: x['performerData']['loversCount']))
formats = self._extract_m3u8_formats(
'%s/hls/stream_%s/playlist.m3u8' % (server_url, uploader_id),
channel_id, 'mp4', m3u8_id='hls', live=True)
self._sort_formats(formats)
return {
'id': channel_id,
'title': self._live_title(uploader or uploader_id),
'uploader': uploader,
'uploader_id': uploader_id,
'like_count': like_count,
'age_limit': 18,
'is_live': True,
'formats': formats,
}

View File

@ -28,6 +28,7 @@
parse_iso8601, parse_iso8601,
smuggle_url, smuggle_url,
str_or_none, str_or_none,
try_get,
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
UnsupportedError, UnsupportedError,
@ -470,18 +471,18 @@ def _extract_urls(ie, webpage):
def _parse_brightcove_metadata(self, json_data, video_id, headers={}): def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
title = json_data['name'].strip() title = json_data['name'].strip()
num_drm_sources = 0
formats = [] formats = []
sources_num = len(json_data.get('sources')) sources = json_data.get('sources') or []
key_systems_present = 0 for source in sources:
for source in json_data.get('sources', []):
container = source.get('container') container = source.get('container')
ext = mimetype2ext(source.get('type')) ext = mimetype2ext(source.get('type'))
src = source.get('src') src = source.get('src')
# https://apis.support.brightcove.com/playback/references/playback-api-video-fields-reference.html # https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object
if source.get('key_systems'): if container == 'WVM' or source.get('key_systems'):
key_systems_present += 1 num_drm_sources += 1
continue continue
elif ext == 'ism' or container == 'WVM': elif ext == 'ism':
continue continue
elif ext == 'm3u8' or container == 'M2TS': elif ext == 'm3u8' or container == 'M2TS':
if not src: if not src:
@ -539,23 +540,14 @@ def build_format_id(kind):
}) })
formats.append(f) formats.append(f)
if sources_num == key_systems_present:
raise ExtractorError('This video is DRM protected', expected=True)
if not formats: if not formats:
# for sonyliv.com DRM protected videos errors = json_data.get('errors')
s3_source_url = json_data.get('custom_fields', {}).get('s3sourceurl') if errors:
if s3_source_url: error = errors[0]
formats.append({ raise ExtractorError(
'url': s3_source_url, error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
'format_id': 'source', if sources and num_drm_sources == len(sources):
}) raise ExtractorError('This video is DRM protected.', expected=True)
errors = json_data.get('errors')
if not formats and errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
self._sort_formats(formats) self._sort_formats(formats)
@ -609,24 +601,27 @@ def _real_extract(self, url):
store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x) store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
def extract_policy_key(): def extract_policy_key():
webpage = self._download_webpage( base_url = 'http://players.brightcove.net/%s/%s_%s/' % (account_id, player_id, embed)
'http://players.brightcove.net/%s/%s_%s/index.min.js' config = self._download_json(
% (account_id, player_id, embed), video_id) base_url + 'config.json', video_id, fatal=False) or {}
policy_key = try_get(
policy_key = None config, lambda x: x['video_cloud']['policy_key'])
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key: if not policy_key:
policy_key = self._search_regex( webpage = self._download_webpage(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1', base_url + 'index.min.js', video_id)
webpage, 'policy key', group='pk')
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
store_pk(policy_key) store_pk(policy_key)
return policy_key return policy_key

View File

@ -11,7 +11,47 @@
class CBSLocalIE(AnvatoIE): class CBSLocalIE(AnvatoIE):
_VALID_URL = r'https?://[a-z]+\.cbslocal\.com/(?:\d+/\d+/\d+|video)/(?P<id>[0-9a-z-]+)' _VALID_URL_BASE = r'https?://[a-z]+\.cbslocal\.com/'
_VALID_URL = _VALID_URL_BASE + r'video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
mcp_id = self._match_id(url)
return self.url_result(
'anvato:anvato_cbslocal_app_web_prod_547f3e49241ef0e5d30c79b2efbca5d92c698f67:' + mcp_id, 'Anvato', mcp_id)
class CBSLocalArticleIE(AnvatoIE):
_VALID_URL = CBSLocalIE._VALID_URL_BASE + r'\d+/\d+/\d+/(?P<id>[0-9a-z-]+)'
_TESTS = [{ _TESTS = [{
# Anvato backend # Anvato backend
@ -52,31 +92,6 @@ class CBSLocalIE(AnvatoIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -96,7 +96,10 @@ def _real_extract(self, url):
config['data_src'] % path, page_title, { config['data_src'] % path, page_title, {
'default': { 'default': {
'media_src': config['media_src'], 'media_src': config['media_src'],
} },
'f4m': {
'host': 'cnn-vh.akamaihd.net',
},
}) })

View File

@ -32,6 +32,7 @@
compat_urlparse, compat_urlparse,
compat_xml_parse_error, compat_xml_parse_error,
) )
from ..downloader import FileDownloader
from ..downloader.f4m import ( from ..downloader.f4m import (
get_base_url, get_base_url,
remove_encrypted_media, remove_encrypted_media,
@ -336,8 +337,8 @@ class InfoExtractor(object):
object, each element of which is a valid dictionary by this specification. object, each element of which is a valid dictionary by this specification.
Additionally, playlists can have "id", "title", "description", "uploader", Additionally, playlists can have "id", "title", "description", "uploader",
"uploader_id", "uploader_url" attributes with the same semantics as videos "uploader_id", "uploader_url", "duration" attributes with the same semantics
(see above). as videos (see above).
_type "multi_video" indicates that there are multiple videos that _type "multi_video" indicates that there are multiple videos that
@ -1237,8 +1238,16 @@ def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
'ViewAction': 'view', 'ViewAction': 'view',
} }
def extract_interaction_type(e):
interaction_type = e.get('interactionType')
if isinstance(interaction_type, dict):
interaction_type = interaction_type.get('@type')
return str_or_none(interaction_type)
def extract_interaction_statistic(e): def extract_interaction_statistic(e):
interaction_statistic = e.get('interactionStatistic') interaction_statistic = e.get('interactionStatistic')
if isinstance(interaction_statistic, dict):
interaction_statistic = [interaction_statistic]
if not isinstance(interaction_statistic, list): if not isinstance(interaction_statistic, list):
return return
for is_e in interaction_statistic: for is_e in interaction_statistic:
@ -1246,8 +1255,8 @@ def extract_interaction_statistic(e):
continue continue
if is_e.get('@type') != 'InteractionCounter': if is_e.get('@type') != 'InteractionCounter':
continue continue
interaction_type = is_e.get('interactionType') interaction_type = extract_interaction_type(is_e)
if not isinstance(interaction_type, compat_str): if not interaction_type:
continue continue
# For interaction count some sites provide string instead of # For interaction count some sites provide string instead of
# an integer (as per spec) with non digit characters (e.g. ",") # an integer (as per spec) with non digit characters (e.g. ",")
@ -1354,81 +1363,270 @@ def _form_hidden_inputs(self, form_id, html):
html, '%s form' % form_id, group='form') html, '%s form' % form_id, group='form')
return self._hidden_inputs(form) return self._hidden_inputs(form)
def _sort_formats(self, formats, field_preference=None): class FormatSort:
regex = r' *((?P<reverse>\+)?(?P<field>[a-zA-Z0-9_]+)((?P<seperator>[~:])(?P<limit>.*?))?)? *$'
default = ('hidden', 'has_video', 'extractor', 'lang', 'quality',
'res', 'fps', 'codec', 'size', 'br', 'asr',
'proto', 'ext', 'has_audio', 'source', 'format_id')
settings = {
'vcodec': {'type': 'ordered', 'regex': True,
'order': ['vp9', '(h265|he?vc?)', '(h264|avc)', 'vp8', '(mp4v|h263)', 'theora', '', None, 'none']},
'acodec': {'type': 'ordered', 'regex': True,
'order': ['opus', 'vorbis', 'aac', 'mp?4a?', 'mp3', 'e?a?c-?3', 'dts', '', None, 'none']},
'protocol': {'type': 'ordered', 'regex': True,
'order': ['(ht|f)tps', '(ht|f)tp$', 'm3u8.+', 'm3u8', '.*dash', '', 'mms|rtsp', 'none', 'f4']},
'vext': {'type': 'ordered', 'field': 'video_ext',
'order': ('mp4', 'webm', 'flv', '', 'none'),
'order_free': ('webm', 'mp4', 'flv', '', 'none')},
'aext': {'type': 'ordered', 'field': 'audio_ext',
'order': ('m4a', 'aac', 'mp3', 'ogg', 'opus', 'webm', '', 'none'),
'order_free': ('opus', 'ogg', 'webm', 'm4a', 'mp3', 'aac', '', 'none')},
'hidden': {'visible': False, 'forced': True, 'type': 'extractor', 'max': -1000},
'extractor_preference': {'priority': True, 'type': 'extractor'},
'has_video': {'priority': True, 'field': 'vcodec', 'type': 'boolean', 'not_in_list': ('none',)},
'has_audio': {'field': 'acodec', 'type': 'boolean', 'not_in_list': ('none',)},
'language_preference': {'priority': True, 'convert': 'ignore'},
'quality': {'priority': True, 'convert': 'float_none'},
'filesize': {'convert': 'bytes'},
'filesize_approx': {'convert': 'bytes'},
'format_id': {'convert': 'string'},
'height': {'convert': 'float_none'},
'width': {'convert': 'float_none'},
'fps': {'convert': 'float_none'},
'tbr': {'convert': 'float_none'},
'vbr': {'convert': 'float_none'},
'abr': {'convert': 'float_none'},
'asr': {'convert': 'float_none'},
'source_preference': {'convert': 'ignore'},
'codec': {'type': 'combined', 'field': ('vcodec', 'acodec')},
'bitrate': {'type': 'combined', 'field': ('tbr', 'vbr', 'abr'), 'same_limit': True},
'filesize_estimate': {'type': 'combined', 'same_limit': True, 'field': ('filesize', 'filesize_approx')},
'extension': {'type': 'combined', 'field': ('vext', 'aext')},
'dimension': {'type': 'multiple', 'field': ('height', 'width'), 'function': min}, # not named as 'resolution' because such a field exists
'res': {'type': 'alias', 'field': 'dimension'},
'ext': {'type': 'alias', 'field': 'extension'},
'br': {'type': 'alias', 'field': 'bitrate'},
'total_bitrate': {'type': 'alias', 'field': 'tbr'},
'video_bitrate': {'type': 'alias', 'field': 'vbr'},
'audio_bitrate': {'type': 'alias', 'field': 'abr'},
'framerate': {'type': 'alias', 'field': 'fps'},
'lang': {'type': 'alias', 'field': 'language_preference'}, # not named as 'language' because such a field exists
'proto': {'type': 'alias', 'field': 'protocol'},
'source': {'type': 'alias', 'field': 'source_preference'},
'size': {'type': 'alias', 'field': 'filesize_estimate'},
'samplerate': {'type': 'alias', 'field': 'asr'},
'video_ext': {'type': 'alias', 'field': 'vext'},
'audio_ext': {'type': 'alias', 'field': 'aext'},
'video_codec': {'type': 'alias', 'field': 'vcodec'},
'audio_codec': {'type': 'alias', 'field': 'acodec'},
'video': {'type': 'alias', 'field': 'has_video'},
'audio': {'type': 'alias', 'field': 'has_audio'},
'extractor': {'type': 'alias', 'field': 'extractor_preference'},
'preference': {'type': 'alias', 'field': 'extractor_preference'}}
_order = []
def _get_field_setting(self, field, key):
if field not in self.settings:
self.settings[field] = {}
propObj = self.settings[field]
if key not in propObj:
type = propObj.get('type')
if key == 'field':
default = 'preference' if type == 'extractor' else (field,) if type in ('combined', 'multiple') else field
elif key == 'convert':
default = 'order' if type == 'ordered' else 'float_string' if field else 'ignore'
else:
default = {'type': 'field', 'visible': True, 'order': [], 'not_in_list': (None,), 'function': max}.get(key, None)
propObj[key] = default
return propObj[key]
def _resolve_field_value(self, field, value, convertNone=False):
if value is None:
if not convertNone:
return None
else:
value = value.lower()
conversion = self._get_field_setting(field, 'convert')
if conversion == 'ignore':
return None
if conversion == 'string':
return value
elif conversion == 'float_none':
return float_or_none(value)
elif conversion == 'bytes':
return FileDownloader.parse_bytes(value)
elif conversion == 'order':
order_free = self._get_field_setting(field, 'order_free')
order_list = order_free if order_free and self._use_free_order else self._get_field_setting(field, 'order')
use_regex = self._get_field_setting(field, 'regex')
list_length = len(order_list)
empty_pos = order_list.index('') if '' in order_list else list_length + 1
if use_regex and value is not None:
for (i, regex) in enumerate(order_list):
if regex and re.match(regex, value):
return list_length - i
return list_length - empty_pos # not in list
else: # not regex or value = None
return list_length - (order_list.index(value) if value in order_list else empty_pos)
else:
if value.isnumeric():
return float(value)
else:
self.settings[field]['convert'] = 'string'
return value
def evaluate_params(self, params, sort_extractor):
self._use_free_order = params.get('prefer_free_formats', False)
self._sort_user = params.get('format_sort', [])
self._sort_extractor = sort_extractor
def add_item(field, reverse, closest, limit_text):
field = field.lower()
if field in self._order:
return
self._order.append(field)
limit = self._resolve_field_value(field, limit_text)
data = {
'reverse': reverse,
'closest': False if limit is None else closest,
'limit_text': limit_text,
'limit': limit}
if field in self.settings:
self.settings[field].update(data)
else:
self.settings[field] = data
sort_list = (
tuple(field for field in self.default if self._get_field_setting(field, 'forced'))
+ (tuple() if params.get('format_sort_force', False)
else tuple(field for field in self.default if self._get_field_setting(field, 'priority')))
+ tuple(self._sort_user) + tuple(sort_extractor) + self.default)
for item in sort_list:
match = re.match(self.regex, item)
if match is None:
raise ExtractorError('Invalid format sort string "%s" given by extractor' % item)
field = match.group('field')
if field is None:
continue
if self._get_field_setting(field, 'type') == 'alias':
field = self._get_field_setting(field, 'field')
reverse = match.group('reverse') is not None
closest = match.group('seperator') == '~'
limit_text = match.group('limit')
has_limit = limit_text is not None
has_multiple_fields = self._get_field_setting(field, 'type') == 'combined'
has_multiple_limits = has_limit and has_multiple_fields and not self._get_field_setting(field, 'same_limit')
fields = self._get_field_setting(field, 'field') if has_multiple_fields else (field,)
limits = limit_text.split(":") if has_multiple_limits else (limit_text,) if has_limit else tuple()
limit_count = len(limits)
for (i, f) in enumerate(fields):
add_item(f, reverse, closest,
limits[i] if i < limit_count
else limits[0] if has_limit and not has_multiple_limits
else None)
def print_verbose_info(self, to_screen):
to_screen('[debug] Sort order given by user: %s' % ','.join(self._sort_user))
if self._sort_extractor:
to_screen('[debug] Sort order given by extractor: %s' % ','.join(self._sort_extractor))
to_screen('[debug] Formats sorted by: %s' % ', '.join(['%s%s%s' % (
'+' if self._get_field_setting(field, 'reverse') else '', field,
'%s%s(%s)' % ('~' if self._get_field_setting(field, 'closest') else ':',
self._get_field_setting(field, 'limit_text'),
self._get_field_setting(field, 'limit'))
if self._get_field_setting(field, 'limit_text') is not None else '')
for field in self._order if self._get_field_setting(field, 'visible')]))
def _calculate_field_preference_from_value(self, format, field, type, value):
reverse = self._get_field_setting(field, 'reverse')
closest = self._get_field_setting(field, 'closest')
limit = self._get_field_setting(field, 'limit')
if type == 'extractor':
maximum = self._get_field_setting(field, 'max')
if value is None or (maximum is not None and value >= maximum):
value = 0
elif type == 'boolean':
in_list = self._get_field_setting(field, 'in_list')
not_in_list = self._get_field_setting(field, 'not_in_list')
value = 0 if ((in_list is None or value in in_list) and (not_in_list is None or value not in not_in_list)) else -1
elif type == 'ordered':
value = self._resolve_field_value(field, value, True)
# try to convert to number
val_num = float_or_none(value)
is_num = self._get_field_setting(field, 'convert') != 'string' and val_num is not None
if is_num:
value = val_num
return ((-10, 0) if value is None
else (1, value, 0) if not is_num # if a field has mixed strings and numbers, strings are sorted higher
else (0, -abs(value - limit), value - limit if reverse else limit - value) if closest
else (0, value, 0) if not reverse and (limit is None or value <= limit)
else (0, -value, 0) if limit is None or (reverse and value == limit) or value > limit
else (-1, value, 0))
def _calculate_field_preference(self, format, field):
type = self._get_field_setting(field, 'type') # extractor, boolean, ordered, field, multiple
get_value = lambda f: format.get(self._get_field_setting(f, 'field'))
if type == 'multiple':
type = 'field' # Only 'field' is allowed in multiple for now
actual_fields = self._get_field_setting(field, 'field')
def wrapped_function(values):
values = tuple(filter(lambda x: x is not None, values))
return (self._get_field_setting(field, 'function')(*values) if len(values) > 1
else values[0] if values
else None)
value = wrapped_function((get_value(f) for f in actual_fields))
else:
value = get_value(field)
return self._calculate_field_preference_from_value(format, field, type, value)
def calculate_preference(self, format):
# Determine missing protocol
if not format.get('protocol'):
format['protocol'] = determine_protocol(format)
# Determine missing ext
if not format.get('ext') and 'url' in format:
format['ext'] = determine_ext(format['url'])
if format.get('vcodec') == 'none':
format['audio_ext'] = format['ext']
format['video_ext'] = 'none'
else:
format['video_ext'] = format['ext']
format['audio_ext'] = 'none'
# if format.get('preference') is None and format.get('ext') in ('f4f', 'f4m'): # Not supported?
# format['preference'] = -1000
# Determine missing bitrates
if format.get('tbr') is None:
if format.get('vbr') is not None and format.get('abr') is not None:
format['tbr'] = format.get('vbr', 0) + format.get('abr', 0)
else:
if format.get('vcodec') != "none" and format.get('vbr') is None:
format['vbr'] = format.get('tbr') - format.get('abr', 0)
if format.get('acodec') != "none" and format.get('abr') is None:
format['abr'] = format.get('tbr') - format.get('vbr', 0)
return tuple(self._calculate_field_preference(format, field) for field in self._order)
def _sort_formats(self, formats, field_preference=[]):
if not formats: if not formats:
raise ExtractorError('No video formats found') raise ExtractorError('No video formats found')
format_sort = self.FormatSort() # params and to_screen are taken from the downloader
for f in formats: format_sort.evaluate_params(self._downloader.params, field_preference)
# Automatically determine tbr when missing based on abr and vbr (improves if self._downloader.params.get('verbose', False):
# formats sorting in some cases) format_sort.print_verbose_info(self._downloader.to_screen)
if 'tbr' not in f and f.get('abr') is not None and f.get('vbr') is not None: formats.sort(key=lambda f: format_sort.calculate_preference(f))
f['tbr'] = f['abr'] + f['vbr']
def _formats_key(f):
# TODO remove the following workaround
from ..utils import determine_ext
if not f.get('ext') and 'url' in f:
f['ext'] = determine_ext(f['url'])
if isinstance(field_preference, (list, tuple)):
return tuple(
f.get(field)
if f.get(field) is not None
else ('' if field == 'format_id' else -1)
for field in field_preference)
preference = f.get('preference')
if preference is None:
preference = 0
if f.get('ext') in ['f4f', 'f4m']: # Not yet supported
preference -= 0.5
protocol = f.get('protocol') or determine_protocol(f)
proto_preference = 0 if protocol in ['http', 'https'] else (-0.5 if protocol == 'rtsp' else -0.1)
if f.get('vcodec') == 'none': # audio only
preference -= 50
if self._downloader.params.get('prefer_free_formats'):
ORDER = ['aac', 'mp3', 'm4a', 'webm', 'ogg', 'opus']
else:
ORDER = ['webm', 'opus', 'ogg', 'mp3', 'aac', 'm4a']
ext_preference = 0
try:
audio_ext_preference = ORDER.index(f['ext'])
except ValueError:
audio_ext_preference = -1
else:
if f.get('acodec') == 'none': # video only
preference -= 40
if self._downloader.params.get('prefer_free_formats'):
ORDER = ['flv', 'mp4', 'webm']
else:
ORDER = ['webm', 'flv', 'mp4']
try:
ext_preference = ORDER.index(f['ext'])
except ValueError:
ext_preference = -1
audio_ext_preference = 0
return (
preference,
f.get('language_preference') if f.get('language_preference') is not None else -1,
f.get('quality') if f.get('quality') is not None else -1,
f.get('tbr') if f.get('tbr') is not None else -1,
f.get('filesize') if f.get('filesize') is not None else -1,
f.get('vbr') if f.get('vbr') is not None else -1,
f.get('height') if f.get('height') is not None else -1,
f.get('width') if f.get('width') is not None else -1,
proto_preference,
ext_preference,
f.get('abr') if f.get('abr') is not None else -1,
audio_ext_preference,
f.get('fps') if f.get('fps') is not None else -1,
f.get('filesize_approx') if f.get('filesize_approx') is not None else -1,
f.get('source_preference') if f.get('source_preference') is not None else -1,
f.get('format_id') if f.get('format_id') is not None else '',
)
formats.sort(key=_formats_key)
def _check_formats(self, formats, video_id): def _check_formats(self, formats, video_id):
if formats: if formats:
@ -2514,16 +2712,18 @@ def _media_formats(src, cur_media_type, type_info={}):
# amp-video and amp-audio are very similar to their HTML5 counterparts # amp-video and amp-audio are very similar to their HTML5 counterparts
# so we wll include them right here (see # so we wll include them right here (see
# https://www.ampproject.org/docs/reference/components/amp-video) # https://www.ampproject.org/docs/reference/components/amp-video)
media_tags = [(media_tag, media_type, '') # For dl8-* tags see https://delight-vr.com/documentation/dl8-video/
for media_tag, media_type _MEDIA_TAG_NAME_RE = r'(?:(?:amp|dl8(?:-live)?)-)?(video|audio)'
in re.findall(r'(?s)(<(?:amp-)?(video|audio)[^>]*/>)', webpage)] media_tags = [(media_tag, media_tag_name, media_type, '')
for media_tag, media_tag_name, media_type
in re.findall(r'(?s)(<(%s)[^>]*/>)' % _MEDIA_TAG_NAME_RE, webpage)]
media_tags.extend(re.findall( media_tags.extend(re.findall(
# We only allow video|audio followed by a whitespace or '>'. # We only allow video|audio followed by a whitespace or '>'.
# Allowing more characters may end up in significant slow down (see # Allowing more characters may end up in significant slow down (see
# https://github.com/ytdl-org/youtube-dl/issues/11979, example URL: # https://github.com/ytdl-org/youtube-dl/issues/11979, example URL:
# http://www.porntrex.com/maps/videositemap.xml). # http://www.porntrex.com/maps/videositemap.xml).
r'(?s)(<(?P<tag>(?:amp-)?(?:video|audio))(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage)) r'(?s)(<(?P<tag>%s)(?:\s+[^>]*)?>)(.*?)</(?P=tag)>' % _MEDIA_TAG_NAME_RE, webpage))
for media_tag, media_type, media_content in media_tags: for media_tag, _, media_type, media_content in media_tags:
media_info = { media_info = {
'formats': [], 'formats': [],
'subtitles': {}, 'subtitles': {},
@ -2596,6 +2796,13 @@ def _media_formats(src, cur_media_type, type_info={}):
return entries return entries
def _extract_akamai_formats(self, manifest_url, video_id, hosts={}): def _extract_akamai_formats(self, manifest_url, video_id, hosts={}):
signed = 'hdnea=' in manifest_url
if not signed:
# https://learn.akamai.com/en-us/webhelp/media-services-on-demand/stream-packaging-user-guide/GUID-BE6C0F73-1E06-483B-B0EA-57984B91B7F9.html
manifest_url = re.sub(
r'(?:b=[\d,-]+|(?:__a__|attributes)=off|__b__=\d+)&?',
'', manifest_url).strip('?')
formats = [] formats = []
hdcore_sign = 'hdcore=3.7.0' hdcore_sign = 'hdcore=3.7.0'
@ -2621,7 +2828,7 @@ def _extract_akamai_formats(self, manifest_url, video_id, hosts={}):
formats.extend(m3u8_formats) formats.extend(m3u8_formats)
http_host = hosts.get('http') http_host = hosts.get('http')
if http_host and m3u8_formats and 'hdnea=' not in m3u8_url: if http_host and m3u8_formats and not signed:
REPL_REGEX = r'https?://[^/]+/i/([^,]+),([^/]+),([^/]+)\.csmil/.+' REPL_REGEX = r'https?://[^/]+/i/([^,]+),([^/]+),([^/]+)\.csmil/.+'
qualities = re.match(REPL_REGEX, m3u8_url).group(2).split(',') qualities = re.match(REPL_REGEX, m3u8_url).group(2).split(',')
qualities_length = len(qualities) qualities_length = len(qualities)

View File

@ -10,6 +10,8 @@
find_xpath_attr, find_xpath_attr,
get_element_by_class, get_element_by_class,
int_or_none, int_or_none,
js_to_json,
merge_dicts,
smuggle_url, smuggle_url,
unescapeHTML, unescapeHTML,
) )
@ -98,6 +100,26 @@ def _real_extract(self, url):
bc_attr['data-bcid']) bc_attr['data-bcid'])
return self.url_result(smuggle_url(bc_url, {'source_url': url})) return self.url_result(smuggle_url(bc_url, {'source_url': url}))
def add_referer(formats):
for f in formats:
f.setdefault('http_headers', {})['Referer'] = url
# As of 01.12.2020 this path looks to cover all cases making the rest
# of the code unnecessary
jwsetup = self._parse_json(
self._search_regex(
r'(?s)jwsetup\s*=\s*({.+?})\s*;', webpage, 'jwsetup',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)
if jwsetup:
info = self._parse_jwplayer_data(
jwsetup, video_id, require_title=False, m3u8_id='hls',
base_url=url)
add_referer(info['formats'])
ld_info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts(info, ld_info)
# Obsolete
# We first look for clipid, because clipprog always appears before # We first look for clipid, because clipprog always appears before
patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')] patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')]
results = list(filter(None, (re.search(p, webpage) for p in patterns))) results = list(filter(None, (re.search(p, webpage) for p in patterns)))
@ -165,6 +187,7 @@ def get_text_attr(d, attr):
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
path, video_id, 'mp4', entry_protocol='m3u8_native', path, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls') if determine_ext(path) == 'm3u8' else [{'url': path, }] m3u8_id='hls') if determine_ext(path) == 'm3u8' else [{'url': path, }]
add_referer(formats)
self._sort_formats(formats) self._sort_formats(formats)
entries.append({ entries.append({
'id': '%s_%d' % (video_id, partnum + 1), 'id': '%s_%d' % (video_id, partnum + 1),

View File

@ -0,0 +1,52 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ctv\.ca/(?P<id>(?:show|movie)s/[^/]+/[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ctv.ca/shows/your-morning/wednesday-december-23-2020-s5e88',
'info_dict': {
'id': '2102249',
'ext': 'flv',
'title': 'Wednesday, December 23, 2020',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'Your Morning delivers original perspectives and unique insights into the headlines of the day.',
'timestamp': 1608732000,
'upload_date': '20201223',
'series': 'Your Morning',
'season': '2020-2021',
'season_number': 5,
'episode_number': 88,
'tags': ['Your Morning'],
'categories': ['Talk Show'],
'duration': 7467.126,
},
}, {
'url': 'https://www.ctv.ca/movies/adam-sandlers-eight-crazy-nights/adam-sandlers-eight-crazy-nights',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
content = self._download_json(
'https://www.ctv.ca/space-graphql/graphql', display_id, query={
'query': '''{
resolvedPath(path: "/%s") {
lastSegment {
content {
... on AxisContent {
axisId
videoPlayerDestCode
}
}
}
}
}''' % display_id,
})['data']['resolvedPath']['lastSegment']['content']
video_id = content['axisId']
return self.url_result(
'9c9media:%s:%s' % (content['videoPlayerDestCode'], video_id),
'NineCNineMedia', video_id)

View File

@ -29,7 +29,7 @@ class DRTVIE(InfoExtractor):
https?:// https?://
(?: (?:
(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*| (?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*|
(?:www\.)?(?:dr\.dk|dr-massive\.com)/drtv/(?:se|episode)/ (?:www\.)?(?:dr\.dk|dr-massive\.com)/drtv/(?:se|episode|program)/
) )
(?P<id>[\da-z_-]+) (?P<id>[\da-z_-]+)
''' '''
@ -111,6 +111,9 @@ class DRTVIE(InfoExtractor):
}, { }, {
'url': 'https://dr-massive.com/drtv/se/bonderoeven_71769', 'url': 'https://dr-massive.com/drtv/se/bonderoeven_71769',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.dr.dk/drtv/program/jagten_220924',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -16,7 +16,7 @@
class EpornerIE(InfoExtractor): class EpornerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eporner\.com/(?:hd-porn|embed)/(?P<id>\w+)(?:/(?P<display_id>[\w-]+))?' _VALID_URL = r'https?://(?:www\.)?eporner\.com/(?:(?:hd-porn|embed)/|video-)(?P<id>\w+)(?:/(?P<display_id>[\w-]+))?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/', 'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/',
'md5': '39d486f046212d8e1b911c52ab4691f8', 'md5': '39d486f046212d8e1b911c52ab4691f8',
@ -43,7 +43,10 @@ class EpornerIE(InfoExtractor):
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0', 'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0', 'url': 'http://www.eporner.com/embed/3YRUtzMcWn0',
'only_matching': True,
}, {
'url': 'https://www.eporner.com/video-FJsA19J3Y3H/one-of-the-greats/',
'only_matching': True, 'only_matching': True,
}] }]
@ -57,7 +60,7 @@ def _real_extract(self, url):
video_id = self._match_id(urlh.geturl()) video_id = self._match_id(urlh.geturl())
hash = self._search_regex( hash = self._search_regex(
r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash') r'hash\s*[:=]\s*["\']([\da-f]{32})', webpage, 'hash')
title = self._og_search_title(webpage, default=None) or self._html_search_regex( title = self._og_search_title(webpage, default=None) or self._html_search_regex(
r'<title>(.+?) - EPORNER', webpage, 'title') r'<title>(.+?) - EPORNER', webpage, 'title')
@ -115,8 +118,8 @@ def calc_hash(s):
duration = parse_duration(self._html_search_meta( duration = parse_duration(self._html_search_meta(
'duration', webpage, default=None)) 'duration', webpage, default=None))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'id="cinemaviews">\s*([0-9,]+)\s*<small>views', r'id=["\']cinemaviews1["\'][^>]*>\s*([0-9,]+)',
webpage, 'view count', fatal=False)) webpage, 'view count', default=None))
return merge_dicts(json_ld, { return merge_dicts(json_ld, {
'id': video_id, 'id': video_id,

View File

@ -30,7 +30,11 @@
from .adultswim import AdultSwimIE from .adultswim import AdultSwimIE
from .aenetworks import ( from .aenetworks import (
AENetworksIE, AENetworksIE,
AENetworksCollectionIE,
AENetworksShowIE,
HistoryTopicIE, HistoryTopicIE,
HistoryPlayerIE,
BiographyIE,
) )
from .afreecatv import AfreecaTVIE from .afreecatv import AfreecaTVIE
from .airmozilla import AirMozillaIE from .airmozilla import AirMozillaIE
@ -56,6 +60,7 @@
AppleTrailersSectionIE, AppleTrailersSectionIE,
) )
from .archiveorg import ArchiveOrgIE from .archiveorg import ArchiveOrgIE
from .arcpublishing import ArcPublishingIE
from .arkena import ArkenaIE from .arkena import ArkenaIE
from .ard import ( from .ard import (
ARDBetaMediathekIE, ARDBetaMediathekIE,
@ -93,10 +98,6 @@
BBCCoUkPlaylistIE, BBCCoUkPlaylistIE,
BBCIE, BBCIE,
) )
from .beampro import (
BeamProLiveIE,
BeamProVodIE,
)
from .beeg import BeegIE from .beeg import BeegIE
from .behindkink import BehindKinkIE from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE from .bellmedia import BellMediaIE
@ -129,6 +130,7 @@
from .blinkx import BlinkxIE from .blinkx import BlinkxIE
from .bloomberg import BloombergIE from .bloomberg import BloombergIE
from .bokecc import BokeCCIE from .bokecc import BokeCCIE
from .bongacams import BongaCamsIE
from .bostonglobe import BostonGlobeIE from .bostonglobe import BostonGlobeIE
from .box import BoxIE from .box import BoxIE
from .bpb import BpbIE from .bpb import BpbIE
@ -173,7 +175,10 @@
CBCOlympicsIE, CBCOlympicsIE,
) )
from .cbs import CBSIE from .cbs import CBSIE
from .cbslocal import CBSLocalIE from .cbslocal import (
CBSLocalIE,
CBSLocalArticleIE,
)
from .cbsinteractive import CBSInteractiveIE from .cbsinteractive import CBSInteractiveIE
from .cbsnews import ( from .cbsnews import (
CBSNewsEmbedIE, CBSNewsEmbedIE,
@ -251,6 +256,7 @@
) )
from .cspan import CSpanIE from .cspan import CSpanIE
from .ctsnews import CtsNewsIE from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import ( from .curiositystream import (
@ -345,7 +351,6 @@
) )
from .esri import EsriVideoIE from .esri import EsriVideoIE
from .europa import EuropaIE from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE
from .expotv import ExpoTVIE from .expotv import ExpoTVIE
from .expressen import ExpressenIE from .expressen import ExpressenIE
from .extremetube import ExtremeTubeIE from .extremetube import ExtremeTubeIE
@ -409,10 +414,10 @@
FrontendMastersLessonIE, FrontendMastersLessonIE,
FrontendMastersCourseIE FrontendMastersCourseIE
) )
from .fujitv import FujiTVFODPlus7IE
from .funimation import FunimationIE from .funimation import FunimationIE
from .funk import FunkIE from .funk import FunkIE
from .fusion import FusionIE from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE from .gaia import GaiaIE
from .gameinformer import GameInformerIE from .gameinformer import GameInformerIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
@ -523,7 +528,6 @@
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .kakao import KakaoIE from .kakao import KakaoIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE
from .kankan import KankanIE from .kankan import KankanIE
from .karaoketv import KaraoketvIE from .karaoketv import KaraoketvIE
from .karrierevideos import KarriereVideosIE from .karrierevideos import KarriereVideosIE
@ -552,7 +556,10 @@
EHFTVIE, EHFTVIE,
ITTFIE, ITTFIE,
) )
from .lbry import LBRYIE from .lbry import (
LBRYIE,
LBRYChannelIE,
)
from .lci import LCIIE from .lci import LCIIE
from .lcp import ( from .lcp import (
LcpPlayIE, LcpPlayIE,
@ -703,9 +710,15 @@
NaverIE, NaverIE,
NaverLiveIE, NaverLiveIE,
) )
from .nba import NBAIE from .nba import (
NBAWatchEmbedIE,
NBAWatchIE,
NBAWatchCollectionIE,
NBAEmbedIE,
NBAIE,
NBAChannelIE,
)
from .nbc import ( from .nbc import (
CSNNEIE,
NBCIE, NBCIE,
NBCNewsIE, NBCNewsIE,
NBCOlympicsIE, NBCOlympicsIE,
@ -748,8 +761,14 @@
NexxIE, NexxIE,
NexxEmbedIE, NexxEmbedIE,
) )
from .nfl import NFLIE from .nfl import (
from .nhk import NhkVodIE NFLIE,
NFLArticleIE,
)
from .nhk import (
NhkVodIE,
NhkVodProgramIE,
)
from .nhl import NHLIE from .nhl import NHLIE
from .nick import ( from .nick import (
NickIE, NickIE,
@ -766,7 +785,6 @@
from .nitter import NitterIE from .nitter import NitterIE
from .njpwworld import NJPWWorldIE from .njpwworld import NJPWWorldIE
from .nobelprize import NobelPrizeIE from .nobelprize import NobelPrizeIE
from .noco import NocoIE
from .nonktube import NonkTubeIE from .nonktube import NonkTubeIE
from .noovo import NoovoIE from .noovo import NoovoIE
from .normalboots import NormalbootsIE from .normalboots import NormalbootsIE
@ -799,6 +817,7 @@
NRKSkoleIE, NRKSkoleIE,
NRKTVIE, NRKTVIE,
NRKTVDirekteIE, NRKTVDirekteIE,
NRKRadioPodkastIE,
NRKTVEpisodeIE, NRKTVEpisodeIE,
NRKTVEpisodesIE, NRKTVEpisodesIE,
NRKTVSeasonIE, NRKTVSeasonIE,
@ -1070,16 +1089,11 @@
from .sky import ( from .sky import (
SkyNewsIE, SkyNewsIE,
SkySportsIE, SkySportsIE,
SkySportsNewsIE,
) )
from .slideshare import SlideshareIE from .slideshare import SlideshareIE
from .slideslive import SlidesLiveIE from .slideslive import SlidesLiveIE
from .slutload import SlutloadIE from .slutload import SlutloadIE
from .smotri import (
SmotriIE,
SmotriCommunityIE,
SmotriUserIE,
SmotriBroadcastIE,
)
from .snotr import SnotrIE from .snotr import SnotrIE
from .sohu import SohuIE from .sohu import SohuIE
from .sonyliv import SonyLIVIE from .sonyliv import SonyLIVIE
@ -1162,7 +1176,6 @@
TagesschauIE, TagesschauIE,
) )
from .tass import TassIE from .tass import TassIE
from .tastytrade import TastyTradeIE
from .tbs import TBSIE from .tbs import TBSIE
from .tdslifeway import TDSLifewayIE from .tdslifeway import TDSLifewayIE
from .teachable import ( from .teachable import (
@ -1189,6 +1202,7 @@
TeleQuebecSquatIE, TeleQuebecSquatIE,
TeleQuebecEmissionIE, TeleQuebecEmissionIE,
TeleQuebecLiveIE, TeleQuebecLiveIE,
TeleQuebecVideoIE,
) )
from .teletask import TeleTaskIE from .teletask import TeleTaskIE
from .telewebion import TelewebionIE from .telewebion import TelewebionIE
@ -1220,7 +1234,10 @@
EMPFlixIE, EMPFlixIE,
MovieFapIE, MovieFapIE,
) )
from .toggle import ToggleIE from .toggle import (
ToggleIE,
MeWatchIE,
)
from .tonline import TOnlineIE from .tonline import TOnlineIE
from .toongoggles import ToonGogglesIE from .toongoggles import ToonGogglesIE
from .toutv import TouTvIE from .toutv import TouTvIE
@ -1253,7 +1270,14 @@
from .tv2hu import TV2HuIE from .tv2hu import TV2HuIE
from .tv4 import TV4IE from .tv4 import TV4IE
from .tv5mondeplus import TV5MondePlusIE from .tv5mondeplus import TV5MondePlusIE
from .tva import TVAIE from .tv5unis import (
TV5UnisVideoIE,
TV5UnisIE,
)
from .tva import (
TVAIE,
QubIE,
)
from .tvanouvelles import ( from .tvanouvelles import (
TVANouvellesIE, TVANouvellesIE,
TVANouvellesArticleIE, TVANouvellesArticleIE,
@ -1262,6 +1286,7 @@
TVCIE, TVCIE,
TVCArticleIE, TVCArticleIE,
) )
from .tver import TVerIE
from .tvigle import TvigleIE from .tvigle import TvigleIE
from .tvland import TVLandIE from .tvland import TVLandIE
from .tvn24 import TVN24IE from .tvn24 import TVN24IE
@ -1440,7 +1465,10 @@
from .medialaan import MedialaanIE from .medialaan import MedialaanIE
from .vube import VubeIE from .vube import VubeIE
from .vuclip import VuClipIE from .vuclip import VuClipIE
from .vvvvid import VVVVIDIE from .vvvvid import (
VVVVIDIE,
VVVVIDShowIE,
)
from .vyborymos import VyboryMosIE from .vyborymos import VyboryMosIE
from .vzaar import VzaarIE from .vzaar import VzaarIE
from .wakanim import WakanimIE from .wakanim import WakanimIE
@ -1471,7 +1499,10 @@
WeiboMobileIE WeiboMobileIE
) )
from .weiqitv import WeiqiTVIE from .weiqitv import WeiqiTVIE
from .wistia import WistiaIE from .wistia import (
WistiaIE,
WistiaPlaylistIE,
)
from .worldstarhiphop import WorldStarHipHopIE from .worldstarhiphop import WorldStarHipHopIE
from .wsj import ( from .wsj import (
WSJIE, WSJIE,
@ -1515,6 +1546,8 @@
YandexMusicTrackIE, YandexMusicTrackIE,
YandexMusicAlbumIE, YandexMusicAlbumIE,
YandexMusicPlaylistIE, YandexMusicPlaylistIE,
YandexMusicArtistTracksIE,
YandexMusicArtistAlbumsIE,
) )
from .yandexvideo import YandexVideoIE from .yandexvideo import YandexVideoIE
from .yapfiles import YapFilesIE from .yapfiles import YapFilesIE
@ -1547,11 +1580,11 @@
YoutubeSubscriptionsIE, YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE, YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE, YoutubeTruncatedURLIE,
YoutubeYtBeIE,
YoutubeYtUserIE, YoutubeYtUserIE,
YoutubeWatchLaterIE, YoutubeWatchLaterIE,
) )
from .zapiks import ZapiksIE from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE
from .zattoo import ( from .zattoo import (
BBVTVIE, BBVTVIE,
EinsUndEinsTVIE, EinsUndEinsTVIE,

View File

@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
import socket import socket
@ -8,6 +9,7 @@
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_http_client, compat_http_client,
compat_str,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, compat_urllib_parse_unquote_plus,
@ -16,14 +18,17 @@
clean_html, clean_html,
error_to_compat_str, error_to_compat_str,
ExtractorError, ExtractorError,
float_or_none,
get_element_by_id, get_element_by_id,
int_or_none, int_or_none,
js_to_json, js_to_json,
limit_length, limit_length,
parse_count, parse_count,
qualities,
sanitized_Request, sanitized_Request,
try_get, try_get,
urlencode_postdata, urlencode_postdata,
urljoin,
) )
@ -39,11 +44,13 @@ class FacebookIE(InfoExtractor):
photo\.php| photo\.php|
video\.php| video\.php|
video/embed| video/embed|
story\.php story\.php|
watch(?:/live)?/?
)\?(?:.*?)(?:v|video_id|story_fbid)=| )\?(?:.*?)(?:v|video_id|story_fbid)=|
[^/]+/videos/(?:[^/]+/)?| [^/]+/videos/(?:[^/]+/)?|
[^/]+/posts/| [^/]+/posts/|
groups/[^/]+/permalink/ groups/[^/]+/permalink/|
watchparty/
)| )|
facebook: facebook:
) )
@ -54,8 +61,6 @@ class FacebookIE(InfoExtractor):
_NETRC_MACHINE = 'facebook' _NETRC_MACHINE = 'facebook'
IE_NAME = 'facebook' IE_NAME = 'facebook'
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s' _VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true&payloadtype=primary' _VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true&payloadtype=primary'
@ -72,6 +77,7 @@ class FacebookIE(InfoExtractor):
}, },
'skip': 'Requires logging in', 'skip': 'Requires logging in',
}, { }, {
# data.video
'url': 'https://www.facebook.com/video.php?v=274175099429670', 'url': 'https://www.facebook.com/video.php?v=274175099429670',
'info_dict': { 'info_dict': {
'id': '274175099429670', 'id': '274175099429670',
@ -133,6 +139,7 @@ class FacebookIE(InfoExtractor):
}, },
}, { }, {
# have 1080P, but only up to 720p in swf params # have 1080P, but only up to 720p in swf params
# data.video.story.attachments[].media
'url': 'https://www.facebook.com/cnn/videos/10155529876156509/', 'url': 'https://www.facebook.com/cnn/videos/10155529876156509/',
'md5': '9571fae53d4165bbbadb17a94651dcdc', 'md5': '9571fae53d4165bbbadb17a94651dcdc',
'info_dict': { 'info_dict': {
@ -147,6 +154,7 @@ class FacebookIE(InfoExtractor):
}, },
}, { }, {
# bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall # bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.media
'url': 'https://www.facebook.com/yaroslav.korpan/videos/1417995061575415/', 'url': 'https://www.facebook.com/yaroslav.korpan/videos/1417995061575415/',
'info_dict': { 'info_dict': {
'id': '1417995061575415', 'id': '1417995061575415',
@ -174,6 +182,7 @@ class FacebookIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.media
'url': 'https://www.facebook.com/groups/1024490957622648/permalink/1396382447100162/', 'url': 'https://www.facebook.com/groups/1024490957622648/permalink/1396382447100162/',
'info_dict': { 'info_dict': {
'id': '1396382447100162', 'id': '1396382447100162',
@ -193,18 +202,23 @@ class FacebookIE(InfoExtractor):
'url': 'https://www.facebook.com/amogood/videos/1618742068337349/?fref=nf', 'url': 'https://www.facebook.com/amogood/videos/1618742068337349/?fref=nf',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.mediaset.currMedia.edges
'url': 'https://www.facebook.com/ChristyClarkForBC/videos/vb.22819070941/10153870694020942/?type=2&theater', 'url': 'https://www.facebook.com/ChristyClarkForBC/videos/vb.22819070941/10153870694020942/?type=2&theater',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.video.story.attachments[].media
'url': 'facebook:544765982287235', 'url': 'facebook:544765982287235',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.media
'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/', 'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.video.creation_story.attachments[].media
'url': 'https://zh-hk.facebook.com/peoplespower/videos/1135894589806027/', 'url': 'https://zh-hk.facebook.com/peoplespower/videos/1135894589806027/',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.video
'url': 'https://www.facebookcorewwwi.onion/video.php?v=274175099429670', 'url': 'https://www.facebookcorewwwi.onion/video.php?v=274175099429670',
'only_matching': True, 'only_matching': True,
}, { }, {
@ -212,6 +226,7 @@ class FacebookIE(InfoExtractor):
'url': 'https://www.facebook.com/onlycleverentertainment/videos/1947995502095005/', 'url': 'https://www.facebook.com/onlycleverentertainment/videos/1947995502095005/',
'only_matching': True, 'only_matching': True,
}, { }, {
# data.video
'url': 'https://www.facebook.com/WatchESLOne/videos/359649331226507/', 'url': 'https://www.facebook.com/WatchESLOne/videos/359649331226507/',
'info_dict': { 'info_dict': {
'id': '359649331226507', 'id': '359649331226507',
@ -222,7 +237,64 @@ class FacebookIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.all_subattachments.nodes[].media
'url': 'https://www.facebook.com/100033620354545/videos/106560053808006/',
'info_dict': {
'id': '106560053808006',
},
'playlist_count': 2,
}, {
# data.video.story.attachments[].media
'url': 'https://www.facebook.com/watch/?v=647537299265662',
'only_matching': True,
}, {
# data.node.comet_sections.content.story.attachments[].style_type_renderer.attachment.all_subattachments.nodes[].media
'url': 'https://www.facebook.com/PankajShahLondon/posts/10157667649866271',
'info_dict': {
'id': '10157667649866271',
},
'playlist_count': 3,
}, {
# data.nodes[].comet_sections.content.story.attachments[].style_type_renderer.attachment.media
'url': 'https://m.facebook.com/Alliance.Police.Department/posts/4048563708499330',
'info_dict': {
'id': '117576630041613',
'ext': 'mp4',
# TODO: title can be extracted from video page
'title': 'Facebook video #117576630041613',
'uploader_id': '189393014416438',
'upload_date': '20201123',
'timestamp': 1606162592,
},
'skip': 'Requires logging in',
}, {
# node.comet_sections.content.story.attached_story.attachments.style_type_renderer.attachment.media
'url': 'https://www.facebook.com/groups/ateistiskselskab/permalink/10154930137678856/',
'info_dict': {
'id': '211567722618337',
'ext': 'mp4',
'title': 'Facebook video #211567722618337',
'uploader_id': '127875227654254',
'upload_date': '20161122',
'timestamp': 1479793574,
},
}, {
# data.video.creation_story.attachments[].media
'url': 'https://www.facebook.com/watch/live/?v=1823658634322275',
'only_matching': True,
}, {
'url': 'https://www.facebook.com/watchparty/211641140192478',
'info_dict': {
'id': '211641140192478',
},
'playlist_count': 1,
'skip': 'Requires logging in',
}] }]
_SUPPORTED_PAGLETS_REGEX = r'(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_[0-9a-f]+)'
_api_config = {
'graphURI': '/api/graphql/'
}
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
@ -305,23 +377,24 @@ def _login(self):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def _extract_from_url(self, url, video_id, fatal_if_no_video=True): def _extract_from_url(self, url, video_id):
req = sanitized_Request(url) webpage = self._download_webpage(
req.add_header('User-Agent', self._CHROME_USER_AGENT) url.replace('://m.facebook.com/', '://www.facebook.com/'), video_id)
webpage = self._download_webpage(req, video_id)
video_data = None video_data = None
def extract_video_data(instances): def extract_video_data(instances):
video_data = []
for item in instances: for item in instances:
if item[1][0] == 'VideoConfig': if try_get(item, lambda x: x[1][0]) == 'VideoConfig':
video_item = item[2][0] video_item = item[2][0]
if video_item.get('video_id'): if video_item.get('video_id'):
return video_item['videoData'] video_data.append(video_item['videoData'])
return video_data
server_js_data = self._parse_json(self._search_regex( server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})(?:\);|,")', webpage, [r'handleServerJS\(({.+})(?:\);|,")', r'\bs\.handle\(({.+?})\);'],
'server js data', default='{}'), video_id, fatal=False) webpage, 'server js data', default='{}'), video_id, fatal=False)
if server_js_data: if server_js_data:
video_data = extract_video_data(server_js_data.get('instances', [])) video_data = extract_video_data(server_js_data.get('instances', []))
@ -331,17 +404,118 @@ def extract_from_jsmods_instances(js_data):
return extract_video_data(try_get( return extract_video_data(try_get(
js_data, lambda x: x['jsmods']['instances'], list) or []) js_data, lambda x: x['jsmods']['instances'], list) or [])
def extract_dash_manifest(video, formats):
dash_manifest = video.get('dash_manifest')
if dash_manifest:
formats.extend(self._parse_mpd_formats(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
def process_formats(formats):
# Downloads with browser's User-Agent are rate limited. Working around
# with non-browser User-Agent.
for f in formats:
f.setdefault('http_headers', {})['User-Agent'] = 'facebookexternalhit/1.1'
self._sort_formats(formats)
def extract_relay_data(_filter):
return self._parse_json(self._search_regex(
r'handleWithCustomApplyEach\([^,]+,\s*({.*?%s.*?})\);' % _filter,
webpage, 'replay data', default='{}'), video_id, fatal=False) or {}
def extract_relay_prefetched_data(_filter):
replay_data = extract_relay_data(_filter)
for require in (replay_data.get('require') or []):
if require[0] == 'RelayPrefetchedStreamCache':
return try_get(require, lambda x: x[3][1]['__bbox']['result']['data'], dict) or {}
if not video_data: if not video_data:
server_js_data = self._parse_json( server_js_data = self._parse_json(self._search_regex([
self._search_regex( r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+' + self._SUPPORTED_PAGLETS_REGEX,
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:pagelet_group_mall|permalink_video_pagelet|hyperfeed_story_id_\d+)', r'bigPipe\.onPageletArrive\(({.*?id\s*:\s*"%s".*?})\);' % self._SUPPORTED_PAGLETS_REGEX
webpage, 'js data', default='{}'), ], webpage, 'js data', default='{}'), video_id, js_to_json, False)
video_id, transform_source=js_to_json, fatal=False)
video_data = extract_from_jsmods_instances(server_js_data) video_data = extract_from_jsmods_instances(server_js_data)
if not video_data: if not video_data:
if not fatal_if_no_video: data = extract_relay_prefetched_data(
return webpage, False r'"(?:dash_manifest|playable_url(?:_quality_hd)?)"\s*:\s*"[^"]+"')
if data:
entries = []
def parse_graphql_video(video):
formats = []
q = qualities(['sd', 'hd'])
for (suffix, format_id) in [('', 'sd'), ('_quality_hd', 'hd')]:
playable_url = video.get('playable_url' + suffix)
if not playable_url:
continue
formats.append({
'format_id': format_id,
'quality': q(format_id),
'url': playable_url,
})
extract_dash_manifest(video, formats)
process_formats(formats)
v_id = video.get('videoId') or video.get('id') or video_id
info = {
'id': v_id,
'formats': formats,
'thumbnail': try_get(video, lambda x: x['thumbnailImage']['uri']),
'uploader_id': try_get(video, lambda x: x['owner']['id']),
'timestamp': int_or_none(video.get('publish_time')),
'duration': float_or_none(video.get('playable_duration_in_ms'), 1000),
}
description = try_get(video, lambda x: x['savable_description']['text'])
title = video.get('name')
if title:
info.update({
'title': title,
'description': description,
})
else:
info['title'] = description or 'Facebook video #%s' % v_id
entries.append(info)
def parse_attachment(attachment, key='media'):
media = attachment.get(key) or {}
if media.get('__typename') == 'Video':
return parse_graphql_video(media)
nodes = data.get('nodes') or []
node = data.get('node') or {}
if not nodes and node:
nodes.append(node)
for node in nodes:
story = try_get(node, lambda x: x['comet_sections']['content']['story'], dict) or {}
attachments = try_get(story, [
lambda x: x['attached_story']['attachments'],
lambda x: x['attachments']
], list) or []
for attachment in attachments:
attachment = try_get(attachment, lambda x: x['style_type_renderer']['attachment'], dict)
ns = try_get(attachment, lambda x: x['all_subattachments']['nodes'], list) or []
for n in ns:
parse_attachment(n)
parse_attachment(attachment)
edges = try_get(data, lambda x: x['mediaset']['currMedia']['edges'], list) or []
for edge in edges:
parse_attachment(edge, key='node')
video = data.get('video') or {}
if video:
attachments = try_get(video, [
lambda x: x['story']['attachments'],
lambda x: x['creation_story']['attachments']
], list) or []
for attachment in attachments:
parse_attachment(attachment)
if not entries:
parse_graphql_video(video)
return self.playlist_result(entries, video_id)
if not video_data:
m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage) m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
if m_msg is not None: if m_msg is not None:
raise ExtractorError( raise ExtractorError(
@ -350,6 +524,43 @@ def extract_from_jsmods_instances(js_data):
elif '>You must log in to continue' in webpage: elif '>You must log in to continue' in webpage:
self.raise_login_required() self.raise_login_required()
if not video_data and '/watchparty/' in url:
post_data = {
'doc_id': 3731964053542869,
'variables': json.dumps({
'livingRoomID': video_id,
}),
}
prefetched_data = extract_relay_prefetched_data(r'"login_data"\s*:\s*{')
if prefetched_data:
lsd = try_get(prefetched_data, lambda x: x['login_data']['lsd'], dict)
if lsd:
post_data[lsd['name']] = lsd['value']
relay_data = extract_relay_data(r'\[\s*"RelayAPIConfigDefaults"\s*,')
for define in (relay_data.get('define') or []):
if define[0] == 'RelayAPIConfigDefaults':
self._api_config = define[2]
living_room = self._download_json(
urljoin(url, self._api_config['graphURI']), video_id,
data=urlencode_postdata(post_data))['data']['living_room']
entries = []
for edge in (try_get(living_room, lambda x: x['recap']['watched_content']['edges']) or []):
video = try_get(edge, lambda x: x['node']['video']) or {}
v_id = video.get('id')
if not v_id:
continue
v_id = compat_str(v_id)
entries.append(self.url_result(
self._VIDEO_PAGE_TEMPLATE % v_id,
self.ie_key(), v_id, video.get('name')))
return self.playlist_result(entries, video_id)
if not video_data:
# Video info not in first request, do a secondary request using # Video info not in first request, do a secondary request using
# tahoe player specific URL # tahoe player specific URL
tahoe_data = self._download_webpage( tahoe_data = self._download_webpage(
@ -379,8 +590,19 @@ def extract_from_jsmods_instances(js_data):
if not video_data: if not video_data:
raise ExtractorError('Cannot parse data') raise ExtractorError('Cannot parse data')
subtitles = {} if len(video_data) > 1:
entries = []
for v in video_data:
video_url = v[0].get('video_url')
if not video_url:
continue
entries.append(self.url_result(urljoin(
url, video_url), self.ie_key(), v[0].get('video_id')))
return self.playlist_result(entries, video_id)
video_data = video_data[0]
formats = [] formats = []
subtitles = {}
for f in video_data: for f in video_data:
format_id = f['stream_type'] format_id = f['stream_type']
if f and isinstance(f, dict): if f and isinstance(f, dict):
@ -399,22 +621,14 @@ def extract_from_jsmods_instances(js_data):
'url': src, 'url': src,
'preference': preference, 'preference': preference,
}) })
dash_manifest = f[0].get('dash_manifest') extract_dash_manifest(f[0], formats)
if dash_manifest:
formats.extend(self._parse_mpd_formats(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
subtitles_src = f[0].get('subtitles_src') subtitles_src = f[0].get('subtitles_src')
if subtitles_src: if subtitles_src:
subtitles.setdefault('en', []).append({'url': subtitles_src}) subtitles.setdefault('en', []).append({'url': subtitles_src})
if not formats: if not formats:
raise ExtractorError('Cannot find video formats') raise ExtractorError('Cannot find video formats')
# Downloads with browser's User-Agent are rate limited. Working around process_formats(formats)
# with non-browser User-Agent.
for f in formats:
f.setdefault('http_headers', {})['User-Agent'] = 'facebookexternalhit/1.1'
self._sort_formats(formats)
video_title = self._html_search_regex( video_title = self._html_search_regex(
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage, r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage,
@ -454,35 +668,13 @@ def extract_from_jsmods_instances(js_data):
'subtitles': subtitles, 'subtitles': subtitles,
} }
return webpage, info_dict return info_dict
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
real_url = self._VIDEO_PAGE_TEMPLATE % video_id if url.startswith('facebook:') else url real_url = self._VIDEO_PAGE_TEMPLATE % video_id if url.startswith('facebook:') else url
webpage, info_dict = self._extract_from_url(real_url, video_id, fatal_if_no_video=False) return self._extract_from_url(real_url, video_id)
if info_dict:
return info_dict
if '/posts/' in url:
video_id_json = self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])', webpage, 'video ids', group='ids',
default='')
if video_id_json:
entries = [
self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
for vid in self._parse_json(video_id_json, video_id)]
return self.playlist_result(entries, video_id)
# Single Video?
video_id = self._search_regex(r'video_id:\s*"([0-9]+)"', webpage, 'single video id')
return self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
else:
_, info_dict = self._extract_from_url(
self._VIDEO_PAGE_TEMPLATE % video_id,
video_id, fatal_if_no_video=True)
return info_dict
class FacebookPluginsVideoIE(InfoExtractor): class FacebookPluginsVideoIE(InfoExtractor):

View File

@ -0,0 +1,35 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class FujiTVFODPlus7IE(InfoExtractor):
_VALID_URL = r'https?://i\.fod\.fujitv\.co\.jp/plus7/web/[0-9a-z]{4}/(?P<id>[0-9a-z]+)'
_BASE_URL = 'http://i.fod.fujitv.co.jp/'
_BITRATE_MAP = {
300: (320, 180),
800: (640, 360),
1200: (1280, 720),
2000: (1280, 720),
}
def _real_extract(self, url):
video_id = self._match_id(url)
formats = self._extract_m3u8_formats(
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id)
for f in formats:
wh = self._BITRATE_MAP.get(f.get('tbr'))
if wh:
f.update({
'width': wh[0],
'height': wh[1],
})
self._sort_formats(formats)
return {
'id': video_id,
'title': video_id,
'formats': formats,
'thumbnail': self._BASE_URL + 'pc/image/wbtn/wbtn_%s.jpg' % video_id,
}

View File

@ -1,16 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .once import OnceIE from .once import OnceIE
from ..compat import ( from ..compat import compat_urllib_parse_unquote
compat_urllib_parse_unquote,
)
from ..utils import (
unescapeHTML,
url_basename,
dict_get,
)
class GameSpotIE(OnceIE): class GameSpotIE(OnceIE):
@ -24,17 +15,16 @@ class GameSpotIE(OnceIE):
'title': 'Arma 3 - Community Guide: SITREP I', 'title': 'Arma 3 - Community Guide: SITREP I',
'description': 'Check out this video where some of the basics of Arma 3 is explained.', 'description': 'Check out this video where some of the basics of Arma 3 is explained.',
}, },
'skip': 'manifest URL give HTTP Error 404: Not Found',
}, { }, {
'url': 'http://www.gamespot.com/videos/the-witcher-3-wild-hunt-xbox-one-now-playing/2300-6424837/', 'url': 'http://www.gamespot.com/videos/the-witcher-3-wild-hunt-xbox-one-now-playing/2300-6424837/',
'md5': '173ea87ad762cf5d3bf6163dceb255a6',
'info_dict': { 'info_dict': {
'id': 'gs-2300-6424837', 'id': 'gs-2300-6424837',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Now Playing - The Witcher 3: Wild Hunt', 'title': 'Now Playing - The Witcher 3: Wild Hunt',
'description': 'Join us as we take a look at the early hours of The Witcher 3: Wild Hunt and more.', 'description': 'Join us as we take a look at the early hours of The Witcher 3: Wild Hunt and more.',
}, },
'params': {
'skip_download': True, # m3u8 downloads
},
}, { }, {
'url': 'https://www.gamespot.com/videos/embed/6439218/', 'url': 'https://www.gamespot.com/videos/embed/6439218/',
'only_matching': True, 'only_matching': True,
@ -49,90 +39,40 @@ class GameSpotIE(OnceIE):
def _real_extract(self, url): def _real_extract(self, url):
page_id = self._match_id(url) page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
data_video_json = self._search_regex( data_video = self._parse_json(self._html_search_regex(
r'data-video=["\'](.*?)["\']', webpage, 'data video') r'data-video=(["\'])({.*?})\1', webpage,
data_video = self._parse_json(unescapeHTML(data_video_json), page_id) 'video data', group=2), page_id)
title = compat_urllib_parse_unquote(data_video['title'])
streams = data_video['videoStreams'] streams = data_video['videoStreams']
manifest_url = None
formats = [] formats = []
f4m_url = streams.get('f4m_stream')
if f4m_url: m3u8_url = streams.get('adaptive_stream')
manifest_url = f4m_url
formats.extend(self._extract_f4m_formats(
f4m_url + '?hdcore=3.7.0', page_id, f4m_id='hds', fatal=False))
m3u8_url = dict_get(streams, ('m3u8_stream', 'adaptive_stream'))
if m3u8_url: if m3u8_url:
manifest_url = m3u8_url
m3u8_formats = self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
m3u8_url, page_id, 'mp4', 'm3u8_native', m3u8_url, page_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False) m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats) for f in m3u8_formats:
progressive_url = dict_get( formats.append(f)
streams, ('progressive_hd', 'progressive_high', 'progressive_low', 'other_lr')) http_f = f.copy()
if progressive_url and manifest_url: del http_f['manifest_url']
qualities_basename = self._search_regex( http_f.update({
r'/([^/]+)\.csmil/', 'format_id': f['format_id'].replace('hls-', 'http-'),
manifest_url, 'qualities basename', default=None) 'protocol': 'http',
if qualities_basename: 'url': f['url'].replace('.m3u8', '.mp4'),
QUALITIES_RE = r'((,\d+)+,?)' })
qualities = self._search_regex( formats.append(http_f)
QUALITIES_RE, qualities_basename,
'qualities', default=None)
if qualities:
qualities = list(map(lambda q: int(q), qualities.strip(',').split(',')))
qualities.sort()
http_template = re.sub(QUALITIES_RE, r'%d', qualities_basename)
http_url_basename = url_basename(progressive_url)
if m3u8_formats:
self._sort_formats(m3u8_formats)
m3u8_formats = list(filter(
lambda f: f.get('vcodec') != 'none', m3u8_formats))
if len(qualities) == len(m3u8_formats):
for q, m3u8_format in zip(qualities, m3u8_formats):
f = m3u8_format.copy()
f.update({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(f)
else:
for q in qualities:
formats.append({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'ext': 'mp4',
'format_id': 'http-%d' % q,
'tbr': q,
})
onceux_json = self._search_regex( mpd_url = streams.get('adaptive_dash')
r'data-onceux-options=["\'](.*?)["\']', webpage, 'data video', default=None) if mpd_url:
if onceux_json: formats.extend(self._extract_mpd_formats(
onceux_url = self._parse_json(unescapeHTML(onceux_json), page_id).get('metadataUri') mpd_url, page_id, mpd_id='dash', fatal=False))
if onceux_url:
formats.extend(self._extract_once_formats(re.sub(
r'https?://[^/]+', 'http://once.unicornmedia.com', onceux_url),
http_formats_preference=-1))
if not formats:
for quality in ['sd', 'hd']:
# It's actually a link to a flv file
flv_url = streams.get('f4m_{0}'.format(quality))
if flv_url is not None:
formats.append({
'url': flv_url,
'ext': 'flv',
'format_id': quality,
})
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': data_video['guid'], 'id': data_video.get('guid') or page_id,
'display_id': page_id, 'display_id': page_id,
'title': compat_urllib_parse_unquote(data_video['title']), 'title': title,
'formats': formats, 'formats': formats,
'description': self._html_search_meta('description', webpage), 'description': self._html_search_meta('description', webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),

View File

@ -20,19 +20,24 @@
ExtractorError, ExtractorError,
float_or_none, float_or_none,
HEADRequest, HEADRequest,
int_or_none,
is_html, is_html,
js_to_json, js_to_json,
KNOWN_EXTENSIONS, KNOWN_EXTENSIONS,
merge_dicts, merge_dicts,
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
parse_duration,
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
unescapeHTML, unescapeHTML,
unified_strdate, unified_timestamp,
unsmuggle_url, unsmuggle_url,
UnsupportedError, UnsupportedError,
url_or_none,
xpath_attr,
xpath_text, xpath_text,
xpath_with_ns,
) )
from .commonprotocols import RtmpIE from .commonprotocols import RtmpIE
from .brightcove import ( from .brightcove import (
@ -48,7 +53,6 @@
from .rutv import RUTVIE from .rutv import RUTVIE
from .tvc import TVCIE from .tvc import TVCIE
from .sportbox import SportBoxIE from .sportbox import SportBoxIE
from .smotri import SmotriIE
from .myvi import MyviIE from .myvi import MyviIE
from .condenast import CondeNastIE from .condenast import CondeNastIE
from .udn import UDNEmbedIE from .udn import UDNEmbedIE
@ -63,7 +67,10 @@
from .mofosex import MofosexEmbedIE from .mofosex import MofosexEmbedIE
from .spankwire import SpankwireIE from .spankwire import SpankwireIE
from .youporn import YouPornIE from .youporn import YouPornIE
from .vimeo import VimeoIE from .vimeo import (
VimeoIE,
VHXEmbedIE,
)
from .dailymotion import DailymotionIE from .dailymotion import DailymotionIE
from .dailymail import DailyMailIE from .dailymail import DailyMailIE
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
@ -123,6 +130,7 @@
from .gedi import GediEmbedsIE from .gedi import GediEmbedsIE
from .rcs import RCSEmbedsIE from .rcs import RCSEmbedsIE
from .bitchute import BitChuteIE from .bitchute import BitChuteIE
from .arcpublishing import ArcPublishingIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -201,11 +209,46 @@ class GenericIE(InfoExtractor):
{ {
'url': 'http://podcastfeeds.nbcnews.com/audio/podcast/MSNBC-MADDOW-NETCAST-M4V.xml', 'url': 'http://podcastfeeds.nbcnews.com/audio/podcast/MSNBC-MADDOW-NETCAST-M4V.xml',
'info_dict': { 'info_dict': {
'id': 'pdv_maddow_netcast_m4v-02-27-2015-201624', 'id': 'http://podcastfeeds.nbcnews.com/nbcnews/video/podcast/MSNBC-MADDOW-NETCAST-M4V.xml',
'ext': 'm4v', 'title': 'MSNBC Rachel Maddow (video)',
'upload_date': '20150228', 'description': 're:.*her unique approach to storytelling.*',
'title': 'pdv_maddow_netcast_m4v-02-27-2015-201624', },
} 'playlist': [{
'info_dict': {
'ext': 'mov',
'id': 'pdv_maddow_netcast_mov-12-03-2020-223726',
'title': 'MSNBC Rachel Maddow (video) - 12-03-2020-223726',
'description': 're:.*her unique approach to storytelling.*',
'upload_date': '20201204',
},
}],
},
# RSS feed with item with description and thumbnails
{
'url': 'https://anchor.fm/s/dd00e14/podcast/rss',
'info_dict': {
'id': 'https://anchor.fm/s/dd00e14/podcast/rss',
'title': 're:.*100% Hydrogen.*',
'description': 're:.*In this episode.*',
},
'playlist': [{
'info_dict': {
'ext': 'm4a',
'id': 'c1c879525ce2cb640b344507e682c36d',
'title': 're:Hydrogen!',
'description': 're:.*In this episode we are going.*',
'timestamp': 1567977776,
'upload_date': '20190908',
'duration': 459,
'thumbnail': r're:^https?://.*\.jpg$',
'episode_number': 1,
'season_number': 1,
'age_limit': 0,
},
}],
'params': {
'skip_download': True,
},
}, },
# RSS feed with enclosures and unsupported link URLs # RSS feed with enclosures and unsupported link URLs
{ {
@ -1986,22 +2029,6 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': [SpringboardPlatformIE.ie_key()], 'add_ie': [SpringboardPlatformIE.ie_key()],
}, },
{
'url': 'https://www.youtube.com/shared?ci=1nEzmT-M4fU',
'info_dict': {
'id': 'uPDB5I9wfp8',
'ext': 'webm',
'title': 'Pocoyo: 90 minutos de episódios completos Português para crianças - PARTE 3',
'description': 'md5:d9e4d9346a2dfff4c7dc4c8cec0f546d',
'upload_date': '20160219',
'uploader': 'Pocoyo - Português (BR)',
'uploader_id': 'PocoyoBrazil',
},
'add_ie': [YoutubeIE.ie_key()],
'params': {
'skip_download': True,
},
},
{ {
'url': 'https://www.yapfiles.ru/show/1872528/690b05d3054d2dbe1e69523aa21bb3b1.mp4.html', 'url': 'https://www.yapfiles.ru/show/1872528/690b05d3054d2dbe1e69523aa21bb3b1.mp4.html',
'info_dict': { 'info_dict': {
@ -2106,23 +2133,23 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{ # {
# Zype embed # # Zype embed
'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites', # 'url': 'https://www.cookscountry.com/episode/554-smoky-barbecue-favorites',
'info_dict': { # 'info_dict': {
'id': '5b400b834b32992a310622b9', # 'id': '5b400b834b32992a310622b9',
'ext': 'mp4', # 'ext': 'mp4',
'title': 'Smoky Barbecue Favorites', # 'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g', # 'thumbnail': r're:^https?://.*\.jpe?g',
'description': 'md5:5ff01e76316bd8d46508af26dc86023b', # 'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
'upload_date': '20170909', # 'upload_date': '20170909',
'timestamp': 1504915200, # 'timestamp': 1504915200,
}, # },
'add_ie': [ZypeIE.ie_key()], # 'add_ie': [ZypeIE.ie_key()],
'params': { # 'params': {
'skip_download': True, # 'skip_download': True,
}, # },
}, # },
{ {
# videojs embed # videojs embed
'url': 'https://video.sibnet.ru/shell.php?videoid=3422904', 'url': 'https://video.sibnet.ru/shell.php?videoid=3422904',
@ -2171,7 +2198,32 @@ class GenericIE(InfoExtractor):
# 'params': { # 'params': {
# 'force_generic_extractor': True, # 'force_generic_extractor': True,
# }, # },
# } # },
{
# VHX Embed
'url': 'https://demo.vhx.tv/category-c/videos/file-example-mp4-480-1-5mg-copy',
'info_dict': {
'id': '858208',
'ext': 'mp4',
'title': 'Untitled',
'uploader_id': 'user80538407',
'uploader': 'OTT Videos',
},
},
{
# ArcPublishing PoWa video player
'url': 'https://www.adn.com/politics/2020/11/02/video-senate-candidates-campaign-in-anchorage-on-eve-of-election-day/',
'md5': 'b03b2fac8680e1e5a7cc81a5c27e71b3',
'info_dict': {
'id': '8c99cb6e-b29c-4bc9-9173-7bf9979225ab',
'ext': 'mp4',
'title': 'Senate candidates wave to voters on Anchorage streets',
'description': 'md5:91f51a6511f090617353dc720318b20e',
'timestamp': 1604378735,
'upload_date': '20201103',
'duration': 1581,
},
},
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@ -2183,6 +2235,10 @@ def _extract_rss(self, url, video_id, doc):
playlist_desc_el = doc.find('./channel/description') playlist_desc_el = doc.find('./channel/description')
playlist_desc = None if playlist_desc_el is None else playlist_desc_el.text playlist_desc = None if playlist_desc_el is None else playlist_desc_el.text
NS_MAP = {
'itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd',
}
entries = [] entries = []
for it in doc.findall('./channel/item'): for it in doc.findall('./channel/item'):
next_url = None next_url = None
@ -2198,10 +2254,33 @@ def _extract_rss(self, url, video_id, doc):
if not next_url: if not next_url:
continue continue
def itunes(key):
return xpath_text(
it, xpath_with_ns('./itunes:%s' % key, NS_MAP),
default=None)
duration = itunes('duration')
explicit = (itunes('explicit') or '').lower()
if explicit in ('true', 'yes'):
age_limit = 18
elif explicit in ('false', 'no'):
age_limit = 0
else:
age_limit = None
entries.append({ entries.append({
'_type': 'url_transparent', '_type': 'url_transparent',
'url': next_url, 'url': next_url,
'title': it.find('title').text, 'title': it.find('title').text,
'description': xpath_text(it, 'description', default=None),
'timestamp': unified_timestamp(
xpath_text(it, 'pubDate', default=None)),
'duration': int_or_none(duration) or parse_duration(duration),
'thumbnail': url_or_none(xpath_attr(it, xpath_with_ns('./itunes:image', NS_MAP), 'href')),
'episode': itunes('title'),
'episode_number': int_or_none(itunes('episode')),
'season_number': int_or_none(itunes('season')),
'age_limit': age_limit,
}) })
return { return {
@ -2321,7 +2400,7 @@ def _real_extract(self, url):
info_dict = { info_dict = {
'id': video_id, 'id': video_id,
'title': self._generic_title(url), 'title': self._generic_title(url),
'upload_date': unified_strdate(head_response.headers.get('Last-Modified')) 'timestamp': unified_timestamp(head_response.headers.get('Last-Modified'))
} }
# Check for direct link to a video # Check for direct link to a video
@ -2427,7 +2506,9 @@ def _real_extract(self, url):
# Sometimes embedded video player is hidden behind percent encoding # Sometimes embedded video player is hidden behind percent encoding
# (e.g. https://github.com/ytdl-org/youtube-dl/issues/2448) # (e.g. https://github.com/ytdl-org/youtube-dl/issues/2448)
# Unescaping the whole page allows to handle those cases in a generic way # Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse_unquote(webpage) # FIXME: unescaping the whole page may break URLs, commenting out for now.
# There probably should be a second run of generic extractor on unescaped webpage.
# webpage = compat_urllib_parse_unquote(webpage)
# Unescape squarespace embeds to be detected by generic extractor, # Unescape squarespace embeds to be detected by generic extractor,
# see https://github.com/ytdl-org/youtube-dl/issues/21294 # see https://github.com/ytdl-org/youtube-dl/issues/21294
@ -2509,6 +2590,10 @@ def _real_extract(self, url):
if tp_urls: if tp_urls:
return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform') return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform')
arc_urls = ArcPublishingIE._extract_urls(webpage)
if arc_urls:
return self.playlist_from_matches(arc_urls, video_id, video_title, ie=ArcPublishingIE.ie_key())
# Look for embedded rtl.nl player # Look for embedded rtl.nl player
matches = re.findall( matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"', r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
@ -2520,6 +2605,10 @@ def _real_extract(self, url):
if vimeo_urls: if vimeo_urls:
return self.playlist_from_matches(vimeo_urls, video_id, video_title, ie=VimeoIE.ie_key()) return self.playlist_from_matches(vimeo_urls, video_id, video_title, ie=VimeoIE.ie_key())
vhx_url = VHXEmbedIE._extract_url(webpage)
if vhx_url:
return self.url_result(vhx_url, VHXEmbedIE.ie_key())
vid_me_embed_url = self._search_regex( vid_me_embed_url = self._search_regex(
r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]', r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]',
webpage, 'vid.me embed', default=None) webpage, 'vid.me embed', default=None)
@ -2775,11 +2864,6 @@ def _real_extract(self, url):
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url')) return self.url_result(mobj.group('url'))
# Look for embedded smotri.com player
smotri_url = SmotriIE._extract_url(webpage)
if smotri_url:
return self.url_result(smotri_url, 'Smotri')
# Look for embedded Myvi.ru player # Look for embedded Myvi.ru player
myvi_url = MyviIE._extract_url(webpage) myvi_url = MyviIE._extract_url(webpage)
if myvi_url: if myvi_url:

View File

@ -38,13 +38,17 @@ class GoIE(AdobePassIE):
'disneynow': { 'disneynow': {
'brand': '011', 'brand': '011',
'resource_id': 'Disney', 'resource_id': 'Disney',
} },
'fxnow.fxnetworks': {
'brand': '025',
'requestor_id': 'dtci',
},
} }
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:(?P<sub_domain>%s)\.)?go| (?:(?P<sub_domain>%s)\.)?go|
(?P<sub_domain_2>abc|freeform|disneynow) (?P<sub_domain_2>abc|freeform|disneynow|fxnow\.fxnetworks)
)\.com/ )\.com/
(?: (?:
(?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)| (?:[^/]+/)*(?P<id>[Vv][Dd][Kk][Aa]\w+)|
@ -99,6 +103,19 @@ class GoIE(AdobePassIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://fxnow.fxnetworks.com/shows/better-things/video/vdka12782841',
'info_dict': {
'id': 'VDKA12782841',
'ext': 'mp4',
'title': 'First Look: Better Things - Season 2',
'description': 'md5:fa73584a95761c605d9d54904e35b407',
},
'params': {
'geo_bypass_ip_block': '3.244.239.0/24',
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding', 'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
'only_matching': True, 'only_matching': True,

View File

@ -22,7 +22,7 @@
class InstagramIE(InfoExtractor): class InstagramIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv)/(?P<id>[^/?#&]+))' _VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/(?:p|tv|reel)/(?P<id>[^/?#&]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc', 'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516', 'md5': '0d2da106a9d2631273e192b372806516',
@ -35,7 +35,7 @@ class InstagramIE(InfoExtractor):
'timestamp': 1371748545, 'timestamp': 1371748545,
'upload_date': '20130620', 'upload_date': '20130620',
'uploader_id': 'naomipq', 'uploader_id': 'naomipq',
'uploader': 'Naomi Leonor Phan-Quang', 'uploader': 'B E A U T Y F O R A S H E S',
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'comments': list, 'comments': list,
@ -95,6 +95,9 @@ class InstagramIE(InfoExtractor):
}, { }, {
'url': 'https://www.instagram.com/tv/aye83DjauH/', 'url': 'https://www.instagram.com/tv/aye83DjauH/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.instagram.com/reel/CDUMkliABpa/',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -122,81 +125,92 @@ def _real_extract(self, url):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
(video_url, description, thumbnail, timestamp, uploader, (media, video_url, description, thumbnail, timestamp, uploader,
uploader_id, like_count, comment_count, comments, height, uploader_id, like_count, comment_count, comments, height,
width) = [None] * 11 width) = [None] * 12
shared_data = try_get(webpage, shared_data = self._parse_json(
(lambda x: self._parse_json( self._search_regex(
self._search_regex( r'window\._sharedData\s*=\s*({.+?});',
r'window\.__additionalDataLoaded\(\'/(?:p|tv)/(?:[^/?#&]+)/\',({.+?})\);', webpage, 'shared data', default='{}'),
x, 'additional data', default='{}'), video_id, fatal=False)
video_id, fatal=False),
lambda x: self._parse_json(
self._search_regex(
r'window\._sharedData\s*=\s*({.+?});',
x, 'shared data', default='{}'),
video_id, fatal=False)['entry_data']['PostPage'][0]),
None)
if shared_data: if shared_data:
media = try_get( media = try_get(
shared_data, shared_data,
(lambda x: x['graphql']['shortcode_media'], (lambda x: x['entry_data']['PostPage'][0]['graphql']['shortcode_media'],
lambda x: x['media']), lambda x: x['entry_data']['PostPage'][0]['media']),
dict) dict)
if media: # _sharedData.entry_data.PostPage is empty when authenticated (see
video_url = media.get('video_url') # https://github.com/ytdl-org/youtube-dl/pull/22880)
height = int_or_none(media.get('dimensions', {}).get('height')) if not media:
width = int_or_none(media.get('dimensions', {}).get('width')) additional_data = self._parse_json(
description = try_get( self._search_regex(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'], r'window\.__additionalDataLoaded\s*\(\s*[^,]+,\s*({.+?})\s*\)\s*;',
compat_str) or media.get('caption') webpage, 'additional data', default='{}'),
thumbnail = media.get('display_src') or media.get('thumbnail_src') video_id, fatal=False)
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date')) if additional_data:
uploader = media.get('owner', {}).get('full_name') media = try_get(
uploader_id = media.get('owner', {}).get('username') additional_data, lambda x: x['graphql']['shortcode_media'],
dict)
if media:
video_url = media.get('video_url')
height = int_or_none(media.get('dimensions', {}).get('height'))
width = int_or_none(media.get('dimensions', {}).get('width'))
description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption')
thumbnail = media.get('display_src') or media.get('display_url')
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
def get_count(key, kind): def get_count(keys, kind):
return int_or_none(try_get( if not isinstance(keys, (list, tuple)):
keys = [keys]
for key in keys:
count = int_or_none(try_get(
media, (lambda x: x['edge_media_%s' % key]['count'], media, (lambda x: x['edge_media_%s' % key]['count'],
lambda x: x['%ss' % kind]['count']))) lambda x: x['%ss' % kind]['count'])))
like_count = get_count('preview_like', 'like') if count is not None:
comment_count = get_count('to_comment', 'comment') return count
like_count = get_count('preview_like', 'like')
comment_count = get_count(
('preview_comment', 'to_comment', 'to_parent_comment'), 'comment')
comments = [{ comments = [{
'author': comment.get('user', {}).get('username'), 'author': comment.get('user', {}).get('username'),
'author_id': comment.get('user', {}).get('id'), 'author_id': comment.get('user', {}).get('id'),
'id': comment.get('id'), 'id': comment.get('id'),
'text': comment.get('text'), 'text': comment.get('text'),
'timestamp': int_or_none(comment.get('created_at')), 'timestamp': int_or_none(comment.get('created_at')),
} for comment in media.get( } for comment in media.get(
'comments', {}).get('nodes', []) if comment.get('text')] 'comments', {}).get('nodes', []) if comment.get('text')]
if not video_url: if not video_url:
edges = try_get( edges = try_get(
media, lambda x: x['edge_sidecar_to_children']['edges'], media, lambda x: x['edge_sidecar_to_children']['edges'],
list) or [] list) or []
if edges: if edges:
entries = [] entries = []
for edge_num, edge in enumerate(edges, start=1): for edge_num, edge in enumerate(edges, start=1):
node = try_get(edge, lambda x: x['node'], dict) node = try_get(edge, lambda x: x['node'], dict)
if not node: if not node:
continue continue
node_video_url = url_or_none(node.get('video_url')) node_video_url = url_or_none(node.get('video_url'))
if not node_video_url: if not node_video_url:
continue continue
entries.append({ entries.append({
'id': node.get('shortcode') or node['id'], 'id': node.get('shortcode') or node['id'],
'title': 'Video %d' % edge_num, 'title': 'Video %d' % edge_num,
'url': node_video_url, 'url': node_video_url,
'thumbnail': node.get('display_url'), 'thumbnail': node.get('display_url'),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])), 'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])), 'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')), 'view_count': int_or_none(node.get('video_view_count')),
}) })
return self.playlist_result( return self.playlist_result(
entries, video_id, entries, video_id,
'Post by %s' % uploader_id if uploader_id else None, 'Post by %s' % uploader_id if uploader_id else None,
description) description)
if not video_url: if not video_url:
video_url = self._og_search_video_url(webpage, secure=False) video_url = self._og_search_video_url(webpage, secure=False)

View File

@ -1,30 +1,21 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import uuid
import xml.etree.ElementTree as etree
import json import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from .brightcove import BrightcoveNewIE from .brightcove import BrightcoveNewIE
from ..compat import (
compat_str,
compat_etree_register_namespace,
)
from ..utils import ( from ..utils import (
clean_html,
determine_ext, determine_ext,
ExtractorError,
extract_attributes, extract_attributes,
int_or_none, get_element_by_class,
JSON_LD_RE,
merge_dicts, merge_dicts,
parse_duration, parse_duration,
smuggle_url, smuggle_url,
try_get, try_get,
url_or_none, url_or_none,
xpath_with_ns,
xpath_element,
xpath_text,
) )
@ -32,14 +23,18 @@ class ITVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?itv\.com/hub/[^/]+/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://(?:www\.)?itv\.com/hub/[^/]+/(?P<id>[0-9a-zA-Z]+)'
_GEO_COUNTRIES = ['GB'] _GEO_COUNTRIES = ['GB']
_TESTS = [{ _TESTS = [{
'url': 'http://www.itv.com/hub/mr-bean-animated-series/2a2936a0053', 'url': 'https://www.itv.com/hub/liar/2a4547a0012',
'info_dict': { 'info_dict': {
'id': '2a2936a0053', 'id': '2a4547a0012',
'ext': 'flv', 'ext': 'mp4',
'title': 'Home Movie', 'title': 'Liar - Series 2 - Episode 6',
'description': 'md5:d0f91536569dec79ea184f0a44cca089',
'series': 'Liar',
'season_number': 2,
'episode_number': 6,
}, },
'params': { 'params': {
# rtmp download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
@ -62,220 +57,97 @@ def _real_extract(self, url):
params = extract_attributes(self._search_regex( params = extract_attributes(self._search_regex(
r'(?s)(<[^>]+id="video"[^>]*>)', webpage, 'params')) r'(?s)(<[^>]+id="video"[^>]*>)', webpage, 'params'))
ns_map = { ios_playlist_url = params.get('data-video-playlist') or params['data-video-id']
'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/', hmac = params['data-video-hmac']
'tem': 'http://tempuri.org/',
'itv': 'http://schemas.datacontract.org/2004/07/Itv.BB.Mercury.Common.Types',
'com': 'http://schemas.itv.com/2009/05/Common',
}
for ns, full_ns in ns_map.items():
compat_etree_register_namespace(ns, full_ns)
def _add_ns(name):
return xpath_with_ns(name, ns_map)
def _add_sub_element(element, name):
return etree.SubElement(element, _add_ns(name))
production_id = (
params.get('data-video-autoplay-id')
or '%s#001' % (
params.get('data-video-episode-id')
or video_id.replace('a', '/')))
req_env = etree.Element(_add_ns('soapenv:Envelope'))
_add_sub_element(req_env, 'soapenv:Header')
body = _add_sub_element(req_env, 'soapenv:Body')
get_playlist = _add_sub_element(body, ('tem:GetPlaylist'))
request = _add_sub_element(get_playlist, 'tem:request')
_add_sub_element(request, 'itv:ProductionId').text = production_id
_add_sub_element(request, 'itv:RequestGuid').text = compat_str(uuid.uuid4()).upper()
vodcrid = _add_sub_element(request, 'itv:Vodcrid')
_add_sub_element(vodcrid, 'com:Id')
_add_sub_element(request, 'itv:Partition')
user_info = _add_sub_element(get_playlist, 'tem:userInfo')
_add_sub_element(user_info, 'itv:Broadcaster').text = 'Itv'
_add_sub_element(user_info, 'itv:DM')
_add_sub_element(user_info, 'itv:RevenueScienceValue')
_add_sub_element(user_info, 'itv:SessionId')
_add_sub_element(user_info, 'itv:SsoToken')
_add_sub_element(user_info, 'itv:UserToken')
site_info = _add_sub_element(get_playlist, 'tem:siteInfo')
_add_sub_element(site_info, 'itv:AdvertisingRestriction').text = 'None'
_add_sub_element(site_info, 'itv:AdvertisingSite').text = 'ITV'
_add_sub_element(site_info, 'itv:AdvertisingType').text = 'Any'
_add_sub_element(site_info, 'itv:Area').text = 'ITVPLAYER.VIDEO'
_add_sub_element(site_info, 'itv:Category')
_add_sub_element(site_info, 'itv:Platform').text = 'DotCom'
_add_sub_element(site_info, 'itv:Site').text = 'ItvCom'
device_info = _add_sub_element(get_playlist, 'tem:deviceInfo')
_add_sub_element(device_info, 'itv:ScreenSize').text = 'Big'
player_info = _add_sub_element(get_playlist, 'tem:playerInfo')
_add_sub_element(player_info, 'itv:Version').text = '2'
headers = self.geo_verification_headers() headers = self.geo_verification_headers()
headers.update({ headers.update({
'Content-Type': 'text/xml; charset=utf-8', 'Accept': 'application/vnd.itv.vod.playlist.v2+json',
'SOAPAction': 'http://tempuri.org/PlaylistService/GetPlaylist', 'Content-Type': 'application/json',
'hmac': hmac.upper(),
}) })
ios_playlist = self._download_json(
ios_playlist_url, video_id, data=json.dumps({
'user': {
'itvUserId': '',
'entitlements': [],
'token': ''
},
'device': {
'manufacturer': 'Safari',
'model': '5',
'os': {
'name': 'Windows NT',
'version': '6.1',
'type': 'desktop'
}
},
'client': {
'version': '4.1',
'id': 'browser'
},
'variantAvailability': {
'featureset': {
'min': ['hls', 'aes', 'outband-webvtt'],
'max': ['hls', 'aes', 'outband-webvtt']
},
'platformTag': 'dotcom'
}
}).encode(), headers=headers)
video_data = ios_playlist['Playlist']['Video']
ios_base_url = video_data.get('Base')
info = self._search_json_ld(webpage, video_id, default={})
formats = [] formats = []
subtitles = {} for media_file in (video_data.get('MediaFiles') or []):
href = media_file.get('Href')
def extract_subtitle(sub_url): if not href:
ext = determine_ext(sub_url, 'ttml') continue
subtitles.setdefault('en', []).append({ if ios_base_url:
'url': sub_url, href = ios_base_url + href
'ext': 'ttml' if ext == 'xml' else ext, ext = determine_ext(href)
}) if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
resp_env = self._download_xml( href, video_id, 'mp4', entry_protocol='m3u8_native',
params['data-playlist-url'], video_id, m3u8_id='hls', fatal=False))
headers=headers, data=etree.tostring(req_env), fatal=False)
if resp_env:
playlist = xpath_element(resp_env, './/Playlist')
if playlist is None:
fault_code = xpath_text(resp_env, './/faultcode')
fault_string = xpath_text(resp_env, './/faultstring')
if fault_code == 'InvalidGeoRegion':
self.raise_geo_restricted(
msg=fault_string, countries=self._GEO_COUNTRIES)
elif fault_code not in (
'InvalidEntity', 'InvalidVodcrid', 'ContentUnavailable'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, fault_string), expected=True)
info.update({
'title': self._og_search_title(webpage),
'episode_title': params.get('data-video-episode'),
'series': params.get('data-video-title'),
})
else: else:
title = xpath_text(playlist, 'EpisodeTitle', default=None) formats.append({
info.update({ 'url': href,
'title': title,
'episode_title': title,
'episode_number': int_or_none(xpath_text(playlist, 'EpisodeNumber')),
'series': xpath_text(playlist, 'ProgrammeTitle'),
'duration': parse_duration(xpath_text(playlist, 'Duration')),
}) })
video_element = xpath_element(playlist, 'VideoEntries/Video', fatal=True)
media_files = xpath_element(video_element, 'MediaFiles', fatal=True)
rtmp_url = media_files.attrib['base']
for media_file in media_files.findall('MediaFile'):
play_path = xpath_text(media_file, 'URL')
if not play_path:
continue
tbr = int_or_none(media_file.get('bitrate'), 1000)
f = {
'format_id': 'rtmp' + ('-%d' % tbr if tbr else ''),
'play_path': play_path,
# Providing this swfVfy allows to avoid truncated downloads
'player_url': 'http://www.itv.com/mercury/Mercury_VideoPlayer.swf',
'page_url': url,
'tbr': tbr,
'ext': 'flv',
}
app = self._search_regex(
'rtmpe?://[^/]+/(.+)$', rtmp_url, 'app', default=None)
if app:
f.update({
'url': rtmp_url.split('?', 1)[0],
'app': app,
})
else:
f['url'] = rtmp_url
formats.append(f)
for caption_url in video_element.findall('ClosedCaptioningURIs/URL'):
if caption_url.text:
extract_subtitle(caption_url.text)
ios_playlist_url = params.get('data-video-playlist') or params.get('data-video-id')
hmac = params.get('data-video-hmac')
if ios_playlist_url and hmac and re.match(r'https?://', ios_playlist_url):
headers = self.geo_verification_headers()
headers.update({
'Accept': 'application/vnd.itv.vod.playlist.v2+json',
'Content-Type': 'application/json',
'hmac': hmac.upper(),
})
ios_playlist = self._download_json(
ios_playlist_url, video_id, data=json.dumps({
'user': {
'itvUserId': '',
'entitlements': [],
'token': ''
},
'device': {
'manufacturer': 'Safari',
'model': '5',
'os': {
'name': 'Windows NT',
'version': '6.1',
'type': 'desktop'
}
},
'client': {
'version': '4.1',
'id': 'browser'
},
'variantAvailability': {
'featureset': {
'min': ['hls', 'aes', 'outband-webvtt'],
'max': ['hls', 'aes', 'outband-webvtt']
},
'platformTag': 'dotcom'
}
}).encode(), headers=headers, fatal=False)
if ios_playlist:
video_data = ios_playlist.get('Playlist', {}).get('Video', {})
ios_base_url = video_data.get('Base')
for media_file in video_data.get('MediaFiles', []):
href = media_file.get('Href')
if not href:
continue
if ios_base_url:
href = ios_base_url + href
ext = determine_ext(href)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
href, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': href,
})
subs = video_data.get('Subtitles')
if isinstance(subs, list):
for sub in subs:
if not isinstance(sub, dict):
continue
href = url_or_none(sub.get('Href'))
if href:
extract_subtitle(href)
if not info.get('duration'):
info['duration'] = parse_duration(video_data.get('Duration'))
self._sort_formats(formats) self._sort_formats(formats)
info.update({ subtitles = {}
subs = video_data.get('Subtitles') or []
for sub in subs:
if not isinstance(sub, dict):
continue
href = url_or_none(sub.get('Href'))
if not href:
continue
subtitles.setdefault('en', []).append({
'url': href,
'ext': determine_ext(href, 'vtt'),
})
info = self._search_json_ld(webpage, video_id, default={})
if not info:
json_ld = self._parse_json(self._search_regex(
JSON_LD_RE, webpage, 'JSON-LD', '{}',
group='json_ld'), video_id, fatal=False)
if json_ld and json_ld.get('@type') == 'BreadcrumbList':
for ile in (json_ld.get('itemListElement:') or []):
item = ile.get('item:') or {}
if item.get('@type') == 'TVEpisode':
item['@context'] = 'http://schema.org'
info = self._json_ld(item, video_id, fatal=False) or {}
break
return merge_dicts({
'id': video_id, 'id': video_id,
'title': self._html_search_meta(['og:title', 'twitter:title'], webpage),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
}) 'duration': parse_duration(video_data.get('Duration')),
'description': clean_html(get_element_by_class('episode-info__synopsis', webpage)),
webpage_info = self._search_json_ld(webpage, video_id, default={}) }, info)
if not webpage_info.get('title'):
webpage_info['title'] = self._html_search_regex(
r'(?s)<h\d+[^>]+\bclass=["\'][^>]*episode-title["\'][^>]*>([^<]+)<',
webpage, 'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or webpage_info['episode']
return merge_dicts(info, webpage_info)
class ITVBTCCIE(InfoExtractor): class ITVBTCCIE(InfoExtractor):

View File

@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import functools
import json import json
from .common import InfoExtractor from .common import InfoExtractor
@ -10,13 +11,73 @@
ExtractorError, ExtractorError,
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
OnDemandPagedList,
try_get, try_get,
urljoin,
) )
class LBRYIE(InfoExtractor): class LBRYBaseIE(InfoExtractor):
IE_NAME = 'lbry.tv' _BASE_URL_REGEX = r'https?://(?:www\.)?(?:lbry\.tv|odysee\.com)/'
_VALID_URL = r'https?://(?:www\.)?(?:lbry\.tv|odysee\.com)/(?P<id>@[^:]+:[0-9a-z]+/[^:]+:[0-9a-z])' _CLAIM_ID_REGEX = r'[0-9a-f]{1,40}'
_OPT_CLAIM_ID = '[^:/?#&]+(?::%s)?' % _CLAIM_ID_REGEX
_SUPPORTED_STREAM_TYPES = ['video', 'audio']
def _call_api_proxy(self, method, display_id, params, resource):
return self._download_json(
'https://api.lbry.tv/api/v1/proxy',
display_id, 'Downloading %s JSON metadata' % resource,
headers={'Content-Type': 'application/json-rpc'},
data=json.dumps({
'method': method,
'params': params,
}).encode())['result']
def _resolve_url(self, url, display_id, resource):
return self._call_api_proxy(
'resolve', display_id, {'urls': url}, resource)[url]
def _permanent_url(self, url, claim_name, claim_id):
return urljoin(url, '/%s:%s' % (claim_name, claim_id))
def _parse_stream(self, stream, url):
stream_value = stream.get('value') or {}
stream_type = stream_value.get('stream_type')
source = stream_value.get('source') or {}
media = stream_value.get(stream_type) or {}
signing_channel = stream.get('signing_channel') or {}
channel_name = signing_channel.get('name')
channel_claim_id = signing_channel.get('claim_id')
channel_url = None
if channel_name and channel_claim_id:
channel_url = self._permanent_url(url, channel_name, channel_claim_id)
info = {
'thumbnail': try_get(stream_value, lambda x: x['thumbnail']['url'], compat_str),
'description': stream_value.get('description'),
'license': stream_value.get('license'),
'timestamp': int_or_none(stream.get('timestamp')),
'tags': stream_value.get('tags'),
'duration': int_or_none(media.get('duration')),
'channel': try_get(signing_channel, lambda x: x['value']['title']),
'channel_id': channel_claim_id,
'channel_url': channel_url,
'ext': determine_ext(source.get('name')) or mimetype2ext(source.get('media_type')),
'filesize': int_or_none(source.get('size')),
}
if stream_type == 'audio':
info['vcodec'] = 'none'
else:
info.update({
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
})
return info
class LBRYIE(LBRYBaseIE):
IE_NAME = 'lbry'
_VALID_URL = LBRYBaseIE._BASE_URL_REGEX + r'(?P<id>\$/[^/]+/[^/]+/{1}|@{0}/{0}|(?!@){0})'.format(LBRYBaseIE._OPT_CLAIM_ID, LBRYBaseIE._CLAIM_ID_REGEX)
_TESTS = [{ _TESTS = [{
# Video # Video
'url': 'https://lbry.tv/@Mantega:1/First-day-LBRY:1', 'url': 'https://lbry.tv/@Mantega:1/First-day-LBRY:1',
@ -28,6 +89,8 @@ class LBRYIE(InfoExtractor):
'description': 'md5:f6cb5c704b332d37f5119313c2c98f51', 'description': 'md5:f6cb5c704b332d37f5119313c2c98f51',
'timestamp': 1595694354, 'timestamp': 1595694354,
'upload_date': '20200725', 'upload_date': '20200725',
'width': 1280,
'height': 720,
} }
}, { }, {
# Audio # Audio
@ -40,6 +103,12 @@ class LBRYIE(InfoExtractor):
'description': 'md5:661ac4f1db09f31728931d7b88807a61', 'description': 'md5:661ac4f1db09f31728931d7b88807a61',
'timestamp': 1591312601, 'timestamp': 1591312601,
'upload_date': '20200604', 'upload_date': '20200604',
'tags': list,
'duration': 2570,
'channel': 'The LBRY Foundation',
'channel_id': '0ed629d2b9c601300cacf7eabe9da0be79010212',
'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212',
'vcodec': 'none',
} }
}, { }, {
'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e', 'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e',
@ -47,45 +116,99 @@ class LBRYIE(InfoExtractor):
}, { }, {
'url': "https://odysee.com/@ScammerRevolts:b0/I-SYSKEY'D-THE-SAME-SCAMMERS-3-TIMES!:b", 'url': "https://odysee.com/@ScammerRevolts:b0/I-SYSKEY'D-THE-SAME-SCAMMERS-3-TIMES!:b",
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://lbry.tv/Episode-1:e7d93d772bd87e2b62d5ab993c1c3ced86ebb396',
'only_matching': True,
}, {
'url': 'https://lbry.tv/$/embed/Episode-1/e7d93d772bd87e2b62d5ab993c1c3ced86ebb396',
'only_matching': True,
}, {
'url': 'https://lbry.tv/Episode-1:e7',
'only_matching': True,
}, {
'url': 'https://lbry.tv/@LBRYFoundation/Episode-1',
'only_matching': True,
}, {
'url': 'https://lbry.tv/$/download/Episode-1/e7d93d772bd87e2b62d5ab993c1c3ced86ebb396',
'only_matching': True,
}] }]
def _call_api_proxy(self, method, display_id, params): def _real_extract(self, url):
return self._download_json( display_id = self._match_id(url)
'https://api.lbry.tv/api/v1/proxy', display_id, if display_id.startswith('$/'):
headers={'Content-Type': 'application/json-rpc'}, display_id = display_id.split('/', 2)[-1].replace('/', ':')
data=json.dumps({ else:
'method': method, display_id = display_id.replace(':', '#')
'params': params, uri = 'lbry://' + display_id
}).encode())['result'] result = self._resolve_url(uri, display_id, 'stream')
result_value = result['value']
if result_value.get('stream_type') not in self._SUPPORTED_STREAM_TYPES:
raise ExtractorError('Unsupported URL', expected=True)
claim_id = result['claim_id']
title = result_value['title']
streaming_url = self._call_api_proxy(
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
info = self._parse_stream(result, url)
info.update({
'id': claim_id,
'title': title,
'url': streaming_url,
})
return info
class LBRYChannelIE(LBRYBaseIE):
IE_NAME = 'lbry:channel'
_VALID_URL = LBRYBaseIE._BASE_URL_REGEX + r'(?P<id>@%s)/?(?:[?#&]|$)' % LBRYBaseIE._OPT_CLAIM_ID
_TESTS = [{
'url': 'https://lbry.tv/@LBRYFoundation:0',
'info_dict': {
'id': '0ed629d2b9c601300cacf7eabe9da0be79010212',
'title': 'The LBRY Foundation',
'description': 'Channel for the LBRY Foundation. Follow for updates and news.',
},
'playlist_count': 29,
}, {
'url': 'https://lbry.tv/@LBRYFoundation',
'only_matching': True,
}]
_PAGE_SIZE = 50
def _fetch_page(self, claim_id, url, page):
page += 1
result = self._call_api_proxy(
'claim_search', claim_id, {
'channel_ids': [claim_id],
'claim_type': 'stream',
'no_totals': True,
'page': page,
'page_size': self._PAGE_SIZE,
'stream_types': self._SUPPORTED_STREAM_TYPES,
}, 'page %d' % page)
for item in (result.get('items') or []):
stream_claim_name = item.get('name')
stream_claim_id = item.get('claim_id')
if not (stream_claim_name and stream_claim_id):
continue
info = self._parse_stream(item, url)
info.update({
'_type': 'url',
'id': stream_claim_id,
'title': try_get(item, lambda x: x['value']['title']),
'url': self._permanent_url(url, stream_claim_name, stream_claim_id),
})
yield info
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url).replace(':', '#') display_id = self._match_id(url).replace(':', '#')
uri = 'lbry://' + display_id result = self._resolve_url(
result = self._call_api_proxy( 'lbry://' + display_id, display_id, 'channel')
'resolve', display_id, {'urls': [uri]})[uri] claim_id = result['claim_id']
result_value = result['value'] entries = OnDemandPagedList(
if result_value.get('stream_type') not in ('video', 'audio'): functools.partial(self._fetch_page, claim_id, url),
raise ExtractorError('Unsupported URL', expected=True) self._PAGE_SIZE)
streaming_url = self._call_api_proxy( result_value = result.get('value') or {}
'get', display_id, {'uri': uri})['streaming_url'] return self.playlist_result(
source = result_value.get('source') or {} entries, claim_id, result_value.get('title'),
media = result_value.get('video') or result_value.get('audio') or {} result_value.get('description'))
signing_channel = result_value.get('signing_channel') or {}
return {
'id': result['claim_id'],
'title': result_value['title'],
'thumbnail': try_get(result_value, lambda x: x['thumbnail']['url'], compat_str),
'description': result_value.get('description'),
'license': result_value.get('license'),
'timestamp': int_or_none(result.get('timestamp')),
'tags': result_value.get('tags'),
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
'duration': int_or_none(media.get('duration')),
'channel': signing_channel.get('name'),
'channel_id': signing_channel.get('claim_id'),
'ext': determine_ext(source.get('name')) or mimetype2ext(source.get('media_type')),
'filesize': int_or_none(source.get('size')),
'url': streaming_url,
}

View File

@ -8,11 +8,15 @@
from ..compat import ( from ..compat import (
compat_b64decode, compat_b64decode,
compat_HTTPError, compat_HTTPError,
compat_str,
) )
from ..utils import ( from ..utils import (
clean_html,
ExtractorError, ExtractorError,
orderedSet, js_to_json,
unescapeHTML, parse_duration,
try_get,
unified_timestamp,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
@ -28,11 +32,15 @@ class LinuxAcademyIE(InfoExtractor):
) )
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://linuxacademy.com/cp/courses/lesson/course/1498/lesson/2/module/154', 'url': 'https://linuxacademy.com/cp/courses/lesson/course/7971/lesson/2/module/675',
'info_dict': { 'info_dict': {
'id': '1498-2', 'id': '7971-2',
'ext': 'mp4', 'ext': 'mp4',
'title': "Introduction to the Practitioner's Brief", 'title': 'What Is Data Science',
'description': 'md5:c574a3c20607144fb36cb65bdde76c99',
'timestamp': 1607387907,
'upload_date': '20201208',
'duration': 304,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -46,7 +54,8 @@ class LinuxAcademyIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '154', 'id': '154',
'title': 'AWS Certified Cloud Practitioner', 'title': 'AWS Certified Cloud Practitioner',
'description': 'md5:039db7e60e4aac9cf43630e0a75fa834', 'description': 'md5:a68a299ca9bb98d41cca5abc4d4ce22c',
'duration': 28835,
}, },
'playlist_count': 41, 'playlist_count': 41,
'skip': 'Requires Linux Academy account credentials', 'skip': 'Requires Linux Academy account credentials',
@ -74,6 +83,7 @@ def random_string():
self._AUTHORIZE_URL, None, 'Downloading authorize page', query={ self._AUTHORIZE_URL, None, 'Downloading authorize page', query={
'client_id': self._CLIENT_ID, 'client_id': self._CLIENT_ID,
'response_type': 'token id_token', 'response_type': 'token id_token',
'response_mode': 'web_message',
'redirect_uri': self._ORIGIN_URL, 'redirect_uri': self._ORIGIN_URL,
'scope': 'openid email user_impersonation profile', 'scope': 'openid email user_impersonation profile',
'audience': self._ORIGIN_URL, 'audience': self._ORIGIN_URL,
@ -129,7 +139,13 @@ def random_string():
access_token = self._search_regex( access_token = self._search_regex(
r'access_token=([^=&]+)', urlh.geturl(), r'access_token=([^=&]+)', urlh.geturl(),
'access token') 'access token', default=None)
if not access_token:
access_token = self._parse_json(
self._search_regex(
r'authorizationResponse\s*=\s*({.+?})\s*;', callback_page,
'authorization response'), None,
transform_source=js_to_json)['response']['access_token']
self._download_webpage( self._download_webpage(
'https://linuxacademy.com/cp/login/tokenValidateLogin/token/%s' 'https://linuxacademy.com/cp/login/tokenValidateLogin/token/%s'
@ -144,30 +160,84 @@ def _real_extract(self, url):
# course path # course path
if course_id: if course_id:
entries = [ module = self._parse_json(
self.url_result( self._search_regex(
urljoin(url, lesson_url), ie=LinuxAcademyIE.ie_key()) r'window\.module\s*=\s*({.+?})\s*;', webpage, 'module'),
for lesson_url in orderedSet(re.findall( item_id)
r'<a[^>]+\bhref=["\'](/cp/courses/lesson/course/\d+/lesson/\d+/module/\d+)', entries = []
webpage))] chapter_number = None
title = unescapeHTML(self._html_search_regex( chapter = None
(r'class=["\']course-title["\'][^>]*>(?P<value>[^<]+)', chapter_id = None
r'var\s+title\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1'), for item in module['items']:
webpage, 'title', default=None, group='value')) if not isinstance(item, dict):
description = unescapeHTML(self._html_search_regex( continue
r'var\s+description\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'description', default=None, group='value')) def type_field(key):
return self.playlist_result(entries, course_id, title, description) return (try_get(item, lambda x: x['type'][key], compat_str) or '').lower()
type_fields = (type_field('name'), type_field('slug'))
# Move to next module section
if 'section' in type_fields:
chapter = item.get('course_name')
chapter_id = item.get('course_module')
chapter_number = 1 if not chapter_number else chapter_number + 1
continue
# Skip non-lessons
if 'lesson' not in type_fields:
continue
lesson_url = urljoin(url, item.get('url'))
if not lesson_url:
continue
title = item.get('title') or item.get('lesson_name')
description = item.get('md_desc') or clean_html(item.get('description')) or clean_html(item.get('text'))
entries.append({
'_type': 'url_transparent',
'url': lesson_url,
'ie_key': LinuxAcademyIE.ie_key(),
'title': title,
'description': description,
'timestamp': unified_timestamp(item.get('date')) or unified_timestamp(item.get('created_on')),
'duration': parse_duration(item.get('duration')),
'chapter': chapter,
'chapter_id': chapter_id,
'chapter_number': chapter_number,
})
return {
'_type': 'playlist',
'entries': entries,
'id': course_id,
'title': module.get('title'),
'description': module.get('md_desc') or clean_html(module.get('desc')),
'duration': parse_duration(module.get('duration')),
}
# single video path # single video path
info = self._extract_jwplayer_data( m3u8_url = self._parse_json(
webpage, item_id, require_title=False, m3u8_id='hls',) self._search_regex(
title = self._search_regex( r'player\.playlist\s*=\s*(\[.+?\])\s*;', webpage, 'playlist'),
(r'>Lecture\s*:\s*(?P<value>[^<]+)', item_id)[0]['file']
r'lessonName\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1'), webpage, formats = self._extract_m3u8_formats(
'title', group='value') m3u8_url, item_id, 'mp4', entry_protocol='m3u8_native',
info.update({ m3u8_id='hls')
self._sort_formats(formats)
info = {
'id': item_id, 'id': item_id,
'title': title, 'formats': formats,
}) }
lesson = self._parse_json(
self._search_regex(
(r'window\.lesson\s*=\s*({.+?})\s*;',
r'player\.lesson\s*=\s*({.+?})\s*;'),
webpage, 'lesson', default='{}'), item_id, fatal=False)
if lesson:
info.update({
'title': lesson.get('lesson_name'),
'description': lesson.get('md_desc') or clean_html(lesson.get('desc')),
'timestamp': unified_timestamp(lesson.get('date')) or unified_timestamp(lesson.get('created_on')),
'duration': parse_duration(lesson.get('duration')),
})
if not info.get('title'):
info['title'] = self._search_regex(
(r'>Lecture\s*:\s*(?P<value>[^<]+)',
r'lessonName\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1'), webpage,
'title', group='value')
return info return info

View File

@ -2,12 +2,16 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
url_or_none,
xpath_text, xpath_text,
) )
@ -16,6 +20,8 @@ class MDRIE(InfoExtractor):
IE_DESC = 'MDR.DE and KiKA' IE_DESC = 'MDR.DE and KiKA'
_VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z-]+-?(?P<id>\d+)(?:_.+?)?\.html' _VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z-]+-?(?P<id>\d+)(?:_.+?)?\.html'
_GEO_COUNTRIES = ['DE']
_TESTS = [{ _TESTS = [{
# MDR regularly deletes its videos # MDR regularly deletes its videos
'url': 'http://www.mdr.de/fakt/video189002.html', 'url': 'http://www.mdr.de/fakt/video189002.html',
@ -66,6 +72,22 @@ class MDRIE(InfoExtractor):
'duration': 3239, 'duration': 3239,
'uploader': 'MITTELDEUTSCHER RUNDFUNK', 'uploader': 'MITTELDEUTSCHER RUNDFUNK',
}, },
}, {
# empty bitrateVideo and bitrateAudio
'url': 'https://www.kika.de/filme/sendung128372_zc-572e3f45_zs-1d9fb70e.html',
'info_dict': {
'id': '128372',
'ext': 'mp4',
'title': 'Der kleine Wichtel kehrt zurück',
'description': 'md5:f77fafdff90f7aa1e9dca14f662c052a',
'duration': 4876,
'timestamp': 1607823300,
'upload_date': '20201213',
'uploader': 'ZDF',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://www.kika.de/baumhaus/sendungen/video19636_zc-fea7f8a0_zs-4bf89c60.html', 'url': 'http://www.kika.de/baumhaus/sendungen/video19636_zc-fea7f8a0_zs-4bf89c60.html',
'only_matching': True, 'only_matching': True,
@ -91,10 +113,13 @@ def _real_extract(self, url):
title = xpath_text(doc, ['./title', './broadcast/broadcastName'], 'title', fatal=True) title = xpath_text(doc, ['./title', './broadcast/broadcastName'], 'title', fatal=True)
type_ = xpath_text(doc, './type', default=None)
formats = [] formats = []
processed_urls = [] processed_urls = []
for asset in doc.findall('./assets/asset'): for asset in doc.findall('./assets/asset'):
for source in ( for source in (
'download',
'progressiveDownload', 'progressiveDownload',
'dynamicHttpStreamingRedirector', 'dynamicHttpStreamingRedirector',
'adaptiveHttpStreamingRedirector'): 'adaptiveHttpStreamingRedirector'):
@ -102,63 +127,49 @@ def _real_extract(self, url):
if url_el is None: if url_el is None:
continue continue
video_url = url_el.text video_url = url_or_none(url_el.text)
if video_url in processed_urls: if not video_url or video_url in processed_urls:
continue continue
processed_urls.append(video_url) processed_urls.append(video_url)
vbr = int_or_none(xpath_text(asset, './bitrateVideo', 'vbr'), 1000) ext = determine_ext(video_url)
abr = int_or_none(xpath_text(asset, './bitrateAudio', 'abr'), 1000)
ext = determine_ext(url_el.text)
if ext == 'm3u8': if ext == 'm3u8':
url_formats = self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native', video_url, video_id, 'mp4', entry_protocol='m3u8_native',
preference=0, m3u8_id='HLS', fatal=False) preference=0, m3u8_id='HLS', fatal=False))
elif ext == 'f4m': elif ext == 'f4m':
url_formats = self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
video_url + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, video_url + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id,
preference=0, f4m_id='HDS', fatal=False) preference=0, f4m_id='HDS', fatal=False))
else: else:
media_type = xpath_text(asset, './mediaType', 'media type', default='MP4') media_type = xpath_text(asset, './mediaType', 'media type', default='MP4')
vbr = int_or_none(xpath_text(asset, './bitrateVideo', 'vbr'), 1000) vbr = int_or_none(xpath_text(asset, './bitrateVideo', 'vbr'), 1000)
abr = int_or_none(xpath_text(asset, './bitrateAudio', 'abr'), 1000) abr = int_or_none(xpath_text(asset, './bitrateAudio', 'abr'), 1000)
filesize = int_or_none(xpath_text(asset, './fileSize', 'file size')) filesize = int_or_none(xpath_text(asset, './fileSize', 'file size'))
format_id = [media_type]
if vbr or abr:
format_id.append(compat_str(vbr or abr))
f = { f = {
'url': video_url, 'url': video_url,
'format_id': '%s-%d' % (media_type, vbr or abr), 'format_id': '-'.join(format_id),
'filesize': filesize, 'filesize': filesize,
'abr': abr, 'abr': abr,
'preference': 1, 'vbr': vbr,
} }
if vbr: if vbr:
width = int_or_none(xpath_text(asset, './frameWidth', 'width'))
height = int_or_none(xpath_text(asset, './frameHeight', 'height'))
f.update({ f.update({
'vbr': vbr, 'width': int_or_none(xpath_text(asset, './frameWidth', 'width')),
'width': width, 'height': int_or_none(xpath_text(asset, './frameHeight', 'height')),
'height': height,
}) })
url_formats = [f] if type_ == 'audio':
f['vcodec'] = 'none'
if not url_formats: formats.append(f)
continue
if not vbr:
for f in url_formats:
abr = f.get('tbr') or abr
if 'tbr' in f:
del f['tbr']
f.update({
'abr': abr,
'vcodec': 'none',
})
formats.extend(url_formats)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -23,7 +23,7 @@ class MediasetIE(ThePlatformBaseIE):
https?:// https?://
(?:(?:www|static3)\.)?mediasetplay\.mediaset\.it/ (?:(?:www|static3)\.)?mediasetplay\.mediaset\.it/
(?: (?:
(?:video|on-demand)/(?:[^/]+/)+[^/]+_| (?:video|on-demand|movie)/(?:[^/]+/)+[^/]+_|
player/index\.html\?.*?\bprogramGuid= player/index\.html\?.*?\bprogramGuid=
) )
)(?P<id>[0-9A-Z]{16,}) )(?P<id>[0-9A-Z]{16,})
@ -88,6 +88,9 @@ class MediasetIE(ThePlatformBaseIE):
}, { }, {
'url': 'https://www.mediasetplay.mediaset.it/video/grandefratellovip/benedetta-una-doccia-gelata_F309344401044C135', 'url': 'https://www.mediasetplay.mediaset.it/video/grandefratellovip/benedetta-una-doccia-gelata_F309344401044C135',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.mediasetplay.mediaset.it/movie/herculeslaleggendahainizio/hercules-la-leggenda-ha-inizio_F305927501000102',
'only_matching': True,
}] }]
@staticmethod @staticmethod

View File

@ -1,16 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
from .common import InfoExtractor from .telecinco import TelecincoIE
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
smuggle_url,
) )
class MiTeleIE(InfoExtractor): class MiTeleIE(TelecincoIE):
IE_DESC = 'mitele.es' IE_DESC = 'mitele.es'
_VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player' _VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player'
@ -53,7 +51,7 @@ class MiTeleIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
}, { }, {
'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player', 'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
'only_matching': True, 'only_matching': True,
@ -69,13 +67,11 @@ def _real_extract(self, url):
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})', r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})',
webpage, 'Pre Player'), display_id)['prePlayer'] webpage, 'Pre Player'), display_id)['prePlayer']
title = pre_player['title'] title = pre_player['title']
video = pre_player['video'] video_info = self._parse_content(pre_player['video'], url)
video_id = video['dataMediaId']
content = pre_player.get('content') or {} content = pre_player.get('content') or {}
info = content.get('info') or {} info = content.get('info') or {}
info = { video_info.update({
'id': video_id,
'title': title, 'title': title,
'description': info.get('synopsis'), 'description': info.get('synopsis'),
'series': content.get('title'), 'series': content.get('title'),
@ -83,38 +79,7 @@ def _real_extract(self, url):
'episode': content.get('subtitle'), 'episode': content.get('subtitle'),
'episode_number': int_or_none(info.get('episode_number')), 'episode_number': int_or_none(info.get('episode_number')),
'duration': int_or_none(info.get('duration')), 'duration': int_or_none(info.get('duration')),
'thumbnail': video.get('dataPoster'),
'age_limit': int_or_none(info.get('rating')), 'age_limit': int_or_none(info.get('rating')),
'timestamp': parse_iso8601(pre_player.get('publishedTime')), 'timestamp': parse_iso8601(pre_player.get('publishedTime')),
} })
return video_info
if video.get('dataCmsId') == 'ooyala':
info.update({
'_type': 'url_transparent',
# for some reason only HLS is supported
'url': smuggle_url('ooyala:' + video_id, {'supportedformats': 'm3u8,dash'}),
})
else:
config = self._download_json(
video['dataConfig'], video_id, 'Downloading config JSON')
services = config['services']
gbx = self._download_json(
services['gbx'], video_id, 'Downloading gbx JSON')
caronte = self._download_json(
services['caronte'], video_id, 'Downloading caronte JSON')
cerbero = self._download_json(
caronte['cerbero'], video_id, 'Downloading cerbero JSON',
headers={
'Content-Type': 'application/json;charset=UTF-8',
'Origin': 'https://www.mitele.es'
},
data=json.dumps({
'bbx': caronte['bbx'],
'gbx': gbx['gbx']
}).encode('utf-8'))
formats = self._extract_m3u8_formats(
caronte['dls'][0]['stream'], video_id, 'mp4', 'm3u8_native', m3u8_id='hls',
query=dict([cerbero['tokens']['1']['cdn'].split('=', 1)]))
info['formats'] = formats
return info

View File

@ -5,33 +5,137 @@
from .turner import TurnerBaseIE from .turner import TurnerBaseIE
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode, compat_parse_qs,
compat_urlparse, compat_str,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
int_or_none,
merge_dicts,
OnDemandPagedList, OnDemandPagedList,
remove_start, parse_duration,
parse_iso8601,
try_get,
update_url_query,
urljoin,
) )
class NBAIE(TurnerBaseIE): class NBACVPBaseIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:watch\.|www\.)?nba\.com/(?P<path>(?:[^/]+/)+(?P<id>[^?]*?))/?(?:/index\.html)?(?:\?.*)?$' def _extract_nba_cvp_info(self, path, video_id, fatal=False):
return self._extract_cvp_info(
'http://secure.nba.com/%s' % path, video_id, {
'default': {
'media_src': 'http://nba.cdn.turner.com/nba/big',
},
'm3u8': {
'media_src': 'http://nbavod-f.akamaihd.net',
},
}, fatal=fatal)
class NBAWatchBaseIE(NBACVPBaseIE):
_VALID_URL_BASE = r'https?://(?:(?:www\.)?nba\.com(?:/watch)?|watch\.nba\.com)/'
def _extract_video(self, filter_key, filter_value):
video = self._download_json(
'https://neulionscnbav2-a.akamaihd.net/solr/nbad_program/usersearch',
filter_value, query={
'fl': 'description,image,name,pid,releaseDate,runtime,tags,seoName',
'q': filter_key + ':' + filter_value,
'wt': 'json',
})['response']['docs'][0]
video_id = str(video['pid'])
title = video['name']
formats = []
m3u8_url = (self._download_json(
'https://watch.nba.com/service/publishpoint', video_id, query={
'type': 'video',
'format': 'json',
'id': video_id,
}, headers={
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0_1 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A402 Safari/604.1',
}, fatal=False) or {}).get('path')
if m3u8_url:
m3u8_formats = self._extract_m3u8_formats(
re.sub(r'_(?:pc|iphone)\.', '.', m3u8_url), video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats)
for f in m3u8_formats:
http_f = f.copy()
http_f.update({
'format_id': http_f['format_id'].replace('hls-', 'http-'),
'protocol': 'http',
'url': http_f['url'].replace('.m3u8', ''),
})
formats.append(http_f)
info = {
'id': video_id,
'title': title,
'thumbnail': urljoin('https://nbadsdmt.akamaized.net/media/nba/nba/thumbs/', video.get('image')),
'description': video.get('description'),
'duration': int_or_none(video.get('runtime')),
'timestamp': parse_iso8601(video.get('releaseDate')),
'tags': video.get('tags'),
}
seo_name = video.get('seoName')
if seo_name and re.search(r'\d{4}/\d{2}/\d{2}/', seo_name):
base_path = ''
if seo_name.startswith('teams/'):
base_path += seo_name.split('/')[1] + '/'
base_path += 'video/'
cvp_info = self._extract_nba_cvp_info(
base_path + seo_name + '.xml', video_id, False)
if cvp_info:
formats.extend(cvp_info['formats'])
info = merge_dicts(info, cvp_info)
self._sort_formats(formats)
info['formats'] = formats
return info
class NBAWatchEmbedIE(NBAWatchBaseIE):
IENAME = 'nba:watch:embed'
_VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'embed\?.*?\bid=(?P<id>\d+)'
_TESTS = [{
'url': 'http://watch.nba.com/embed?id=659395',
'md5': 'b7e3f9946595f4ca0a13903ce5edd120',
'info_dict': {
'id': '659395',
'ext': 'mp4',
'title': 'Mix clip: More than 7 points of Joe Ingles, Luc Mbah a Moute, Blake Griffin and 6 more in Utah Jazz vs. the Clippers, 4/15/2017',
'description': 'Mix clip: More than 7 points of Joe Ingles, Luc Mbah a Moute, Blake Griffin and 6 more in Utah Jazz vs. the Clippers, 4/15/2017',
'timestamp': 1492228800,
'upload_date': '20170415',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self._extract_video('pid', video_id)
class NBAWatchIE(NBAWatchBaseIE):
IE_NAME = 'nba:watch'
_VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'(?:nba/)?video/(?P<id>.+?(?=/index\.html)|(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html', 'url': 'http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html',
'md5': '9e7729d3010a9c71506fd1248f74e4f4', 'md5': '9d902940d2a127af3f7f9d2f3dc79c96',
'info_dict': { 'info_dict': {
'id': '0021200253-okc-bkn-recap', 'id': '70946',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Thunder vs. Nets', 'title': 'Thunder vs. Nets',
'description': 'Kevin Durant scores 32 points and dishes out six assists as the Thunder beat the Nets in Brooklyn.', 'description': 'Kevin Durant scores 32 points and dishes out six assists as the Thunder beat the Nets in Brooklyn.',
'duration': 181, 'duration': 181,
'timestamp': 1354638466, 'timestamp': 1354597200,
'upload_date': '20121204', 'upload_date': '20121204',
}, },
'params': {
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://www.nba.com/video/games/hornets/2014/12/05/0021400276-nyk-cha-play5.nba/', 'url': 'http://www.nba.com/video/games/hornets/2014/12/05/0021400276-nyk-cha-play5.nba/',
'only_matching': True, 'only_matching': True,
@ -39,116 +143,286 @@ class NBAIE(TurnerBaseIE):
'url': 'http://watch.nba.com/video/channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba', 'url': 'http://watch.nba.com/video/channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba',
'md5': 'b2b39b81cf28615ae0c3360a3f9668c4', 'md5': 'b2b39b81cf28615ae0c3360a3f9668c4',
'info_dict': { 'info_dict': {
'id': 'channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba', 'id': '330865',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Hawks vs. Cavaliers Game 1', 'title': 'Hawks vs. Cavaliers Game 1',
'description': 'md5:8094c3498d35a9bd6b1a8c396a071b4d', 'description': 'md5:8094c3498d35a9bd6b1a8c396a071b4d',
'duration': 228, 'duration': 228,
'timestamp': 1432134543, 'timestamp': 1432094400,
'upload_date': '20150520', 'upload_date': '20150521',
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
'url': 'http://www.nba.com/clippers/news/doc-rivers-were-not-trading-blake',
'info_dict': {
'id': 'teams/clippers/2016/02/17/1455672027478-Doc_Feb16_720.mov-297324',
'ext': 'mp4',
'title': 'Practice: Doc Rivers - 2/16/16',
'description': 'Head Coach Doc Rivers addresses the media following practice.',
'upload_date': '20160216',
'timestamp': 1455672000,
},
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
'url': 'http://www.nba.com/timberwolves/wiggins-shootaround#',
'info_dict': {
'id': 'timberwolves',
'title': 'Shootaround Access - Dec. 12 | Andrew Wiggins',
},
'playlist_count': 30,
'params': {
# Download the whole playlist takes too long time
'playlist_items': '1-30',
}, },
}, { }, {
'url': 'http://www.nba.com/timberwolves/wiggins-shootaround#', 'url': 'http://watch.nba.com/nba/video/channels/nba_tv/2015/06/11/YT_go_big_go_home_Game4_061115',
'info_dict': { 'only_matching': True,
'id': 'teams/timberwolves/2014/12/12/Wigginsmp4-3462601', }, {
'ext': 'mp4', # only CVP mp4 format available
'title': 'Shootaround Access - Dec. 12 | Andrew Wiggins', 'url': 'https://watch.nba.com/video/teams/cavaliers/2012/10/15/sloan121015mov-2249106',
'description': 'Wolves rookie Andrew Wiggins addresses the media after Friday\'s shootaround.', 'only_matching': True,
'upload_date': '20141212', }, {
'timestamp': 1418418600, 'url': 'https://watch.nba.com/video/top-100-dunks-from-the-2019-20-season?plsrc=nba&collection=2019-20-season-highlights',
}, 'only_matching': True,
'params': {
'noplaylist': True,
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}] }]
_PAGE_SIZE = 30 def _real_extract(self, url):
display_id = self._match_id(url)
collection_id = compat_parse_qs(compat_urllib_parse_urlparse(url).query).get('collection', [None])[0]
if collection_id:
if self._downloader.params.get('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % display_id)
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % collection_id)
return self.url_result(
'https://www.nba.com/watch/list/collection/' + collection_id,
NBAWatchCollectionIE.ie_key(), collection_id)
return self._extract_video('seoName', display_id)
def _fetch_page(self, team, video_id, page):
search_url = 'http://searchapp2.nba.com/nba-search/query.jsp?' + compat_urllib_parse_urlencode({
'type': 'teamvideo',
'start': page * self._PAGE_SIZE + 1,
'npp': (page + 1) * self._PAGE_SIZE + 1,
'sort': 'recent',
'output': 'json',
'site': team,
})
results = self._download_json(
search_url, video_id, note='Download page %d of playlist data' % page)['results'][0]
for item in results:
yield self.url_result(compat_urlparse.urljoin('http://www.nba.com/', item['url']))
def _extract_playlist(self, orig_path, video_id, webpage): class NBAWatchCollectionIE(NBAWatchBaseIE):
team = orig_path.split('/')[0] IE_NAME = 'nba:watch:collection'
_VALID_URL = NBAWatchBaseIE._VALID_URL_BASE + r'list/collection/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://watch.nba.com/list/collection/season-preview-2020',
'info_dict': {
'id': 'season-preview-2020',
},
'playlist_mincount': 43,
}]
_PAGE_SIZE = 100
if self._downloader.params.get('noplaylist'): def _fetch_page(self, collection_id, page):
self.to_screen('Downloading just video because of --no-playlist') page += 1
video_path = self._search_regex( videos = self._download_json(
r'nbaVideoCore\.firstVideo\s*=\s*\'([^\']+)\';', webpage, 'video path') 'https://content-api-prod.nba.com/public/1/endeavor/video-list/collection/' + collection_id,
video_url = 'http://www.nba.com/%s/video/%s' % (team, video_path) collection_id, 'Downloading page %d JSON metadata' % page, query={
return self.url_result(video_url) 'count': self._PAGE_SIZE,
'page': page,
self.to_screen('Downloading playlist - add --no-playlist to just download video') })['results']['videos']
playlist_title = self._og_search_title(webpage, fatal=False) for video in videos:
entries = OnDemandPagedList( program = video.get('program') or {}
functools.partial(self._fetch_page, team, video_id), seo_name = program.get('seoName') or program.get('slug')
self._PAGE_SIZE) if not seo_name:
continue
return self.playlist_result(entries, team, playlist_title) yield {
'_type': 'url',
'id': program.get('id'),
'title': program.get('title') or video.get('title'),
'url': 'https://www.nba.com/watch/video/' + seo_name,
'thumbnail': video.get('image'),
'description': program.get('description') or video.get('description'),
'duration': parse_duration(program.get('runtimeHours')),
'timestamp': parse_iso8601(video.get('releaseDate')),
}
def _real_extract(self, url): def _real_extract(self, url):
path, video_id = re.match(self._VALID_URL, url).groups() collection_id = self._match_id(url)
orig_path = path entries = OnDemandPagedList(
if path.startswith('nba/'): functools.partial(self._fetch_page, collection_id),
path = path[3:] self._PAGE_SIZE)
return self.playlist_result(entries, collection_id)
if 'video/' not in path:
webpage = self._download_webpage(url, video_id)
path = remove_start(self._search_regex(r'data-videoid="([^"]+)"', webpage, 'video id'), '/')
if path == '{{id}}': class NBABaseIE(NBACVPBaseIE):
return self._extract_playlist(orig_path, video_id, webpage) _VALID_URL_BASE = r'''(?x)
https?://(?:www\.)?nba\.com/
(?P<team>
blazers|
bucks|
bulls|
cavaliers|
celtics|
clippers|
grizzlies|
hawks|
heat|
hornets|
jazz|
kings|
knicks|
lakers|
magic|
mavericks|
nets|
nuggets|
pacers|
pelicans|
pistons|
raptors|
rockets|
sixers|
spurs|
suns|
thunder|
timberwolves|
warriors|
wizards
)
(?:/play\#)?/'''
_CHANNEL_PATH_REGEX = r'video/channel|series'
# See prepareContentId() of pkgCvp.js def _embed_url_result(self, team, content_id):
if path.startswith('video/teams'): return self.url_result(update_url_query(
path = 'video/channels/proxy/' + path[6:] 'https://secure.nba.com/assets/amp/include/video/iframe.html', {
'contentId': content_id,
'team': team,
}), NBAEmbedIE.ie_key())
return self._extract_cvp_info( def _call_api(self, team, content_id, query, resource):
'http://www.nba.com/%s.xml' % path, video_id, { return self._download_json(
'default': { 'https://api.nba.net/2/%s/video,imported_video,wsc/' % team,
'media_src': 'http://nba.cdn.turner.com/nba/big', content_id, 'Download %s JSON metadata' % resource,
}, query=query, headers={
'm3u8': { 'accessToken': 'internal|bb88df6b4c2244e78822812cecf1ee1b',
'media_src': 'http://nbavod-f.akamaihd.net', })['response']['result']
},
def _extract_video(self, video, team, extract_all=True):
video_id = compat_str(video['nid'])
team = video['brand']
info = {
'id': video_id,
'title': video.get('title') or video.get('headline') or video['shortHeadline'],
'description': video.get('description'),
'timestamp': parse_iso8601(video.get('published')),
}
subtitles = {}
captions = try_get(video, lambda x: x['videoCaptions']['sidecars'], dict) or {}
for caption_url in captions.values():
subtitles.setdefault('en', []).append({'url': caption_url})
formats = []
mp4_url = video.get('mp4')
if mp4_url:
formats.append({
'url': mp4_url,
}) })
if extract_all:
source_url = video.get('videoSource')
if source_url and not source_url.startswith('s3://') and self._is_valid_url(source_url, video_id, 'source'):
formats.append({
'format_id': 'source',
'url': source_url,
'preference': 1,
})
m3u8_url = video.get('m3u8')
if m3u8_url:
if '.akamaihd.net/i/' in m3u8_url:
formats.extend(self._extract_akamai_formats(
m3u8_url, video_id, {'http': 'pmd.cdn.turner.com'}))
else:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
content_xml = video.get('contentXml')
if team and content_xml:
cvp_info = self._extract_nba_cvp_info(
team + content_xml, video_id, fatal=False)
if cvp_info:
formats.extend(cvp_info['formats'])
subtitles = self._merge_subtitles(subtitles, cvp_info['subtitles'])
info = merge_dicts(info, cvp_info)
self._sort_formats(formats)
else:
info.update(self._embed_url_result(team, video['videoId']))
info.update({
'formats': formats,
'subtitles': subtitles,
})
return info
def _real_extract(self, url):
team, display_id = re.match(self._VALID_URL, url).groups()
if '/play#/' in url:
display_id = compat_urllib_parse_unquote(display_id)
else:
webpage = self._download_webpage(url, display_id)
display_id = self._search_regex(
self._CONTENT_ID_REGEX + r'\s*:\s*"([^"]+)"', webpage, 'video id')
return self._extract_url_results(team, display_id)
class NBAEmbedIE(NBABaseIE):
IENAME = 'nba:embed'
_VALID_URL = r'https?://secure\.nba\.com/assets/amp/include/video/(?:topI|i)frame\.html\?.*?\bcontentId=(?P<id>[^?#&]+)'
_TESTS = [{
'url': 'https://secure.nba.com/assets/amp/include/video/topIframe.html?contentId=teams/bulls/2020/12/04/3478774/1607105587854-20201204_SCHEDULE_RELEASE_FINAL_DRUPAL-3478774&team=bulls&adFree=false&profile=71&videoPlayerName=TAMPCVP&baseUrl=&videoAdsection=nba.com_mobile_web_teamsites_chicagobulls&ampEnv=',
'only_matching': True,
}, {
'url': 'https://secure.nba.com/assets/amp/include/video/iframe.html?contentId=2016/10/29/0021600027boschaplay7&adFree=false&profile=71&team=&videoPlayerName=LAMPCVP',
'only_matching': True,
}]
def _real_extract(self, url):
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
content_id = qs['contentId'][0]
team = qs.get('team', [None])[0]
if not team:
return self.url_result(
'https://watch.nba.com/video/' + content_id, NBAWatchIE.ie_key())
video = self._call_api(team, content_id, {'videoid': content_id}, 'video')[0]
return self._extract_video(video, team)
class NBAIE(NBABaseIE):
IENAME = 'nba'
_VALID_URL = NBABaseIE._VALID_URL_BASE + '(?!%s)video/(?P<id>(?:[^/]+/)*[^/?#&]+)' % NBABaseIE._CHANNEL_PATH_REGEX
_TESTS = [{
'url': 'https://www.nba.com/bulls/video/teams/bulls/2020/12/04/3478774/1607105587854-20201204schedulereleasefinaldrupal-3478774',
'info_dict': {
'id': '45039',
'ext': 'mp4',
'title': 'AND WE BACK.',
'description': 'Part 1 of our 2020-21 schedule is here! Watch our games on NBC Sports Chicago.',
'duration': 94,
'timestamp': 1607112000,
'upload_date': '20201218',
},
}, {
'url': 'https://www.nba.com/bucks/play#/video/teams%2Fbucks%2F2020%2F12%2F17%2F64860%2F1608252863446-Op_Dream_16x9-64860',
'only_matching': True,
}, {
'url': 'https://www.nba.com/bucks/play#/video/wsc%2Fteams%2F2787C911AA1ACD154B5377F7577CCC7134B2A4B0',
'only_matching': True,
}]
_CONTENT_ID_REGEX = r'videoID'
def _extract_url_results(self, team, content_id):
return self._embed_url_result(team, content_id)
class NBAChannelIE(NBABaseIE):
IENAME = 'nba:channel'
_VALID_URL = NBABaseIE._VALID_URL_BASE + '(?:%s)/(?P<id>[^/?#&]+)' % NBABaseIE._CHANNEL_PATH_REGEX
_TESTS = [{
'url': 'https://www.nba.com/blazers/video/channel/summer_league',
'info_dict': {
'title': 'Summer League',
},
'playlist_mincount': 138,
}, {
'url': 'https://www.nba.com/bucks/play#/series/On%20This%20Date',
'only_matching': True,
}]
_CONTENT_ID_REGEX = r'videoSubCategory'
_PAGE_SIZE = 100
def _fetch_page(self, team, channel, page):
results = self._call_api(team, channel, {
'channels': channel,
'count': self._PAGE_SIZE,
'offset': page * self._PAGE_SIZE,
}, 'page %d' % (page + 1))
for video in results:
yield self._extract_video(video, team, False)
def _extract_url_results(self, team, content_id):
entries = OnDemandPagedList(
functools.partial(self._fetch_page, team, content_id),
self._PAGE_SIZE)
return self.playlist_result(entries, playlist_title=content_id)

View File

@ -158,7 +158,8 @@ def _real_extract(self, url):
class NBCSportsVPlayerIE(InfoExtractor): class NBCSportsVPlayerIE(InfoExtractor):
_VALID_URL = r'https?://vplayer\.nbcsports\.com/(?:[^/]+/)+(?P<id>[0-9a-zA-Z_]+)' _VALID_URL_BASE = r'https?://(?:vplayer\.nbcsports\.com|(?:www\.)?nbcsports\.com/vplayer)/'
_VALID_URL = _VALID_URL_BASE + r'(?:[^/]+/)+(?P<id>[0-9a-zA-Z_]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/9CsDKds0kvHI', 'url': 'https://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/9CsDKds0kvHI',
@ -174,12 +175,15 @@ class NBCSportsVPlayerIE(InfoExtractor):
}, { }, {
'url': 'https://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/_hqLjQ95yx8Z', 'url': 'https://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/_hqLjQ95yx8Z',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.nbcsports.com/vplayer/p/BxmELC/nbcsports/select/PHJSaFWbrTY9?form=html&autoPlay=true',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
iframe_m = re.search( iframe_m = re.search(
r'<iframe[^>]+src="(?P<url>https?://vplayer\.nbcsports\.com/[^"]+)"', webpage) r'<(?:iframe[^>]+|div[^>]+data-(?:mpx-)?)src="(?P<url>%s[^"]+)"' % NBCSportsVPlayerIE._VALID_URL_BASE, webpage)
if iframe_m: if iframe_m:
return iframe_m.group('url') return iframe_m.group('url')
@ -192,21 +196,29 @@ def _real_extract(self, url):
class NBCSportsIE(InfoExtractor): class NBCSportsIE(InfoExtractor):
# Does not include https because its certificate is invalid _VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?!vplayer/)(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_TEST = { _TESTS = [{
# iframe src
'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke', 'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke',
'info_dict': { 'info_dict': {
'id': 'PHJSaFWbrTY9', 'id': 'PHJSaFWbrTY9',
'ext': 'flv', 'ext': 'mp4',
'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke', 'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke',
'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113', 'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113',
'uploader': 'NBCU-SPORTS', 'uploader': 'NBCU-SPORTS',
'upload_date': '20150330', 'upload_date': '20150330',
'timestamp': 1427726529, 'timestamp': 1427726529,
} }
} }, {
# data-mpx-src
'url': 'https://www.nbcsports.com/philadelphia/philadelphia-phillies/bruce-bochy-hector-neris-hes-idiot',
'only_matching': True,
}, {
# data-src
'url': 'https://www.nbcsports.com/boston/video/report-card-pats-secondary-no-match-josh-allen',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -274,33 +286,6 @@ def _real_extract(self, url):
} }
class CSNNEIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?csnne\.com/video/(?P<id>[0-9a-z-]+)'
_TEST = {
'url': 'http://www.csnne.com/video/snc-evening-update-wright-named-red-sox-no-5-starter',
'info_dict': {
'id': 'yvBLLUgQ8WU0',
'ext': 'mp4',
'title': 'SNC evening update: Wright named Red Sox\' No. 5 starter.',
'description': 'md5:1753cfee40d9352b19b4c9b3e589b9e3',
'timestamp': 1459369979,
'upload_date': '20160330',
'uploader': 'NBCU-SPORTS',
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': self._html_search_meta('twitter:player:stream', webpage),
'display_id': display_id,
}
class NBCNewsIE(ThePlatformIE): class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/([^/]+/)*(?:.*-)?(?P<id>[^/?]+)' _VALID_URL = r'(?x)https?://(?:www\.)?(?:nbcnews|today|msnbc)\.com/([^/]+/)*(?:.*-)?(?P<id>[^/?]+)'

View File

@ -4,19 +4,15 @@
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
ExtractorError, clean_html,
int_or_none, determine_ext,
remove_end, get_element_by_class,
) )
class NFLIE(InfoExtractor): class NFLBaseIE(InfoExtractor):
IE_NAME = 'nfl.com' _VALID_URL_BASE = r'''(?x)
_VALID_URL = r'''(?x)
https?:// https?://
(?P<host> (?P<host>
(?:www\.)? (?:www\.)?
@ -34,15 +30,15 @@ class NFLIE(InfoExtractor):
houstontexans| houstontexans|
colts| colts|
jaguars| jaguars|
titansonline| (?:titansonline|tennesseetitans)|
denverbroncos| denverbroncos|
kcchiefs| (?:kc)?chiefs|
raiders| raiders|
chargers| chargers|
dallascowboys| dallascowboys|
giants| giants|
philadelphiaeagles| philadelphiaeagles|
redskins| (?:redskins|washingtonfootball)|
chicagobears| chicagobears|
detroitlions| detroitlions|
packers| packers|
@ -52,180 +48,113 @@ class NFLIE(InfoExtractor):
neworleanssaints| neworleanssaints|
buccaneers| buccaneers|
azcardinals| azcardinals|
stlouisrams| (?:stlouis|the)rams|
49ers| 49ers|
seahawks seahawks
)\.com| )\.com|
.+?\.clubs\.nfl\.com .+?\.clubs\.nfl\.com
) )
)/ )/
(?:.+?/)*
(?P<id>[^/#?&]+)
''' '''
_VIDEO_CONFIG_REGEX = r'<script[^>]+id="[^"]*video-config-[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}[^"]*"[^>]*>\s*({.+})'
_WORKING = False
def _parse_video_config(self, video_config, display_id):
video_config = self._parse_json(video_config, display_id)
item = video_config['playlist'][0]
mcp_id = item.get('mcpID')
if mcp_id:
info = self.url_result(
'anvato:GXvEgwyJeWem8KCYXfeoHWknwP48Mboj:' + mcp_id,
'Anvato', mcp_id)
else:
media_id = item.get('id') or item['entityId']
title = item['title']
item_url = item['url']
info = {'id': media_id}
ext = determine_ext(item_url)
if ext == 'm3u8':
info['formats'] = self._extract_m3u8_formats(item_url, media_id, 'mp4')
self._sort_formats(info['formats'])
else:
info['url'] = item_url
if item.get('audio') is True:
info['vcodec'] = 'none'
is_live = video_config.get('live') is True
thumbnails = None
image_url = item.get(item.get('imageSrc')) or item.get(item.get('posterImage'))
if image_url:
thumbnails = [{
'url': image_url,
'ext': determine_ext(image_url, 'jpg'),
}]
info.update({
'title': self._live_title(title) if is_live else title,
'is_live': is_live,
'description': clean_html(item.get('description')),
'thumbnails': thumbnails,
})
return info
class NFLIE(NFLBaseIE):
IE_NAME = 'nfl.com'
_VALID_URL = NFLBaseIE._VALID_URL_BASE + r'(?:videos?|listen|audio)/(?P<id>[^/#?&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.nfl.com/videos/nfl-game-highlights/0ap3000000398478/Week-3-Redskins-vs-Eagles-highlights', 'url': 'https://www.nfl.com/videos/baker-mayfield-s-game-changing-plays-from-3-td-game-week-14',
'md5': '394ef771ddcd1354f665b471d78ec4c6',
'info_dict': { 'info_dict': {
'id': '0ap3000000398478', 'id': '899441',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Week 3: Redskins vs. Eagles highlights', 'title': "Baker Mayfield's game-changing plays from 3-TD game Week 14",
'description': 'md5:56323bfb0ac4ee5ab24bd05fdf3bf478', 'description': 'md5:85e05a3cc163f8c344340f220521136d',
'upload_date': '20140921', 'upload_date': '20201215',
'timestamp': 1411337580, 'timestamp': 1608009755,
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'NFL',
} }
}, { }, {
'url': 'http://prod.www.steelers.clubs.nfl.com/video-and-audio/videos/LIVE_Post_Game_vs_Browns/9d72f26a-9e2b-4718-84d3-09fb4046c266', 'url': 'https://www.chiefs.com/listen/patrick-mahomes-travis-kelce-react-to-win-over-dolphins-the-breakdown',
'md5': 'cf85bdb4bc49f6e9d3816d130c78279c', 'md5': '6886b32c24b463038c760ceb55a34566',
'info_dict': { 'info_dict': {
'id': '9d72f26a-9e2b-4718-84d3-09fb4046c266', 'id': 'd87e8790-3e14-11eb-8ceb-ff05c2867f99',
'ext': 'mp4', 'ext': 'mp3',
'title': 'LIVE: Post Game vs. Browns', 'title': 'Patrick Mahomes, Travis Kelce React to Win Over Dolphins | The Breakdown',
'description': 'md5:6a97f7e5ebeb4c0e69a418a89e0636e8', 'description': 'md5:12ada8ee70e6762658c30e223e095075',
'upload_date': '20131229',
'timestamp': 1388354455,
'thumbnail': r're:^https?://.*\.jpg$',
} }
}, { }, {
'url': 'http://www.nfl.com/news/story/0ap3000000467586/article/patriots-seahawks-involved-in-lategame-skirmish', 'url': 'https://www.buffalobills.com/video/buffalo-bills-military-recognition-week-14',
'info_dict': {
'id': '0ap3000000467607',
'ext': 'mp4',
'title': 'Frustrations flare on the field',
'description': 'Emotions ran high at the end of the Super Bowl on both sides of the ball after a dramatic finish.',
'timestamp': 1422850320,
'upload_date': '20150202',
},
}, {
'url': 'http://www.patriots.com/video/2015/09/18/10-days-gillette',
'md5': '4c319e2f625ffd0b481b4382c6fc124c',
'info_dict': {
'id': 'n-238346',
'ext': 'mp4',
'title': '10 Days at Gillette',
'description': 'md5:8cd9cd48fac16de596eadc0b24add951',
'timestamp': 1442618809,
'upload_date': '20150918',
},
}, {
# lowercase data-contentid
'url': 'http://www.steelers.com/news/article-1/Tomlin-on-Ben-getting-Vick-ready/56399c96-4160-48cf-a7ad-1d17d4a3aef7',
'info_dict': {
'id': '12693586-6ea9-4743-9c1c-02c59e4a5ef2',
'ext': 'mp4',
'title': 'Tomlin looks ahead to Ravens on a short week',
'description': 'md5:32f3f7b139f43913181d5cbb24ecad75',
'timestamp': 1443459651,
'upload_date': '20150928',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.nfl.com/videos/nfl-network-top-ten/09000d5d810a6bd4/Top-10-Gutsiest-Performances-Jack-Youngblood',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.buffalobills.com/video/videos/Rex_Ryan_Show_World_Wide_Rex/b1dcfab2-3190-4bb1-bfc0-d6e603d6601a', 'url': 'https://www.raiders.com/audio/instant-reactions-raiders-week-14-loss-to-indianapolis-colts-espn-jason-fitz',
'only_matching': True, 'only_matching': True,
}] }]
@staticmethod def _real_extract(self, url):
def prepend_host(host, url): display_id = self._match_id(url)
if not url.startswith('http'): webpage = self._download_webpage(url, display_id)
if not url.startswith('/'): return self._parse_video_config(self._search_regex(
url = '/%s' % url self._VIDEO_CONFIG_REGEX, webpage, 'video config'), display_id)
url = 'http://{0:}{1:}'.format(host, url)
return url
@staticmethod
def format_from_stream(stream, protocol, host, path_prefix='', class NFLArticleIE(NFLBaseIE):
preference=0, note=None): IE_NAME = 'nfl.com:article'
url = '{protocol:}://{host:}/{prefix:}{path:}'.format( _VALID_URL = NFLBaseIE._VALID_URL_BASE + r'news/(?P<id>[^/#?&]+)'
protocol=protocol, _TEST = {
host=host, 'url': 'https://www.buffalobills.com/news/the-only-thing-we-ve-earned-is-the-noise-bills-coaches-discuss-handling-rising-e',
prefix=path_prefix, 'info_dict': {
path=stream.get('path'), 'id': 'the-only-thing-we-ve-earned-is-the-noise-bills-coaches-discuss-handling-rising-e',
) 'title': "'The only thing we've earned is the noise' | Bills coaches discuss handling rising expectations",
return { },
'url': url, 'playlist_count': 4,
'vbr': int_or_none(stream.get('rate', 0), 1000), }
'preference': preference,
'format_note': note,
}
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
video_id, host = mobj.group('id'), mobj.group('host') webpage = self._download_webpage(url, display_id)
entries = []
webpage = self._download_webpage(url, video_id) for video_config in re.findall(self._VIDEO_CONFIG_REGEX, webpage):
entries.append(self._parse_video_config(video_config, display_id))
config_url = NFLIE.prepend_host(host, self._search_regex( title = clean_html(get_element_by_class(
r'(?:(?:config|configURL)\s*:\s*|<nflcs:avplayer[^>]+data-config\s*=\s*)(["\'])(?P<config>.+?)\1', 'nfl-c-article__title', webpage)) or self._html_search_meta(
webpage, 'config URL', default='static/content/static/config/video/config.json', ['og:title', 'twitter:title'], webpage)
group='config')) return self.playlist_result(entries, display_id, title)
# For articles, the id in the url is not the video id
video_id = self._search_regex(
r'(?:<nflcs:avplayer[^>]+data-content[Ii]d\s*=\s*|content[Ii]d\s*:\s*)(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'video id', default=video_id, group='id')
config = self._download_json(config_url, video_id, 'Downloading player config')
url_template = NFLIE.prepend_host(
host, '{contentURLTemplate:}'.format(**config))
video_data = self._download_json(
url_template.format(id=video_id), video_id)
formats = []
cdn_data = video_data.get('cdnData', {})
streams = cdn_data.get('bitrateInfo', [])
if cdn_data.get('format') == 'EXTERNAL_HTTP_STREAM':
parts = compat_urllib_parse_urlparse(cdn_data.get('uri'))
protocol, host = parts.scheme, parts.netloc
for stream in streams:
formats.append(
NFLIE.format_from_stream(stream, protocol, host))
else:
cdns = config.get('cdns')
if not cdns:
raise ExtractorError('Failed to get CDN data', expected=True)
for name, cdn in cdns.items():
# LimeLight streams don't seem to work
if cdn.get('name') == 'LIMELIGHT':
continue
protocol = cdn.get('protocol')
host = remove_end(cdn.get('host', ''), '/')
if not (protocol and host):
continue
prefix = cdn.get('pathprefix', '')
if prefix and not prefix.endswith('/'):
prefix = '%s/' % prefix
preference = 0
if protocol == 'rtmp':
preference = -2
elif 'prog' in name.lower():
preference = 1
for stream in streams:
formats.append(
NFLIE.format_from_stream(stream, protocol, host,
prefix, preference, name))
self._sort_formats(formats)
thumbnail = None
for q in ('xl', 'l', 'm', 's', 'xs'):
thumbnail = video_data.get('imagePaths', {}).get(q)
if thumbnail:
break
return {
'id': video_id,
'title': video_data.get('headline'),
'formats': formats,
'description': video_data.get('caption'),
'duration': video_data.get('duration'),
'thumbnail': thumbnail,
'timestamp': int_or_none(video_data.get('posted'), 1000),
}

View File

@ -3,51 +3,33 @@
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import urljoin
class NhkVodIE(InfoExtractor): class NhkBaseIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[^/]+?-\d{8}-\d+)' _API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7a/%s/%s/%s/all%s.json'
# Content available only for a limited period of time. Visit _BASE_URL_REGEX = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand'
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples. _TYPE_REGEX = r'/(?P<type>video|audio)/'
_TESTS = [{
# clip
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/9999011/',
'md5': '256a1be14f48d960a7e61e2532d95ec3',
'info_dict': {
'id': 'a95j5iza',
'ext': 'mp4',
'title': "Dining with the Chef - Chef Saito's Family recipe: MENCHI-KATSU",
'description': 'md5:5aee4a9f9d81c26281862382103b0ea5',
'timestamp': 1565965194,
'upload_date': '20190816',
},
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/2015173/',
'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/j_art-20150903-1/',
'only_matching': True,
}]
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7a/episode/%s/%s/all%s.json'
def _real_extract(self, url): def _call_api(self, m_id, lang, is_video, is_episode, is_clip):
lang, m_type, episode_id = re.match(self._VALID_URL, url).groups() return self._download_json(
self._API_URL_TEMPLATE % (
'v' if is_video else 'r',
'clip' if is_clip else 'esd',
'episode' if is_episode else 'program',
m_id, lang, '/all' if is_video else ''),
m_id, query={'apikey': 'EJfK8jdS57GqlupFgAfAAwr573q01y6k'})['data']['episodes'] or []
def _extract_episode_info(self, url, episode=None):
fetch_episode = episode is None
lang, m_type, episode_id = re.match(NhkVodIE._VALID_URL, url).groups()
if episode_id.isdigit(): if episode_id.isdigit():
episode_id = episode_id[:4] + '-' + episode_id[4:] episode_id = episode_id[:4] + '-' + episode_id[4:]
is_video = m_type == 'video' is_video = m_type == 'video'
episode = self._download_json( if fetch_episode:
self._API_URL_TEMPLATE % ( episode = self._call_api(
'v' if is_video else 'r', episode_id, lang, is_video, True, episode_id[:4] == '9999')[0]
'clip' if episode_id[:4] == '9999' else 'esd',
episode_id, lang, '/all' if is_video else ''),
episode_id, query={'apikey': 'EJfK8jdS57GqlupFgAfAAwr573q01y6k'})['data']['episodes'][0]
title = episode.get('sub_title_clean') or episode['sub_title'] title = episode.get('sub_title_clean') or episode['sub_title']
def get_clean_field(key): def get_clean_field(key):
@ -76,18 +58,121 @@ def get_clean_field(key):
'episode': title, 'episode': title,
} }
if is_video: if is_video:
vod_id = episode['vod_id']
info.update({ info.update({
'_type': 'url_transparent', '_type': 'url_transparent',
'ie_key': 'Piksel', 'ie_key': 'Piksel',
'url': 'https://player.piksel.com/v/refid/nhkworld/prefid/' + episode['vod_id'], 'url': 'https://player.piksel.com/v/refid/nhkworld/prefid/' + vod_id,
'id': vod_id,
}) })
else: else:
audio = episode['audio'] if fetch_episode:
audio_path = audio['audio'] audio_path = episode['audio']['audio']
info['formats'] = self._extract_m3u8_formats( info['formats'] = self._extract_m3u8_formats(
'https://nhkworld-vh.akamaihd.net/i%s/master.m3u8' % audio_path, 'https://nhkworld-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
episode_id, 'm4a', entry_protocol='m3u8_native', episode_id, 'm4a', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False) m3u8_id='hls', fatal=False)
for f in info['formats']: for f in info['formats']:
f['language'] = lang f['language'] = lang
else:
info.update({
'_type': 'url_transparent',
'ie_key': NhkVodIE.ie_key(),
'url': url,
})
return info return info
class NhkVodIE(NhkBaseIE):
_VALID_URL = r'%s%s(?P<id>\d{7}|[^/]+?-\d{8}-[0-9a-z]+)' % (NhkBaseIE._BASE_URL_REGEX, NhkBaseIE._TYPE_REGEX)
# Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{
# video clip
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/9999011/',
'md5': '7a90abcfe610ec22a6bfe15bd46b30ca',
'info_dict': {
'id': 'a95j5iza',
'ext': 'mp4',
'title': "Dining with the Chef - Chef Saito's Family recipe: MENCHI-KATSU",
'description': 'md5:5aee4a9f9d81c26281862382103b0ea5',
'timestamp': 1565965194,
'upload_date': '20190816',
},
}, {
# audio clip
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/r_inventions-20201104-1/',
'info_dict': {
'id': 'r_inventions-20201104-1-en',
'ext': 'm4a',
'title': "Japan's Top Inventions - Miniature Video Cameras",
'description': 'md5:07ea722bdbbb4936fdd360b6a480c25b',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/2015173/',
'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/j_art-20150903-1/',
'only_matching': True,
}]
def _real_extract(self, url):
return self._extract_episode_info(url)
class NhkVodProgramIE(NhkBaseIE):
_VALID_URL = r'%s/program%s(?P<id>[0-9a-z]+)(?:.+?\btype=(?P<episode_type>clip|(?:radio|tv)Episode))?' % (NhkBaseIE._BASE_URL_REGEX, NhkBaseIE._TYPE_REGEX)
_TESTS = [{
# video program episodes
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/program/video/japanrailway',
'info_dict': {
'id': 'japanrailway',
'title': 'Japan Railway Journal',
},
'playlist_mincount': 1,
}, {
# video program clips
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/program/video/japanrailway/?type=clip',
'info_dict': {
'id': 'japanrailway',
'title': 'Japan Railway Journal',
},
'playlist_mincount': 5,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/program/video/10yearshayaomiyazaki/',
'only_matching': True,
}, {
# audio program
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/program/audio/listener/',
'only_matching': True,
}]
def _real_extract(self, url):
lang, m_type, program_id, episode_type = re.match(self._VALID_URL, url).groups()
episodes = self._call_api(
program_id, lang, m_type == 'video', False, episode_type == 'clip')
entries = []
for episode in episodes:
episode_path = episode.get('url')
if not episode_path:
continue
entries.append(self._extract_episode_info(
urljoin(url, episode_path), episode))
program_title = None
if entries:
program_title = entries[0].get('series')
return self.playlist_result(entries, program_id, program_title)

View File

@ -1,20 +1,23 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import datetime import datetime
import functools
import json
import math
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urlparse, compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
dict_get, dict_get,
ExtractorError, ExtractorError,
int_or_none,
float_or_none, float_or_none,
InAdvancePagedList,
int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
remove_start, remove_start,
@ -181,7 +184,7 @@ def _login(self):
if urlh is False: if urlh is False:
login_ok = False login_ok = False
else: else:
parts = compat_urlparse.urlparse(urlh.geturl()) parts = compat_urllib_parse_urlparse(urlh.geturl())
if compat_parse_qs(parts.query).get('message', [None])[0] == 'cant_login': if compat_parse_qs(parts.query).get('message', [None])[0] == 'cant_login':
login_ok = False login_ok = False
if not login_ok: if not login_ok:
@ -292,7 +295,7 @@ def _format_id_from_url(video_url):
'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1', 'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
video_id, 'Downloading flv info') video_id, 'Downloading flv info')
flv_info = compat_urlparse.parse_qs(flv_info_webpage) flv_info = compat_parse_qs(flv_info_webpage)
if 'url' not in flv_info: if 'url' not in flv_info:
if 'deleted' in flv_info: if 'deleted' in flv_info:
raise ExtractorError('The video has been deleted.', raise ExtractorError('The video has been deleted.',
@ -437,34 +440,76 @@ def get_video_info(items):
class NiconicoPlaylistIE(InfoExtractor): class NiconicoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/mylist/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/(?:user/\d+/)?mylist/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.nicovideo.jp/mylist/27411728', 'url': 'http://www.nicovideo.jp/mylist/27411728',
'info_dict': { 'info_dict': {
'id': '27411728', 'id': '27411728',
'title': 'AKB48のオールナイトニッポン', 'title': 'AKB48のオールナイトニッポン',
'description': 'md5:d89694c5ded4b6c693dea2db6e41aa08',
'uploader': 'のっく',
'uploader_id': '805442',
}, },
'playlist_mincount': 225, 'playlist_mincount': 225,
} }, {
'url': 'https://www.nicovideo.jp/user/805442/mylist/27411728',
'only_matching': True,
}]
_PAGE_SIZE = 100
def _call_api(self, list_id, resource, query):
return self._download_json(
'https://nvapi.nicovideo.jp/v2/mylists/' + list_id, list_id,
'Downloading %s JSON metatdata' % resource, query=query,
headers={'X-Frontend-Id': 6})['data']['mylist']
def _parse_owner(self, item):
owner = item.get('owner') or {}
if owner:
return {
'uploader': owner.get('name'),
'uploader_id': owner.get('id'),
}
return {}
def _fetch_page(self, list_id, page):
page += 1
items = self._call_api(list_id, 'page %d' % page, {
'page': page,
'pageSize': self._PAGE_SIZE,
})['items']
for item in items:
video = item.get('video') or {}
video_id = video.get('id')
if not video_id:
continue
count = video.get('count') or {}
get_count = lambda x: int_or_none(count.get(x))
info = {
'_type': 'url',
'id': video_id,
'title': video.get('title'),
'url': 'https://www.nicovideo.jp/watch/' + video_id,
'description': video.get('shortDescription'),
'duration': int_or_none(video.get('duration')),
'view_count': get_count('view'),
'comment_count': get_count('comment'),
'ie_key': NiconicoIE.ie_key(),
}
info.update(self._parse_owner(video))
yield info
def _real_extract(self, url): def _real_extract(self, url):
list_id = self._match_id(url) list_id = self._match_id(url)
webpage = self._download_webpage(url, list_id) mylist = self._call_api(list_id, 'list', {
'pageSize': 1,
entries_json = self._search_regex(r'Mylist\.preload\(\d+, (\[.*\])\);', })
webpage, 'entries') entries = InAdvancePagedList(
entries = json.loads(entries_json) functools.partial(self._fetch_page, list_id),
entries = [{ math.ceil(mylist['totalItemCount'] / self._PAGE_SIZE),
'_type': 'url', self._PAGE_SIZE)
'ie_key': NiconicoIE.ie_key(), result = self.playlist_result(
'url': ('http://www.nicovideo.jp/watch/%s' % entries, list_id, mylist.get('name'), mylist.get('description'))
entry['item_data']['video_id']), result.update(self._parse_owner(mylist))
} for entry in entries] return result
return {
'_type': 'playlist',
'title': self._search_regex(r'\s+name: "(.*?)"', webpage, 'title'),
'id': list_id,
'entries': entries,
}

View File

@ -5,10 +5,11 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
parse_iso8601,
float_or_none,
ExtractorError, ExtractorError,
float_or_none,
int_or_none, int_or_none,
parse_iso8601,
try_get,
) )
@ -35,7 +36,7 @@ def _real_extract(self, url):
'$include': '[HasClosedCaptions]', '$include': '[HasClosedCaptions]',
}) })
if content_package.get('Constraints', {}).get('Security', {}).get('Type'): if try_get(content_package, lambda x: x['Constraints']['Security']['Type']):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
manifest_base_url = content_package_url + 'manifest.' manifest_base_url = content_package_url + 'manifest.'
@ -52,7 +53,7 @@ def _real_extract(self, url):
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = [] thumbnails = []
for image in content.get('Images', []): for image in (content.get('Images') or []):
image_url = image.get('Url') image_url = image.get('Url')
if not image_url: if not image_url:
continue continue
@ -70,7 +71,7 @@ def _real_extract(self, url):
continue continue
container.append(e_name) container.append(e_name)
season = content.get('Season', {}) season = content.get('Season') or {}
info = { info = {
'id': content_id, 'id': content_id,
@ -79,13 +80,14 @@ def _real_extract(self, url):
'timestamp': parse_iso8601(content.get('BroadcastDateTime')), 'timestamp': parse_iso8601(content.get('BroadcastDateTime')),
'episode_number': int_or_none(content.get('Episode')), 'episode_number': int_or_none(content.get('Episode')),
'season': season.get('Name'), 'season': season.get('Name'),
'season_number': season.get('Number'), 'season_number': int_or_none(season.get('Number')),
'season_id': season.get('Id'), 'season_id': season.get('Id'),
'series': content.get('Media', {}).get('Name'), 'series': try_get(content, lambda x: x['Media']['Name']),
'tags': tags, 'tags': tags,
'categories': categories, 'categories': categories,
'duration': float_or_none(content_package.get('Duration')), 'duration': float_or_none(content_package.get('Duration')),
'formats': formats, 'formats': formats,
'thumbnails': thumbnails,
} }
if content_package.get('HasClosedCaptions'): if content_package.get('HasClosedCaptions'):

File diff suppressed because it is too large Load Diff

View File

@ -541,6 +541,10 @@ def _real_extract(self, url):
'format_id': format_id, 'format_id': format_id,
'filesize': file_size, 'filesize': file_size,
}) })
if format_id == '0p':
f['vcodec'] = 'none'
else:
f['fps'] = int_or_none(file_.get('fps'))
formats.append(f) formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -6,16 +6,33 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError,
dict_get, dict_get,
ExtractorError,
int_or_none, int_or_none,
unescapeHTML,
parse_iso8601, parse_iso8601,
try_get,
unescapeHTML,
) )
class PikselIE(InfoExtractor): class PikselIE(InfoExtractor):
_VALID_URL = r'https?://player\.piksel\.com/v/(?:refid/[^/]+/prefid/)?(?P<id>[a-z0-9_]+)' _VALID_URL = r'''(?x)https?://
(?:
(?:
player\.
(?:
olympusattelecom|
vibebyvista
)|
(?:api|player)\.multicastmedia|
(?:api-ovp|player)\.piksel
)\.com|
(?:
mz-edge\.stream\.co|
movie-s\.nhk\.or
)\.jp|
vidego\.baltimorecity\.gov
)/v/(?:refid/(?P<refid>[^/]+)/prefid/)?(?P<id>[\w-]+)'''
_TESTS = [ _TESTS = [
{ {
'url': 'http://player.piksel.com/v/ums2867l', 'url': 'http://player.piksel.com/v/ums2867l',
@ -56,46 +73,41 @@ def _extract_url(webpage):
if mobj: if mobj:
return mobj.group('url') return mobj.group('url')
def _call_api(self, app_token, resource, display_id, query, fatal=True):
response = (self._download_json(
'http://player.piksel.com/ws/ws_%s/api/%s/mode/json/apiv/5' % (resource, app_token),
display_id, query=query, fatal=fatal) or {}).get('response')
failure = try_get(response, lambda x: x['failure']['reason'])
if failure:
if fatal:
raise ExtractorError(failure, expected=True)
self.report_warning(failure)
return response
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) ref_id, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-de-program-uuid=[\'"]([a-z0-9]+)',
webpage, 'program uuid', default=display_id)
app_token = self._search_regex([ app_token = self._search_regex([
r'clientAPI\s*:\s*"([^"]+)"', r'clientAPI\s*:\s*"([^"]+)"',
r'data-de-api-key\s*=\s*"([^"]+)"' r'data-de-api-key\s*=\s*"([^"]+)"'
], webpage, 'app token') ], webpage, 'app token')
response = self._download_json( query = {'refid': ref_id, 'prefid': display_id} if ref_id else {'v': display_id}
'http://player.piksel.com/ws/ws_program/api/%s/mode/json/apiv/5' % app_token, program = self._call_api(
video_id, query={ app_token, 'program', display_id, query)['WsProgramResponse']['program']
'v': video_id video_id = program['uuid']
})['response'] video_data = program['asset']
failure = response.get('failure')
if failure:
raise ExtractorError(response['failure']['reason'], expected=True)
video_data = response['WsProgramResponse']['program']['asset']
title = video_data['title'] title = video_data['title']
asset_type = dict_get(video_data, ['assetType', 'asset_type'])
formats = [] formats = []
m3u8_url = dict_get(video_data, [ def process_asset_file(asset_file):
'm3u8iPadURL', if not asset_file:
'ipadM3u8Url', return
'm3u8AndroidURL',
'm3u8iPhoneURL',
'iphoneM3u8Url'])
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
asset_type = dict_get(video_data, ['assetType', 'asset_type'])
for asset_file in video_data.get('assetFiles', []):
# TODO: extract rtmp formats # TODO: extract rtmp formats
http_url = asset_file.get('http_url') http_url = asset_file.get('http_url')
if not http_url: if not http_url:
continue return
tbr = None tbr = None
vbr = int_or_none(asset_file.get('videoBitrate'), 1024) vbr = int_or_none(asset_file.get('videoBitrate'), 1024)
abr = int_or_none(asset_file.get('audioBitrate'), 1024) abr = int_or_none(asset_file.get('audioBitrate'), 1024)
@ -118,6 +130,43 @@ def _real_extract(self, url):
'filesize': int_or_none(asset_file.get('filesize')), 'filesize': int_or_none(asset_file.get('filesize')),
'tbr': tbr, 'tbr': tbr,
}) })
def process_asset_files(asset_files):
for asset_file in (asset_files or []):
process_asset_file(asset_file)
process_asset_files(video_data.get('assetFiles'))
process_asset_file(video_data.get('referenceFile'))
if not formats:
asset_id = video_data.get('assetid') or program.get('assetid')
if asset_id:
process_asset_files(try_get(self._call_api(
app_token, 'asset_file', display_id, {
'assetid': asset_id,
}, False), lambda x: x['WsAssetFileResponse']['AssetFiles']))
m3u8_url = dict_get(video_data, [
'm3u8iPadURL',
'ipadM3u8Url',
'm3u8AndroidURL',
'm3u8iPhoneURL',
'iphoneM3u8Url'])
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
smil_url = dict_get(video_data, ['httpSmil', 'hdSmil', 'rtmpSmil'])
if smil_url:
transform_source = None
if ref_id == 'nhkworld':
# TODO: figure out if this is something to be fixed in urljoin,
# _parse_smil_formats or keep it here
transform_source = lambda x: x.replace('src="/', 'src="').replace('/media"', '/media/"')
formats.extend(self._extract_smil_formats(
re.sub(r'/od/[^/]+/', '/od/http/', smil_url), video_id,
transform_source=transform_source, fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@ -31,7 +31,12 @@ def _download_webpage_handle(self, *args, **kwargs):
def dl(*args, **kwargs): def dl(*args, **kwargs):
return super(PornHubBaseIE, self)._download_webpage_handle(*args, **kwargs) return super(PornHubBaseIE, self)._download_webpage_handle(*args, **kwargs)
webpage, urlh = dl(*args, **kwargs) ret = dl(*args, **kwargs)
if not ret:
return ret
webpage, urlh = ret
if any(re.search(p, webpage) for p in ( if any(re.search(p, webpage) for p in (
r'<body\b[^>]+\bonload=["\']go\(\)', r'<body\b[^>]+\bonload=["\']go\(\)',
@ -53,7 +58,7 @@ class PornHubIE(PornHubBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)| (?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/ (?:www\.)?thumbzilla\.com/video/
) )
(?P<id>[\da-z]+) (?P<id>[\da-z]+)
@ -152,6 +157,9 @@ class PornHubIE(PornHubBaseIE):
}, { }, {
'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933', 'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.pornhub.org/view_video.php?viewkey=203640933',
'only_matching': True,
}, { }, {
'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5e4acdae54a82', 'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5e4acdae54a82',
'only_matching': True, 'only_matching': True,
@ -160,7 +168,7 @@ class PornHubIE(PornHubBaseIE):
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return re.findall( return re.findall(
r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub\.(?:com|net)/embed/[\da-z]+)', r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub\.(?:com|net|org)/embed/[\da-z]+)',
webpage) webpage)
def _extract_count(self, pattern, webpage, name): def _extract_count(self, pattern, webpage, name):
@ -280,14 +288,24 @@ def add_video_url(video_url):
video_urls.append((v_url, None)) video_urls.append((v_url, None))
video_urls_set.add(v_url) video_urls_set.add(v_url)
def parse_quality_items(quality_items):
q_items = self._parse_json(quality_items, video_id, fatal=False)
if not isinstance(q_items, list):
return
for item in q_items:
if isinstance(item, dict):
add_video_url(item.get('url'))
if not video_urls: if not video_urls:
FORMAT_PREFIXES = ('media', 'quality') FORMAT_PREFIXES = ('media', 'quality', 'qualityItems')
js_vars = extract_js_vars( js_vars = extract_js_vars(
webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES), webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES),
default=None) default=None)
if js_vars: if js_vars:
for key, format_url in js_vars.items(): for key, format_url in js_vars.items():
if any(key.startswith(p) for p in FORMAT_PREFIXES): if key.startswith(FORMAT_PREFIXES[-1]):
parse_quality_items(format_url)
elif any(key.startswith(p) for p in FORMAT_PREFIXES[:2]):
add_video_url(format_url) add_video_url(format_url)
if not video_urls and re.search( if not video_urls and re.search(
r'<[^>]+\bid=["\']lockedPlayer', webpage): r'<[^>]+\bid=["\']lockedPlayer', webpage):
@ -343,12 +361,16 @@ def add_video_url(video_url):
r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<', r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<',
webpage, 'uploader', default=None) webpage, 'uploader', default=None)
def extract_vote_count(kind, name):
return self._extract_count(
(r'<span[^>]+\bclass="votes%s"[^>]*>([\d,\.]+)</span>' % kind,
r'<span[^>]+\bclass=["\']votes%s["\'][^>]*\bdata-rating=["\'](\d+)' % kind),
webpage, name)
view_count = self._extract_count( view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> [Vv]iews', webpage, 'view') r'<span class="count">([\d,\.]+)</span> [Vv]iews', webpage, 'view')
like_count = self._extract_count( like_count = extract_vote_count('Up', 'like')
r'<span class="votesUp">([\d,\.]+)</span>', webpage, 'like') dislike_count = extract_vote_count('Down', 'dislike')
dislike_count = self._extract_count(
r'<span class="votesDown">([\d,\.]+)</span>', webpage, 'dislike')
comment_count = self._extract_count( comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment') r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
@ -422,7 +444,7 @@ def _real_extract(self, url):
class PornHubUserIE(PornHubPlaylistBaseIE): class PornHubUserIE(PornHubPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)' _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph', 'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118, 'playlist_mincount': 118,
@ -490,7 +512,7 @@ def _real_extract(self, url):
class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE): class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)' _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph/videos', 'url': 'https://www.pornhub.com/model/zoe_ph/videos',
'only_matching': True, 'only_matching': True,
@ -605,7 +627,7 @@ def suitable(cls, url):
class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE): class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)' _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net|org))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload', 'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
'info_dict': { 'info_dict': {

View File

@ -7,6 +7,8 @@
ExtractorError, ExtractorError,
int_or_none, int_or_none,
float_or_none, float_or_none,
try_get,
unescapeHTML,
url_or_none, url_or_none,
) )
@ -55,10 +57,12 @@ class RedditRIE(InfoExtractor):
'id': 'zv89llsvexdz', 'id': 'zv89llsvexdz',
'ext': 'mp4', 'ext': 'mp4',
'title': 'That small heart attack.', 'title': 'That small heart attack.',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:4',
'timestamp': 1501941939, 'timestamp': 1501941939,
'upload_date': '20170805', 'upload_date': '20170805',
'uploader': 'Antw87', 'uploader': 'Antw87',
'duration': 12,
'like_count': int, 'like_count': int,
'dislike_count': int, 'dislike_count': int,
'comment_count': int, 'comment_count': int,
@ -116,13 +120,40 @@ def _real_extract(self, url):
else: else:
age_limit = None age_limit = None
thumbnails = []
def add_thumbnail(src):
if not isinstance(src, dict):
return
thumbnail_url = url_or_none(src.get('url'))
if not thumbnail_url:
return
thumbnails.append({
'url': unescapeHTML(thumbnail_url),
'width': int_or_none(src.get('width')),
'height': int_or_none(src.get('height')),
})
for image in try_get(data, lambda x: x['preview']['images']) or []:
if not isinstance(image, dict):
continue
add_thumbnail(image.get('source'))
resolutions = image.get('resolutions')
if isinstance(resolutions, list):
for resolution in resolutions:
add_thumbnail(resolution)
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': video_url, 'url': video_url,
'title': data.get('title'), 'title': data.get('title'),
'thumbnail': url_or_none(data.get('thumbnail')), 'thumbnails': thumbnails,
'timestamp': float_or_none(data.get('created_utc')), 'timestamp': float_or_none(data.get('created_utc')),
'uploader': data.get('author'), 'uploader': data.get('author'),
'duration': int_or_none(try_get(
data,
(lambda x: x['media']['reddit_video']['duration'],
lambda x: x['secure_media']['reddit_video']['duration']))),
'like_count': int_or_none(data.get('ups')), 'like_count': int_or_none(data.get('ups')),
'dislike_count': int_or_none(data.get('downs')), 'dislike_count': int_or_none(data.get('downs')),
'comment_count': int_or_none(data.get('num_comments')), 'comment_count': int_or_none(data.get('num_comments')),

View File

@ -6,14 +6,24 @@
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
find_xpath_attr,
int_or_none, int_or_none,
unified_strdate,
url_or_none,
xpath_attr, xpath_attr,
xpath_text, xpath_text,
) )
class RuutuIE(InfoExtractor): class RuutuIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ruutu|supla)\.fi/(?:video|supla)/(?P<id>\d+)' _VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?(?:ruutu|supla)\.fi/(?:video|supla|audio)/|
static\.nelonenmedia\.fi/player/misc/embed_player\.html\?.*?\bnid=
)
(?P<id>\d+)
'''
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.ruutu.fi/video/2058907', 'url': 'http://www.ruutu.fi/video/2058907',
@ -71,15 +81,53 @@ class RuutuIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'age_limit': 0, 'age_limit': 0,
}, },
'expected_warnings': ['HTTP Error 502: Bad Gateway'], 'expected_warnings': [
} 'HTTP Error 502: Bad Gateway',
'Failed to download m3u8 information',
],
},
{
'url': 'http://www.supla.fi/audio/2231370',
'only_matching': True,
},
{
'url': 'https://static.nelonenmedia.fi/player/misc/embed_player.html?nid=3618790',
'only_matching': True,
},
{
# episode
'url': 'https://www.ruutu.fi/video/3401964',
'info_dict': {
'id': '3401964',
'ext': 'mp4',
'title': 'Temptation Island Suomi - Kausi 5 - Jakso 17',
'description': 'md5:87cf01d5e1e88adf0c8a2937d2bd42ba',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 2582,
'age_limit': 12,
'upload_date': '20190508',
'series': 'Temptation Island Suomi',
'season_number': 5,
'episode_number': 17,
'categories': ['Reality ja tositapahtumat', 'Kotimaiset suosikit', 'Romantiikka ja parisuhde'],
},
'params': {
'skip_download': True,
},
},
{
# premium
'url': 'https://www.ruutu.fi/video/3618715',
'only_matching': True,
},
] ]
_API_BASE = 'https://gatling.nelonenmedia.fi'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_xml = self._download_xml( video_xml = self._download_xml(
'https://gatling.nelonenmedia.fi/media-xml-cache', video_id, '%s/media-xml-cache' % self._API_BASE, video_id,
query={'id': video_id}) query={'id': video_id})
formats = [] formats = []
@ -96,9 +144,18 @@ def extract_formats(node):
continue continue
processed_urls.append(video_url) processed_urls.append(video_url)
ext = determine_ext(video_url) ext = determine_ext(video_url)
auth_video_url = url_or_none(self._download_webpage(
'%s/auth/access/v2' % self._API_BASE, video_id,
note='Downloading authenticated %s stream URL' % ext,
fatal=False, query={'stream': video_url}))
if auth_video_url:
processed_urls.append(auth_video_url)
video_url = auth_video_url
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='hls', fatal=False)) video_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id='hds', fatal=False)) video_url, video_id, f4m_id='hds', fatal=False))
@ -136,18 +193,35 @@ def extract_formats(node):
extract_formats(video_xml.find('./Clip')) extract_formats(video_xml.find('./Clip'))
drm = xpath_text(video_xml, './Clip/DRM', default=None) def pv(name):
if not formats and drm: node = find_xpath_attr(
raise ExtractorError('This video is DRM protected.', expected=True) video_xml, './Clip/PassthroughVariables/variable', 'name', name)
if node is not None:
return node.get('value')
if not formats:
drm = xpath_text(video_xml, './Clip/DRM', default=None)
if drm:
raise ExtractorError('This video is DRM protected.', expected=True)
ns_st_cds = pv('ns_st_cds')
if ns_st_cds != 'free':
raise ExtractorError('This video is %s.' % ns_st_cds, expected=True)
self._sort_formats(formats) self._sort_formats(formats)
themes = pv('themes')
return { return {
'id': video_id, 'id': video_id,
'title': xpath_attr(video_xml, './/Behavior/Program', 'program_name', 'title', fatal=True), 'title': xpath_attr(video_xml, './/Behavior/Program', 'program_name', 'title', fatal=True),
'description': xpath_attr(video_xml, './/Behavior/Program', 'description', 'description'), 'description': xpath_attr(video_xml, './/Behavior/Program', 'description', 'description'),
'thumbnail': xpath_attr(video_xml, './/Behavior/Startpicture', 'href', 'thumbnail'), 'thumbnail': xpath_attr(video_xml, './/Behavior/Startpicture', 'href', 'thumbnail'),
'duration': int_or_none(xpath_text(video_xml, './/Runtime', 'duration')), 'duration': int_or_none(xpath_text(video_xml, './/Runtime', 'duration')) or int_or_none(pv('runtime')),
'age_limit': int_or_none(xpath_text(video_xml, './/AgeLimit', 'age limit')), 'age_limit': int_or_none(xpath_text(video_xml, './/AgeLimit', 'age limit')),
'upload_date': unified_strdate(pv('date_start')),
'series': pv('series_name'),
'season_number': int_or_none(pv('season_number')),
'episode_number': int_or_none(pv('episode_number')),
'categories': themes.split(',') if themes else [],
'formats': formats, 'formats': formats,
} }

View File

@ -4,8 +4,12 @@
import re import re
from .brightcove import BrightcoveNewIE from .brightcove import BrightcoveNewIE
from ..compat import compat_str from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import ( from ..utils import (
ExtractorError,
try_get, try_get,
update_url_query, update_url_query,
) )
@ -41,16 +45,22 @@ class SevenPlusIE(BrightcoveNewIE):
def _real_extract(self, url): def _real_extract(self, url):
path, episode_id = re.match(self._VALID_URL, url).groups() path, episode_id = re.match(self._VALID_URL, url).groups()
media = self._download_json( try:
'https://videoservice.swm.digital/playback', episode_id, query={ media = self._download_json(
'appId': '7plus', 'https://videoservice.swm.digital/playback', episode_id, query={
'deviceType': 'web', 'appId': '7plus',
'platformType': 'web', 'deviceType': 'web',
'accountId': 5303576322001, 'platformType': 'web',
'referenceId': 'ref:' + episode_id, 'accountId': 5303576322001,
'deliveryId': 'csai', 'referenceId': 'ref:' + episode_id,
'videoType': 'vod', 'deliveryId': 'csai',
})['media'] 'videoType': 'vod',
})['media']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
raise ExtractorError(self._parse_json(
e.cause.read().decode(), episode_id)[0]['error_code'], expected=True)
raise
for source in media.get('sources', {}): for source in media.get('sources', {}):
src = source.get('src') src = source.get('src')

View File

@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
extract_attributes, extract_attributes,
@ -11,38 +13,61 @@
class SkyBaseIE(InfoExtractor): class SkyBaseIE(InfoExtractor):
def _real_extract(self, url): BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
video_id = self._match_id(url) _SDC_EL_REGEX = r'(?s)(<div[^>]+data-(?:component-name|fn)="sdc-(?:articl|sit)e-video"[^>]*>)'
webpage = self._download_webpage(url, video_id)
video_data = extract_attributes(self._search_regex(
r'(<div.+?class="[^"]*sdc-article-video__media-ooyala[^"]*"[^>]+>)',
webpage, 'video data'))
video_url = 'ooyala:%s' % video_data['data-video-id'] def _process_ooyala_element(self, webpage, sdc_el, url):
if video_data.get('data-token-required') == 'true': sdc = extract_attributes(sdc_el)
token_fetch_options = self._parse_json(video_data.get( provider = sdc.get('data-provider')
'data-token-fetch-options', '{}'), video_id, fatal=False) or {} if provider == 'ooyala':
token_fetch_url = token_fetch_options.get('url') video_id = sdc['data-sdc-video-id']
if token_fetch_url: video_url = 'ooyala:%s' % video_id
embed_token = self._download_webpage(urljoin( ie_key = 'Ooyala'
url, token_fetch_url), video_id, fatal=False) ooyala_el = self._search_regex(
if embed_token: r'(<div[^>]+class="[^"]*\bsdc-article-video__media-ooyala\b[^"]*"[^>]+data-video-id="%s"[^>]*>)' % video_id,
video_url = smuggle_url( webpage, 'video data', fatal=False)
video_url, {'embed_token': embed_token.strip('"')}) if ooyala_el:
ooyala_attrs = extract_attributes(ooyala_el) or {}
if ooyala_attrs.get('data-token-required') == 'true':
token_fetch_url = (self._parse_json(ooyala_attrs.get(
'data-token-fetch-options', '{}'),
video_id, fatal=False) or {}).get('url')
if token_fetch_url:
embed_token = self._download_json(urljoin(
url, token_fetch_url), video_id, fatal=False)
if embed_token:
video_url = smuggle_url(
video_url, {'embed_token': embed_token})
elif provider == 'brightcove':
video_id = sdc['data-video-id']
account_id = sdc.get('data-account-id') or '6058004172001'
player_id = sdc.get('data-player-id') or 'RC9PQUaJ6'
video_url = self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, video_id)
ie_key = 'BrightcoveNew'
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'id': video_id, 'id': video_id,
'url': video_url, 'url': video_url,
'ie_key': ie_key,
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
info = self._process_ooyala_element(webpage, self._search_regex(
self._SDC_EL_REGEX, webpage, 'sdc element'), url)
info.update({
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'description': strip_or_none(self._og_search_description(webpage)), 'description': strip_or_none(self._og_search_description(webpage)),
'ie_key': 'Ooyala', })
} return info
class SkySportsIE(SkyBaseIE): class SkySportsIE(SkyBaseIE):
_VALID_URL = r'https?://(?:www\.)?skysports\.com/watch/video/(?P<id>[0-9]+)' IE_NAME = 'sky:sports'
_TEST = { _VALID_URL = r'https?://(?:www\.)?skysports\.com/watch/video/([^/]+/)*(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.skysports.com/watch/video/10328419/bale-its-our-time-to-shine', 'url': 'http://www.skysports.com/watch/video/10328419/bale-its-our-time-to-shine',
'md5': '77d59166cddc8d3cb7b13e35eaf0f5ec', 'md5': '77d59166cddc8d3cb7b13e35eaf0f5ec',
'info_dict': { 'info_dict': {
@ -52,19 +77,55 @@ class SkySportsIE(SkyBaseIE):
'description': 'md5:e88bda94ae15f7720c5cb467e777bb6d', 'description': 'md5:e88bda94ae15f7720c5cb467e777bb6d',
}, },
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
} }, {
'url': 'https://www.skysports.com/watch/video/sports/f1/12160544/abu-dhabi-gp-the-notebook',
'only_matching': True,
}, {
'url': 'https://www.skysports.com/watch/video/tv-shows/12118508/rainford-brent-how-ace-programme-helps',
'only_matching': True,
}]
class SkyNewsIE(SkyBaseIE): class SkyNewsIE(SkyBaseIE):
IE_NAME = 'sky:news'
_VALID_URL = r'https?://news\.sky\.com/video/[0-9a-z-]+-(?P<id>[0-9]+)' _VALID_URL = r'https?://news\.sky\.com/video/[0-9a-z-]+-(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'https://news.sky.com/video/russian-plane-inspected-after-deadly-fire-11712962', 'url': 'https://news.sky.com/video/russian-plane-inspected-after-deadly-fire-11712962',
'md5': 'd6327e581473cea9976a3236ded370cd', 'md5': '411e8893fd216c75eaf7e4c65d364115',
'info_dict': { 'info_dict': {
'id': '1ua21xaDE6lCtZDmbYfl8kwsKLooJbNM', 'id': 'ref:1ua21xaDE6lCtZDmbYfl8kwsKLooJbNM',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Russian plane inspected after deadly fire', 'title': 'Russian plane inspected after deadly fire',
'description': 'The Russian Investigative Committee has released video of the wreckage of a passenger plane which caught fire near Moscow.', 'description': 'The Russian Investigative Committee has released video of the wreckage of a passenger plane which caught fire near Moscow.',
'uploader_id': '6058004172001',
'timestamp': 1567112345,
'upload_date': '20190829',
}, },
'add_ie': ['Ooyala'], 'add_ie': ['BrightcoveNew'],
} }
class SkySportsNewsIE(SkyBaseIE):
IE_NAME = 'sky:sports:news'
_VALID_URL = r'https?://(?:www\.)?skysports\.com/([^/]+/)*news/\d+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.skysports.com/golf/news/12176/10871916/dustin-johnson-ready-to-conquer-players-championship-at-tpc-sawgrass',
'info_dict': {
'id': '10871916',
'title': 'Dustin Johnson ready to conquer Players Championship at TPC Sawgrass',
'description': 'Dustin Johnson is confident he can continue his dominant form in 2017 by adding the Players Championship to his list of victories.',
},
'playlist_count': 2,
}
def _real_extract(self, url):
article_id = self._match_id(url)
webpage = self._download_webpage(url, article_id)
entries = []
for sdc_el in re.findall(self._SDC_EL_REGEX, webpage):
entries.append(self._process_ooyala_element(webpage, sdc_el, url))
return self.playlist_result(
entries, article_id, self._og_search_title(webpage),
self._html_search_meta(['og:description', 'description'], webpage))

View File

@ -2,7 +2,12 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url from ..utils import (
bool_or_none,
smuggle_url,
try_get,
url_or_none,
)
class SlidesLiveIE(InfoExtractor): class SlidesLiveIE(InfoExtractor):
@ -18,8 +23,21 @@ class SlidesLiveIE(InfoExtractor):
'description': 'Watch full version of this video at https://slideslive.com/38902413.', 'description': 'Watch full version of this video at https://slideslive.com/38902413.',
'uploader': 'SlidesLive Videos - A', 'uploader': 'SlidesLive Videos - A',
'uploader_id': 'UC62SdArr41t_-_fX40QCLRw', 'uploader_id': 'UC62SdArr41t_-_fX40QCLRw',
'timestamp': 1597615266,
'upload_date': '20170925', 'upload_date': '20170925',
} }
}, {
# video_service_name = yoda
'url': 'https://slideslive.com/38935785',
'md5': '575cd7a6c0acc6e28422fe76dd4bcb1a',
'info_dict': {
'id': 'RMraDYN5ozA_',
'ext': 'mp4',
'title': 'Offline Reinforcement Learning: From Algorithms to Practical Challenges',
},
'params': {
'format': 'bestvideo',
},
}, { }, {
# video_service_name = youtube # video_service_name = youtube
'url': 'https://slideslive.com/38903721/magic-a-scientific-resurrection-of-an-esoteric-legend', 'url': 'https://slideslive.com/38903721/magic-a-scientific-resurrection-of-an-esoteric-legend',
@ -39,18 +57,48 @@ def _real_extract(self, url):
video_data = self._download_json( video_data = self._download_json(
'https://ben.slideslive.com/player/' + video_id, video_id) 'https://ben.slideslive.com/player/' + video_id, video_id)
service_name = video_data['video_service_name'].lower() service_name = video_data['video_service_name'].lower()
assert service_name in ('url', 'vimeo', 'youtube') assert service_name in ('url', 'yoda', 'vimeo', 'youtube')
service_id = video_data['video_service_id'] service_id = video_data['video_service_id']
subtitles = {}
for sub in try_get(video_data, lambda x: x['subtitles'], list) or []:
if not isinstance(sub, dict):
continue
webvtt_url = url_or_none(sub.get('webvtt_url'))
if not webvtt_url:
continue
lang = sub.get('language') or 'en'
subtitles.setdefault(lang, []).append({
'url': webvtt_url,
})
info = { info = {
'id': video_id, 'id': video_id,
'thumbnail': video_data.get('thumbnail'), 'thumbnail': video_data.get('thumbnail'),
'url': service_id, 'is_live': bool_or_none(video_data.get('is_live')),
'subtitles': subtitles,
} }
if service_name == 'url': if service_name in ('url', 'yoda'):
info['title'] = video_data['title'] info['title'] = video_data['title']
if service_name == 'url':
info['url'] = service_id
else:
formats = []
_MANIFEST_PATTERN = 'https://01.cdn.yoda.slideslive.com/%s/master.%s'
# use `m3u8` entry_protocol until EXT-X-MAP is properly supported by `m3u8_native` entry_protocol
formats.extend(self._extract_m3u8_formats(
_MANIFEST_PATTERN % (service_id, 'm3u8'),
service_id, 'mp4', m3u8_id='hls', fatal=False))
formats.extend(self._extract_mpd_formats(
_MANIFEST_PATTERN % (service_id, 'mpd'), service_id,
mpd_id='dash', fatal=False))
self._sort_formats(formats)
info.update({
'id': service_id,
'formats': formats,
})
else: else:
info.update({ info.update({
'_type': 'url_transparent', '_type': 'url_transparent',
'url': service_id,
'ie_key': service_name.capitalize(), 'ie_key': service_name.capitalize(),
'title': video_data.get('title'), 'title': video_data.get('title'),
}) })

View File

@ -1,416 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
import hashlib
import uuid
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
sanitized_Request,
unified_strdate,
urlencode_postdata,
xpath_text,
)
class SmotriIE(InfoExtractor):
IE_DESC = 'Smotri.com'
IE_NAME = 'smotri'
_VALID_URL = r'https?://(?:www\.)?(?:smotri\.com/video/view/\?id=|pics\.smotri\.com/(?:player|scrubber_custom8)\.swf\?file=)(?P<id>v(?P<realvideoid>[0-9]+)[a-z0-9]{4})'
_NETRC_MACHINE = 'smotri'
_TESTS = [
# real video id 2610366
{
'url': 'http://smotri.com/video/view/?id=v261036632ab',
'md5': '02c0dfab2102984e9c5bb585cc7cc321',
'info_dict': {
'id': 'v261036632ab',
'ext': 'mp4',
'title': 'катастрофа с камер видеонаблюдения',
'uploader': 'rbc2008',
'uploader_id': 'rbc08',
'upload_date': '20131118',
'thumbnail': 'http://frame6.loadup.ru/8b/a9/2610366.3.3.jpg',
},
},
# real video id 57591
{
'url': 'http://smotri.com/video/view/?id=v57591cb20',
'md5': '830266dfc21f077eac5afd1883091bcd',
'info_dict': {
'id': 'v57591cb20',
'ext': 'flv',
'title': 'test',
'uploader': 'Support Photofile@photofile',
'uploader_id': 'support-photofile',
'upload_date': '20070704',
'thumbnail': 'http://frame4.loadup.ru/03/ed/57591.2.3.jpg',
},
},
# video-password, not approved by moderator
{
'url': 'http://smotri.com/video/view/?id=v1390466a13c',
'md5': 'f6331cef33cad65a0815ee482a54440b',
'info_dict': {
'id': 'v1390466a13c',
'ext': 'mp4',
'title': 'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
'uploader': 'timoxa40',
'uploader_id': 'timoxa40',
'upload_date': '20100404',
'thumbnail': 'http://frame7.loadup.ru/af/3f/1390466.3.3.jpg',
},
'params': {
'videopassword': 'qwerty',
},
'skip': 'Video is not approved by moderator',
},
# video-password
{
'url': 'http://smotri.com/video/view/?id=v6984858774#',
'md5': 'f11e01d13ac676370fc3b95b9bda11b0',
'info_dict': {
'id': 'v6984858774',
'ext': 'mp4',
'title': 'Дача Солженицина ПАРОЛЬ 223322',
'uploader': 'psavari1',
'uploader_id': 'psavari1',
'upload_date': '20081103',
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
'videopassword': '223322',
},
},
# age limit + video-password, not approved by moderator
{
'url': 'http://smotri.com/video/view/?id=v15408898bcf',
'md5': '91e909c9f0521adf5ee86fbe073aad70',
'info_dict': {
'id': 'v15408898bcf',
'ext': 'flv',
'title': 'этот ролик не покажут по ТВ',
'uploader': 'zzxxx',
'uploader_id': 'ueggb',
'upload_date': '20101001',
'thumbnail': 'http://frame3.loadup.ru/75/75/1540889.1.3.jpg',
'age_limit': 18,
},
'params': {
'videopassword': '333'
},
'skip': 'Video is not approved by moderator',
},
# age limit + video-password
{
'url': 'http://smotri.com/video/view/?id=v7780025814',
'md5': 'b4599b068422559374a59300c5337d72',
'info_dict': {
'id': 'v7780025814',
'ext': 'mp4',
'title': 'Sexy Beach (пароль 123)',
'uploader': 'вАся',
'uploader_id': 'asya_prosto',
'upload_date': '20081218',
'thumbnail': r're:^https?://.*\.jpg$',
'age_limit': 18,
},
'params': {
'videopassword': '123'
},
},
# swf player
{
'url': 'http://pics.smotri.com/scrubber_custom8.swf?file=v9188090500',
'md5': '31099eeb4bc906712c5f40092045108d',
'info_dict': {
'id': 'v9188090500',
'ext': 'mp4',
'title': 'Shakira - Don\'t Bother',
'uploader': 'HannahL',
'uploader_id': 'lisaha95',
'upload_date': '20090331',
'thumbnail': 'http://frame8.loadup.ru/44/0b/918809.7.3.jpg',
},
},
]
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<embed[^>]src=(["\'])(?P<url>http://pics\.smotri\.com/(?:player|scrubber_custom8)\.swf\?file=v.+?\1)',
webpage)
if mobj is not None:
return mobj.group('url')
mobj = re.search(
r'''(?x)<div\s+class="video_file">http://smotri\.com/video/download/file/[^<]+</div>\s*
<div\s+class="video_image">[^<]+</div>\s*
<div\s+class="video_id">(?P<id>[^<]+)</div>''', webpage)
if mobj is not None:
return 'http://smotri.com/video/view/?id=%s' % mobj.group('id')
def _search_meta(self, name, html, display_name=None):
if display_name is None:
display_name = name
return self._html_search_meta(name, html, display_name)
def _real_extract(self, url):
video_id = self._match_id(url)
video_form = {
'ticket': video_id,
'video_url': '1',
'frame_url': '1',
'devid': 'LoadupFlashPlayer',
'getvideoinfo': '1',
}
video_password = self._downloader.params.get('videopassword')
if video_password:
video_form['pass'] = hashlib.md5(video_password.encode('utf-8')).hexdigest()
video = self._download_json(
'http://smotri.com/video/view/url/bot/',
video_id, 'Downloading video JSON',
data=urlencode_postdata(video_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
video_url = video.get('_vidURL') or video.get('_vidURL_mp4')
if not video_url:
if video.get('_moderate_no'):
raise ExtractorError(
'Video %s has not been approved by moderator' % video_id, expected=True)
if video.get('error'):
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
if video.get('_pass_protected') == 1:
msg = ('Invalid video password' if video_password
else 'This video is protected by a password, use the --video-password option')
raise ExtractorError(msg, expected=True)
title = video['title']
thumbnail = video.get('_imgURL')
upload_date = unified_strdate(video.get('added'))
uploader = video.get('userNick')
uploader_id = video.get('userLogin')
duration = int_or_none(video.get('duration'))
# Video JSON does not provide enough meta data
# We will extract some from the video web page instead
webpage_url = 'http://smotri.com/video/view/?id=%s' % video_id
webpage = self._download_webpage(webpage_url, video_id, 'Downloading video page')
# Warning if video is unavailable
warning = self._html_search_regex(
r'<div[^>]+class="videoUnModer"[^>]*>(.+?)</div>', webpage,
'warning message', default=None)
if warning is not None:
self._downloader.report_warning(
'Video %s may not be available; smotri said: %s ' %
(video_id, warning))
# Adult content
if 'EroConfirmText">' in webpage:
self.report_age_confirmation()
confirm_string = self._html_search_regex(
r'<a[^>]+href="/video/view/\?id=%s&confirm=([^"]+)"' % video_id,
webpage, 'confirm string')
confirm_url = webpage_url + '&confirm=%s' % confirm_string
webpage = self._download_webpage(
confirm_url, video_id,
'Downloading video page (age confirmed)')
adult_content = True
else:
adult_content = False
view_count = self._html_search_regex(
r'(?s)Общее количество просмотров.*?<span class="Number">(\d+)</span>',
webpage, 'view count', fatal=False)
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
'uploader_id': uploader_id,
'duration': duration,
'view_count': int_or_none(view_count),
'age_limit': 18 if adult_content else 0,
}
class SmotriCommunityIE(InfoExtractor):
IE_DESC = 'Smotri.com community videos'
IE_NAME = 'smotri:community'
_VALID_URL = r'https?://(?:www\.)?smotri\.com/community/video/(?P<id>[0-9A-Za-z_\'-]+)'
_TEST = {
'url': 'http://smotri.com/community/video/kommuna',
'info_dict': {
'id': 'kommuna',
},
'playlist_mincount': 4,
}
def _real_extract(self, url):
community_id = self._match_id(url)
rss = self._download_xml(
'http://smotri.com/export/rss/video/by/community/-/%s/video.xml' % community_id,
community_id, 'Downloading community RSS')
entries = [
self.url_result(video_url.text, SmotriIE.ie_key())
for video_url in rss.findall('./channel/item/link')]
return self.playlist_result(entries, community_id)
class SmotriUserIE(InfoExtractor):
IE_DESC = 'Smotri.com user videos'
IE_NAME = 'smotri:user'
_VALID_URL = r'https?://(?:www\.)?smotri\.com/user/(?P<id>[0-9A-Za-z_\'-]+)'
_TESTS = [{
'url': 'http://smotri.com/user/inspector',
'info_dict': {
'id': 'inspector',
'title': 'Inspector',
},
'playlist_mincount': 9,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
rss = self._download_xml(
'http://smotri.com/export/rss/user/video/-/%s/video.xml' % user_id,
user_id, 'Downloading user RSS')
entries = [self.url_result(video_url.text, 'Smotri')
for video_url in rss.findall('./channel/item/link')]
description_text = xpath_text(rss, './channel/description') or ''
user_nickname = self._search_regex(
'^Видео режиссера (.+)$', description_text,
'user nickname', fatal=False)
return self.playlist_result(entries, user_id, user_nickname)
class SmotriBroadcastIE(InfoExtractor):
IE_DESC = 'Smotri.com broadcasts'
IE_NAME = 'smotri:broadcast'
_VALID_URL = r'https?://(?:www\.)?(?P<url>smotri\.com/live/(?P<id>[^/]+))/?.*'
_NETRC_MACHINE = 'smotri'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
broadcast_id = mobj.group('id')
broadcast_url = 'http://' + mobj.group('url')
broadcast_page = self._download_webpage(broadcast_url, broadcast_id, 'Downloading broadcast page')
if re.search('>Режиссер с логином <br/>"%s"<br/> <span>не существует<' % broadcast_id, broadcast_page) is not None:
raise ExtractorError(
'Broadcast %s does not exist' % broadcast_id, expected=True)
# Adult content
if re.search('EroConfirmText">', broadcast_page) is not None:
(username, password) = self._get_login_info()
if username is None:
self.raise_login_required(
'Erotic broadcasts allowed only for registered users')
login_form = {
'login-hint53': '1',
'confirm_erotic': '1',
'login': username,
'password': password,
}
request = sanitized_Request(
broadcast_url + '/?no_redirect=1', urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
broadcast_page = self._download_webpage(
request, broadcast_id, 'Logging in and confirming age')
if '>Неверный логин или пароль<' in broadcast_page:
raise ExtractorError(
'Unable to log in: bad username or password', expected=True)
adult_content = True
else:
adult_content = False
ticket = self._html_search_regex(
(r'data-user-file=(["\'])(?P<ticket>(?!\1).+)\1',
r"window\.broadcast_control\.addFlashVar\('file'\s*,\s*'(?P<ticket>[^']+)'\)"),
broadcast_page, 'broadcast ticket', group='ticket')
broadcast_url = 'http://smotri.com/broadcast/view/url/?ticket=%s' % ticket
broadcast_password = self._downloader.params.get('videopassword')
if broadcast_password:
broadcast_url += '&pass=%s' % hashlib.md5(broadcast_password.encode('utf-8')).hexdigest()
broadcast_json_page = self._download_webpage(
broadcast_url, broadcast_id, 'Downloading broadcast JSON')
try:
broadcast_json = json.loads(broadcast_json_page)
protected_broadcast = broadcast_json['_pass_protected'] == 1
if protected_broadcast and not broadcast_password:
raise ExtractorError(
'This broadcast is protected by a password, use the --video-password option',
expected=True)
broadcast_offline = broadcast_json['is_play'] == 0
if broadcast_offline:
raise ExtractorError('Broadcast %s is offline' % broadcast_id, expected=True)
rtmp_url = broadcast_json['_server']
mobj = re.search(r'^rtmp://[^/]+/(?P<app>.+)/?$', rtmp_url)
if not mobj:
raise ExtractorError('Unexpected broadcast rtmp URL')
broadcast_playpath = broadcast_json['_streamName']
broadcast_app = '%s/%s' % (mobj.group('app'), broadcast_json['_vidURL'])
broadcast_thumbnail = broadcast_json.get('_imgURL')
broadcast_title = self._live_title(broadcast_json['title'])
broadcast_description = broadcast_json.get('description')
broadcaster_nick = broadcast_json.get('nick')
broadcaster_login = broadcast_json.get('login')
rtmp_conn = 'S:%s' % uuid.uuid4().hex
except KeyError:
if protected_broadcast:
raise ExtractorError('Bad broadcast password', expected=True)
raise ExtractorError('Unexpected broadcast JSON')
return {
'id': broadcast_id,
'url': rtmp_url,
'title': broadcast_title,
'thumbnail': broadcast_thumbnail,
'description': broadcast_description,
'uploader': broadcaster_nick,
'uploader_id': broadcaster_login,
'age_limit': 18 if adult_content else 0,
'ext': 'flv',
'play_path': broadcast_playpath,
'player_url': 'http://pics.smotri.com/broadcast_play.swf',
'app': broadcast_app,
'rtmp_live': True,
'rtmp_conn': rtmp_conn,
'is_live': True,
}

View File

@ -1,40 +1,112 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import time
import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
int_or_none,
)
class SonyLIVIE(InfoExtractor): class SonyLIVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?sonyliv\.com/details/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?sonyliv\.com/(?:s(?:how|port)s/[^/]+|movies|clip|trailer|music-videos)/[^/?#&]+-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight", 'url': 'https://www.sonyliv.com/shows/bachelors-delight-1700000113/achaari-cheese-toast-1000022678?watch=true',
'info_dict': { 'info_dict': {
'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight", 'title': 'Bachelors Delight - Achaari Cheese Toast',
'id': 'ref:5024612095001', 'id': '1000022678',
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20170923', 'upload_date': '20200411',
'description': 'md5:7f28509a148d5be9d0782b4d5106410d', 'description': 'md5:3957fa31d9309bf336ceb3f37ad5b7cb',
'uploader_id': '5182475815001', 'timestamp': 1586632091,
'timestamp': 1506200547, 'duration': 185,
'season_number': 1,
'episode': 'Achaari Cheese Toast',
'episode_number': 1,
'release_year': 2016,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['BrightcoveNew'],
}, { }, {
'url': 'http://www.sonyliv.com/details/full%20movie/4951168986001/Sei-Raat-(Bangla)', 'url': 'https://www.sonyliv.com/movies/tahalka-1000050121?watch=true',
'only_matching': True,
}, {
'url': 'https://www.sonyliv.com/clip/jigarbaaz-1000098925',
'only_matching': True,
}, {
'url': 'https://www.sonyliv.com/trailer/sandwiched-forever-1000100286?watch=true',
'only_matching': True,
}, {
'url': 'https://www.sonyliv.com/sports/india-tour-of-australia-2020-21-1700000286/cricket-hls-day-3-1st-test-aus-vs-ind-19-dec-2020-1000100959?watch=true',
'only_matching': True,
}, {
'url': 'https://www.sonyliv.com/music-videos/yeh-un-dinon-ki-baat-hai-1000018779',
'only_matching': True, 'only_matching': True,
}] }]
_GEO_COUNTRIES = ['IN']
_TOKEN = None
# BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s' def _call_api(self, version, path, video_id):
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5182475815001/default_default/index.html?videoId=ref:%s' headers = {}
if self._TOKEN:
headers['security_token'] = self._TOKEN
try:
return self._download_json(
'https://apiv2.sonyliv.com/AGL/%s/A/ENG/WEB/%s' % (version, path),
video_id, headers=headers)['resultObj']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
message = self._parse_json(
e.cause.read().decode(), video_id)['message']
if message == 'Geoblocked Country':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
raise ExtractorError(message)
raise
def _real_initialize(self):
self._TOKEN = self._call_api('1.4', 'ALL/GETTOKEN', None)
def _real_extract(self, url): def _real_extract(self, url):
brightcove_id = self._match_id(url) video_id = self._match_id(url)
return self.url_result( content = self._call_api(
smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, { '1.5', 'IN/CONTENT/VIDEOURL/VOD/' + video_id, video_id)
'geo_countries': ['IN'], if content.get('isEncrypted'):
'referrer': url, raise ExtractorError('This video is DRM protected.', expected=True)
}), dash_url = content['videoURL']
'BrightcoveNew', brightcove_id) headers = {
'x-playback-session-id': '%s-%d' % (uuid.uuid4().hex, time.time() * 1000)
}
formats = self._extract_mpd_formats(
dash_url, video_id, mpd_id='dash', headers=headers, fatal=False)
formats.extend(self._extract_m3u8_formats(
dash_url.replace('.mpd', '.m3u8').replace('/DASH/', '/HLS/'),
video_id, 'mp4', m3u8_id='hls', headers=headers, fatal=False))
for f in formats:
f.setdefault('http_headers', {}).update(headers)
self._sort_formats(formats)
metadata = self._call_api(
'1.6', 'IN/DETAIL/' + video_id, video_id)['containers'][0]['metadata']
title = metadata['title']
episode = metadata.get('episodeTitle')
if episode and title != episode:
title += ' - ' + episode
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': content.get('posterURL'),
'description': metadata.get('longDescription') or metadata.get('shortDescription'),
'timestamp': int_or_none(metadata.get('creationDate'), 1000),
'duration': int_or_none(metadata.get('duration')),
'season_number': int_or_none(metadata.get('season')),
'episode': episode,
'episode_number': int_or_none(metadata.get('episodeNumber')),
'release_year': int_or_none(metadata.get('year')),
}

View File

@ -7,17 +7,24 @@
determine_ext, determine_ext,
ExtractorError, ExtractorError,
merge_dicts, merge_dicts,
orderedSet,
parse_duration, parse_duration,
parse_resolution, parse_resolution,
str_to_int, str_to_int,
url_or_none, url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin,
) )
class SpankBangIE(InfoExtractor): class SpankBangIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?spankbang\.com/(?P<id>[\da-z]+)/(?:video|play|embed)\b' _VALID_URL = r'''(?x)
https?://
(?:[^/]+\.)?spankbang\.com/
(?:
(?P<id>[\da-z]+)/(?:video|play|embed)\b|
[\da-z]+-(?P<id_2>[\da-z]+)/playlist/[^/?#&]+
)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://spankbang.com/3vvn/video/fantasy+solo', 'url': 'http://spankbang.com/3vvn/video/fantasy+solo',
'md5': '1cc433e1d6aa14bc376535b8679302f7', 'md5': '1cc433e1d6aa14bc376535b8679302f7',
@ -57,10 +64,14 @@ class SpankBangIE(InfoExtractor):
}, { }, {
'url': 'https://spankbang.com/2y3td/embed/', 'url': 'https://spankbang.com/2y3td/embed/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://spankbang.com/2v7ik-7ecbgu/playlist/latina+booty',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') or mobj.group('id_2')
webpage = self._download_webpage( webpage = self._download_webpage(
url.replace('/%s/embed' % video_id, '/%s/video' % video_id), url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
video_id, headers={'Cookie': 'country=US'}) video_id, headers={'Cookie': 'country=US'})
@ -155,30 +166,33 @@ def extract_format(format_id, format_url):
class SpankBangPlaylistIE(InfoExtractor): class SpankBangPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?spankbang\.com/(?P<id>[\da-z]+)/playlist/[^/]+' _VALID_URL = r'https?://(?:[^/]+\.)?spankbang\.com/(?P<id>[\da-z]+)/playlist/(?P<display_id>[^/]+)'
_TEST = { _TEST = {
'url': 'https://spankbang.com/ug0k/playlist/big+ass+titties', 'url': 'https://spankbang.com/ug0k/playlist/big+ass+titties',
'info_dict': { 'info_dict': {
'id': 'ug0k', 'id': 'ug0k',
'title': 'Big Ass Titties', 'title': 'Big Ass Titties',
}, },
'playlist_mincount': 50, 'playlist_mincount': 40,
} }
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage( webpage = self._download_webpage(
url, playlist_id, headers={'Cookie': 'country=US; mobile=on'}) url, playlist_id, headers={'Cookie': 'country=US; mobile=on'})
entries = [self.url_result( entries = [self.url_result(
'https://spankbang.com/%s/video' % video_id, urljoin(url, mobj.group('path')),
ie=SpankBangIE.ie_key(), video_id=video_id) ie=SpankBangIE.ie_key(), video_id=mobj.group('id'))
for video_id in orderedSet(re.findall( for mobj in re.finditer(
r'<a[^>]+\bhref=["\']/?([\da-z]+)/play/', webpage))] r'<a[^>]+\bhref=(["\'])(?P<path>/?[\da-z]+-(?P<id>[\da-z]+)/playlist/%s(?:(?!\1).)*)\1'
% re.escape(display_id), webpage)]
title = self._html_search_regex( title = self._html_search_regex(
r'<h1>([^<]+)\s+playlist</h1>', webpage, 'playlist title', r'<h1>([^<]+)\s+playlist\s*<', webpage, 'playlist title',
fatal=False) fatal=False)
return self.playlist_result(entries, playlist_id, title) return self.playlist_result(entries, playlist_id, title)

View File

@ -3,50 +3,62 @@
from .adobepass import AdobePassIE from .adobepass import AdobePassIE
from ..utils import ( from ..utils import (
extract_attributes, int_or_none,
update_url_query,
smuggle_url, smuggle_url,
update_url_query,
) )
class SproutIE(AdobePassIE): class SproutIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?sproutonline\.com/watch/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?(?:sproutonline|universalkids)\.com/(?:watch|(?:[^/]+/)*videos)/(?P<id>[^/?#]+)'
_TEST = { _TESTS = [{
'url': 'http://www.sproutonline.com/watch/cowboy-adventure', 'url': 'https://www.universalkids.com/shows/remy-and-boo/season/1/videos/robot-bike-race',
'md5': '74bf14128578d1e040c3ebc82088f45f',
'info_dict': { 'info_dict': {
'id': '9dexnwtmh8_X', 'id': 'bm0foJFaTKqb',
'ext': 'mp4', 'ext': 'mp4',
'title': 'A Cowboy Adventure', 'title': 'Robot Bike Race',
'description': 'Ruff-Ruff, Tweet and Dave get to be cowboys for the day at Six Cow Corral.', 'description': 'md5:436b1d97117cc437f54c383f4debc66d',
'timestamp': 1437758640, 'timestamp': 1606148940,
'upload_date': '20150724', 'upload_date': '20201123',
'uploader': 'NBCU-SPROUT-NEW', 'uploader': 'NBCU-MPAT',
} },
} 'params': {
'skip_download': True,
},
}, {
'url': 'http://www.sproutonline.com/watch/cowboy-adventure',
'only_matching': True,
}, {
'url': 'https://www.universalkids.com/watch/robot-bike-race',
'only_matching': True,
}]
_GEO_COUNTRIES = ['US']
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) mpx_metadata = self._download_json(
video_component = self._search_regex( # http://nbcuunikidsprod.apps.nbcuni.com/networks/universalkids/content/videos/
r'(?s)(<div[^>]+data-component="video"[^>]*?>)', 'https://www.universalkids.com/_api/videos/' + display_id,
webpage, 'video component', default=None) display_id)['mpxMetadata']
if video_component: media_pid = mpx_metadata['mediaPid']
options = self._parse_json(extract_attributes( theplatform_url = 'https://link.theplatform.com/s/HNK2IC/' + media_pid
video_component)['data-options'], video_id) query = {
theplatform_url = options['video'] 'mbr': 'true',
query = { 'manifest': 'm3u',
'mbr': 'true', }
'manifest': 'm3u', if mpx_metadata.get('entitlement') == 'auth':
} query['auth'] = self._extract_mvpd_auth(url, media_pid, 'sprout', 'sprout')
if options.get('protected'): theplatform_url = smuggle_url(
query['auth'] = self._extract_mvpd_auth(url, options['pid'], 'sprout', 'sprout') update_url_query(theplatform_url, query), {
theplatform_url = smuggle_url(update_url_query( 'force_smil_url': True,
theplatform_url, query), {'force_smil_url': True}) 'geo_countries': self._GEO_COUNTRIES,
else: })
iframe = self._search_regex( return {
r'(<iframe[^>]+id="sproutVideoIframe"[^>]*?>)', '_type': 'url_transparent',
webpage, 'iframe') 'id': media_pid,
theplatform_url = extract_attributes(iframe)['src'] 'url': theplatform_url,
'series': mpx_metadata.get('seriesName'),
return self.url_result(theplatform_url, 'ThePlatform') 'season_number': int_or_none(mpx_metadata.get('seasonNumber')),
'episode_number': int_or_none(mpx_metadata.get('episodeNumber')),
'ie_key': 'ThePlatform',
}

View File

@ -4,25 +4,28 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, clean_html,
ExtractorError,
int_or_none, int_or_none,
js_to_json, str_or_none,
unescapeHTML, try_get,
) )
class StitcherIE(InfoExtractor): class StitcherIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?stitcher\.com/podcast/(?:[^/]+/)+e/(?:(?P<display_id>[^/#?&]+?)-)?(?P<id>\d+)(?:[/#?&]|$)' _VALID_URL = r'https?://(?:www\.)?stitcher\.com/(?:podcast|show)/(?:[^/]+/)+e(?:pisode)?/(?:(?P<display_id>[^/#?&]+?)-)?(?P<id>\d+)(?:[/#?&]|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.stitcher.com/podcast/the-talking-machines/e/40789481?autoplay=true', 'url': 'http://www.stitcher.com/podcast/the-talking-machines/e/40789481?autoplay=true',
'md5': '391dd4e021e6edeb7b8e68fbf2e9e940', 'md5': 'e9635098e0da10b21a0e2b85585530f6',
'info_dict': { 'info_dict': {
'id': '40789481', 'id': '40789481',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Machine Learning Mastery and Cancer Clusters', 'title': 'Machine Learning Mastery and Cancer Clusters',
'description': 'md5:55163197a44e915a14a1ac3a1de0f2d3', 'description': 'md5:547adb4081864be114ae3831b4c2b42f',
'duration': 1604, 'duration': 1604,
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'upload_date': '20180126',
'timestamp': 1516989316,
}, },
}, { }, {
'url': 'http://www.stitcher.com/podcast/panoply/vulture-tv/e/the-rare-hourlong-comedy-plus-40846275?autoplay=true', 'url': 'http://www.stitcher.com/podcast/panoply/vulture-tv/e/the-rare-hourlong-comedy-plus-40846275?autoplay=true',
@ -38,6 +41,7 @@ class StitcherIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Page Not Found',
}, { }, {
# escaped title # escaped title
'url': 'http://www.stitcher.com/podcast/marketplace-on-stitcher/e/40910226?autoplay=true', 'url': 'http://www.stitcher.com/podcast/marketplace-on-stitcher/e/40910226?autoplay=true',
@ -45,37 +49,39 @@ class StitcherIE(InfoExtractor):
}, { }, {
'url': 'http://www.stitcher.com/podcast/panoply/getting-in/e/episode-2a-how-many-extracurriculars-should-i-have-40876278?autoplay=true', 'url': 'http://www.stitcher.com/podcast/panoply/getting-in/e/episode-2a-how-many-extracurriculars-should-i-have-40876278?autoplay=true',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.stitcher.com/show/threedom/episode/circles-on-a-stick-200212584',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id, audio_id = re.match(self._VALID_URL, url).groups()
audio_id = mobj.group('id')
display_id = mobj.group('display_id') or audio_id
webpage = self._download_webpage(url, display_id) resp = self._download_json(
'https://api.prod.stitcher.com/episode/' + audio_id,
display_id or audio_id)
episode = try_get(resp, lambda x: x['data']['episodes'][0], dict)
if not episode:
raise ExtractorError(resp['errors'][0]['message'], expected=True)
episode = self._parse_json( title = episode['title'].strip()
js_to_json(self._search_regex( audio_url = episode['audio_url']
r'(?s)var\s+stitcher(?:Config)?\s*=\s*({.+?});\n', webpage, 'episode config')),
display_id)['config']['episode']
title = unescapeHTML(episode['title']) thumbnail = None
formats = [{ show_id = episode.get('show_id')
'url': episode[episode_key], if show_id and episode.get('classic_id') != -1:
'ext': determine_ext(episode[episode_key]) or 'mp3', thumbnail = 'https://stitcher-classic.imgix.net/feedimages/%s.jpg' % show_id
'vcodec': 'none',
} for episode_key in ('episodeURL',) if episode.get(episode_key)]
description = self._search_regex(
r'Episode Info:\s*</span>([^<]+)<', webpage, 'description', fatal=False)
duration = int_or_none(episode.get('duration'))
thumbnail = episode.get('episodeImage')
return { return {
'id': audio_id, 'id': audio_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': description, 'description': clean_html(episode.get('html_description') or episode.get('description')),
'duration': duration, 'duration': int_or_none(episode.get('duration')),
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'formats': formats, 'url': audio_url,
'vcodec': 'none',
'timestamp': int_or_none(episode.get('date_created')),
'season_number': int_or_none(episode.get('season')),
'season_id': str_or_none(episode.get('season_id')),
} }

View File

@ -2,25 +2,40 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..utils import (
from ..utils import unified_strdate int_or_none,
parse_iso8601,
str_or_none,
strip_or_none,
try_get,
urljoin,
)
class StreetVoiceIE(InfoExtractor): class StreetVoiceIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?streetvoice\.com/[^/]+/songs/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:.+?\.)?streetvoice\.com/[^/]+/songs/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://streetvoice.com/skippylu/songs/94440/', 'url': 'https://streetvoice.com/skippylu/songs/123688/',
'md5': '15974627fc01a29e492c98593c2fd472', 'md5': '0eb535970629a5195685355f3ed60bfd',
'info_dict': { 'info_dict': {
'id': '94440', 'id': '123688',
'ext': 'mp3', 'ext': 'mp3',
'title': '', 'title': '流浪',
'description': 'Crispy脆樂團 - 輸', 'description': 'md5:8eb0bfcc9dcd8aa82bd6efca66e3fea6',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg',
'duration': 260, 'duration': 270,
'upload_date': '20091018', 'upload_date': '20100923',
'uploader': 'Crispy脆樂團', 'uploader': 'Crispy脆樂團',
'uploader_id': '627810', 'uploader_id': '627810',
'uploader_url': 're:^https?://streetvoice.com/skippylu/',
'timestamp': 1285261661,
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
'track': '流浪',
'track_id': '123688',
'album': '2010',
} }
}, { }, {
'url': 'http://tw.streetvoice.com/skippylu/songs/94440/', 'url': 'http://tw.streetvoice.com/skippylu/songs/94440/',
@ -29,21 +44,57 @@ class StreetVoiceIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
song_id = self._match_id(url) song_id = self._match_id(url)
base_url = 'https://streetvoice.com/api/v4/song/%s/' % song_id
song = self._download_json( song = self._download_json(base_url, song_id, query={
'https://streetvoice.com/api/v1/public/song/%s/' % song_id, song_id, data=b'') 'fields': 'album,comments_count,created_at,id,image,length,likes_count,name,nickname,plays_count,profile,share_count,synopsis,user,username',
})
title = song['name'] title = song['name']
author = song['user']['nickname']
formats = []
for suffix, format_id in [('hls/file', 'hls'), ('file', 'http'), ('file/original', 'original')]:
f_url = (self._download_json(
base_url + suffix + '/', song_id,
'Downloading %s format URL' % format_id,
data=b'', fatal=False) or {}).get('file')
if not f_url:
continue
f = {
'ext': 'mp3',
'format_id': format_id,
'url': f_url,
'vcodec': 'none',
}
if format_id == 'hls':
f['protocol'] = 'm3u8_native'
abr = self._search_regex(r'\.mp3\.(\d+)k', f_url, 'bitrate', default=None)
if abr:
abr = int(abr)
f.update({
'abr': abr,
'tbr': abr,
})
formats.append(f)
user = song.get('user') or {}
username = user.get('username')
get_count = lambda x: int_or_none(song.get(x + '_count'))
return { return {
'id': song_id, 'id': song_id,
'url': song['file'], 'formats': formats,
'title': title, 'title': title,
'description': '%s - %s' % (author, title), 'description': strip_or_none(song.get('synopsis')),
'thumbnail': self._proto_relative_url(song.get('image'), 'http:'), 'thumbnail': song.get('image'),
'duration': song.get('length'), 'duration': int_or_none(song.get('length')),
'upload_date': unified_strdate(song.get('created_at')), 'timestamp': parse_iso8601(song.get('created_at')),
'uploader': author, 'uploader': try_get(user, lambda x: x['profile']['nickname']),
'uploader_id': compat_str(song['user']['id']), 'uploader_id': str_or_none(user.get('id')),
'uploader_url': urljoin(url, '/%s/' % username) if username else None,
'view_count': get_count('plays'),
'like_count': get_count('likes'),
'comment_count': get_count('comments'),
'repost_count': get_count('share'),
'track': title,
'track_id': song_id,
'album': try_get(song, lambda x: x['album']['name']),
} }

View File

@ -140,7 +140,7 @@ class TeachableIE(TeachableBaseIE):
@staticmethod @staticmethod
def _is_teachable(webpage): def _is_teachable(webpage):
return 'teachableTracker.linker:autoLink' in webpage and re.search( return 'teachableTracker.linker:autoLink' in webpage and re.search(
r'<link[^>]+href=["\']https?://process\.fs\.teachablecdn\.com', r'<link[^>]+href=["\']https?://(?:process\.fs|assets)\.teachablecdn\.com',
webpage) webpage)
@staticmethod @staticmethod
@ -269,7 +269,7 @@ def _real_extract(self, url):
r'(?s)(?P<li><li[^>]+class=(["\'])(?:(?!\2).)*?section-item[^>]+>.+?</li>)', r'(?s)(?P<li><li[^>]+class=(["\'])(?:(?!\2).)*?section-item[^>]+>.+?</li>)',
webpage): webpage):
li = mobj.group('li') li = mobj.group('li')
if 'fa-youtube-play' not in li: if 'fa-youtube-play' not in li and not re.search(r'\d{1,2}:\d{2}', li):
continue continue
lecture_url = self._search_regex( lecture_url = self._search_regex(
r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1', li, r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1', li,

Some files were not shown because too many files have changed in this diff Show More