Skip to content

requests for particular CDNs #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
claustromaniac opened this issue Sep 11, 2018 · 31 comments
Open

requests for particular CDNs #5

claustromaniac opened this issue Sep 11, 2018 · 31 comments
Labels

Comments

@claustromaniac
Copy link
Owner

claustromaniac commented Sep 11, 2018

Use this issue for requests regarding individual CDNs.


Platform to investigate Added in Notes
75CDN
Advanced Hosting
Akamai 98ec768
Alibaba Cloud 47c01bd
Amazon Cloudfront 98ec768
Amazon Shield 9e347b6
Azion Needs custom pragma in the request to get debug headers in the response
Azure Can't be detected reliably via headers.
Baidu 6cc3ae8
BelugaCDN 6cc3ae8
BootCDN
BootstrapCDN 6cc3ae8
BunnyCDN 6cc3ae8
CacheFly CDN Offers custom CDNs and multi-CDN setups, seizing other popular CDNs (like Cloudflare)
CDN.net
CDN77 e3112f7
CDNetworks a149a10 Uses Zenedge internally
cdnlion
ChinaCache 9e347b6
Cloudflare 1fa92cc
Cloudflare AMP Already detected by CF filters
Cloudflare IPFS gateway Same as above
Edgecast e3112f7
Fastly 33f24e4
fly.io 9e347b6
Flywheel 9e347b6
G-CDN a149a10
GitHub 6cc3ae8
GoCache e3112f7
Google AMP
Google Cloud 33f24e4
Google Project Shield 6e8c38d
Huawei Cloud
Highwinds a149a10
IBM Cloud CDN powered by Akamai.
ICSS
Incapsula 6e8c38d
Instart Logic 6fe4d17
IPFS 6fe4d17 Not a CDN, but a gateway.
jsDelivr Uses StackPath, Cloudflare, Fastly, and Quantil.
KeyCDN 6e8c38d
Kinsta 98ec768 Hosting powered by Google Cloud, CDN powered by KeyCDN.
Leaseweb 6fe4d17
Limelight
Link11
MaxCDN / StackPath
MyraCloud 98ec768
NetDNA 9e347b6
Netlify 6fe4d17
NetScout
Netskope
OVH
QiHU 6cc3ae8
Qiniu
Quantil e3112f7
section.io a149a10
SingularCDN 6fe4d17
Sucuri 6e8c38d
staticfile Open source CDN for open source libraries. Can detect by URL.
Tor2web ad6de0c Not a CDN, but a gateway.
TransparentCDN 1ad2d8d
Variti 6cc3ae8
Zenedge 6fe4d17

Notes

  • Researching these cloud services can take quite a lot of time, and results are not guaranteed. Please be patient.
  • I only add new items to the extension when I can detect them in a fairly reliable and efficient way.
  • Items marked with ❌ are items that I already investigated considerably, but still haven't figured out how to detect reliably and efficiently. I may eventually get back to investigating these, but they lose priority.
@ghost

This comment has been minimized.

@claustromaniac

This comment has been minimized.

@claustromaniac claustromaniac added the enhancement New feature or request label Sep 19, 2018
@claustromaniac claustromaniac changed the title feature idea: inform users of *potential* MitM risks other than Cloudflare requests for particular CDNs Sep 19, 2018
@ghost

This comment has been minimized.

@claustromaniac

This comment has been minimized.

@ajvsol

This comment has been minimized.

@claustromaniac

This comment has been minimized.

@ghost

This comment has been minimized.

@ghost

This comment has been minimized.

@claustromaniac claustromaniac added sticky and removed enhancement New feature or request labels Oct 6, 2018
@ghost

This comment has been minimized.

@ghost

This comment has been minimized.

@claustromaniac

This comment has been minimized.

@ghost

This comment has been minimized.

@ghost
Copy link

ghost commented Oct 11, 2018

@claustromaniac looking at https://www.cdnplanet.com and https://www.cdnoverview.com and https://www.webpagetest.org/ it reveals myriads of CDN.

Researching each and every one would be a gargantuan task and rather unlikely to be achievable.

Thus wondering whether it would not make sense to team up with them (perhaps via an API), e.g. https://github.com/turbobytes/cdnfinder or https://github.com/WPO-Foundation/webpagetest

@claustromaniac
Copy link
Owner Author

TL;DR: The main goal of this extension is to raise awareness. If you want more precise information you can always use those tools on your own. Be aware that they are not failsafe either, though.


I was aware of those, and what you suggest is reasonable. To be honest, I already asked myself if I should take that route at some point. However, my decision has always been to stick to my current path for a very simple reason: I don't want to rely on third parties. That's the whole point of this extension.

I can't afford to make queries to third parties on each single request. That would be expensive. Furthermore, the scope of this extension is different than the scope of those web tools. If anything, I'd say those guys are in favor of CDNs. They even use CDNs themselves, which means that I would be making queries to CDNs just to detect other CDNs. See the problem? I wouldn't give up my privacy just to detect CDNs. That doesn't make sense to me. My privacy is the very concern that led me to create this extension in the first place.

That being said, I admit that having hundreds of CDNs to research is not the only trade-off of my current path. Not relying on third parties also means that I have less resources to work with. The extension has very limited information to analyze, which means there are things it cannot do, and it misses and will always miss a number CDNs. Moreover, I don't even want to start using an internal database of IP ranges, because I can barely maintain the extension as it is.

That's the very reason I added heuristics, and I intend to keep improving that feature as much as possible. It won't tell you who is the middle man, but it is a pretty reliable indicator otherwise.

@ghost
Copy link

ghost commented Oct 11, 2018

Valid points for sure. What I meant for the API was not realtime pulling from a 3rd party but say in frequent periods pull their available header information and incorporate in this WX, to lighten the load on the research.

I would not mind to lend a hand in expanding the header data on the detection and thus wondering whether the heuristics could be expanded to the browser's developer panel and to highlight in the network tab the row with the heuristic detection and when clicking on such row highlight in the header tab -> the header(s) triggering the heuristics.

Or perhaps an own True View tab in the developer tools, sort of this one in GC

https://chrome.google.com/webstore/detail/is-it-cached/naikbjeckbmjhngcejdmcjhoedhckglk

I could then take a look and report particular headers for specific CDN and thus help in expanding the list of CDN list

@claustromaniac
Copy link
Owner Author

claustromaniac commented Oct 12, 2018

I've come up with similar ideas myself but, for now, that wouldn't be practical. Even if you provided me with header data gathered by yourself, I'd still want to analyze it, and my stockpile of data to analyze is large enough as it is. Besides, it may not seem to you like my methods are efficient, but the truth is I've had scarce time to work on this, and so far I have invested a lot of that time coding (not researching). You'll just have to be patient.

I will reconsider the idea sometime in the future, after the extension has matured enough.

I appreciate your continued interest and willingness to help, though. Thanks for that.

@Thorin-Oakenpants
Copy link

I am a moron, so please bear with me: why is "github" treated as a CDN, especially given I am on "github"? This confuses me, and I seriously need help (in more ways than you could know)! TIA
hubba-hubba

@claustromaniac
Copy link
Owner Author

claustromaniac commented Oct 15, 2018

You're not a moron :( You're 👖!

The extension treats GitHub as a CDN because it is a CDN can be used as a CDN. It's mostly because of GitHub Pages. IIRC, there are at least 100,000 domains hosted on GitHub.

Some
examples
for
you.

(I can go on
and on if I want)...

As for why it shows while you're on GitHub... that's because I didn't bother implementing exceptions, since I don't consider that necessary.

I seriously need help (in more ways than you could know)!

I hope it's not too bad 😿

@claustromaniac
Copy link
Owner Author

claustromaniac commented Oct 15, 2018

Let's say it's sort of a special case, but I considered it one worth adding because ... https://whotracks.me/trackers.html (look where github stands in that ranking). It's not like GitHub offers reverse proxy services or the like, if that's what you were wondering. (It does use reverse proxies tho).

Plus... Microsoft.

@Thorin-Oakenpants
Copy link

OK, thanks. I totally get the github pages, but to me that is not a CDN (well not if they use the github.io, but I see they can be anything, so yeah). Soz for the spam and stoopid Q's ... I'm a special kind of NEEDS JESUS, but you handled me very well. Tah

@claustromaniac
Copy link
Owner Author

claustromaniac commented Oct 16, 2018

Nah, don't say that. Your question was actually good.

As I said before, it's a special case. GitHub is not technically a CDN and AFAIK they don't officially offer such services. There have been ways to take advantage of their infrastructure for similar goals, but that's all. You made me start to think that I should probably move it out of that fieldset in the options page, because lumping it with the other (actual) CDNs doesn't do it justice.

@claustromaniac
Copy link
Owner Author

claustromaniac commented Oct 16, 2018

Similarly, I could make the extension detect corporations that threaten our privacy and/or security (even if they don't qualify as CDNs), and list them separately in the options...

@ghost
Copy link

ghost commented Oct 16, 2018

I would prefer for Github to remain as CDN as long as it acts one (for 3rd party domains) and thus falling into the category of a CDN (notwithstanding being owned by MS).

To make that distinction (detect corporations that threaten our privacy and/or security) would be helpful for the user but how much extra work/effort to put on your plate?

Loading fonts, libraries, css (and media files) from a CDN is probably less threatening than login credentials or otherwise sensitive personal data being decrypted at the edge server.

That is back to what was discussed earlier - where domain protective services (e.g. firewall or geo location obfuscation with SNI certificates) gets mixed with CDN services and thus blurring the lines.

@Thorin-Oakenpants
Copy link

If GitHub is a special case... (if want this in FORTRAN or COBOL, I can do it for you)

IF TLD = github.*
THEN do not list as CDN
ELSE list as CDN

But then if you start to expand to "blurred lines" material, seems like a lot of work. I kinda like the idea of a separate category

@claustromaniac
Copy link
Owner Author

claustromaniac commented Oct 17, 2018

Technically, GitHub is a platform for hosting Git repos, and GitHub Pages are meant to be just static pages hosted on GitHub's servers. That's what I mean when I say GitHub is not a CDN. The thing is, anyone can quite easily host static content on GitHub and then load it up from somewhere else. That would be the simplest way I can think of to use GitHub as a CDN.

When it comes to GitHub Pages, that service is different than your typical CDN in that GitHub is merely hosting the sites. We can be (pretty darn) sure GitHub is the one at the other end of the communication (the very end, after all intermediary proxies, etc), which makes this somewhat less potentially risky than caching proxies offered by CDNs.

Still, there are many legit reasons for wanting to detect content served by GitHub, that's why I added it. I'm just wondering if I should move it out of that group of options to avoid giving people the false impression that GitHub offers CDN services.

To make that distinction (detect corporations that threaten our privacy and/or security) would be helpful for the user but how much extra work/effort to put on your plate?

I meant that I could once more broaden the scope of this extension a bit, and have it not only focus on detecting CDNs but also hosting sites (like GitHub) and such. If I did that, I should try to separate the items somehow so it becomes easier for users to understand what the extension is detecting. For starters, I should create a separate category in the options page for sites like GitHub, but I could also add some more information about each service somewhere... I'll have to think this well before I decide what to do.

Loading fonts, libraries, css (and media files) from a CDN is probably less threatening than login credentials or otherwise sensitive personal data being decrypted at the edge server.

Sure, it should be less risky in general, but that's when you think mostly about security. From a privacy standpoint, third-party content served by CDNs is just as potentially dangerous, if not more.

@:jeans:

If I list GitHub in the options as a hosting service or so, seeing it detected here shouldn't be confusing anymore (right?). Would you still prefer to not see it in the popup here? Just to be clear, I can do it like you said, I just don't personally care for it. Besides, not adding such exceptions could be useful: if you're on GitHub and the extension doesn't detect it, it could suggest something weird is going on (like phishing or something else).

Whatcha think?

@Thorin-Oakenpants
Copy link

Whatever you do, just be consistent. If GitHub is put in a new category, then treat it the same as others you would put in there re: badge counter etc

Would you still prefer to not see it in the popup here

It's not worth it the extra work, especially as the items in the new category grows. The distinction has already been made by being in a new category! cogito ergo drinkies 🍺 night night

PS: would we get a shiny new color (green, glorious green https://en.wikipedia.org/wiki/Money_(Blackadder))

@ghost
Copy link

ghost commented Oct 17, 2018

From a privacy standpoint, third-party content served by CDNs is just as potentially dangerous, if not more.

How is the risk to user privacy, as in user tracking/profiling, elevated compared to a domain utilizing user profiling/tracking without a CDN involved?
Either party can utilise only the same set of technology available for user tracking/profiling today and privacy conscious users would deploy countermeasures anyway. Such countermeasure protect all the same, whether CDN is in play or not.


When it comes to GitHub Pages, that service is different than your typical CDN in that GitHub is merely hosting the sites. We can be (pretty darn) sure GitHub is the one at the other end of the communication (the very end, after all intermediary proxies, etc), which makes this somewhat less potentially risky than caching proxies offered by CDNs.

That seems inconsistent thus - GitHub is serving content to 3rd part domains and is owned by MS and thus could potentially pose a risk to user privacy (in your own words).

If Github gets excluded from detection, that incl. heuristics, the WX would start loosing its credibility.


The thing is, anyone can quite easily host static content on GitHub and then load it up from somewhere else. That would be the simplest way I can think of to use GitHub as a CDN.

In which case is transforms to a content delivery network for domains unaffiliated to Github.


In Wikipedia CDN is an umbrella term spanning different types of content delivery services: video streaming, software downloads, web and mobile content acceleration, licensed/managed CDN, transparent caching, and services to measure CDN performance, load balancing, multi-CDN switching and analytics and cloud intelligence. CDN vendors may cross over into other industries like security, with DDoS protection and web application firewalls (WAF), and WAN optimization.

@ghost
Copy link

ghost commented Oct 17, 2018

loading libraries from CDN may actually pose a security risk unless protected by SRI

@claustromaniac
Copy link
Owner Author

How is the risk to user privacy, as in user tracking/profiling, elevated compared to a domain utilizing user profiling/tracking without a CDN involved?

My bad, I misread what you said before. For some weird reason I thought you were comparing third-party vs first-party, while you were actually only comparing the content type. I haven't slept well lately, sorry. 😅

If Github gets excluded from detection, that incl. heuristics, the WX would start loosing its credibility.

I never said anything about excluding GitHub. I only said I considered it more appropriate to move GitHub into a separate category in the options page/menu, because GitHub does not (AFAIK) offer CDN services to web developers. If you visit a site and GitHub is detected, you know GitHub is at the other end of the communication. That's the key difference. Services like Cloudflare are used by good and bad people alike, but GitHub is always GitHub.

In Wikipedia CDN is an umbrella term spanning different types of content delivery services

It is indeed a pretty broad term, but GitHub is clearly different than the thirty-something CDNs already detected by TS, and it doesn't even sell itself as a CDN. It's only a hosting service, at least for now. That's why I consider it appropriate to make that distinction somewhere, or at the very least it should be clearly stated somewhere that some hosting services are being thrown in that same bag.

@ghost
Copy link

ghost commented Oct 17, 2018

I haven't slept well lately

😟

https://www.centurylink.com/business/networking/cdn.html

@claustromaniac
Copy link
Owner Author

PS: would we get a shiny new color (green, glorious green https://en.wikipedia.org/wiki/Money_(Blackadder))

Maybe in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants