Iframe cross-origin issue

Post reply

❤Follow Topic(2)

0 favourites

8 posts

From the Asset Store

Cross roads

$9.99 USD

Template for a cross roads game, fully documented in comments and video

bartalluyn

- X3M
- - Joined 8 Jul, 2016
  - 29 topics • 356 posts
- 1
- 31 Jan, 2017
- Quote
Hi there,

I've noticed that we can actually access other domains from an Iframe without getting blocked by the CORS if we use NWJS, but it does not work when testing in a browser.

As far as I know, we can bypass CORS by disabling web security in Chrome, which is not a suitable solution.

I'm wondering how do Web scraping services manage to bypass it, since they all use an Iframe to analyze the content of a website.

And I'm also wondering how the Chromium browser of NWJS is doing it, does it have its web security tag disabled ?

Thanks.
- blurymind
- - Joined 6 Dec, 2013
  - 27 topics • 407 posts
- 1
- 1 Feb, 2017
- Quote
This is something that I am also trying to figure out how to do.

Web scrappers tend to control a web browser process. Some use phantom.js - which is a headless web browser - optimized for that sorta thing

http://phantomjs.org/

Some websites will detect that a web scrapper is trying to access them and block it, so you need to authenticate your scrapper as a browser to them

There are multiple modules on than for python. Some other people write their web scrappers in ruby on rails.

I've done my first one in autohotkey+IE (COM) - it's pretty lame choice but works. AHK has regular expressions and even a builtin gui toolkit. It's full of goodies.

Python is another great one if you are more serious about it. You can use python+flask+beautiful soup (its better than regular expressions) to make a web app, but I have never tried to make a web app that is a web scrapper yet. Might give it a try in the future, as I am getting pretty far with my research there.

I have encountered the security iframe limitation just like you have - cross domain access forbidden, but am yet to figure out a way to get around it in an elegant way. Java script or jquery wont allow it, so you might have to do something extra to get around that.

A strategy I want to try- download the target html to your localhost folder (flask), then load it inside the iframe- that way it will be on the same domain as the page trying to load it inside an iframe

X3M
- Joined 8 Jul, 2016
- 29 topics • 356 posts
1

1 Feb, 2017
Quote

blurymind Finally here is the solution :

<object data="http://www.web-source.net" width="600" height="400">
    <embed src="http://www.web-source.net" width="600" height="400"> </embed>
    Error: Embedded data could not be displayed.
</object>
[/code:2pw4ve9q]
I've never heard of <embed> tags before (Maybe it is an HTML5 addition ), but it does the job perfectly, and it looks more elegant than an Iframe. No more CORS !
Well technically it won't work if the target has set 'X-Frame-Options' to 'SAMEORIGIN' (Such as google)[/u]

Edit: That does not seem to be the ideal way for scraping, since the embed tag obfuscates the DOM elements of the frame.
It must be the solution you proposed of loading the content into a blank.html.

- blurymind
- - Joined 6 Dec, 2013
  - 27 topics • 407 posts
- 1
- 1 Feb, 2017
- Quote
I need to grab string values from elements inside the iframe, so this wouldnt do it for me
- X3M
- - Joined 8 Jul, 2016
  - 29 topics • 356 posts
- 1
- 1 Feb, 2017
- Quote
I need to grab string values from elements inside the iframe, so this wouldnt do it for me

Me too, I've tried your solution and it worked partially in NWJS but not in standard browsers.

This is what I did:

1- Created a second Iframe (Iframe2) next to the iframe which loads the website (Iframe1).

2- Got the document innerHTML of the Iframe1

3- Assigned the innerHTML to the Iframe 2

4- Got the strings from Iframe2.

It worked well for most of the websites, but not all of them allow this.

Just give it a try.
- blurymind
- - Joined 6 Dec, 2013
  - 27 topics • 407 posts
- 1
- 1 Feb, 2017
- Quote
Yeah.. I still need it to work in a browser though.

I can already easily grab any info from the target websites via web scraping and no need to use iframes at all.

I have reverse engineered how they work and already have code that scrapes them

But the current job assignment requires me to make a web form that grabs data from a website. Of course none of my bosses understand how these things work, they just want a web form with the required validations. But the data to validate some fields is stored on another website with another database. So my hacky web scrapping solutions only work when the submission form is a native app that runs a web scrapping macro.

My theory was to make a web scrapper with a web interface.But I put that on hold, because they might eventually give me access to host my form on the same domain - which will of course get rid of the security block
- blurymind
- - Joined 6 Dec, 2013
  - 27 topics • 407 posts
- 1
- 1 Feb, 2017
- Quote
X3M does your solution allow to grab the current url of whats inside the frame?
Basically if the user clicks on a url inside of the embedded frame, can we use dom in the web browser console to access the updated content url of the frame?

That might solve it for me partially at least, because some of the result's information is in the query string of the url the user clicks on
Try Construct 3

Develop games in your browser. Powerful, performant & highly capable.
Try Now Construct 3 users don't see these ads
- X3M
- - Joined 8 Jul, 2016
  - 29 topics • 356 posts
- 1
- 2 Feb, 2017
- Quote
blurymind Nah since I'm using the sandbox attribute without allow-script.
Well if you wanna make a robust Web scrapper then Javascript and client side is not the best way. It's better to make it server-sided using NodeJS.

Here is preview of my halted project, basically you get to pick whatever DOM element you want to scrap, I get the className first, if it does not exist then I get the tagName and scrap similiar elements:

But I'm willing to complete this small project since there is no Web scrapping software out there that is based on NWJS.