Logging into a website through google and scraping data from it

Hello,


In Swift, I want to be able to log into a website that uses a Google account to log in and scrape page text from it.


Let me explain.


I'm trying to build an app that will run in the background and periodically pull page text from a website to see if its changed. If its changed I will recieve a notification, etc. That's the base of the app. This wouldn't be too hard, but the problem is: The website I need to get to requires you to login first. Now I'm not trying to hack into the website because I don't have access. If I log in using my google account (which is what it prompts me to do), it lets me in just fine.

If I log out of the website in Safari, and reload the page, I come to a screen with a list of my three google accounts. I can log in with the third one in the list and proceed to the website.


I've already followed a Stack Overflow post that said how to scrape the content of a website and produced this code:


let url = NSURL(string: "\website url here\")
        let request = URLRequest(url: url as! URL)
        let task = URLSession.shared.dataTask(with: request) {(data, response, error) in
            var myString = String(NSString(data: data!, encoding: String.Encoding.utf8.rawValue)!)
            print(myString)
        }
        task.resume()


This code will gather the HTML from the website page specificed and print it to the console. I tried applying this code to apple.com (not the website I want to use obviously) and it printed the HTML code for apple's website, which contained some page text, but not all by the looks of it. I then tried scraping the website I want to get to. I copied and pasted the link which it took me to after I logged in. When it printed to the console, I found the words "Google" and "Account" everywhere meaning it had not bypassed the login screen.


What I want my app to do is, however neccessary (whether it be creating a WebKit view, something similar to my example up there, or something completely different), bypass the Google page by logging in to that third account or somehow clicking it, then take all the text on the page it redirects to and convert it to a String to then be parsed, etc.


How would I go about doing this?

Answered by DTS Engineer in 327265022

In Swift, I want to be able to log into a website that uses a Google account to log in and scrape page text from it.

Yeah, that’s going to be challenging.

I'm trying to build an app that will run in the background and periodically pull page text from a website to see if its changed.

And that even more so.

There’s a bunch of issues here:

  • On the non-technical side, many web sites take a dim view of web scraping. I recommend you seek legal advice before doing that.

  • iOS puts strict limits on how much an app can run in the background. You may be better off doing this work on a server, where you don’t have to worry about power, mobile networking, and so on. Also, server platforms often have good web scraping tools.

    If you need to notify the user of any changes, you can do that using push notifications.

  • As you’ve already discovered, one of the main sticking points is going to be authentication. I’ve discussed this before here on DevForums (see this thread and this other thread).

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"
Accepted Answer

In Swift, I want to be able to log into a website that uses a Google account to log in and scrape page text from it.

Yeah, that’s going to be challenging.

I'm trying to build an app that will run in the background and periodically pull page text from a website to see if its changed.

And that even more so.

There’s a bunch of issues here:

  • On the non-technical side, many web sites take a dim view of web scraping. I recommend you seek legal advice before doing that.

  • iOS puts strict limits on how much an app can run in the background. You may be better off doing this work on a server, where you don’t have to worry about power, mobile networking, and so on. Also, server platforms often have good web scraping tools.

    If you need to notify the user of any changes, you can do that using push notifications.

  • As you’ve already discovered, one of the main sticking points is going to be authentication. I’ve discussed this before here on DevForums (see this thread and this other thread).

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"
Logging into a website through google and scraping data from it
 
 
Q