contentsof:url loads content of truncated URL

Question

Created Jul ’18

Replies 2

Boosts 0

Views 2.2k

Participants 3

I am trying to analyze the contents of a site that was previously loaded with URLRequest, but the content obtained with String(contentsOf:) is different than the content displayed on the WKWebView. It appears that String(contentsOf) gets the content of the truncated URL not the complete URL that was assigned.

Example:

let contents = try String(contentsOf: https://www.amazon.com/gp/aw/d/B00BECJ4R8/ref=mp_s_a_1_1?ie=UTF8&qid=1531620716&sr=8-1-spons&pi=AC_SX236_SY340_QL65&keywords=cole+haan&psc=1)

it returns the contents of the page https://www.amazon.com/gp/aw/d/B00BECJ4R8/

Why is this happening?

Is there an alternative method that allows to read the conent of the actual URL not the truncated URL?

Any advice if very much appreciated.

Thank you.

Boost

Answer 1

Claude31 OP

Jul ’18

Did you try adding

encoding: NSUTF8StringEncoding

after the URL ?

0

Answer 2

OOPer OP

Jul ’18

First of all, DO NOT USE String.init(contentsOf:) or String.init(contentsOfFile:) for resources on external network.

It's a blocking (synchronous) method and the execution of the thread is blocked until whole response reaches to the device. So, unless you know very well how to execute it in non-UI thread, you should never use it.

Using synchronous method in the UI thread can be a risk for your app to be rejected.

Use URLSession instead. You may not be accustomed to asynchronous methods, but you need to if you want to write an app accessing network resources.

This code has shown exactly the same HTML text as I put the URL into my Safari.

        let url = URL(string: "_____//www.amazon.com/gp/aw/d/B00BECJ4R8/ref=mp_s_a_1_1?ie=UTF8&qid=1531620716&sr=8-1-spons&pi=AC_SX236_SY340_QL65&keywords=cole+haan&psc=1")!
        let task = URLSession.shared.dataTask(with: url) { data, response, error in
            guard error == nil else {
                print(error!)
                return
            }
            guard let data = data else {
                print("data is nil")
                return
            }
            guard let text = String(data: data, encoding: .utf8) else {
                print("the response is not in UTF-8")
                return
            }
            print(text)
            //Use `text` inside this closure, you can call other methods or closures
            //...
        }
        task.resume()
        //Do nothing after `task.resume()`.

(Replace _____ to https: in the first line.)

But, unfortunately, the HTML text is a non-Robot proof page. Amazon's output depends on the browser's view history or login state. So, if you clean up all your browser's history, you'll see the same page even if you do not truncate the URL.

You may need to find a URL which robots can access, or need to explore how Amazon is tracking the browser's info. Very likely by cookies, but I cannot be any help to investigate Amazon's behavior.

0