Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Coding a web scraper - I have a big question
Forum Updated to NodeBB v4.3 + New Features

Coding a web scraper - I have a big question

Scheduled Pinned Locked Moved Unsolved General and Desktop
7 Posts 3 Posters 988 Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Alvein
    wrote on 27 Jan 2020, 21:04 last edited by
    #1

    Hello.

    I've coded scrapers before, but this time there's something new I've never implemented.

    There's some login form with a reCAPTCHA v2.

    I've been thinking of showing a pop up window with the actual page for the user to enter the login details and "solve" the CAPTCHA. AND THEN, steal the session cookies once the protected content is reached and continue scraping from there as usual.

    Is this feasible?

    I'd say yes, but may somebody show me the right path to follow to implement the "pop up" part? I.e., using this or that component, etc.

    Remember, I actually need to browse/render the real login page and in some way, control whatever happens inside like server responses or cookies.

    Thanks for your help.

    1 Reply Last reply
    0
    • F Offline
      F Offline
      fcarney
      wrote on 27 Jan 2020, 21:22 last edited by
      #2

      Isn't the point of a captcha to prevent bots from scraping?

      C++ is a perfectly valid school of magic.

      K A 2 Replies Last reply 27 Jan 2020, 21:46
      1
      • F fcarney
        27 Jan 2020, 21:22

        Isn't the point of a captcha to prevent bots from scraping?

        K Offline
        K Offline
        kshegunov
        Moderators
        wrote on 27 Jan 2020, 21:46 last edited by
        #3

        @fcarney said in Coding a web scraper - I have a big question:

        Isn't the point of a captcha to prevent bots from scraping?

        It is. Also bots are mighty annoying.

        Read and abide by the Qt Code of Conduct

        1 Reply Last reply
        0
        • F fcarney
          27 Jan 2020, 21:22

          Isn't the point of a captcha to prevent bots from scraping?

          A Offline
          A Offline
          Alvein
          wrote on 27 Jan 2020, 21:56 last edited by
          #4

          @fcarney @kshegunov it is a login form, where the user has a valid username and password. I'm not stealing anything or annoying people. I'm actually automating the website for the user, which is the one who will solve the CAPTCHA in one way or another.

          K 1 Reply Last reply 27 Jan 2020, 22:07
          0
          • A Alvein
            27 Jan 2020, 21:56

            @fcarney @kshegunov it is a login form, where the user has a valid username and password. I'm not stealing anything or annoying people. I'm actually automating the website for the user, which is the one who will solve the CAPTCHA in one way or another.

            K Offline
            K Offline
            kshegunov
            Moderators
            wrote on 27 Jan 2020, 22:07 last edited by
            #5

            Maybe, but you can understand our wariness. It's a JS based system, so you need to run the whole JS machinery alongside the HTML layout engine. Probably QWebEngine is what you want, however I've not used it myself. Thereafter you do what a browser'd do, I assume.

            Read and abide by the Qt Code of Conduct

            A 1 Reply Last reply 28 Jan 2020, 18:49
            1
            • K kshegunov
              27 Jan 2020, 22:07

              Maybe, but you can understand our wariness. It's a JS based system, so you need to run the whole JS machinery alongside the HTML layout engine. Probably QWebEngine is what you want, however I've not used it myself. Thereafter you do what a browser'd do, I assume.

              A Offline
              A Offline
              Alvein
              wrote on 28 Jan 2020, 18:49 last edited by
              #6

              @kshegunov Maybe I should have used another words. But I understand what you say.

              Thanks for your suggestion. I'm using QWebEngineView. It looks like the simplest option.

              One thing I've noticed is that the web engine CRAWLS when it's in debug build. So this is a big annoyance because the login page behaves very unresponsive (talk about the reCAPTCHA...) and I have no explanation for this.

              In the release build, it works flawlessly.

              K 1 Reply Last reply 28 Jan 2020, 22:48
              1
              • A Alvein
                28 Jan 2020, 18:49

                @kshegunov Maybe I should have used another words. But I understand what you say.

                Thanks for your suggestion. I'm using QWebEngineView. It looks like the simplest option.

                One thing I've noticed is that the web engine CRAWLS when it's in debug build. So this is a big annoyance because the login page behaves very unresponsive (talk about the reCAPTCHA...) and I have no explanation for this.

                In the release build, it works flawlessly.

                K Offline
                K Offline
                kshegunov
                Moderators
                wrote on 28 Jan 2020, 22:48 last edited by
                #7

                Unfortunately I really have no idea. I've never even looked at the documentation of that module, I just know it exists (for the mentioned purpose). Hopefully someone with an idea is going to pick up the thread and give you a decent suggestion.

                Read and abide by the Qt Code of Conduct

                1 Reply Last reply
                0

                2/7

                27 Jan 2020, 21:22

                topic:navigator.unread, 5
                • Login

                • Login or register to search.
                2 out of 7
                • First post
                  2/7
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • Users
                • Groups
                • Search
                • Get Qt Extensions
                • Unsolved