QImageScraper, a simple app to scrape images from Google, Bing and Yahoo
-
I create a simple image downloader which able to download images search by Google, Bing and Yahoo. Unlike many free image search apps, this app can download almost every images search by the search engines(the best I have ever used is Extreme Image Finder, but it is non-free and close source).
There are two purposes of this project
- Help developers to collect images for their computer vision/machine learning projects
- Study how to scrape data by Qt5
Why not all of the images are downloadable?
- The search engine(Bing, Yahoo, Google etc) fail to find direct link of the image.
- The server "think" you is not a real human(robot?)
I do not attempt to solve factor 1, but I do apply some skills to alleviate factor 2. They are
- Always switch user agents
- Do not start network request instantly but start them with random period
I would like to switch ip address from time to time too but haven't figured out how to do it yet. Every free proxies I found cannot be used with QNetworkProxy, if you want to help me solve this problem, please read Recommended free, trustable proxy could work with QNetworkProxy.
Here is the project link.
-
Update to version 1.2, bug fixes. This app already suit for my needs, if there are no suggestions/requirements come from another, this maybe the last version unless I found another bug to fix
-
Update to version 1.3, fix two bugs
- Fix bug, cannot rename file when file suffix is weird(ex : smoke.jpg#$erty&)
- Fix bug, do not load the setting of save at directory
version 1.3 should be quite stable, I use it to download more than 5000 images from Bing for my small computer vision app on last night.
-
Update to version 1.4
What changed
- Fix broken Google image search due to some minor change from Google
- Support proxy
- Add top and bottom buttons to scroll web page
- Back to the top of Gallery page rather than back to the first page of the search engine, this could avoid the need of reset search settings again and again
I planned to support Tor proxy in ver_1.4, but someone say this is abusing Tor, so I give up on this idea(it is almost done).