【Python】Pixabayから画像を大量にダウンロードするプログラム【Pixabay API】

「Pixabay」というフリーのストックフォトサイトがあるのですが、APIを提供していて、画像を大量にダウンロードする事ができるので、Pythonで組んでみました。

機械学習の画像収集用にでも、どうぞ。

import requests
import urllib
import math

api_key = "" #PixabayのAPIキーを記述

print("Pixabayから画像をダウンロードするプログラム。")

keyword = input("検索キーワードを入力してください:")

page = 1
per_page = 200
limit = 0 # ここで設定した枚数までしかダウンロードしない。0の場合は無制限。

uri = "https://pixabay.com/api/"

prms = {
    "key" : api_key,
    "q" : keyword,
    "lang" : "ja", #デフォルトはen
    "image_type" : "all", #all, photo, illustration, vector
    "orientation" : "all", #all, horizontal, vertical
    "category" : "", # fashion, nature, backgrounds, science, education, people, "feelings, religion, health, places, animals, industry, food, computer, sports, transportation, travel, buildings, business, music
    "min_width" : "0", # 最小の横幅
    "min_height" : "0", # 最小の立幅
    "colors" : "", # "grayscale", "transparent", "red", "orange", "yellow", "green", "turquoise", "blue", "lilac", "pink", "white", "gray", "black", "brown"
    "editors_choice" : "false", # Editor's Choiceフラグが立ったものに限定したい場合はtrue
    "safesearch" : "false", # セーフサーチ alse or true
    "order" : "popular", # 並び順（popular or latest）
    "page" : page, # デフォルトは1、ページネーションのページ番号らしい
    "per_page" : per_page, # デフォルトは20。1ページあたりの表示件数。3〜200まで
    "callback" : "", # JSONPのコールバック関数を指定できるらしい
    "pretty" : "false", # JSON出力をインデントするかどうか false or true
}

def download(uri, save_path):
    print("Downloading from " + uri)
    img_data = urllib.request.urlopen(uri).read()
    uri_list = uri.split("/")
    file_name = uri_list[len(uri_list) - 1]
    with open(save_path + "/" + file_name, mode="wb") as f:
        f.write(img_data)
        print("Saved to " + save_path + "/" + file_name + ".")

def fetch(data, save_path, cnt, total, limit):
    if limit and cnt > limit:
        return
    for value in data:
        if limit and cnt > limit:
            break
        image = value["webformatURL"] # imageURL(オリジナルサイズ), webformatURL(640px), largeImageURL(1280px), fullHDURL(1920px)などがある。
        print(str(cnt) + "/" + str(total))
        cnt = cnt + 1
        download(image, save_path)
    return cnt

save_path = "./img"
print("save_path:" + save_path)

req = requests.get(uri, params=prms)
result = req.json()

total = result["totalHits"]
cnt = 1
cnt = fetch(result["hits"], save_path, cnt, total, limit)

page_num = math.ceil(total / per_page)

# ページネーション対応
if page_num > page:
    for i in range(page + 1, page_num + 1, 1):
        if limit and cnt > limit:
            break
        prms["page"] = i
        print(i)
        req = requests.get(uri, params=prms)
        result = req.json()
        cnt = fetch(result["hits"], save_path, cnt, total, limit)

print("Download Finished.")

この記事の最終更新日：2019/08/02

最初に記事を書いた日：2018/11/08

この記事をシェアする

・Pythonで、UPCコードのチェックデジットを計算する方法

資料室のトップに戻る

【Python】Pixabayから画像を大量にダウンロードするプログラム【Pixabay API】

この記事をシェアする

関連記事

Spotifyで聴く

Meteoric Streamについて

管理人