About APIs, Data Scraping and Crowdfunding
Need an API from a site that doesn't have one? Come and I'll show you how to scrape websites
- Fundraising to Launch!
- Whaaat! Where's the API?
- What's This Web Scraping Thing?
- Creating the API
- Creating the Endpoints
- Scraping Data
- Installing Puppeteer
- Can We Only Get Data With Puppeteer?
- And the Result?
- Conclusions
A few days ago I started collaborating with a group of developers to create JustShip. JustShip is a space to help devs, makers and hackers walk the path and simply SHIP 🚀. The great thing about this is that it's made by the community and for the community, open source and transparent with its finances. All this sounds very nice but to buy servers, hire services, pay for internet and more you need money, that's why we launched a crowdfunding campaign on Ko-fi.
Fundraising to Launch!
Ko-fi is a good site to raise fiat money and so far it's not Cubanophobic. It has its limitations for withdrawing money, but nothing that a Cuban ninja 🥷🏻 with a thousand blockades and embargoes on top can't solve 😉.
So we decided to make a post explaining what it's about and launch the crowdfunding on Ko-fi.
Whaaat! Where's the API?
To make everything transparent we wanted to show how the fundraising was going on our initial landing page and what a shock it was when we saw that Ko-fi doesn't have an API! But of course, this wasn't going to stop us. We're computer scientists, creative and entrepreneurs! That's why we decided to scrape the web and create an API to get the information we wanted.
What's This Web Scraping Thing?
Put down the broom, it's not mopping, it's scraping. This 95.67% invented word comes from English scraping. Web scraping is a technique used through software programs to extract information from websites. Usually, these programs simulate a human's navigation on the internet.
Creating the API
To create something simple, fast and nice we're going to create a small project in express and deploy it on vercel. All very serverless.
Creating the Endpoints
Express is a library written in JavaScript that allows you to create a fast, minimalist and flexible web infrastructure for Node.js We're going to create two routes: One that shows a warm greeting and another that given a Ko-fi account extracts the data. This is how we create a basic endpoint:
// endpoint that shows a warm greeting
app.get("/", (req, res) => {
res.send("Hello World!");
});
The other endpoint will receive the Ko-fi username from the url and display its information:
app.get("/:username", async (req, res) => {
try {
// we get the username from the url
const { username } = req.params;
// we scrape the data
let result = await scrapping.getCrowdfunding(username);
// we send the information to the user
res.send(result);
} catch (error) {
// an error occurred? Well, we also have to show it to the user
res.status(500).send({
type: "error",
code: error.name,
message: error.message,
});
}
});
Good, we already have the two endpoints, now what's missing? Oh yeah! Scraping!
Scraping Data
To scrape the data we're going to use Google's puppeteer: Puppeteer. (You can also find its documentation at https://pptr.dev.) To work with Puppeteer you must follow the logic of a normal user using a browser. Example:
- open the browser
- open a new tab
- open the url you want
- see the information I need
Installing Puppeteer
Installing puppeteer is like installing other Node libraries:
npm install puppeteer for NPM and yarn add puppeteer in case you use yarn.
Less blah blah blah and more code? Here I show you how simple it is:
// we import the library
const puppeteer = require("puppeteer");
async function scraping(url) {
// we "open" the browser
const browser = await puppeteer.launch();
// we open a new tab
const page = await browser.newPage();
// we go to the url we want.
await page.goto(url);
// here we take a screenshot of that web we visited
// and save it with the name screenshot.png
await page.screenshot({ path: "screenshot.png" });
// we close the browser
browser.close();
}
This is a "hello world" with puppeteer. Once the web you want is open you can select the data you need using selectors:
let results = await page.evaluate(() => {
const selector = "#page > .my-class";
return document.querySelector(selector).textContent;
});
With the evaluate method we "enter" the puppeteer browser console and execute JavaScript code. There we just use document.querySelector to select the content of an element and return it.
Can We Only Get Data With Puppeteer?
No. Not at all. In this brief (I know, it's very long) introduction I showed you how to get data, but you can enter data, click buttons and much more. You can create a bot of yourself posting kitty memes on facebook if you want 😉.
And the Result?
Since this article wasn't meant to be so long I can't leave you all the code here. With this I only intend to show you what can be done and the tools you can use to do it. It will be up to you to apply ingenuity and lots of Google to create your kitty-posting bot. The code is hosted on GitHub and is open source. If you want you can stop by, take a look and leave your Pull Request. It has several errors that can be fun for you to fix. The code is deployed on vercel and running. You can find it here. If you want to see other people's crowdfunding you just have to pass their profile in the url like the JustShip example: https://kofi-data.ragnarok22.dev/justship
Conclusions
With Puppeteer you can simulate being a person and interact with another web. Enter text in an input, click buttons, see what information other elements contain and everything your body asks for. You can get the code to obtain ko-fi crowdfunding on my GitHub. By the way, here's my Ko-fi in case you're interested in supporting my content 😉. PS: If you want to support JustShip stop by https://justship.to, there you'll learn more about the project and see the donation methods.