Scraping Crypto Details website using Python

About Cryptocurrency

A cryptocurrency is a digital or virtual currency, which is secured by cryptography, which makes it impossible to counterfeit or double-spend. Many cryptocurrencies are decentralized networks based on blockchain technology - a distributed ledger enforced by a disparate network of computers.

coingecko

Coingeco is a website that contains information about all crypto currencies. It helps the user to understand live price, volumes and so many other things related to cryptocurrencies.

Project Outline

This project will use several Python libraries to scrape data from coingecko website. We will use the Python libraries like requests and Beautiful Soup to scrape data from the pages, then save our data in a CSV file.

  1. Download the webpage using requests
  2. Inspect the HTML in the Browser
  3. Parse the webpage’s HTML code using Beautiful Soup
  4. Extract the information we want from the code
  5. Use Python lists and dictionaries to organize the extracted information
  6. Extract and combine data from multiple pages
  7. Save the extracted information to a CSV file
  8. Conclusion

Download the webpage using requests

Python requests library specifically requests. Get (), will allow us to extract the source code of a web page by passing in a URL. To keep our code clean, we'll assign the URL to a variable.

Now we can download the web page using requests.get.

Let’s check it to make sure that the request was successful

A 200 code means the request was successful.

To access the page content .text property of response can be used

The page contains around 13,29,624 characters.

Parse the webpage’s HTML code using Beautiful Soup

Beautiful Soup very useful Python library used to parse, or extract data from, HTML, XML, and other markup language documents. It’s installed as beautifulsoup4, and the BeautifulSoup class is imported from the bs4 module.

To extract information

We’re getting closer to parsing the page. Since the list of coins has been spread to so many pages which has a list of 100 coins each the most efficient way is to create a function which can be used to get information from the specific page, but which returns information of the first page as a default.

We can see that there are 101 tr tags on the page. However, there are only 100 rows with exact information needed and it also reveals that header row is also contained in tr

Now we can finally begin extracting the data we’ve been looking for. As we saw already, the tbody tag contains every information with tr tags. Let's find the first row, or tr tag:

To make our code more efficient let us modify our function that which takes page number and returns table contents of the page. Because in this case we only need table contents of the page which has list of all coins and we do have only one table for each page.

Great!! We can simply get a table of the entire page, In the next part let us use this information to parse information properly

Use python lists and dictionaries to organize the extracted information

We can see how to select the child elements we need which contain the desired data. Now we can write a function which goes through each row of table, pulls all the data for each row, puts it in a dictionary, and finally creates a list of all these data dictionaries. And after that using pandas we create a DataFrame with the same dictionary. For that we will install and import Pandas first.

Scraping another page for more info

Great. Until now we are able to scrape the page and create a dataframe out of it. Now in a single shot let us try to scrape another page and let us create another dataframe. So, in that in the later phase we can merge both previous and later dataframe which could result in dataframe with larger dataset

Let us merge both dataframes having 100 columns and 5 rows saw that we can get a large dataframe which has every info of 200 crypto currencies

Save the extracted information to a CSV file

The last steps will be to convert our data from a DataFrame to a CSV file, a universal data format. CSV files can be used for many things or read by a spreadsheet program to make a more pleasing presentation of the data.

We’ll write a function which will convert the keys in the dictionary to the header row, and the values will become data rows.

To convert datframe to CSV file we use to_csv() method.

Summary

What we have covered in this project

  • Downloaded the webpage containing information of top 100 crypto- currencies using requests library
  • Parse HTML code using BeautifulSoup
  • created DataFrame using compiled and extracted information
  • DataFrame contains name, price, volume and market cap
  • Created a CSV file and passed DataFrame to it.

Coffee:

Meanwhile, if you find this blog insightful, you can buy me a coffee!! 🤗🤗 at : https://www.buymeacoffee.com/hebbaraditya