Scraping Crypto Details website using Python
About Cryptocurrency
A cryptocurrency is a digital or virtual currency
, which is secured by cryptography, which makes it impossible to counterfeit or double-spend. Many cryptocurrencies are decentralized networks
based on blockchain technology
- a distributed ledger enforced by a disparate network of computers.
coingecko
Coingeco
is a website that contains information about all crypto currencies. It
helps the user to understand live price, volumes and so many other things related to cryptocurrencies.
Project Outline
This project will use several Python libraries to scrape data from coingecko website. We will use the Python libraries like requests
and Beautiful Soup
to scrape data from the pages, then save our data in a CSV file
.
- Download the webpage using requests
- Inspect the HTML in the Browser
- Parse the webpage’s HTML code using Beautiful Soup
- Extract the information we want from the code
- Use Python lists and dictionaries to organize the extracted information
- Extract and combine data from multiple pages
- Save the extracted information to a CSV file
- Conclusion
Download the webpage using requests
Python requests library specifically requests. Get (), will allow us to extract the source code of a web page by passing in a URL. To keep our code clean, we'll assign the URL to a variable.
Now we can download the web page using requests.get.
Let’s check it to make sure that the request was successful
A 200 code means the request was successful.
To access the page content .text
property of response
can be used
The page contains around 13,29,624
characters.
Parse the webpage’s HTML code using Beautiful Soup
Beautiful Soup very useful Python library used to parse, or extract data from, HTML, XML, and other markup language documents. It’s installed as beautifulsoup4
, and the BeautifulSoup
class is imported from the bs4
module.
To extract information
We’re getting closer to parsing the page. Since the list of coins has been spread to so many pages which has a list of 100 coins each the most efficient way is to create a function which can be used to get information from the specific page, but which returns information of the first page as a default.
We can see that there are 101 tr tags
on the page. However, there are only 100 rows with exact information needed and it also reveals that header
row is also contained in tr
Now we can finally begin extracting the data we’ve been looking for. As we saw already, the tbody
tag contains every information with tr tags. Let's find the first row, or tr tag:
To make our code more efficient let us modify our function that which takes page number and returns table contents of the page. Because in this case we only need table contents of the page which has list of all coins and we do have only one table for each page.
Great!! We can simply get a table of the entire page, In the next part let us use this information to parse information properly
Use python lists and dictionaries to organize the extracted information
We can see how to select the child elements we need which contain the desired data. Now we can write a function which goes through each row of table, pulls all the data for each row, puts it in a dictionary, and finally creates a list of all these data dictionaries. And after that using pandas
we create a DataFrame
with the same dictionary. For that we will install and import Pandas
first.
Scraping another page for more info
Great. Until now we are able to scrape the page and create a dataframe out of it. Now in a single shot let us try to scrape another page and let us create another dataframe. So, in that in the later phase we can merge both previous and later dataframe which could result in dataframe with larger dataset
Let us merge both dataframes having 100 columns and 5 rows saw that we can get a large dataframe which has every info of 200 crypto currencies
Save the extracted information to a CSV file
The last steps will be to convert our data from a DataFrame
to a CSV file, a universal data format. CSV files can be used for many things or read by a spreadsheet program to make a more pleasing presentation of the data.
We’ll write a function which will convert the keys in the dictionary to the header row, and the values will become data rows.
To convert datframe
to CSV
file we use to_csv()
method.
Summary
What we have covered in this project
- Downloaded the webpage containing information of top 100 crypto- currencies using requests library
- Parse HTML code using BeautifulSoup
- created DataFrame using compiled and extracted information
- DataFrame contains name, price, volume and market cap
- Created a CSV file and passed DataFrame to it.
Coffee:
Meanwhile, if you find this blog insightful, you can buy me a coffee!! 🤗🤗 at : https://www.buymeacoffee.com/hebbaraditya