Importing data in Google Sheets is a breeze thanks to its many IMPORT functions. One of the most useful ones for pulling data from websites is the IMPORTXML Google Sheets function. In this article, we will cover everything you need to know about using XML files in Google Sheets, including detailed step-by-step guides with screenshots. Read on to learn more.
Table of Contents
What is XML?
XML stands for eXtensible Markup Language, similar to the HTML language. However, there is one difference between the two: the XML language does not have predefined tags to use. Instead of that, you’re able to define tags on your own that fulfill your needs.
This makes XML a highly customizable language that allows you to store your data in a format you like to store, share, and search the data. Another pro of using XML is that it’s a standardized language which means you can transmit and share data across multiple platforms and systems over the internet or locally. The recipient will still be able to parse the data because of the standard syntax.
Ways to Import XML to Google Sheets
Here are some of the ways you can import an XML into Google Sheets:
- Manual Import:ย The XML file’s contents must be copied and pasted into Google Sheets to use this method. This approach is relatively ineffective and should only be used when dealing with a small amount of data.
- IMPORTXML Function:ย The IMPORTXML function in Google Sheets imports structured data from several sources, including XML, HTML, and TSV.
- Google Apps Script: It works well to create small custom bits of code to load XML data into Google Sheets. However, writing and setting up scripts requires more technical expertise, which is why it can be hard to do for beginners.
What is the Google Sheet IMPORTXML Function, and When Should You Use It?
The IMPORTXML Google Sheets function allows you to import XML into Google Sheets. The data can be taken from structured data like HTML, CSV, XML, TSV, RSS, and ATOM feeds.
It can take information from XML fields that are enclosed in a HTML tag like <data> and </data>. Some of the best use case scenarios for the import XML Google Sheets function is that you can utilize it to import data available publicly on the internet. This means you can scrape data from other websites.
Syntax for the IMPORTXML Google Sheets Function
Before we look at IMPORTXML Google Sheets examples, let’s look at how the formula for the function works. Here is the syntax for the import XML to Google Sheets formula:
=IMPORTXML(link, xpath_query)
The formula has two parameters that are needed for it to work. These are:
- link: this parameter defines the link of the web page you want the formula to examine. The value for the URL parameter should either be enclosed inside quotation marks or should contain a reference to the cell containing the URL.
- xpath_query: this is the query you wish to run on the data. XPath is used as an expression language that specifies parts of XML documents. The value in this parameter must be enclosed inside quotation marks. This can also be a reference to cells containing the text. We will discuss more about this below.
What is XPath?
XPath is the Path Language for XML. It uses non-XML syntax and can provide a flexible method to address different parts of XML documents. Additionally, it may be used to check addressed nodes inside a document to see whether they follow a pattern.
Before we go into some detail, let’s take a look at some of the basics of HTML:
- All the information on a website is saved in XML format and displayed using HTML.
- For example, the xpath_query “/h1/@title” provides all the “title” attributes found in the h1 tags that are available on the page.
- You can modify these XPath arguments to match your website search criteria.
Some of the most commonly used XPath Google Sheets queries include:
- //h1ย โ This query is used to scrape all of the h1s in the document. You can change the number in the query to represent different header types.
- //titleย โ //h1ย โ This query is used to scrape all of the titles in the document.
- //meta[@name=’description’]/@content โ This query is used to scrape the meta description in the document.
- //@hrefย โ This query is used to scrape all of the links in the document.
- //link[@rel=โcanonicalโ]/@hrefย โ This query is used to scrape all of the canonicals in the document.
- //*[@itemtype]/@itemtype โ This query is used to scrape schema types in the document.
- //*[@hreflang] โ This query is used to scrape hreflang in the document.
If you’re using Google Chrome, you can use the DevTools feature, which will allow you to import XPaths to Google Sheets. To do this, right-click and click on Inspect in the drop-down menu. This will take you to the elements panel. There, right-click the element you highlighted, click on Copy, and then Copy XPath.
If youโre typing manually, be careful of typos, so you donโt encounter errors.
How to Use IMPORTXML in Google Sheets
Now that we know how the import XML Google Sheetย function works, let’s take a look at how to use the Google Sheetย import XML function.
Importing an XML Table
In this example, we want to import the data from a table into our spreadsheet. As a sample, we are going to use this Wikipediaย page. We are going to import the table in the Major species section.
Here are the steps you need to follow to do this:
- Open your browser and head over to the website. Make sure to use a modern browser that allows you to inspect the code of a web page. In this example, we are using the chromium-based Microsoft Edge.
- Scroll down to find the table and right-click on the XML element. In the drop-down menu, select the Inspectย option.
- This will open the source code for the page in a sidebar towards the right side of the screen. Here, you can find how your data has been tagged. It’s best if you right-click on the exact element. However, you can hover your cursor over the different elements, which your browser will highlight on the webpage, so you can know which element you want to choose.
- In this case, the data we wish to get is written asย <tr>…</tr> as it is a table. If we wish to write this in the formula, we will be writing it as โ//trโ
- Head over to your spreadsheet now, and click on the cell where you wish to enter the formula.
- Write the first part of the IMPORTXML formula, which is =IMPORTXML(
- Now, we need to enter the first parameter, the link to the webpage from where we wish to import the XML data. In this case, we will write it as โhttps://en.wikipedia.org/wiki/Fish_farming“. Make sure to add quotation marks.
- Now, we will add a Comma ( , )ย to separate the two parameters.
- For the second parameter, we will be writing the XPath query. In this example, we checked the query we needed in step 4. Write it as โ//trโย and add quotation marks.
- Add a bracket to close the formula and press the Enter button to execute it.
When you execute the formula, you may notice extra data being added to the spreadsheet. You can then nest this function into other Google Sheets functions to clean up the data.
However, there is another more straightforward solution for smaller tables like this. Simply select the data you want to keep in the spreadsheet and copy it. Then click on the cell you wish to paste the data in and paste it. This will remove the formula from the data, which means it won’t be updated in case changes are made to the web page.
Related Reading:ย Google Sheets Limitations
Importing an XML Class
Similar to the steps we followed in the previous method, you can import XML classes from websites to Sheets using IMPORTXML. In this example, we have another page from Wikipedia, and we will extract the headers from it. Here are the steps you need to follow to do so:
- Open your browser and head over to the website.
- Right-click on the XML element. In the drop-down menu, select the Inspect option.
- Look for the name of the class for the selected data. In this example, the class is called mw-headline and exists on <span>. The XPath tag we will use here is โ//span[@class=’mw-headline’]โ. Make sure to add quotation marks.
- Head over to your spreadsheet now, and click on the cell where you wish to enter the formula. Write the first part of the IMPORTXML formula, which is =IMPORTXML(.
- Now, we need to enter the first parameter, the link to the webpage from where we wish to import the XML data. In this case, we will write it as “https://en.wikipedia.org/wiki/Meat”. Make sure to add quotation marks.
- Now, we will add a Comma ( , )ย to separate the two parameters.
- For the second parameter, we will be writing the XPath query. In this example, we checked the query we needed in step 4. Write it as “//span[@class=’mw-headline’]”ย and make sure to add quotation marks.
- Add a bracket to close the formula and press the Enter button to execute it.
Frequently Asked Questions
What Is the Google Sheets IMPORTXML Function?
XML stands for eXtensible Markup Language. It is a language similar to the HTML language. However, there is one difference between the two, where the XML language does not have predefined tags to be used. Instead of that, you’re able to define tags on your own that fulfill your needs.
The IMPORTXML function in Google Sheets import structured data from several sources. These include XML, HTML, and TSV.
How Does the IMPORTXML Function Work in Google Sheets?
IMPORTXML is a feature that comes with Google Sheets. It can be used to collect publically accessible structured data from websites. Data can be imported using IMPORTXML from any structured data, including CSV, XML, RSS, TSV, ATOM XML feeds, and HTML.
Here is the syntax for the import XML to Google Sheets formula:
=IMPORTXML(link, xpath_query)
The formula has two parameters that are needed for it to work. The first one is the link which is used to define the link of the web page you want the formula to examine. The second one is xpath_query, the query you wish to run on the data. For example, you could use โ//h2โ to search all the H2 headings from a webpage
What Are Some Common XPaths?
Here are some of the commonly used XPaths you should know:
- The โ//@hrefโย query can obtain all of the links on a webpage.
- For extracting all of the internal links on a webpage, use โ//a[contains(@href, โexample.comโ)]/@hrefโ.
- Use โ//a[not(contains(@href, โexample.comโ))]/@hrefโย to extract all of the external links on a webpage.
- For the title of the page, use the โ//titleโย tag.
- For a description of Meta use the โ//meta[@name=โdescriptionโ]/@contentโย tag.
Wrapping Up the Google Sheets IMPORTXML Guide
After browsing this IMPORTXML Google Sheets guide, you should thoroughly understand how to scrape XML data from websites and into Google Sheets. And now that youโve tackled this function, learning other import formulas should be much easier. Itโs often best to get a headstart on learning everything you can about Google Sheets before encountering problems. That way, you can solve them more quickly. With that in mind, check out our rankings for the best online courses for Google Sheetsย to give you a head start.