A Guide to Parse Data

NineAnimator uses the SwiftSoupopen in new window library for working with HTML. By using DOM traversal or CSS selectors, it enable us to find and extract data from a website.

If you are not familiar with using CSS selectors, it is recommend that you try out the simple SwiftSoup CSS selectors site: SwiftSoup Test Siteopen in new window. If you are familiar with how to use SwiftSoup and understand the basic of CSS selectors, you may skip to A NineAnimator Parsing Example.

Quick Reference Guide

The Basics

Element name

The element selector selects HTML elements based on name.

let html: String = """
<html>
  <head>
    <title>Try SwiftSoup</title>
  </head>
  <body>
    <p>This is a SwiftSoup test page</p>
    <a href='http://example.com/'>Some example link</a>
  </body>
</html>
""";
let doc: Document = try SwiftSoup.parse(html)
let paragraph: Element = try doc.select("p").first()!
let link: Element = try doc.select("a").first()!

let bodyText: String = try doc.body()!.text();
// "This is a SwiftSoup test page Some example link"

let paragraphText: String = try paragraph.text();
// "This is a SwiftSoup test page"

let linkHref: String = try link.attr("href");
// "http://example.com/"

let linkText: String = try link.text();
// "Some example link"

Classes and Id

The id selector uses the id attribute of an HTML element to select a specific element. The class selector selects HTML elements with a specific class attribute.

let html: String = """
<html>
  <head>
    <title>Try SwiftSoup</title>
  </head>
  <body>
    <p id="foo">weakness</p>
    <p id="bar">camera</p>
    <p id="foobar" class="common">offense</p>
    <p id="baz" class="common">stumble</p>
  </body>
</html>
""";
let doc: Document = try SwiftSoup.parse(html)
let paragraph: Element = try doc.select("p")

let paragraphTextOne: String = try paragraph.select("#foo").text();
// "weakness"

let paragraphTextTwo: String = try paragraph.select("#foobar").text();
// "offense"

let paragraphTextClass: String = try paragraph.select(".common").text();
// "offense stumble"

Advanced Selectors

Combinators

The combinators selectors is used to select HTML elements based on a specific relationship between them. Refer to combinatorsopen in new window for the complete list of combinators selectors.

There are four different combinators in CSS:

  • descendant selector (space)
  • child selector (>)
  • adjacent sibling selector (+)
  • general sibling selector (~)
let html: String = """
<html>
  <head>
    <title>Try SwiftSoup</title>
  </head>
  <body>
    <div id="foo">
      <p id="bar">camera</p>
      <p id="foobar" class="common">offense</p>
      <a href='http://example.com/'>Some example link</a>
    </div>
  </body>
</html>
""";
let doc: Document = try SwiftSoup.parse(html)
let body: Element = try doc.select("body")

let paragraphTextOne: String = try body.select("div > p").text();
// "camera offense"

let paragraphTextTwo: String = try body.select("p + p").text();
// "offense"

let linkHref: String = try body.select("#foo a").attr("href");
// "http://example.com/"

Attribute

The [attribute] selector is used to select elements with a specified attribute. Refer to Attribute selectorsopen in new window for the complete list of attribute selectors.

let html: String = """
<html>
  <head>
    <title>Try SwiftSoup</title>
  </head>
  <body>
    <div lang="en-us en-gb en-au en-nz">Hello World!</div>
    <div lang="pt">Olá Mundo!</div>
    <div lang="zh-CN">世界您好!</div>
  </body>
</html>
""";
let doc: Document = try SwiftSoup.parse(html)
let body: Element = try doc.select("body")

let paragraphTextOne: String = try body.select("div[lang='pt']").text();
// "Olá Mundo!"

A NineAnimator Parsing Example

The same concept apply when parsing data in NineAnimator using SwiftSoup. As mentioned, most operations in NineAnimator are performed asynchronously with NineAnimator's asynchronous framework: NineAnimatorPromise classopen in new window. The example below shows how to parse data for the AnimeSource+Featured.swift file using NineAnimatorPromise.

TIP

NineAnimator provides useful utilities to help you parse the html and return a Document object: responseBowl when making a requests with NineAnimator's requestManager. This means you do not need to do SwiftSoup.parse(htmlString).

extension NASourceGogoAnime {
    // ...
    fileprivate var latestAnimeUpdates: NineAnimatorPromise<[AnimeLink]> {
        // Browse home
        return requestManager.request("/", handling: .browsing)
            .responseBowl
            .then {
                bowl -> [AnimeLink] in
                Log.info("Loading GogoAnime ongoing releases page")
                return try bowl
                    // Selecting all the <a> element that is the direct child of elements with the "img" class
                    .select(".last_episodes>ul>li")
                    .compactMap {
                        item -> AnimeLink? in
                        // Selecting all the <a> element that is the direct child of elements with the "img" class
                        let linkContainer = try item.select(".img>a")

                        // Getting the link by retrieving the "href" attribute of the linkContainer
                        // "/boruto-naruto-next-generations-episode-237"
                        let episodePath = try linkContainer.attr("href")

                        // Match the anime identifier with regex
                        let animeIdentifierMatches = NASourceGogoAnime
                            .animeLinkFromEpisodePathRegex
                            .matches(in: episodePath, options: [], range: episodePath.matchingRange)
                        guard let animeIdentifierMatch = animeIdentifierMatches.first else { return nil }
                        let animeIdentifier = episodePath[animeIdentifierMatch.range(at: 1)]

                        // Reassemble the anime URL to something like '/category/xxx-xxxx'
                        guard let animeUrl = URL(string: "\(self.endpoint)/category/\(animeIdentifier)") else {
                            return nil
                        }

                        // Read the link to the artwork
                        // "https://example.com/cover/boruto-naruto-next-generations.png"
                        guard let artworkUrl = URL(string: try linkContainer.select("img").attr("src")) else {
                            return nil
                        }

                        // Selecting all the <p> elements with the class "name" from that are in the ".last_episodes>ul>li"
                        // "Boruto: Naruto Next Generations"
                        let animeTitle = try item.select("p.name").text()

                        return AnimeLink(
                            title: animeTitle,
                            link: animeUrl,
                            image: artworkUrl,
                            source: self
                        )
                }
        }
    }
    // ...
}











 



 



 















 





 












<div class="last_episodes loaddub">
  <ul class="items">
    <!-- .last_episodes > ul > li -->
    <li>
      <div class="img">
        <!-- .img > a -->
        <a
          href="/boruto-naruto-next-generations-episode-237"
          title="Boruto: Naruto Next Generations"
        >
          <img
            src="https://example.com/cover/boruto-naruto-next-generations.png"
            alt="Boruto: Naruto Next Generations"
          />
          <div class="type ic-SUB"></div>
        </a>
      </div>
      <!-- p.name -->
      <p class="name">
        <a
          href="/boruto-naruto-next-generations-episode-237"
          title="Boruto: Naruto Next Generations"
          >Boruto: Naruto Next Generations</a
        >
      </p>
      <p class="episode">Episode 237</p>
    </li>
  </ul>
</div>