Adding a New Parser
This guide explains how to add a new manga source parser to the Kotatsu parsers library. It covers the process from evaluating the manga website to implementing and testing a parser. For information about parser architecture, see Core Architecture and for code style conventions, see Code Style and Guidelines.
Understanding the Parser Framework
Before adding a new parser, it's important to understand the basic architecture of the parsing system.
classDiagram
MangaParser <|-- AbstractMangaParser
MangaParser <|-- LegacyMangaParser
LegacyMangaParser <|-- LegacyPagedMangaParser
LegacyPagedMangaParser <|-- TemplateParser
TemplateParser <|-- SiteSpecificParser
class SiteSpecificParser{
<<concrete>>
}
class TemplateParser{
<<abstract>>
}
class LegacyPagedMangaParser{
<<abstract>>
}
class LegacyMangaParser{
<<interface>>
}
class AbstractMangaParser{
<<abstract>>
}
class MangaParser{
<<interface>>
+getList()
+getDetails()
+getPages()
+getPageUrl()
+getFilterOptions()
}
Prerequisites
Before implementing a parser, you'll need:
- Basic understanding of Kotlin
- Knowledge of web scraping (JSoup) or JSON/GraphQL APIs
- Familiarity with the website you're creating a parser for
Evaluating a Manga Website
Start by investigating how the target website structures its content:
- Does it use a common CMS like WordPress with a known theme (Madara, WpComics)?
- Does it have a public API (REST, GraphQL)?
- What elements need to be parsed (manga lists, chapter lists, pages)?
Choosing a Base Parser
Based on your evaluation, select an appropriate base class:
Website Type | Recommended Base Class | Notes |
---|---|---|
Madara WordPress theme | MadaraParser |
Common for many manga sites |
MangaReader-like sites | MangaReaderParser |
Sites with a similar structure to MangaReader |
Single page content | LegacySinglePageMangaParser |
Sites without pagination |
Custom implementation | LegacyPagedMangaParser |
For sites with pagination but no existing template |
Implementation Steps
1. Create a New Parser Class
Create a new Kotlin file in the appropriate package under org.koitharu.kotatsu.parsers.site
. The package structure follows {site}.{language}
convention.
flowchart TB
id1[Choose base parser class] --> id2[Create new file in appropriate package] --> id3[Add @MangaSourceParser annotation] --> id4[Implement required methods] --> id5[Configure parser properties] --> id6[Test the parser]
2. Add Parser Annotation
Add the @MangaSourceParser
annotation to your class with:
- Internal name (all caps)
- Display name
- Language code
For example:
3. Extend Appropriate Base Class
Extend the appropriate base class and implement the constructor:
internal class NewSiteParser(context: MangaLoaderContext) :
MadaraParser(context, MangaParserSource.NEWSITE, "mangasite.com")
Make sure to:
- Pass the
MangaLoaderContext
to the superclass - Use the correct
MangaParserSource
enum value - Specify the default domain
4. Configure Parser Properties
Override properties from the base class as needed:
override val tagPrefix = "manga-genre/"
override val listUrl = "manga/"
override val datePattern = "MMMM d, yyyy"
The specific properties to override depend on the base class you're extending.
5. Implement Required Methods
If you're extending a template parser like MadaraParser
, you might not need to override any methods. Otherwise, implement the necessary methods:
Key Methods to Implement
For custom parsers, you'll need to implement these methods:
Method | Purpose |
---|---|
getListPage() | Fetch a page of manga listing |
getDetails() | Get detailed information about a manga |
getPages() | Get pages for a manga chapter |
getFilterOptions() | Return available filter options |
6. Custom Selectors and Parsers
For HTML parsing, define CSS selectors for the elements you need to extract:
protected open val selectMangaList = ".manga-list .item"
protected open val selectChapter = "li.wp-manga-chapter"
protected open val selectPage = "div.page-break img"
Use these selectors in your parsing methods to extract the required information.
Example Implementation
Here's an outline of implementing a parser for a Madara-based site:
- Create a new file in the appropriate package
- Add the
@MangaSourceParser
annotation - Extend
MadaraParser
and configure the domain - Override any necessary properties
- If needed, override parsing methods
For more complex sites that don't use a common template, you'll need to implement custom parsing logic using JSoup or handle API responses.
Testing Your Parser
1. Manual Testing
Test your parser with the following operations:
- Listing manga
- Searching
- Viewing manga details
- Reading chapters
2. Automated Testing
Before submitting, run the parser tests:
- Modify the
MangaSources
annotation class to include your parser - Run the
MangaParserTest
test class - Verify the results with the test report
Best Practices
1. Domain Handling
Never hardcode domains. Use configKeyDomain
for the default domain and access it via the domain
property:
2. Generating Unique IDs
Ensure all manga, chapter, and page IDs are unique and domain-independent:
3. Error Handling
Use extension functions like selectFirstOrThrow
and attrAsRelativeUrl
for better error handling:
4. Authentication Support
If your parser requires authentication, implement the MangaParserAuthProvider
interface:
internal class AuthenticatedParser(...) : BaseParser(...), MangaParserAuthProvider {
override val authUrl: String get() = "https://$domain/login"
override val isAuthorized: Boolean get() = /* check cookies */
override suspend fun getUsername(): String { /* implementation */ }
}
Advanced Topics
Custom Network Interceptors
For sites with special requirements (CAPTCHA, CloudFlare protection), implement the Interceptor
interface:
internal class ProtectedSiteParser(...) : BaseParser(...), Interceptor {
override fun intercept(chain: Interceptor.Chain): Response {
// Custom handling of requests/responses
}
}
Conclusion
Adding a new parser involves understanding the target website, choosing an appropriate base class, and implementing the necessary methods. By following the patterns established in existing parsers, you can create a reliable parser that integrates well with the Kotatsu ecosystem.
For more assistance, refer to the Core Architecture documentation or ask for help in the community channels mentioned in the CONTRIBUTING.md file.