Content storage and structure

Magnolia Documentation Team

Performance tuning guide 6.4

Adyen Connector module
- master
AI Accelerator module
- 3.x
- 2.2
- 1.4
Algolia E-commerce connector
- master
Amplience DAM Connector module
- master
API
- master
- 1.1
Architecture Compass
- master
B-FY Connector module
- master
Backend Live
- master
Backup Extended module
- master
Bitbucket module
- master
Bot Protection module
- master
Bynder Universal Compact View Integration Module
- 3.0
- 2.2
- 1.2
Campaign manager module
- 5.0
- 4.0
- 3.1
Canto DAM connector
- 2.0
- 1.0
CDN Helper module
- 3.0
- 1.0
CDP integration framework
- 3.0
- 1.1
Celum DAM Connector module
- 4.0
- 2.1
Cloudinary External DAM module
- 3.0
- 2.1
- 1.3
Commenting module
- 2.0
- 1.1
Configuration Injection module
- master
Content Diff module
- 2.0
- 1.0
Content Exporter module
- 3.0
- 2.0
- 1.0
Content Locking module
- 3.0
- 2.0
Content Recommender module
- 3.0
- 2.0
Content Translation Extended module
- 5.0
- 4.2
- 3.6
Content Type models
- master
Content Types module
- 2.0.0
Content Usage
- master
Custom CSS module
- master
Customer Journey Mapping module
- master
DAM App module
- 6.0
- 5.0
DAM Focal module
- 3.0
- 2.4
DAM JCR Fastly renderer module
- master
DAM module
- 6.0
- 5.0
- 4.0
Dotdigital Integration module
- master
DX Cloud
- master
DX Cloud Cockpit
- master
DX Core
- 6.4
- 6.3
- 6.2
Dynamic Form module
- 3.0
- 2.0
- 1.2
E-commerce Category Sync
- master
E-commerce module
- 3.0
- 2.0
- 1.3
Eight Eye Workflow module
- master
Elasticsearch provider module
- master
Extended Health Check module
- master
Freeze module
- 4.0
- 3.0
- 2.0
Frontify DAM connector
- 3.0
- 2.0
- 1.0
Fullstory Integration module
- master
Groovy shell scripts
- master
- 6.2.55
Home
- master
Hooks API module
- master
Hybrid Assets module
- master
Image Focal module
- 2.0
- 1.0
Image placement module
- master
Image Recognition module
- 4.0
- 3.0
- 2.0
Imaging module
- 5.0
- 4.1
Incubator Modules
- master
Instrumentation module
- 3.0
- 2.0
internal
- master
Javascript Models
- 4.0
- 3.0
- 2.0
JavaScript UI module
- 4.0
- 3.1
- 2.2
Language Availability module
- master
Link Mapper module
- master
Linkmapper Shared Database module
- master
Live Copy module
- 5.x
- 4.x
- 3.x
Magnolia 5 UI documentation
- master
Magnolia Answers
- master
Magnolia CLI
- 5.x
- 4.x
Magnolia Cloud
- master
Magnolia Search Index Feeder module
- master
Magnolia Support documentation
- master
Magnolia Vercel App
- master
MediaValet DAM connector
- 1.0
Microsoft DAM Connector module
- master
Migration Tool module
- master
Multi Assets Upload module
- master
Multisite module
- 3.0.0
Netlify Integration module
- master
Orchestrate module
- 1.0-SNAPSHOT
Package Manager module
- 2.0.2
- 1.0.0
Page-editor apps extension
- 3.0
- 2.0
Performance tuning guide
- 6.4
- 6.3
Periscope Control module
- master
Piano Analytics Connector module
- 2.0
- 1.0
Public User Registration Database module
- master
Publication Task Config
- master
REST module
- 4.0
- 3.1
REST Proxy module
- 3.0
- 2.0
- 1.0
RMQ Publication module
- master
Salesforce B2B Commerce connector
- master
Salesforce Commerce Cloud B2B connector API Reference
- master
SearchStax integration module
- master
SEO module
- master
Shop module
- master
Site module
- master
Siteimprove module
- 3.1
- 2.1
- 1.3
Six Eye Workflow module
- master
Slack Integration module
- master
SSO Login Extension module
- master
SSO module
- 5.0
- 4.0
- 3.1
- 2.0
Task Email Notifications module
- 6.4
- 6.2
Tasks cleaner module
- 3.0
- 1.0
Throttling Filter module
- master
Two Factor Authentication module
- 3.0
- 2.0
- 1.0
URI Mapping app
- 2.0
- 1.2
URL Translation Module
- 6.4
- 6.2
Veeva DAM Connector module
- 2.0
- 1.1
Version Cleaner module
- master
VWO AB Testing module
- 3.0
- 1.0
Webhooks module
- 3.0
- 2.0
- 1.0
WeChat Login module
- 1.0
Workflow Extended module
- master

Content storage and structure

Magnolia stores all content (web pages, images, documents, configuration, data) in a content repository. The magnolia repository contains workspaces and further custom workspaces can be added.

A content repository is a high-level information management system that’s a superset of traditional data repositories. It implements content services such as:

Hierarchical, structured and unstructured content
Granular content access and access control
Node types, property types (text, number, date, binary)
Queries (XPath, SQL)
Import and export
Referential integrity
Versioning
Observation
Locking
Clustering
Multiple persistence models

The repository implementation chosen, Apache Jackrabbit, adheres to the Java Content Repository standard (JCR).

Hierarchical content store

A content repository is designed to store, search and retrieve hierarchical data. Data consists of a tree of nodes with associated properties. Data is stored in the properties. They may store simple values such as numbers and strings or binary data (images, documents) of arbitrary length. Nodes may optionally have one or more types associated with them, which in turn dictates the type of their properties, the number and type of their child nodes, and certain behavioral characteristics.

In the example below, A, B, C, and D are nodes. The boxes represent properties with Boolean, numerical, string, and binary values. You don’t need to worry about how the data is stored. The repository provides a standardized way to store and retrieve it whether it resides in a traditional database or in a file system.

Example node structure

Hierarchical content store

JCR standard API for content repositories

Java Content Repository (JCR) is a standard interface for accessing content repositories. JCR version 1.0 was specified in Java Specification Request 170 (JSR-170). Version 2.0 in JSR-283 is also final. JCR specifies a hierarchical content store with support for structured and unstructured content.

Magnolia was the first open-source content management system built specifically to leverage JCR. The standard decouples the responsibilities of content storage from content management and provides a common API that enables standardized content reuse across the enterprise and between applications. Magnolia uses the open-source Jackrabbit reference implementation.

Content storage

As depicted above, Magnolia typically has one repository, Magnolia. That, in turn, contains several workspaces. One workspace stores website content, another stores user accounts, a third stores configuration, and so on. For more on creating custom workspaces and naming conventions, see Workspaces.

Persistent storage

A persistence manager (PM) is an internal Jackrabbit component that handles the persistent storage of content nodes and properties. Each workspace of a Jackrabbit content repository can use a separate persistence manager to store content for that workspace. The persistence manager sits at the bottom layer of the Jackrabbit system architecture. Reliability, integrity and performance of the PM are crucial to the overall stability and performance of the repository.

In order to avoid integrity issues and to benefit from services such as observation, clustering and indexing, you should always access the content through the JCR API. Changing the data directly (bypassing the API) causes serious issues. This may sound restrictive but the API is actually quite versatile. You can even access the content repository from external applications using the API.

The choice of persistence managers includes:

Database: Magnolia uses a database as persistence manager by default. This is the most common option. We ship WAR files and operating system specific bundles with the H2 database. H2 is an embedded database that allows us to package a fully operational Magnolia example into a single download, including configuration details and demonstration websites. It requires minimal installation effort from users. However, for production environments, we recommend an enterprise-scale database such as MySQL, PostgreSQL or Oracle. All of them work with JCR. Database connections are based on JDBC, involve zero deployment, and run fast.

The MySQL InnoDB storage engine is supported by Magnolia, the MyISAM engine isn’t. InnoDB is the default engine in MySQL 5.5 or later.
File system: This kind of data store is typically not meant to run in production environments, except in read-only cases, but it can be very fast.
In-memory: This is a great persistence manager for testing and for small workspaces. All content is kept in memory and lost as soon as the repository is closed. Even faster than a file system. Again, not for production use.

Magnolia DX Core allows you to switch between persistence managers. Each workspace has a workspace.xml file in which its persistence manager is configured.

To avoid losing content, you must create a verifiable content migration before switching persistence managers. Content migration is a time-consuming process. You can use SQL dumps, but you need to review them carefully to ensure they have the proper configurations to import to the target data store.

Changing the PersistenceManager entry in the XML allows you to switch persistence managers to use the class best suited for your use case. For more, see classes, and check out the Apache PersistenceManagerFAQ if needed.

Use native database tools to generate and restore backups using a verifiable content migration plan. Logical backups are a great option to copy the database to another environment. For more, see Backing up and restoring instances.

The Nodes API allows you to use a REST service for CRUD operations against a running Magnolia instance. Alternatively, you can import content via bootstrapping. Whichever migration approach you choose, ensure the content migration process is verifiable and planned.

If you removed the persistence volume for a Magnolia instance or started Magnolia in a new environment using an existing database, you must delete the search index and restart it for the index to build successfully. To remove the index, run the following from your affected environment.

cd /mgnl-home/repositories/magnolia/workspaces/
find . -name index -exec rm -rf {} \;

Content storage and structure

Hierarchical content store

JCR standard API for content repositories

Persistent storage

Location

Main doc sections