A plug-and-play snippet is available at the end of the article.
This guide explains how to create custom parsers for Media Cleaner to support additional plugins or themes.
Media Cleaner uses parsers to detect media usage across WordPress plugins and themes. Each parser hooks into the scanning process and registers found media references using the $wpmc global object.
Parsers can hook into these actions:
| Hook | Parameters | When it fires |
| `wpmc_scan_once` | None | Once at the beginning of each scan. Use for global settings, theme options, site icons, etc. |
| `wpmc_scan_post` | `$html`, `$post_id` | For each post. `$html` is the post content. |
| `wpmc_scan_postmeta` | `$post_id` | For each post. Use to scan post meta fields. |
| `wpmc_scan_widget` | `$widget` | For each active widget. |
| `wpmc_scan_widgets` | None | After all widgets have been scanned. |
Core Reference Functions
These are the primary functions to register found media:
add_reference_id( $idOrIds, $type, $origin = null, $extra = null )
Register media by their attachment ID(s).
// Single ID
$wpmc->add_reference_id( 123, 'PLUGIN_NAME (ID)', $post_id );
// Multiple IDs
$wpmc->add_reference_id( [123, 456, 789], 'PLUGIN_NAME (ID)', $post_id );
Parameters:
$idOrIds(int|array): Single attachment ID or array of IDs.$type(string): Identifier for the source (shown in the UI).$origin(int|null): The post ID where the reference was found.$extra(mixed|null): Additional data for debugging.
add_reference_url( $urlOrUrls, $type, $origin = null, $extra = null )
Register media by their URL(s).
// Single URL
$wpmc->add_reference_url( $url, 'PLUGIN_NAME (URL)', $post_id );
// Multiple URLs
$wpmc->add_reference_url( $urls_array, 'PLUGIN_NAME (URL)', $post_id );
Parameters:
$urlOrUrls(string|array): Single URL or array of URLs.$type(string): Identifier for the source.$origin(int|null): The post ID where the reference was found.$extra(mixed|null): Additional data for debugging.
Important: URLs are automatically cleaned using
clean_url()internally.
Helper Functions
URL Extraction
get_urls_from_string( $string )
Extract all media URLs from any string using regex. Works with any text content.
$urls = $wpmc->get_urls_from_string( $raw_content );
// Returns: ['2024/01/image.jpg', '2024/02/photo.png', ...]
get_urls_from_html( $html )
Parse HTML using DOMDocument to extract URLs from:
<img>tags (src, srcset)<video>tags (src, poster)<audio>tags (src)<source>tags (src)<a>tags (href)<link>tags (href)<meta>tags (og:image, twitter:image)- Background images in CSS
- PDF links
- iframes (recursively)
$urls = $wpmc->get_urls_from_html( $post_content );
clean_url( $url )
Convert a full URL to a cleaned relative path (e.g., 2024/01/image.jpg).
$clean = $wpmc->clean_url( 'https://example.com/wp-content/uploads/2024/01/image.jpg' );
// Returns: '2024/01/image.jpg'
Note: Most
$wpmcmethods already useclean_url()internally. Use this only when extracting URLs manually.
clean_url_from_resolution( $url )
Remove resolution suffix from a URL (e.g., -300x200).
$clean = $wpmc->clean_url_from_resolution( '2024/01/image-300x200.jpg' );
// Returns: '2024/01/image.jpg'
is_url( $url )
Check if a string is a valid URL (starts with http or /).
if ( $wpmc->is_url( $value ) ) {
$urls[] = $wpmc->clean_url( $value );
}
Meta Data Extraction
get_from_meta( $meta, $lookFor, &$ids, &$urls, $rawMode = false )
Recursively search through meta data (arrays/objects) for specific keys and extract IDs and URLs.
$ids = [];
$urls = [];
$data = get_post_meta( $post_id, '_plugin_data', true );
$wpmc->get_from_meta(
$data,
['id', 'url', 'image', 'thumbnail', 'background_image'], // Keys to look for
$ids,
$urls
);
$wpmc->add_reference_id( $ids, 'PLUGIN (ID)', $post_id );
$wpmc->add_reference_url( $urls, 'PLUGIN (URL)', $post_id );
Parameters:
$meta(array|object): The data structure to search.$lookFor(array): Array of key names to look for.&$ids(array): Reference to array where IDs will be added.&$urls(array): Reference to array where URLs will be added.$rawMode(bool): If true, all values are added to URLs without type checking.
array_to_ids_or_urls( $meta, &$ids, &$urls, $recursive = false, $filters = [] )
Convert array values to IDs or URLs based on their type. Numeric values become IDs, file-like strings become URLs.
$ids = [];
$urls = [];
$wpmc->array_to_ids_or_urls(
$block_attributes,
$ids,
$urls,
true, // Recursive
['src', 'ids', 'url', 'image'] // Only process these keys
);
Parameters:
$meta(array): The data to process.&$ids(array): Reference to array for IDs.&$urls(array): Reference to array for URLs.$recursive(bool): Whether to process nested arrays.$filters(array): If set, only process these specific keys.
Shortcode Processing
nested_shortcodes_to_array( $html )
Parse nested shortcodes into a hierarchical array structure (AST).
$html = '[gallery ids="1,2,3"][slider][slide id="5"/][/slider][/gallery]';
$nodes = $wpmc->nested_shortcodes_to_array( $html );
// Returns array structure with type, tag, attributes, and children
Return structure:
[
[
'type' => 'shortcode',
'tag' => 'gallery',
'attributes' => ['ids' => '1,2,3'],
'children' => [...]
],
[
'type' => 'text',
'content' => 'Some text between shortcodes'
]
]
Use with array_to_ids_or_urls() to extract media:
$nodes = $wpmc->nested_shortcodes_to_array( $html );
$wpmc->array_to_ids_or_urls( $nodes, $ids, $urls, true, ['src', 'ids', 'url'] );
get_all_shortcodes_attributes( $html, $ids_attr = [], $urls_attr = [] )
Extract specific attributes from all shortcodes in HTML.
$result = $wpmc->get_all_shortcodes_attributes(
$html,
['id', 'ids', 'image_id'], // Attributes containing IDs
['src', 'url', 'image'] // Attributes containing URLs
);
// Returns: ['ids' => [...], 'urls' => [...]]
$ids = $result['ids'];
$urls = $result['urls'];
get_shortcode_attributes( $shortcode_tag, $post )
Get attributes from a specific shortcode tag in a post.
$attrs = $wpmc->get_shortcode_attributes( 'my_gallery', $post );
// Returns array of attribute arrays for each instance
WordPress Blocks (Gutenberg)
get_from_blocks( $html, $prefix, $keys, &$urls, &$ids )
Extract IDs and URLs from WordPress Gutenberg blocks.
$ids = [];
$urls = [];
// Scan for blocks like "myplugin/gallery", "myplugin/slider", etc.
$wpmc->get_from_blocks(
$html,
'myplugin/', // Block name prefix
['id', 'url', 'mediaId', 'imageUrl'], // Attributes to extract
$urls,
$ids
);
$wpmc->add_reference_url( $urls, 'MYPLUGIN (URL)', $post_id );
$wpmc->add_reference_id( $ids, 'MYPLUGIN (ID)', $post_id );
Media ID Resolution
custom_attachment_url_to_postid( $url )
Get the attachment ID from a URL.
$attachment_id = $wpmc->custom_attachment_url_to_postid( $full_url );
find_media_id_from_file( $file, $doLog )
Find attachment ID from a file path.
$id = $wpmc->find_media_id_from_file( '2024/01/image.jpg', false );
get_id_from_clean_url( $clean_url )
Get attachment ID from a cleaned URL.
$id = $wpmc->get_id_from_clean_url( '2024/01/image.jpg' );
Thumbnail Handling
get_thumbnails_urls( $id, $sizes_as_key = false )
Get all thumbnail URLs for an attachment ID.
$urls = $wpmc->get_thumbnails_urls( 123 );
// Returns: ['2024/01/image-150x150.jpg', '2024/01/image-300x200.jpg', ...]
// With size names as keys:
$urls = $wpmc->get_thumbnails_urls( 123, true );
// Returns: ['thumbnail' => '...', 'medium' => '...', ...]
get_thumbnails_urls_from_srcset( $media, $size = 'full' )
Get all URLs from an image's srcset.
$urls = $wpmc->get_thumbnails_urls_from_srcset( 123 );
// or
$urls = $wpmc->get_thumbnails_urls_from_srcset( $image_url );
Theme Scanning
get_images_from_themes( &$ids, &$urls )
Extract media from theme settings (header, logo, site icon, etc.).
$ids = [];
$urls = [];
$wpmc->get_images_from_themes( $ids, $urls );
$wpmc->add_reference_id( $ids, 'THEME' );
$wpmc->add_reference_url( $urls, 'THEME' );
get_favicon()
Get the favicon/site icon URL from Yoast SEO settings.
$favicon = $wpmc->get_favicon();
if ( !empty( $favicon ) ) {
$wpmc->add_reference_url( $favicon, 'SITE ICON' );
}
Debugging
log( $message, $force = false )
Write to Media Cleaner's debug log (only when debug mode is enabled).
$wpmc->log( "Processing plugin data for post: $post_id" );
Full Template
<?php
/**
* Media Cleaner Parser Template for PLUGIN_NAME
*
* This parser enables Media Cleaner to detect media usage in PLUGIN_NAME.
*
* Replace PLUGIN_NAME with the actual plugin name throughout this file.
* Replace _plugin_meta_key with the actual meta key(s) used by the plugin.
*/
// Register parser hooks
add_action( 'wpmc_scan_once', 'wpmc_scan_once_PLUGIN_NAME', 10, 0 );
add_action( 'wpmc_scan_post', 'wpmc_scan_html_PLUGIN_NAME', 10, 2 );
add_action( 'wpmc_scan_postmeta', 'wpmc_scan_postmeta_PLUGIN_NAME', 10, 1 );
/**
* Runs once at the beginning of each scan.
* Use for: plugin global settings, options pages, theme settings, site icons, etc.
*/
function wpmc_scan_once_PLUGIN_NAME() {
global $wpmc;
$ids = [];
$urls = [];
// Example: Check plugin global settings
$settings = get_option( 'plugin_name_settings', [] );
if ( !empty( $settings ) ) {
$wpmc->get_from_meta(
$settings,
['logo', 'background', 'icon', 'image'], // Keys to look for
$ids,
$urls
);
}
// Register found references
if ( !empty( $ids ) ) {
$wpmc->add_reference_id( $ids, 'PLUGIN_NAME SETTINGS (ID)' );
}
if ( !empty( $urls ) ) {
$wpmc->add_reference_url( $urls, 'PLUGIN_NAME SETTINGS (URL)' );
}
}
/**
* Runs for each post to scan its meta data.
* Use for: custom fields, plugin-specific post meta, serialized data, etc.
*
* @param int $post_id The post ID.
*/
function wpmc_scan_postmeta_PLUGIN_NAME( $post_id ) {
global $wpmc;
$ids = [];
$urls = [];
// Get plugin's post meta data
$data = get_post_meta( $post_id, '_plugin_meta_key', true );
if ( empty( $data ) ) {
return;
}
// Handle JSON-encoded data
if ( is_string( $data ) && !empty( $data ) ) {
$decoded = json_decode( $data, true );
if ( json_last_error() === JSON_ERROR_NONE ) {
$data = $decoded;
}
}
// Extract IDs and URLs from the meta data
// Specify all attribute names the plugin uses for media
$wpmc->get_from_meta(
$data,
['id', 'url', 'image', 'thumbnail', 'background_image', 'src', 'mediaId'],
$ids,
$urls
);
// Register found references
$wpmc->add_reference_id( $ids, 'PLUGIN_NAME (ID)', $post_id );
$wpmc->add_reference_url( $urls, 'PLUGIN_NAME (URL)', $post_id );
}
/**
* Runs for each post to scan its HTML content.
* Use for: shortcodes, Gutenberg blocks, embedded media in content.
*
* @param string $html The post content HTML.
* @param int $post_id The post ID.
*/
function wpmc_scan_html_PLUGIN_NAME( $html, $post_id ) {
global $wpmc;
$ids = [];
$urls = [];
if ( empty( $html ) ) {
return;
}
// ═══════════════════════════════════════════════════════════════════════
// METHOD 1: Simple shortcode rendering
// Best for: Plugins that use standard shortcodes with rendered output
// ═══════════════════════════════════════════════════════════════════════
// Check if plugin shortcode exists before processing
if ( has_shortcode( $html, 'plugin_name' ) ) {
$rendered = do_shortcode( $html );
$shortcode_urls = $wpmc->get_urls_from_string( $rendered );
$urls = array_merge( $urls, $shortcode_urls );
}
// ═══════════════════════════════════════════════════════════════════════
// METHOD 2: Parse nested shortcodes
// Best for: Complex shortcode structures like [gallery][image id="1"/][/gallery]
// ═══════════════════════════════════════════════════════════════════════
// if ( strpos( $html, '[plugin_name' ) !== false ) {
// $nodes = $wpmc->nested_shortcodes_to_array( $html );
// $wpmc->array_to_ids_or_urls(
// $nodes,
// $ids,
// $urls,
// true, // Recursive
// ['src', 'ids', 'url', 'image', 'id'] // Attributes to extract
// );
// }
// ═══════════════════════════════════════════════════════════════════════
// METHOD 3: Extract specific shortcode attributes
// Best for: When you know exactly which attributes contain media
// ═══════════════════════════════════════════════════════════════════════
// $result = $wpmc->get_all_shortcodes_attributes(
// $html,
// ['id', 'ids', 'image_id'], // Attributes containing IDs
// ['src', 'url', 'image'] // Attributes containing URLs
// );
// $ids = array_merge( $ids, $result['ids'] );
// $urls = array_merge( $urls, $result['urls'] );
// ═══════════════════════════════════════════════════════════════════════
// METHOD 4: Scan Gutenberg blocks
// Best for: Plugins that provide custom Gutenberg blocks
// ═══════════════════════════════════════════════════════════════════════
// if ( strpos( $html, '<!-- wp:plugin-name/' ) !== false ) {
// $wpmc->get_from_blocks(
// $html,
// 'plugin-name/', // Block namespace prefix
// ['id', 'url', 'mediaId', 'imageUrl'], // Block attributes to extract
// $urls,
// $ids
// );
// }
// ═══════════════════════════════════════════════════════════════════════
// METHOD 5: Direct regex extraction (use sparingly)
// Best for: Non-standard formats or when other methods don't work
// ═══════════════════════════════════════════════════════════════════════
// Extract all URLs from the raw HTML (includes images, links, etc.)
// $raw_urls = $wpmc->get_urls_from_html( $html );
// $urls = array_merge( $urls, $raw_urls );
// Register found references
$wpmc->add_reference_id( $ids, 'PLUGIN_NAME (ID)', $post_id );
$wpmc->add_reference_url( $urls, 'PLUGIN_NAME (URL)', $post_id );
}
/**
* OPTIONAL: Advanced scanning techniques
* Uncomment and adapt these as needed for your plugin.
*/
// ═══════════════════════════════════════════════════════════════════════════
// Scan taxonomy terms (for plugins that attach media to categories/tags)
// ═══════════════════════════════════════════════════════════════════════════
// function wpmc_scan_taxonomy_PLUGIN_NAME() {
// global $wpdb, $wpmc;
//
// $terms = get_terms([
// 'taxonomy' => 'your_taxonomy',
// 'hide_empty' => false,
// ]);
//
// foreach ( $terms as $term ) {
// $image_id = get_term_meta( $term->term_id, 'image_id', true );
// if ( !empty( $image_id ) ) {
// $wpmc->add_reference_id( $image_id, 'PLUGIN_NAME TERM (ID)', $term->term_id );
// }
// }
// }
// add_action( 'wpmc_scan_once', 'wpmc_scan_taxonomy_PLUGIN_NAME', 10, 0 );
// ═══════════════════════════════════════════════════════════════════════════
// Include thumbnails for referenced images
// Use when the plugin might resize images differently than WordPress defaults
// ═══════════════════════════════════════════════════════════════════════════
// foreach ( $ids as $id ) {
// $thumbnail_urls = $wpmc->get_thumbnails_urls( $id );
// $wpmc->add_reference_url( $thumbnail_urls, 'PLUGIN_NAME (URL) {SAFE}', $post_id );
// }
// ═══════════════════════════════════════════════════════════════════════════
// Get attachment ID from URL when only URL is available
// ═══════════════════════════════════════════════════════════════════════════
// foreach ( $urls as $url ) {
// $attachment_id = $wpmc->custom_attachment_url_to_postid( $url );
// if ( $attachment_id ) {
// $wpmc->add_reference_id( $attachment_id, 'PLUGIN_NAME (ID)', $post_id );
// }
// }
// ═══════════════════════════════════════════════════════════════════════════
// Debug logging (enable in Media Cleaner settings)
// ═══════════════════════════════════════════════════════════════════════════
// $wpmc->log( "PLUGIN_NAME: Found " . count($ids) . " IDs and " . count($urls) . " URLs in post $post_id" );