Searching watermarks
Leave feedback
On this page
Use Watermarker.search()
to scan a document for objects that can be treated as watermarks. This includes watermarks added by third‑party tools.
import groupdocs.watermark as gw
with gw.Watermarker("document.pdf") as watermarker:
possible = watermarker.search()
for wm in possible:
if wm.image_data is not None:
print(len(wm.image_data))
print("Text", wm.text)
print("X", wm.x)
print("Y", wm.y)
print("RotateAngle", wm.rotate_angle)
print("Width", wm.width)
print("Height", wm.height)
print("PageNumber", wm.page_number)
print("")
Large documents may contain many candidates. Without parameters, search()
returns a subset such as backgrounds or floating objects. Use dedicated criteria to narrow results.
Find watermarks by exact text.
import groupdocs.watermark as gw
import groupdocs.watermark.search.searchcriteria as gws_sc
with gw.Watermarker("document.pdf") as watermarker:
text_criteria = gws_sc.TextSearchCriteria("© 2017")
possible = watermarker.search(text_criteria)
print("Found", possible.count, "possible watermark(s)")
Use regex with TextSearchCriteria
.
import re
import groupdocs.watermark as gw
import groupdocs.watermark.search.searchcriteria as gws_sc
with gw.Watermarker("document.pdf") as watermarker:
regex = re.compile(r"^© \d{4}$")
text_criteria = gws_sc.TextSearchCriteria(regex)
possible = watermarker.search(text_criteria)
print("Found", possible.count, "possible watermark(s)")
NoteWhenTextSearchCriteria
is provided, the API also scans the main document text along with shapes, XObjects, annotations, and other objects.
Find image watermarks that are visually similar to a sample image using perceptual hashing (DCT hash). Control sensitivity with max_difference
(0–1).
import groupdocs.watermark as gw
import groupdocs.watermark.search.searchcriteria as gws_sc
with gw.Watermarker("document.pdf") as watermarker:
image_criteria = gws_sc.ImageDctHashSearchCriteria("watermark.jpg")
image_criteria.max_difference = 0.9
possible = watermarker.search(image_criteria)
print("Found", possible.count, "possible watermark(s)")
Other image criteria:
ImageColorHistogramSearchCriteria
: robust to rotation, scaling, translation.ImageThumbnailSearchCriteria
: robust to rotation, scaling, minor color changes.
Combine multiple criteria with logical And/Or/Not.
import groupdocs.watermark as gw
import groupdocs.watermark.search.searchcriteria as gws_sc
with gw.Watermarker("document.pdf") as watermarker:
image_criteria = gws_sc.ImageDctHashSearchCriteria("logo.png")
image_criteria.max_difference = 0.9
text_criteria = gws_sc.TextSearchCriteria("Company Name")
angle_criteria = gws_sc.RotateAngleSearchCriteria(30, 60)
combined = image_criteria.or_(text_criteria).and_(angle_criteria)
possible = watermarker.search(combined)
print("Found", possible.count, "possible watermark(s)")
Find watermarks by text formatting such as font, size, and color ranges.
import groupdocs.watermark as gw
import groupdocs.watermark.search.searchcriteria as gws_sc
with gw.Watermarker("document.pdf") as watermarker:
criteria = gws_sc.TextFormattingSearchCriteria()
criteria.foreground_color_range = gws_sc.ColorRange()
criteria.foreground_color_range.min_hue = -5
criteria.foreground_color_range.max_hue = 10
criteria.foreground_color_range.min_brightness = 0.01
criteria.foreground_color_range.max_brightness = 0.99
criteria.background_color_range = gws_sc.ColorRange()
criteria.background_color_range.is_empty = True
criteria.font_name = "Arial"
criteria.min_font_size = 19
criteria.max_font_size = 42
criteria.font_bold = True
possible = watermarker.search(criteria)
print("Found", possible.count, "possible watermark(s)")
Limit search to specific object types to improve performance, either globally via WatermarkerSettings.searchable_objects
or per instance via Watermarker.searchable_objects
.
import os
import groupdocs.watermark as gw
import groupdocs.watermark.search as gws
settings = gw.WatermarkerSettings()
settings.searchable_objects = gws.SearchableObjects(
word_processing_searchable_objects=gws.WordProcessingSearchableObjects.HYPERLINKS | gws.WordProcessingSearchableObjects.TEXT,
spreadsheet_searchable_objects=gws.SpreadsheetSearchableObjects.HEADERS_FOOTERS,
presentation_searchable_objects=gws.PresentationSearchableObjects.SLIDES_BACKGROUNDS | gws.PresentationSearchableObjects.SHAPES,
diagram_searchable_objects=gws.DiagramSearchableObjects.NONE,
pdf_searchable_objects=gws.PdfSearchableObjects.ALL,
)
files = ["document.docx", "spreadsheet.xlsx", "presentation.pptx", "diagram.vsdx", "document.pdf"]
for file in files:
with gw.Watermarker(file, settings) as watermarker:
possible = watermarker.search()
print(f"In {os.path.basename(file)} found {possible.count} possible watermark(s).")
Restrict search to hyperlinks for a specific Watermarker
instance.
import groupdocs.watermark as gw
import groupdocs.watermark.search as gws
with gw.Watermarker("document.pdf") as watermarker:
watermarker.searchable_objects.pdf_searchable_objects = gws.PdfSearchableObjects.HYPERLINKS
possible = watermarker.search()
print("Found", possible.count, "hyperlink watermark(s)")
Enable tolerant matching when text contains unreadable characters between letters.
import groupdocs.watermark as gw
import groupdocs.watermark.search.searchcriteria as gws_sc
with gw.Watermarker("document.pdf") as watermarker:
criterion = gws_sc.TextSearchCriteria("Company name")
criterion.skip_unreadable_characters = True
result = watermarker.search(criterion)
print("Found", result.count, "possible watermark(s)")
Was this page helpful?
Any additional feedback you'd like to share with us?
Please tell us how we can improve this page.
Thank you for your feedback!
We value your opinion. Your feedback will help us improve our documentation.