Existing objects in PDF document Leave feedback

Extract information about page objects

The example below reports the objects on the first page of a watermarked PDF.

extract_objects.py

from groupdocs.watermark import Watermarker
from groupdocs.watermark.options.pdf import PdfLoadOptions

def extract_objects():
    with Watermarker("./document.pdf", PdfLoadOptions()) as watermarker:
        content = watermarker.get_content()
        page = content.pages[0]
        print(f"Page 1: xobjects={len(page.xobjects)} "
              f"artifacts={len(page.artifacts)} annotations={len(page.annotations)}")
        for artifact in page.artifacts:
            text = (artifact.text or "").strip()
            print(f"  artifact text={text!r} size={round(artifact.width)}x{round(artifact.height)}")
        for annotation in page.annotations:
            text = (annotation.text or "").strip()
            print(f"  annotation text={text!r} size={round(annotation.width)}x{round(annotation.height)}")

if __name__ == "__main__":
    extract_objects()

document.pdf

document.pdf is the sample file used in this example. Click here to download it.

extract-objects.txt

Page 1: xobjects=2 artifacts=2 annotations=0
  artifact text='CONFIDENTIAL' size=268x36
  artifact text='' size=135x40

Download full output

Each object exposes text, image, x, y, width, height, and rotate_angle (artifacts also expose opacity, artifact_type, and artifact_subtype).

Remove and modify objects

Each collection supports remove_at(index) and remove(object). Iterate in reverse when removing by index:

remove_and_modify_objects.py

from groupdocs.watermark import Watermarker
from groupdocs.watermark.options.pdf import PdfLoadOptions

def remove_and_modify_objects():
    with Watermarker("./document.pdf", PdfLoadOptions()) as watermarker:
        content = watermarker.get_content()
        for page in content.pages:
            for i in range(len(page.artifacts) - 1, -1, -1):
                if page.artifacts[i].text and "watermark" in page.artifacts[i].text.lower():
                    page.artifacts.remove_at(i)
        watermarker.save("./output.pdf")

if __name__ == "__main__":
    remove_and_modify_objects()

document.pdf

document.pdf is the sample file used in this example. Click here to download it.

output.pdf

Binary file (PDF, 394 KB)

Download full output

You can replace an object’s text by assigning to obj.text, replace its image by assigning bytes to obj.image_data, and add a watermark to an image object via image.add(watermark) after locating it with page.find_images().

We value your opinion. Your feedback will help us improve our documentation.

Existing objects in PDF document Leave feedback

On this page

Extract information about page objects

Remove and modify objects

Was this page helpful?

Any additional feedback you'd like to share with us?

Please tell us how we can improve this page.

Thank you for your feedback!

On this page