What Is F1 Score in Machine Learning? A Practical Guide
A simple way to balance precision and recall when accuracy is misleading.
This post explains F1 with a clear confusion-matrix view, when it matters (imbalanced classes), and how to interpret trade-offs—plus a small Python example.
#MachineLearning #DataScience #Python #ModelEvaluation #ai #medium #ML
@ai @theartificialintelligence @programming @towardsdatascience
@pythonclcoding @chartrdaily @medium
🆕 blog! “Removing "/Subtype /Watermark" images from a PDF using Linux”
Problem: I've received a PDF which has a large "watermark" obscuring every page.
Investigating: Opening the PDF in LibreOffice Draw allowed me to see that the watermark was a separate image floating above the others.
Manual Solution: Hit page down, select image, delete, repeat 500 times. …
👀 Read more: https://shkspr.mobi/blog/2026/01/removing-subtype-watermark-images-from-a-pdf-using-linux/
⸻
#LLM #pdf #python
Removing "/Subtype /Watermark" images from a PDF using Linux
https://shkspr.mobi/blog/2026/01/removing-subtype-watermark-images-from-a-pdf-using-linux/Problem: I've received a PDF which has a large "watermark" obscuring every page.
Investigating: Opening the PDF in LibreOffice Draw allowed me to see that the watermark was a separate image floating above the others.
Manual Solution: Hit page down, select image, delete, repeat 500 times. BORING!
Further Investigating: Using pdftk, it's possible to decompress a PDF. That makes it easier to look through manually.
pdftk input.pdf output output.pdf uncompress
Hey presto! A PDF you can open in a text editor! Deep joy!
Searching: On a hunch, I searched for "watermark" and found several lines like this:
<<
/Length 548
>>
stream
/Figure <</MCID 0 >>BDC q 0 0 477 733.464 re W n q /GS0 gs 479.2799893 0 0 735.5999836 -1.0800002 -1.0559941 cm /Im0 Do Q EMC
/Figure <</MCID 1 >>BDC Q q 28.333 300.661 420.334 126.141 re W n q /GS0 gs 420.3339603 0 0 126.1418879 28.3330078 300.6610601 cm /Im1 Do Q EMC
/Figure <</MCID 2 >>BDC Q q 16.106 0 444.787 215.464 re W n q /GS0 gs 444.7874274 0 0 216.5921386 16.1062775 -1.1281493 cm /Im2 Do Q EMC
/Artifact <</Subtype /Watermark /Type /Pagination >>BDC Q q 0.7361145 0 0 0.7361145 113.3616638 240.8575745 cm /GS1 gs /Fm0 Do Q EMC
endstream
endobj
Those are Marked Content Blocks. In theory you can just chop out the line with /Subtype /Watermark but each block has a /length variable - so you'd also need to adjust that to account for what you've changed - otherwise the layout goes all screwy.
That led me to PyMuPDF which claimed to solve the problem. But running that code only removed some of the watermarks. It got stuck on an infinite loop on certain pages.
So, now that I had more detailed knowledge, I managed to get an LLM to construct something which mostly seems to work.
Does it work with every PDF? I don't know. Does it contain subtle implementation bugs? Probably. Is there an easier way to do this? Not that I can find.
import re
import pymupdf
# Open the PDF
doc = pymupdf.open("output.pdf")
# Regex of the watermarks
pattern = re.compile(
rb"/Artifact\s*<<[^>]*?/Subtype\s*/Watermark[^>]*?>>BDC.*?EMC",
re.DOTALL
)
# Loop through the PDF's pages
for page_num, page in enumerate(doc, start=1):
print(f"Processing page {page_num}")
xrefs = page.get_contents()
for xref in xrefs:
cont = doc.xref_stream(xref)
new_cont, n = pattern.subn(b"", cont)
if n > 0:
print(f" Removed {n} watermark block(s)")
doc.update_stream(xref, new_cont)
doc.save("no-watermarks.pdf")
One of the (many) problems with Vibe Coding is that trying to get a LLM to spit out something useful depends massively on how well you know the subject area. I'm proud to say I know vanishingly little about the baroque PDF specification - which meant that most of my attempts to use various "AI" tools consisted of me saying "No, that doesn't work" and the accurs'd machine saying back "Golly-gee! You're right! Let me fix that!" and then breaking something else.
I'm not sure this is the future we wanted, but it looks like the future we've got.
#LLM #pdf #pythonAda meets Waldemar Cordeiro and Giorgio Moscati
(Learn more about them at https://www.waldemarcordeiro.com/ & https://ekac.org/moscati.html)
Find the sketch-a-day archives and tip jar at: https://abav.lugaralgum.com/sketch-a-day
Code for this sketch at: https://github.com/villares/sketch-a-day/tree/main/2026/sketch_2026_01_21 #Processing #Python #py5 #CreativeCoding

Affirm is hiring Senior Software Engineer, Backend (Servicing International)
🔧 #kotlin #python #react #vue #aws #kubernetes #mysql #seniorengineer
🌎 Remote; Poland
⏰ Internship
🏢 Affirm
Job details https://jobsfordevelopers.com/jobs/senior-software-engineer-backend-servicing-international-at-affirm-com-dec-17-2025-32e383?utm_source=mastodon.world&utm_medium=social&utm_campaign=posting
#jobalert #jobsearch #hiring
I just released pyglobegl 0.4.0 to PyPi. Now exposing the globe.gl arcs layer API. Also integrated with Pandera for better validation when using the GeoPandas helper functions and enhanced the image comparing automated tests to be less flaky. Still lots more globe.gl APIs to implement. #Python

Mới: công cụ miễn phí TubeFlow AI giúp tự động tạo Shorts. Kết hợp Gemini và Pexels API, không cần đăng ký, không quảng cáo. Tiết kiệm thời gian tìm footage khi dùng AI. Bạn có thể thử ngay! #Shorts #AI #Python #CôngCụMiễnPhí #Video #Automation
A vacancy for a Data Officer within Dogs Trust's data science and analytics research team has gone live this morning:
- £37,130 per annum
- Fully remote (within UK)
- #SQL + #Python and/or #RStats experience sought, alongside #NLP / #TextMining
Deadline for applications: 2026-01-29
Further details here:
https://careers.dogstrust.org.uk/en/postings/ef80c5d6-b7b8-43f4-86f0-72e1ca634ab3
I think you're at risk of confuse two things here @mgd. If your goal is to "get a job" and you're trying to create a portfolio/example application - then you should choose a language that is _common_ in commercial development.
Guix itself is build in #guile #scheme which is a great language but it's not popular for commercial devel.
The other way to improve Guix (most of the work) is packaging software. That's could be #Python, #clojure, etc - that's what @sharlatan means
Data Engineer passionné par l’intégration de données complexes et le développement de solutions logicielles pour la biologie et la cybersécurité. Expert en Java, Spark et Big Data. À la recherche de nouveaux défis innovants.
#DataEngineer #BigData #Bioinformatics #Java #Python #Spark #CyberSecurity

On the note of #Excel & #Python, this one here looks interesting:
https://github.com/Amourspirit/python_libre_pythonista_ext
You can call remote APIs and keep the deployment of the #LibreOffice #Calc thin.
And host your own computing environment. It may get institutional acceptance if you document and automate it at scale.
Data Scientist spécialisé dans l’analyse de données. Expert en outils Big Data. Passionné par l’innovation à l’intersection de la data science, ouvert à la CyberSécurité.
#DataScience #MachineLearning #CyberSecurity #Python #Innovation
