python

Back Open Paginator
22.01.2026 13:57
hasanaligultekin (@hasanaligultekin@me.dm)

What Is F1 Score in Machine Learning? A Practical Guide

A simple way to balance precision and recall when accuracy is misleading.

This post explains F1 with a clear confusion-matrix view, when it matters (imbalanced classes), and how to interpret trade-offs—plus a small Python example.

:medium: medium.com/@hasanaligultekin/w

#MachineLearning #DataScience #Python #ModelEvaluation #ai #medium #ML

@ai @theartificialintelligence @programming @towardsdatascience
@pythonclcoding @chartrdaily @medium





Show Original Post


22.01.2026 13:35
Edent (@Edent@mastodon.social)

🆕 blog! “Removing "/Subtype /Watermark" images from a PDF using Linux”

Problem: I've received a PDF which has a large "watermark" obscuring every page.

Investigating: Opening the PDF in LibreOffice Draw allowed me to see that the watermark was a separate image floating above the others.

Manual Solution: Hit page down, select image, delete, repeat 500 times. …

👀 Read more: shkspr.mobi/blog/2026/01/remov




Show Original Post


22.01.2026 13:34
blog (@blog@shkspr.mobi)

Removing "/Subtype /Watermark" images from a PDF using Linux

shkspr.mobi/blog/2026/01/remov

Problem: I've received a PDF which has a large "watermark" obscuring every page.

Investigating: Opening the PDF in LibreOffice Draw allowed me to see that the watermark was a separate image floating above the others.

Manual Solution: Hit page down, select image, delete, repeat 500 times. BORING!

Further Investigating: Using pdftk, it's possible to decompress a PDF. That makes it easier to look through manually.

pdftk input.pdf output output.pdf uncompress

Hey presto! A PDF you can open in a text editor! Deep joy!

Searching: On a hunch, I searched for "watermark" and found several lines like this:

<<
/Length 548
>>
stream
/Figure <</MCID 0 >>BDC q 0 0 477 733.464 re W n q /GS0 gs 479.2799893 0 0 735.5999836 -1.0800002 -1.0559941 cm /Im0 Do Q EMC 
/Figure <</MCID 1 >>BDC Q q 28.333 300.661 420.334 126.141 re W n q /GS0 gs 420.3339603 0 0 126.1418879 28.3330078 300.6610601 cm /Im1 Do Q EMC
/Figure <</MCID 2 >>BDC Q q 16.106 0 444.787 215.464 re W n q /GS0 gs 444.7874274 0 0 216.5921386 16.1062775 -1.1281493 cm /Im2 Do Q EMC
/Artifact <</Subtype /Watermark /Type /Pagination >>BDC Q q 0.7361145 0 0 0.7361145 113.3616638 240.8575745 cm /GS1 gs /Fm0 Do Q EMC
endstream
endobj

Those are Marked Content Blocks. In theory you can just chop out the line with /Subtype /Watermark but each block has a /length variable - so you'd also need to adjust that to account for what you've changed - otherwise the layout goes all screwy.

That led me to PyMuPDF which claimed to solve the problem. But running that code only removed some of the watermarks. It got stuck on an infinite loop on certain pages.

So, now that I had more detailed knowledge, I managed to get an LLM to construct something which mostly seems to work.

Does it work with every PDF? I don't know. Does it contain subtle implementation bugs? Probably. Is there an easier way to do this? Not that I can find.

import re
import pymupdf

# Open the PDF
doc = pymupdf.open("output.pdf")

# Regex of the watermarks
pattern = re.compile(
    rb"/Artifact\s*<<[^>]*?/Subtype\s*/Watermark[^>]*?>>BDC.*?EMC",
    re.DOTALL
)

# Loop through the PDF's pages
for page_num, page in enumerate(doc, start=1):
    print(f"Processing page {page_num}")
    xrefs = page.get_contents()
    for xref in xrefs:
        cont = doc.xref_stream(xref)
        new_cont, n = pattern.subn(b"", cont)
        if n > 0:
            print(f"  Removed {n} watermark block(s)")
            doc.update_stream(xref, new_cont)

doc.save("no-watermarks.pdf")

One of the (many) problems with Vibe Coding is that trying to get a LLM to spit out something useful depends massively on how well you know the subject area. I'm proud to say I know vanishingly little about the baroque PDF specification - which meant that most of my attempts to use various "AI" tools consisted of me saying "No, that doesn't work" and the accurs'd machine saying back "Golly-gee! You're right! Let me fix that!" and then breaking something else.

I'm not sure this is the future we wanted, but it looks like the future we've got.

#LLM #pdf #python


Show Original Post


22.01.2026 13:26
villares (@villares@pynews.com.br)

Ada meets Waldemar Cordeiro and Giorgio Moscati
(Learn more about them at waldemarcordeiro.com/ & ekac.org/moscati.html)
Find the sketch-a-day archives and tip jar at: abav.lugaralgum.com/sketch-a-d
Code for this sketch at: github.com/villares/sketch-a-d #Processing #Python #py5 #CreativeCoding





Show Original Post


22.01.2026 13:05
jobsfordevelopers (@jobsfordevelopers@mastodon.world)

Affirm is hiring Senior Software Engineer, Backend (Servicing International)

🔧 #kotlin #python #react #vue #aws #kubernetes #mysql #seniorengineer
🌎 Remote; Poland
⏰ Internship
🏢 Affirm

Job details jobsfordevelopers.com/jobs/sen
#jobalert #jobsearch #hiring




Show Original Post


22.01.2026 12:57
owenrlamont (@owenrlamont@fosstodon.org)

I just released pyglobegl 0.4.0 to PyPi. Now exposing the globe.gl arcs layer API. Also integrated with Pandera for better validation when using the GeoPandas helper functions and enhanced the image comparing automated tests to be less flaky. Still lots more globe.gl APIs to implement. #Python





Show Original Post


22.01.2026 12:20
reddit_tech_vn_bot (@reddit_tech_vn_bot@mastodon.maobui.com)

Mới: công cụ miễn phí TubeFlow AI giúp tự động tạo Shorts. Kết hợp Gemini và Pexels API, không cần đăng ký, không quảng cáo. Tiết kiệm thời gian tìm footage khi dùng AI. Bạn có thể thử ngay! #Shorts #AI #Python #CôngCụMiễnPhí #Video #Automation

reddit.com/r/SideProject/comme




Show Original Post


22.01.2026 11:41
cd_newton (@cd_newton@hachyderm.io)

A vacancy for a Data Officer within Dogs Trust's data science and analytics research team has gone live this morning:
 
- £37,130 per annum
- Fully remote (within UK)
- #SQL + #Python and/or #RStats experience sought, alongside #NLP / #TextMining
 
Deadline for applications: 2026-01-29
 
Further details here:

careers.dogstrust.org.uk/en/po

#GetFediHired




Show Original Post


22.01.2026 11:33
futurile (@futurile@mastodon.social)

@mgd @sharlatan

I think you're at risk of confuse two things here @mgd. If your goal is to "get a job" and you're trying to create a portfolio/example application - then you should choose a language that is _common_ in commercial development.

Guix itself is build in which is a great language but it's not popular for commercial devel.

The other way to improve Guix (most of the work) is packaging software. That's could be , , etc - that's what @sharlatan means




Show Original Post


22.01.2026 11:14
gaby_wald (@gaby_wald@framapiaf.org)

Data Engineer passionné par l’intégration de données complexes et le développement de solutions logicielles pour la biologie et la cybersécurité. Expert en Java, Spark et Big Data. À la recherche de nouveaux défis innovants.

#DataEngineer #BigData #Bioinformatics #Java #Python #Spark #CyberSecurity

linkedin.com/posts/gabriel-cha





Show Original Post


22.01.2026 11:00
windsheep (@windsheep@infosec.exchange)

On the note of #Excel & #Python, this one here looks interesting:

github.com/Amourspirit/python_

You can call remote APIs and keep the deployment of the #LibreOffice #Calc thin.

And host your own computing environment. It may get institutional acceptance if you document and automate it at scale.




Show Original Post


22.01.2026 10:53
gaby_wald (@gaby_wald@framapiaf.org)

Data Scientist spécialisé dans l’analyse de données. Expert en outils Big Data. Passionné par l’innovation à l’intersection de la data science, ouvert à la CyberSécurité.

#DataScience #MachineLearning #CyberSecurity #Python #Innovation

linkedin.com/posts/gabriel-cha





Show Original Post


1 ...609 610 611 612 613 614 615 616 617 618 619 ...1585
UP