python Timeline - FediScanner

19.10.2025 16:24
moprius (@moprius@mastodon.social)

Há quem imagine que programar exige um talento inato, uma espécie de vocação misteriosa, como de matemática e reservada a poucas pessoas. Esse mito persiste na maioria, embora a realidade seja mais generosa, programar é uma habilidade ensinável, incremental, e profundamente útil para a vida prática. Entre as linguagens disponíveis, Python se destaca pela clareza.

#programming #programação #programador #technology #tecnologia #python #pythonprogramming #pythondeveloper

https://www.moprius.com/2025/10/python-uma-linguagem-para-cientista.html

Show Original Post

Report

19.10.2025 16:20
GripNews (@GripNews@mastodon.social)

🌗 改進 PixelMelt 的 Kindle 網頁解混淆器
➤ 從單字OCR到整頁渲染：提升Kindle電子書內容的可讀性
✤ https://shkspr.mobi/blog/2025/10/improving-pixelmelts-kindle-web-deobfuscator/
本文作者Terry Eden針對PixelMelt提出的Amazon Kindle電子書 DRM解除方法進行改進。原方法透過模擬瀏覽器下載JSON檔案，重構混淆的SVG圖形後進行OCR辨識，但存在區域設定限制與OCR辨識錯誤（如句號誤判為中點）。作者採用新方法，直接對整頁SVG進行渲染，利用字體與尺寸資訊精確放置各個字元，再利用Tesseract OCR引擎進行辨識，以期提高準確性與閱讀體驗。儘管新方法仍有不足，例如無法處理圖片、無法辨識排版（如粗體、斜體、縮排），且OCR辨識可能出錯，但已大幅改善了原始方法的缺點，為取得Amazon書籍的數位權利提供了一個更實用的解決方案。
+ 這真是太厲害了！能將DRM保護的書籍內容提取出來，而且還進行了
#科技 #電子書 #DRM #Python #OCR

Show Original Post

Report

19.10.2025 15:38
pythonbrasil (@pythonbrasil@pynews.com.br)

Tutorial

AWS VIBE CODING DOJO: Programação Colaborativa + IA! 👥 - Marcelo Palladino

Experiência imersiva de 2h combinando programação colaborativa e IA generativa. Desenvolva soluções reais em equipe usando Amazon Q Developer CLI, com mentoria de especialistas AWS.

#python #bolhadev #aws #pythonbrasil #pybr2025

Show Original Post

Report

19.10.2025 15:00
nowosad (@nowosad@fosstodon.org)

Spatial operations in Python? 📍🌍🐍

Chapter 3 of Geocomputation with Python covers:

- Vector: spatial joins, subsetting, aggregation, etc.
- Raster: map algebra (local, focal, zonal, global), tiling & merging

👉 https://py.geocompx.org/03-spatial-operations

#GeoPython #Python #GISchat #geocompx

Show Original Post

Report

19.10.2025 14:21
reddit_tech_vn_bot (@reddit_tech_vn_bot@mastodon.maobui.com)

Mình đã xây dựng Graphite: Chỗ.space trực quan không tuyến tính cho LLM, biến chat局域 thành bản đồ ý tưởng (Python/Ollama)! 🌐
Giúp bạn theo dõi partidos nghista, tạo biểu đồological từ dữ liệu trong node. 100%Offline ratios nhờ Ollama.
#Graphite #AI #Ollama #Python #NonLinear #DataPrivacy #VisualAI

https://www.reddit.com/r/ollama/comments/1oanxsn/i_built_graphite_a_visual_nonlinear_llm_interface/

Show Original Post

Report

19.10.2025 13:35
Edent (@Edent@mastodon.social)

🆕 blog! “Improving PixelMelt's Kindle Web Deobfuscator”

A few days ago, someone called PixelMelt published a way for Amazon's customers to download their purchased books without DRM. Well… sort of.

In their post "How I Reversed Amazon's Kindle Web Obfuscation Because Their App Sucked" they describe the process of spoofing a web browser, downloading a b…

Show Original Post

Report

19.10.2025 13:34
blog (@blog@shkspr.mobi)

Improving PixelMelt's Kindle Web Deobfuscator

https://shkspr.mobi/blog/2025/10/improving-pixelmelts-kindle-web-deobfuscator/

A few days ago, someone called PixelMelt published a way for Amazon's customers to download their purchased books without DRM. Well… sort of.

In their post "How I Reversed Amazon's Kindle Web Obfuscation Because Their App Sucked" they describe the process of spoofing a web browser, downloading a bunch of JSON files, reconstructing the obfuscated SVGs used to draw individual letters, and running OCR on them to extract text.

There were a few problems with this approach.

Firstly, the downloader was hard-coded to only work with the .com site. That fix was simple - do a search and replace on amazon.com with amazon.co.uk. Easy!

But the harder problem was with the OCR. The code was designed to visually centre each extracted glyph. That gives a nice amount of whitespace around the character which makes it easier for OCR to run. The only problem is that some characters are ambiguous when centred:

When I ran the code, lots of full-stops became midpoints, commas became apostrophes, and various other characters went a bit wonky.

That made the output rather hard to read. This was compounded by the way line-breaks were treated. Modern eBooks are designed to be reflowable - no matter the size of your screen, lines should only break on a new paragraph. This had forced linebreaks at the end of every displayed line - rather than at the end of a paragraph.

So I decided to fix it.

A New Approach

I decided that OCRing an entire page would yield better results than single characters. I was (mostly) right. Here's what a typical page looks like after de-obfuscation and reconstruction:

As you can see - the typesetting is good for the body text, but skew-whiff for the title. Bold and italics are preserved. There are no links or images.

Here's how I did it.

Extract the characters

As in the original code, I took the SVG path of the character and rendered it as a monochrome PNG. Rather than centring the glyph, I used the height and width provided in the glyphs.json file. That gave me a directory full of individual letters, numbers, punctuation marks, and ligatures. These were named by fontKey (bold, italic, normal, etc).

Create a blank page

The page_data_0_4.json has a width and height of the page. I created a white PNG with the same dimensions. The individual characters could then be placed on that.

Resize the characters

In the page_data_0_4.json each run of text has a fontKey - which allows the correct glyph to be selected. There's also a fontSize parameter. Most text seems to be (the ludicrously precise) 19.800001. If a font had a different size, I temporarily scaled the glyph in proportion to 19.8.

Each glyph has an associated xPosition, along with a transform which gives X and Y offsets. That allows for indenting and other text layouts.

The characters were then pasted on to the blank page.

Once every character from that page had been extracted, resized, and placed - the page was saved as a monochrome PNG.

OCR the page

Tesseract 5 is a fast, modern, and reasonably accurate OCR engine for Linux.

Running tesseract page_0022.png output -l eng produced a .txt file with all the text extracted.

For a more useful HTML style layout, the hOCR output can be used: tesseract page_0022.png output -l eng hocr

Or, a PDF with embedded text: tesseract page_0022.png output -l eng pdf

Mistakes

OCR isn't infallible. Even with a high resolution image and a clear font, there were some errors.

Superscript numerals for footnotes were often missing from the OCR.
Words can run together even if they are well spaced.
Tesseract can recognise bold and italic characters - but it outputs everything as plain text.

What's missing?

Images aren't downloaded. I took a brief look and, while there are links to them in the metadata, they're downloaded as encrypted blobs. I'm not clever enough to do anything with them.

The OCR can't pick out semantic meaning. Chapter headings and footnotes are rendered the same way as text.

Layout is flat. The image of the page might have an indent, but the outputted text won't.

What's next?

This is very far from perfect. It can give you a visually similar layout to a book you have purchased from Amazon. But it won't be reflowable.

The text will be reasonably accurate. But there will be plenty of mistakes.

You can get an HTML layout with hOCR. But it will be missing formatting and links.

Processing all the JSON files and OCRing all the images is relatively quick. But tweaking and assembling is still fairly manual.

There's nothing particularly clever about what I've done. The original code didn't come with an open source software licence, so I am unable to share my changes - but any moderately competent programmer could recreate this.

Personally, I've just stopped buying books from Amazon. I find that Kobo is often cheaper and their DRM is easy to bypass. But if you have many books trapped in Amazon - or a book is only published there - this is a barely adequate way to liberate it for your personal use.

#Amazon #drm #ebooks #kindle #python

Show Original Post

Report

19.10.2025 13:25
lobsters (@lobsters@mastodon.social)

The future of Python web services looks GIL-free https://lobste.rs/s/huszno #python #web
https://blog.baro.dev/p/the-future-of-python-web-services-looks-gil-free

Show Original Post

Report

19.10.2025 12:43
r (@r@fed.brid.gy)

The future of Python web services looks GIL-free

https://fed.brid.gy/r/https://blog.baro.dev/p/the-future-of-python-web-services-looks-gil-free

Show Original Post

Report

19.10.2025 11:41
villares (@villares@ciberlandia.pt)

@damien you could draw it with #Blender or #FreeCAD, both can be controlled with #Python also. And I like making stuff directly with Python + #shapely + #trimesh (+ #py5) an export STL for 3D printing :)

Show Original Post

Report

19.10.2025 11:21
reddit_tech_vn_bot (@reddit_tech_vn_bot@mastodon.maobui.com)

"Xây dựng pipeline tìm kiếm druge nhé! 🤔 Cảpgvector hayLLamaIndex+Milvus? Nhu cầu millions rows. Giúpôi về t_CSAL, skalabilité, bảo trì! #pgvector #LlamaIndex #Milvus #SemanticSearch #Python #AI"

https://www.reddit.com/r/LocalLLaMA/comments/1oaksnu/need_advice_pgvector_vs_llamaindex_milvus_for/

Show Original Post

Report

19.10.2025 10:41
hamatti (@hamatti@mastodon.world)

The return of PyCon Finland was a marvellous event.

I wrote a recap of the conference (+ a bit about Lokacon) to capture the experience.

https://hamatti.org/posts/pycon-finland-2025-recap/

@pyconfi

#PyConFi #PyConFinland #Python #blogging #Lokacon