• 0 Posts
  • 191 Comments
Joined 2 years ago
Cake day: July 9th, 2023


  • I fondly remember regularly logging into simtel20.wsmr.army.mil back in the days (WSMR=White Sands Missile Range). No issue, just used “anonymous” as the username, and your email address as the password. And even the email address was just a convenience…


  • In nearly forty-ish years on the internet (yes, I was around before the web), I have not seen someone expressing an internet address in octal (before this discussion), although I remember that it is legal. Using hex, yes, but not octal.





  • Yes, but you can write it in different ways. If the numeric string contains a dot, left of it must be between 0 and 255, and is put in the highest byte of the address. If the rest also contains a dot, repeat, but put it into the second highest byte.

    BUT: if the string does not contain a dot, the number is put into the remaining bytes.

    So 123.256 is a valid address. The 123 goes into the top byte, the 256 goes into the remaining three bytes, so the address would be 123.0.1.0.

    Most common example is 127.1, which is short for 127.0.0.1 - the localhost address.



  • This depends on what you are actually looking for, and how you are looking for it.

    Do you really need pattern matching, or do you only look for fixed strings? Then other tools may be faster.

    If you need case independent search on an upper- and lowercase data set, make a copy that is all upper or all lower, and search there.

    If you only search in certain columns, make a copy that only includes these.

    Or import the data into a database.




  • The compressing and renumbering seems to be more common with embedded Chinese fonts - Space-wise it makes a lot of sense. But yes, mark and copy text, paste it into word or writer, and you get gibberish. Can’t verify the search, though. And, of course, Google translate can’t do anything with it, either.




  • The problem lies in the PDFs themselves. In there are objects that represent lines of glyphs. If you are lucky. A conversion tool can guess which of those lines belong together and produce the text.

    It cannot know any intentions behind it, though. Take a numbered list. The first line is two line objects: the number plus the . or the ), and the first line of text. The conversion tool can now guess. As the line blocks with the numbers are all left of the line blocks with text, this could be a numbered list. Or it could be a table with two columns. Nothing in the PDF is giving any hints.

    And that is the easy part. This assumes that the document either uses default fonts, or keeps its embedded fonts untouched. If they use embedded fonts and a PDF optimizer that only embeds the used characters and renumbers them, any copy or conversion tool is bound to fail.

    Same with protected PDFs where you simply cannot copy the text from the start.

    And then there are PDFs that just consist of scanned pages. Here you would need an OCR software to get something readable out of them.

    PDF is an archival, output format, the end of a process. Not something to work from.

    Always preserve the original file. Keep it safe. If you change tools, make sure you have a conversion path into something editable. The PDF is for giving away, nothing else.



  • I used an Apple II at school, but that was already the fourth computer I worked with. The first was a one-board computer with an LCD and one kilobyte of RAM, the second was a TI99/4A with 16 KB, and the third was a C64. I never had a PC running Windows as a main OS, but one of my earlier PCs had win95 as an alternative boot for gaming only.