- Jan 06, 2023
-
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
-
- Jan 05, 2023
-
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
-
- Dec 21, 2022
-
-
Sven M. Hallberg authored
The code in pdf.c actually does this already, but there is no reason not to be defensive here. Just for completeness' sake: There is nothing theoretically wrong with having even "earlier changes" (earlychange > 1), but we don't want that.
-
Sven M. Hallberg authored
Returning an empty HBytes was an artefact to satisfy the earlier structure of the grammar and is no longer necessary.
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
Oh. My. God.
-
- Dec 20, 2022
-
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
This replaces the validations on code9 etc. with one continuation that picks the appropriate parser. Also relaxes the parser to allow further output codes after the table is full. Looking at the spec, it seems to me at this times that the requirement for a clear code when the table is full is a requirement on producers of PDF files, but not on the file format itself. As far as I understand, conforming files can be created by a non-conforming process. Note: The implementation uses a slight trick to handle the last code (4095) correctly. Quoting the comment in act_output(): Rather than going through the effort of ensuring that the last code is only updated once, we simply assign one more code as a dummy. So, the table is now 4097 entries in actual size. The last one will receive a bogus update every cycle, so that the last real code does not. This is less work than actually detecting and avoiding the bogus updates.
-
- Dec 19, 2022
-
-
Sven M. Hallberg authored
Since we don't expose the struct (any more), we might as well pick a simpler name for it.
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
Also removes an unneeded memset.
-
Sven M. Hallberg authored
This avoids creating an HBytes for each and every code word. Instead, the code words are collected into blocks behind each clear code and translated together into a single HBytes per block.
-
Sven M. Hallberg authored
This saves us from allocating and freeing the HBytes that were stored in the table. It should also save memory since it essentially shares common prefixes between codes. The only remaining call to malloc() is the one allocating the global context object itself.
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
Remember that HBytes itself just wraps a pointer and a size, so this does not significantly enlarge the struct, but it saves a whole bunch of allocation.
-
Sven M. Hallberg authored
No need for it to be part of the exposed interface.
-
Sven M. Hallberg authored
Commit 970f23cf already removed the use of h_butnot(), so there is no need anymore for act_output to handle code = 257 (eod).
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
This includes the global context variable and all semantic actions and validations. Besides being good practice, this makes the "LZW" in their names unnecessary.
-
Sven M. Hallberg authored
The only difference between the codeword and the litspec rules was that the latter validated that code < 258. This has become redundant because they were only still used for eod and clear both of which have their own specific validation for the code value. Thus the litspec rules and their validations can go.
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
This frees up the more generic name.
-
Sven M. Hallberg authored
This makes LZW_literal redundant and removes the need to use h_butnot() to detect eod.
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
This makes test/valid/lzw.pdf report decoder failure. Is that file actually valid?
-
Sven M. Hallberg authored
After the previous commit, we no longer need to know the last seen code. The only remaining use was the test whether we have already assigned a code (after clear). We can just as well detect that by inspecting the number of defined codes.
-
Sven M. Hallberg authored
This changes the logic of act_LZW_codeword such that it creates a new table entry after processing each code word, even though it does not know the last character, yet. We know that we will discover the last character on the next round, before we need it for any output. In return we can remove all the fumbling around with prev_string. A tiny gripe remains in the fact that HBytes declares its token member const, so technically we are forbidden from filling in the last character after the fact. But also technically, we can sledgehammer-cast the const away, thanks. Also slightly extends coverage of the defensive asserts and exposes a bug (test/valid/lzw.pdf crashes) that I think must have been there before: It seems that we never validate that code words are actually in the defined range!?
-
Sven M. Hallberg authored
-
Sven M. Hallberg authored
This saves a tiny bit of code dup in updating ctx->old and building the return value.
-
Sven M. Hallberg authored
The only thing missing from act_LZW_codeword is to skip the table update on the first code after a clear. The rest of the relevant code path is virtually identical.
-
Sven M. Hallberg authored
This logically matches the H_ALLOC in act_LZW_literal. NB: We can drop the multiplication by sizeof(uint8_t) because the latter is guaranteed to be 1. If uint8_t exists, CHAR_BIT must equal 8.
-
Meredith L. Patterson authored
Fix recent instigator crashes Closes #25, #31, #35, #36, and #37 See merge request !35
-
- Dec 18, 2022
-
-
Sven M. Hallberg authored
-